Porting the Pythonesque library on top of .NET.

I am exploring the possibility of porting the Python Beautiful Soup library to .NET. Basically, because I really like the parser and there simply aren’t any good HTML parsers in the .NET platform (the Html Agility Pack is deprecated, buggy, undocumented and does not work well unless the scheme is known exactly).

One of my main goals is to get the basic DOM selection functionality to really compare the beauty and simplicity of BeautifulSoup, allowing developers to easily create expressions to find the elements they are looking for.

BeautifulSoup takes advantage of unrelated and named parameters to make this happen. For example, to find all the tags awith idfrom testand a titlethat contain the word foo, I could do:

soup.find_all('a', id='test', title=re.compile('foo'))

However, C # has no idea about an arbitrary number of named elements. The .NET.NET runtime has named parameters, but they must match the existing prototype method.

My question is: What is the C # design template that is most parallel to this Pythonic design?

Some ideas:

I would like to go after this, based on how I, as a developer, would like to code. The implementation of this is beyond the scope of this publication. One of my ideas is to use anonymous types. Sort of:

soup.FindAll("a", new { Id = "Test", Title = new Regex("foo") });

Python, .

  • FindAll .
  • FindAll Object, , , . , , .

, , - , , , Python . . - :

soup.FindAll("a")
    .Attr("id", "Test")
    .Attr("title", new Regex("foo"));

DOM.

LINQ. - :

var nodes = (from n in soup
             where n.Tag == "a" &&
             n["id"] == "Test" &&
             Regex.Match(n["title"], "foo").Success
             select n);

, Python # .

+5
1

IronPython. , , python.

+1

All Articles