How can I emulate ": contains" using BeautifulSoup?

I am working on a project where I need to scratch a little. The project is in the Google App Engine, and we are currently using Python 2.5. Ideally, we would use PyQuery , but due to work in App Engine and Python 2.5 this is not an option.

I saw questions like this search for an HTML tag with specific text , but they didn't quite hit the mark.

I have HTML that looks like this:

<div class="post">
    <div class="description">
        This post is about <a href="http://www.wikipedia.org">Wikipedia.org</a>
    </div>
</div>
<!-- More posts of similar format -->

In PyQuery, I could do something like this (as far as I know):

s = pq(html)
s(".post:contains('This post is about Wikipedia.org')")
# returns all posts containing that text

Naively, although I could do something like this in BeautifulSoup:

soup = BeautifulSoup(html)
soup.findAll(True, "post", text=("This post is about Google.com"))
# []

However, this did not produce results. I modified my query to use regex and got a bit more, but still no luck:

soup.findAll(True, "post", text=re.compile(".*This post is about.*Google.com.*"))
# []

, Google.com, . :contains BeautifulSoup?

- PyQuery- , App Engine ( Python 2.5)?

+5
1

BeautifulSoup ( ):

"- , NavigableString "

:

soup.findAll(True, "post", text=re.compile(".*This post is about.*Google.com.*"))

, :

regex = re.compile('.*This post is about.*Google.com.*')
[post for post in soup.findAll(True, 'post') if regex.match(post.text)]

, Google.com, - NavigableString BeautifulSoup "This post is about", "Google.com", .

, post.text , , , ! post.

+5

All Articles