Extract class name from beautifulsoup python tag
I have the following HTML code:
<td class="image">
<a href="/target/tt0111161/" title="Target Text 1">
<img alt="target img" height="74" src="img src url" title="image title" width="54"/>
</a>
</td>
<td class="title">
<span class="wlb_wrapper" data-caller-name="search" data-size="small" data-tconst="tt0111161">
</span>
<a href="/target/tt0111161/">
Other Text
</a>
<span class="year_type">
(2013)
</span>
I am trying to use a beautiful soup to parse some elements in a tab delimited file. I got a lot of help and had:
for td in soup.select('td.title'):
span = td.select('span.wlb_wrapper')
if span:
print span[0].get('data-tconst') # To get `tt0082971`
Now I want to get "Target Text 1".
I tried some things, such as the text above, for example:
for td in soup.select('td.image'): #trying to select the <td class="image"> tag
img = td.select('a.title') #from inside td I now try to look inside the a tag that also has the word title
if img:
print img[2].get('title') #if it finds anything, then I want to return the text in class 'title'
+5
2 answers
If you are trying to get different td values depending on the class (for example, td class = "image" and td class = "title", you can use beautiful soup as a dictionary to get different classes.
This will find all td class = "image" in the table.
from bs4 import BeautifulSoup
page = """
<table>
<tr>
<td class="image">
<a href="/target/tt0111161/" title="Target Text 1">
<img alt="target img" height="74" src="img src url" title="image title" width="54"/>
</a>
</td>
<td class="title">
<span class="wlb_wrapper" data-caller-name="search" data-size="small" data-tconst="tt0111161">
</span>
<a href="/target/tt0111161/">
Other Text
</a>
<span class="year_type">
(2013)
</span>
</td>
</tr>
</table>
"""
soup = BeautifulSoup(page)
tbl = soup.find('table')
rows = tbl.findAll('tr')
for row in rows:
cols = row.find_all('td')
for col in cols:
if col.has_attr('class') and col['class'][0] == 'image':
hrefs = col.find_all('a')
for href in hrefs:
print href.get('title')
elif col.has_attr('class') and col['class'][0] == 'title':
spans = col.find_all('span')
for span in spans:
if span.has_attr('class') and span['class'][0] == 'wlb_wrapper':
print span.get('data-tconst')
+3