Saturday, November 9, 2013

Python: Extracting an attribute value with beautifulsoup


Share at Facebook

Previously I have demonstrate to parse the div content from the html source. Now I am showing you how to parse the attribute value from a desired tag from a given html.


For this example, lets consider the the attribute name is __VIEWSTATE. This is very important when you are scraping the sites of Microsoft. In this example I am going to show you how to parse this __VIEWSTATE using beautifulsoup.

soup = BeautifulSoup(content)
viewstate = soup.findAll("input", {"type": "hidden", "name": "__VIEWSTATE"})
print viewstate[0]['value']


__VIEWSTATE mainly resides as hidden parameter under input of the forms. So I have looked for the tag named input that is having a type=hidden and name=__VIEWSTATE. Example code returns all the matches from the html.

Here, I have selected the first match(means index 0). But in your case it can be in different indexes if multiple matches are found. Also, its very important to check the count before accessing the array(using function len()). Otherwise Python will popup with an Array index out of bound exception in case no matches are found.





No comments: