If you say that this is a simple regular expression that solves your problem, then no, there is no other more effective solution. When it comes to crawling, an alternative would be to load the entire html page in memory, in a DOM document, and search using XPath or even XQuery. But in fact, if the information is easily extracted using regular expressions, then donβt worry, especially if you are not familiar with XPath.
The power of XPath comes when you want to do complex searches. And it is more elegant than regular expression for this task (at least in w3c oppinion). But if you want a quick solution, you have already found it, and it is more efficient in terms of RAM.
source
share