I am writing a basic script that simply extracts all the links from a web page. It is written in Perl and uses the WWW :: Mechanize and HTML :: Treebuilder :: Xpath modules, both of which I installed through CPAN.
I know that this can be easily done using only WWW :: Mechanize, but I would like to learn how to do it using XPath.
So, the script will analyze the entire web page and check the href attribute for each anchor tag, extract the link and print it to the console / write to the file. Note that in the script below I did not use use strict, since I am only writing this to clarify and understand the concept of using XPath to move an HTML tree.
here is the script:
use WWW::Mechanize;
use HTML::TreeBuilder::XPath;
use warnings;
$url="https://example.com";
$mech=WWW::Mechanize->new();
$mech->get($url);
$tree=HTML::TreeBuilder::XPath->new();
$tree->parse($mech->content);
$nodes=$tree->findnodes(q{'//a'});
foreach $node($nodes)
{
print $node->attr('href');
}
And this gives an error:
Can't locate object method "attr" via package "XML::XPathEngine::Literal" at pagegetter.pl line 23.
I changed the script as follows:
$nodes=$tree->findnodes(q{'//a/@href'});
while($node=$nodes->shift)
{
print $node->attr('href');
}
Error:
Can't locate object method "shift" via package "XML::XPathEngine::Literal"
, href.
$nodes href? , , ?
, , .
.