Getting an element using xpath and cheerio

Trying to write a function in node.js that gets an element by xpath.

I have the xpath of the desired dom element, for example

xpath = '/html/body/div/div[2]/div/h1/span'

My DOM is loaded into cheerio via the fs module (because I have this web page stored locally):

var file = fs.readFileSync( "aaa.html" )
var inDom    = cheerio.load( file )

Then I try to iterate through each part of xpath, get the dom tree element, check its children if the name and number of the element are the same, and if they do, save rez , since this is a matching element. Then I keep digging a new part of xpath. The code looks like this, but it cannot get what I want, because right after I get the first machine and set rez as a matching element, in the next cycle of the loop this new element does not seem to have any children.

var rez = inDom('html');
var xpath = inXpath.split( "/" );
for( var i = iterateStart; i < xpath.length; i++ ) {
    var selector = xpath[ i ].split('[')[0];
    var matches = xpath[ i ].match(/\[(.*?)\]/);
    var child = 0;
    if( matches ) {
        child = matches[ 1 ];
    }

    for( var k = 0; k < rez.length; k++ ) {
        var found = false
        var curE = rez[ k ]

        for( var p = 0; p < curE.children.length; p++ ) {
            var curE_child = curE.children[ p ]

            if( curE_child.name = selector ) {
                if( child > 0 ) {
                    child--
                }
                else {
                    rez = curE_child
                    found = true
                    break
                }
            }               
        }
        if( found ) {
            break
        }
    }       
}

Can someone help me with the code using the specified node.js modules?

+5
source share
3 answers

, , . html-?

Cheerio api , .

var html = fs.readFileSync('aaa.html')
var $ = cheerio.load(html)
var selector = 'div' // some selector here which I can tune to the example html page
var parent = $(selector)
var childSelector = 'p' // some other selector 
var children = parent.find(childSelector)
+4

, cheerio, xpath.

xpath, , , .

inXpath = "BODY/DIV[1]/DIV[2]/DIV[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/DIV[3]/DIV[3]/DIV[1]/DIV[1]/DIV[1]/DIV[1]/SPAN[1]"
var xpath = inXpath.split( "/" );
var dom_body = cheerio.load(body);
sss = dom_body('*');
for( var i = 0; i < xpath.length; i++ ) {
    if (xpath[i].indexOf('[') == -1){
        sss = sss.children(xpath[i])
    } else {
        var selector = xpath[i].split('[')[0];
        var matches = xpath[i].match(/\[(.*?)\]/);
        var index = matches[1] - 1;
        sss = sss.children(selector).eq(index)
    }
}
console.log(sss.html().trim())
0

Yes, there is an xpath implementation:

npm install xpath

Example:

var xml = "<book><title>Harry Potter</title></book>"
var doc = new dom().parseFromString(xml)
var title = xpath.select("//title/text()", doc).toString()
console.log(title)

Source:    https://www.npmjs.org/package/xpath

0
source

All Articles