XMLParser eats my spaces

Question

XMLParser eats my spaces

I am losing significant gaps from the wiki page that I am analyzing, and I think about it because of the parser. I have this in my Groovy script:

@Grab(group='org.ccil.cowan.tagsoup', module='tagsoup', version='1.2' )
def slurper = new XmlSlurper(new org.ccil.cowan.tagsoup.Parser())
slurper.keepWhitespace = true
inputStream.withStream{ doc = slurper.parse(it) 
println "originalContent = " + doc.'**'.find{ it.@id == 'editpageform' }.'**'.find { it.@name=='originalContent'}.@value
}

If the inputStream is initialized from a GET URL request to edit the confluence wiki page. Later in the withInputStream block, where I do this:

println "originalContent = " + doc.'**'.find{ it.@id == 'editpageform' }.'**'.find { it.@name=='originalContent'}.@value

I noticed that all the original content of the page is devoid of new lines. Initially, I thought it was a server thing, but when I went to do the same req in my browser and look at the source, I could see new lines in the hidden parameter "originalContent". Is there an easy way to disable the normalization of spaces and keep the contents of the field? The above has been done against the Confluence internal wiki page, but most likely it could be reproached when editing any arbitrary wiki page.

"slurped.keepWhitespace = true" , . , , ? Java XMLParser? ?

+5

java xml xml-parsing groovy

Cliff 30 '12 3:54

2

stackmagic · Answer 1 · 2012-06-08T16:17:47+0000

, node, html.

, tagoup , slurper , .

, , tagoup feature ignorable-whitespace . ( )

, . , , , , , . , tagoup xml slurper?

html, ?

@Grab(group='org.ccil.cowan.tagsoup', module='tagsoup', version='1.2' )

String html = """\
<html><head><title>test</title></head><body>
<p>
    <form id="editpageform">
        <p>
            <input name="originalContent" value="         ">         

            </input>
        </p>
    </form>
</p>
</body></html>
"""
def inputStream = new ByteArrayInputStream(html.getBytes())

def parser = new org.ccil.cowan.tagsoup.Parser()
parser.setFeature("http://www.ccil.org/~cowan/tagsoup/features/ignorable-whitespace", true)

def slurper = new XmlSlurper(parser)
slurper.keepWhitespace = true

inputStream.withStream{ doc = slurper.parse(it) 
    def parse = { doc.'**'.find{ it.@id == 'editpageform' }.'**'.find { it.@name=='originalContent'} }
    println "originalContent (name)  = '${parse().@name}'"
    println "originalContent (value) = '${parse().@value}'"
    println "originalContent (text)  = '${parse().text()}'"
}

Cliff · Answer 2 · 2012-06-08T17:45:02+0000

, value. . :

@Grab(group='org.ccil.cowan.tagsoup', module='tagsoup', version='1.2' )

String html = """\
<html><head><title>test</title></head><body>
<p>
    <form id="editpageform">
        <p>
            <input name="originalContent" value=" 



                    ">         

            </input>
        </p>
    </form>
</p>
</body></html>
"""
def inputStream = new ByteArrayInputStream(html.getBytes())

def parser = new org.ccil.cowan.tagsoup.Parser()
parser.setFeature("http://www.ccil.org/~cowan/tagsoup/features/ignorable-whitespace", true)

def slurper = new XmlSlurper(parser)
slurper.keepWhitespace = true

inputStream.withStream{ doc = slurper.parse(it) 
    def parse = { doc.'**'.find{ it.@id == 'editpageform' }.'**'.find { it.@name=='originalContent'} }
    println "originalContent (name)  = '${parse().@name}'"
    println "originalContent (value) = '${parse().@value}'"
    println "originalContent (text)  = '${parse().text()}'"
    assert parse().@value.toString().contains('\n') : "Should contain a newline"
}

XMLParser eats my spaces

More articles: