Parsing html with Jsoup and removing spaces with a specific style

I am writing an application for a friend, but I ran into a problem, the website has these

<span style="display:none">&amp;0000000000000217000000</span>

And we have no idea what they are, but I need to remove them, because my application displays their value.

Is there any way to check if this is in Elements and remove it? I have parsing for each loop, but I can’t figure out how to remove this element effectively.

thank

+3
source share
3 answers

If you want to completely remove these spaces based on the style attribute, try this code:

String html = "<span style=\"display:none\">&amp;0000000000000217000000</span>";
html += "<span style=\"display:none\">&amp;1111111111111111111111111</span>";
html += "<p>Test paragraph should not be removed</p>";

Document doc = Jsoup.parse(html);

doc.select("span[style*=display:none]").remove();

System.out.println(doc);

Here is the result:

<html>
 <head></head>
 <body>
  <p>Test paragraph should not be removed</p>
 </body>
</html>
+8
source

Just try the following:

//Assuming you have all the data in a Document called doc:
String cleanData = doc.select("query").text();

.text(); html- . , ownText(); . , .

+1

JSOUP innerHTML , innerHTML:

Elements elements = doc.select('span');
for(Element e : elements) {
    e.html( e.html().replaceAll("&amp;","") );
}

, , . &amp; , .

, , &amp; - &. & HTML. , , , . , . !

:

// eliminate ampersand and all trailing numbers
e.html( e.html().replaceAll("&amp;[0-9]*","") );

For more information on regular expressions, see Javadocs on the Regex pattern .

0
source

All Articles