Remov...">

Removing italic and bold html tags from a string?

What is a safer way to remove bold and italics than the following?

        String text = "<b>Remove <i>bold</i> and italics</b>";
        System.out.println(text);
        text = text.replaceAll("\\<.*?\\>", ""); //remove all but only want to remove b and i?
        System.out.println(text);

Also, and more extensible (if I want to include other tags, such as "strong" or "em", and allow case sensitivity of "b" and "B", etc.)?

+3
source share
3 answers

You can use this regex:
        <\/?[bi]>

Demo

THE CODE:

    String text = "<b>Remove <i>bold</i> and italics</b>"; 
    text = text.replaceAll("<\\/?[bi]>", "");  
    System.out.println(text);

OUTPUT

Remove bold and italics

If you want to be case-insensitive, you can use the corresponding flag (?i)

EXPLANATION

enter image description here

+7
source

You can use Jsoupclean with Whitespace. Whitespaceis extensible to include attributes that should also be excluded.

javadoc

, HTML ( ) . ....

( !), :

  • addTags (java.lang.String...)
  • addAttributes (java.lang.String, java.lang.String...)
  • addEnforcedAttribute (java.lang.String, java.lang.String, java.lang.String)
  • addProtocols (java.lang.String, java.lang.String, java.lang.String...)

:

String text = "<b>Remove <i>bold</i> and italics</b>";
        System.out.println(text);
        String doc =  Jsoup.clean(text, new Whitelist());
        System.out.println(doc);

:

<b>Remove <em>bold</i> and italics</b>
Remove bold and italics
+2

HTML, - JSoup .

   Document doc = Jsoup.parse(content);
   Elements elements = doc.getElementsByTag("b");
   for (Element pre : elements) {
             pre.remove();
    }

, "b" "i"

, .

JSoup css, - doc.select("strong, em, b, i"); Elements

0

All Articles