How to exclude substring from string using regular expression?

I have string input in the following two forms.

1.

<!--XYZdfdjf., 15456, hdfv.4002-->
<!DOCTYPE

2.

<!--XYZdfdjf., 15456, hdfv.4002
<!DOCTYPE

I want to return a match if form 2 is found and does not match form 1. Thus, basically, I want the regular expression to arbitrarily accept all characters between <!--and <!DOCTYPE, except when there is a space between them -->.

I use Pattern, Matcher and java regex. Help is requested in terms of a regular expression that can be used with Pattern.compile ()

Thanks in advance.

+3
source share
5 answers
Pattern p = Pattern.compile("(?s)<!--(?:(?!-->).)*<!DOCTYPE");

(?:(?!-->).)*matches one character at a time, after checking that it is not the first character -->.

(?s) DOTALL ( a.k.a.), . .

, , * *?, :

"(?s)<!--(?:(?!-->).)*?<!DOCTYPE"

, , , .

+4

, , String.contains():

if (yourHtml.contains("-->")) {
    // exclude
} else {
    // extract the content you need
    String content = 
        yourHtml.substring("<!--".length(), yourHtml.indexOf("<!DOCTYPE"));
}

, .

+3
\<!--([\s\S](?!--\>))*?(?=\<\!DOCTYPE)

lookahead, <! DOCTYPE (lookahead ).

+2

I don’t have a test system, so I can’t give you a regular expression, but you have to look inside the template documentation for what is called negative lookahead assertion. This allows you to express the rules of the form: match it if you don't follow it.

He should help you :)

+1
source

Regular expression may not be the best answer to your problem. Have you tried splitting the first line from everything else and see if it contains -->?

In particular, something like:

String htmlString;
String firstLine = htmlString.split("\r?\n")[0];
if(firstLine.contains("-->"))
    ;//no match
//match
+1
source

All Articles