Remove non-print utf8 characters except control characters from String

I have a string containing text, control characters, numbers, umlauts (German) and other utf8 characters.

I want to remove all utf8 characters that are not “part of the language”. Special characters, such as (not a complete list) ": / \ ßä,; \ n \ t", must be preserved.

Sadly stackoverflow deletes all these characters, so I need to add an image ( link ).

Any ideas? Help is much appreciated!

PS: If anyone knows the paste service that does not kill these special characters, I would love to download the lines. I just could not find them.

[Edit]: I THINK THE MODE "\ P {Cc}" - all the characters that I want to save. Can this regular expression be inverted so that all characters not matching this regular expression are returned?

+5
source share
2 answers

You have already found the properties of the Unicode character.

You can invert the property of a character by changing the leading case "p"

eg.

\p{L} matches all letters

\p{L} matches all characters that do not have a property letter.

So, if you think that \P{Cc}- this is what you need, then it \P{Cc}will correspond to the opposite.

More on regular-expressions.info

, \P{Cc} , , , , . (0x09), Linefeed (0x0A) (0x0D).

, :

[^\P{Cc}\t\r\n]

[^...] , , " " ( , ), tab, CR LF.

+8

your_string.replaceAll("\\p{C}", "");
0

All Articles