C ++ Strip non-ASCII String Characters

Before you get started; Yes, I know this is a duplicate question, and yes, I looked at the published solutions. My problem is that I could not get them to work.

bool invalidChar (char c)
{ 
    return !isprint((unsigned)c); 
}
void stripUnicode(string & str)
{
    str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end()); 
}

I tested this method on "Prusæus, Ægyptians" and it didn’t do anything. I also tried to replace it isprintwithisalnum

The real problem arises when in another section of my program I convert the string string-> wstring->. conversion prohibits if there are Unicode characters in the string-> wstring conversion.

Ref:

How can you strip non-ASCII characters from a string? (in c #)

How to remove all non-alphanumeric characters from a string in C ++?

Edit:

I would still like to remove all non-ASCII characters, no matter if this helps, this is where I crashed:

// Convert to wstring
wchar_t* UnicodeTextBuffer = new wchar_t[ANSIWord.length()+1];
wmemset(UnicodeTextBuffer, 0, ANSIWord.length()+1);
mbstowcs(UnicodeTextBuffer, ANSIWord.c_str(), ANSIWord.length());
wWord = UnicodeTextBuffer; //CRASH

Error dialog

MSV++

!

://myproject

: f:\dd\vctools\crt_bld\self_x86\crt\src\isctype.c

://

( ) ( + 1) <= 256

Edit:

: .txt, , ANSI. .

:

bool invalidChar (char c) 
{  
    return !(c>=0 && c <128);   
} 
void stripUnicode(string & str) 
{ 
    str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());  
}

- / , .

EDIT:

: __ isascii, iswascii

+3
3

:

bool invalidChar (char c) 
{  
    return !(c>=0 && c <128);   
} 
void stripUnicode(string & str) 
{ 
    str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());  
}

EDIT:

: __isascii, iswascii

+8

invalidChar. :

return !isprint( static_cast<unsigned char>( c ) );

a char unsigned, , , , char (UNIT_MAX+1 + c). Passing such a value to isprint` - undefined.

+2

isprint depends on the locale, so the corresponding character must be available for printing in the current locale.

If you want strictly ASCII, check the range for [0..127]. If you want to print ASCII, check the range and isprint.

0
source

All Articles