Encode / Decode std :: string in UTF-16

I need to process a file format (both read and write to it) in which strings are encoded in UTF-16 (2 bytes per character). Since characters from the ASCII table are rarely used in the application domain, all strings in my C ++ class models are stored in std :: string instances (UTF-8 encoding).

I am looking for a library (search in STL and Boost without any luck) or a set of C / C ++ functions to handle this conversion std :: string ↔ UTF-16 when loading or saving to a file format (actually modeled as bytestream), in that including generation / recognition of surrogate pairs and all of these Unicode products (I, by all accounts, am not a specialist) ...

Any suggestions? Thank!

EDIT: forgot to mention that it must be cross-platform (Win / Mac) and cannot use C ++ 11.

+5
source share
2 answers

C ++ 11 has this functionality:

std::string s = u8"Hello, World!";

// #include <codecvt>
std::wstring_convert<std::codecvt<char16_t,char,std::mbstate_t>,char16_t> convert;

std::u16string u16 = convert.from_bytes(s);
std::string u8 = convert.to_bytes(u16);

However, as far as I know, the only implementation that has this so far is libC ++. C ++ 11 also has std::codecvt_utf8_utf16<char16_t>one that has some other implementations. In particular, it codecvt_utf8_utf16works in VS 2010 and above, and since wchar_t is used by Windows to represent UTF-16, you can use this to convert between UTF-8 and the original Windows encoding .


Specialization codecvt<char16_t, char, mbstate_t>converts the encoding of UTF-16 and UTF-8 schemes, and specialization codecvt<char32_t, char, mbstate_t>converts between UTF-32 and UTF-8 encoding schemes.

                                                                                                                                               ;   ] 22.4.1.4/3


, std:: codecvt , wstring_convert , :

template <class Facet>
class usable_facet : public Facet {
public:
    using Facet::Facet; // inherit constructors
    ~usable_facet() {}

    // workaround for compilers without inheriting constructors:
    // template <class ...Args> usable_facet(Args&& ...args) : Facet(std::forward<Args>(args)...) {}
};

template<typename internT, typename externT, typename stateT> 
using codecvt = usable_facet<std::codecvt<internT, externT, stateT>>;

std::wstring_convert<codecvt<char16_t,char,std::mbstate_t>> convert;
+11

Boost.Locale? , , UTF UTF IOStreams.

+4

All Articles