Let’s say you need to write an XML file with this content: | < ?xml version="1.0" encoding="UTF-8"? > < root description="this is a na?ve example" > < /root > |
How do we write that in C++? At a first glance, you could be tempted to write it like this: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | #include < fstream > int main() { std::ofstream testFile; testFile.open("demo.xml", std::ios::out | std::ios::binary); std::string text = "< ?xml version=\"1.0\" encoding=\"UTF-8\"? >\n" "< root description=\"this is a na?ve example\" >\n< /root >"; testFile << text; testFile.close(); return 0; } |
When you open the file in IE for instance, surprize! It’s not rendered correctly: So you could be tempted to say “let’s switch to wstring and wofstream”. | int main() { std::wofstream testFile; testFile.open("demo.xml", std::ios::out | std::ios::binary); std::wstring text = L"< ?xml version=\"1.0\" encoding=\"UTF-8\"? >\n" L"< root description=\"this is a na?ve example\" >\n< /root >"; testFile << text; testFile.close(); return 0; } |
And when you run it and open the file again, no change. So, where is the problem? Well, the problem is that neither ofstream nor wofstream write the text in a UTF-8 format. If you want the file to really be in UTF-8 format, you have to encode the output buffer in UTF-8. And to do that we can use WideCharToMultiByte(). This Windows API maps a wide character string to a new character string (which is not necessary from a multibyte character set). The first argument indicates the code page. For UTF-8 we need to specify CP_UTF8. The following helper functions encode a std::wstring into a UTF-8 stream, wrapped into a std::string. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | #include < windows.h > std::string to_utf8(const wchar_t* buffer, int len) { int nChars = ::WideCharToMultiByte( CP_UTF8, 0, buffer, len, NULL, 0, NULL, NULL); if (nChars == 0) return ""; string newbuffer; newbuffer.resize(nChars) ; ::WideCharToMultiByte( CP_UTF8, 0, buffer, len, const_cast< char* >(newbuffer.c_str()), nChars, NULL, NULL); return newbuffer; } std::string to_utf8(const std::wstring& str) { return to_utf8(str.c_str(), (int)str.size()); } |
With that in hand, all you have to do is doing the following changes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | int main() { std::ofstream testFile; testFile.open("demo.xml", std::ios::out | std::ios::binary); std::wstring text = L"< ?xml version=\"1.0\" encoding=\"UTF-8\"? >\n" L"< root description=\"this is a na?ve example\" >\n< /root >"; std::string outtext = to_utf8(text); testFile << outtext; testFile.close(); return 0; } |
And now when you open the file, you get what you wanted in the first place. And that is all!
|