分享

Boost.Locale: Default Encoding under Microsoft Win...

 兰亭文艺 2019-12-13

All modern operating systems use Unicode.

  • Unix operating system family use UTF-8 encoding by default.
  • Microsoft Windows had migrated to Wide/UTF-16 API. The narrow encodings had been deprecated and the native OS API became so called 'Wide API'

As a result of radically different approaches, it is very hard to write portable Unicode aware applications.

Boost Locale fully supports both narrow and wide API. The default character encoding is assumed to be UTF-8 on Windows.

So if the default operating system Locale is 'English_USA.1252' the default locale for Boost.Locale on Windows would be 'en_US.UTF-8'.

When the created locale object is installed globally then any libraries that use std::codecvt for conversion between narrow API and the native wide API would handle UTF-8 correctly.

A good example of such library is Boost.Filesystem v3.

For example

#include <boost/locale.hpp>#include <boost/filesystem/path.hpp>#include <boost/filesystem/fstream.hpp>int main(){ // Create and install global locale std::locale::global(boost::locale::generator().generate('')); // Make boost.filesystem use it boost::filesystem::path::imbue(std::locale()); // Now Works perfectly fine with UTF-8! boost::filesystem::ofstream hello('שלום.txt'); }

However such behavior may break existing software that assumes that the current encoding is single byte encodings like code page 1252.

boost::locale::generator class has a property use_ansi_encoding() that allows to change the behavior to legacy one and select an ANSI code page as default system encoding.

So, when the current locale is 'English_USA.1252' and the use_ansi_encoding is turned on then the default locale would be 'en_US.windows-1252'

Note:
winapi backend does not support ANSI encodings, thus UTF-8 encoding is always used for narrow characters.

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多