分享

Handling a Unicode String in Delphi Versions <= 2007

 quasiceo 2012-12-19

Background: This question relates to versions of Delphi below 2009 (ie without Unicode support built in). I have a specification that requires me to transmit a Unicode encoded string over a TCP connection but I do not have Delphi 2009.

Question Is there a single function or very small library (I don't need too much bulk) that I can use to encode a single string into UTF-8 immediately prior sending over the wire? As a second part of my question: if there are UTF-8 encoded strings being sent back as a response, I guess I would then need another function to get it back into a Delphi string format. I understand the limitations of such Unicode support in this way.


What about Utf8ToAnsi and AnsiToUtf8 located in system.pas? – Uwe Raabe Dec 20 '08 at 11:05
feedback

5 Answers

up vote 19 down vote accepted

Delphi versions prior to Delphi 2009 do have Unicode support built in. The WideString type has been available since Delphi 4, I think, maybe earlier. WideString isn't as nice as the new UnicodeString type, but it still holds 16-bit Unicode characters, and you can type-cast it to PWideChar to send strings to Unicode API functions. The Windows unit declares most of the "wide" versions of the API functions, and there's nothing to stop you from declaring other functions yourself if you find some missing.

What prior versions don't have is Unicode support in the VCL. For that, you can use the Tnt Unicode controls. They used to be free. Looks like there are a few places where the latest free version is still available: (1), (2).

The JCL has a couple of units for working with Unicode. The JclWideStrings unit has mostly light-weight utility functions. The JclUnicode unit is more complete, but it also includes a sizable resource for determining character properties of all Unicode characters.

With the JCL you have a few choices for classes to hold lists of WideString values. I think Delphi 7 even comes with a class for that.

Don't think that just because you don't have Delphi 2009 you can't write a Unicode program.

If you have a WideString value, and you want to encode it as UTF-8, then call the Utf8Encode function. It will return an AnsiString value, or possibly Utf8String, if your Delphi version declares that type. It's not the same as Delphi 2009's Utf8String type, though. Delphi 2009's will automatically convert to UnicodeString or AnsiString(x) and vice versa in assignment statements. Prior versions just have a single AnsiString type, so you need to keep track for yourself which variables hold UTF-8 data and which hold Ansi data. (Hungarian notation on your variable and parameter names can help you keep track.) And of course, there's also a Utf8Decode function for converting UTF-8 data back to WideString.

For handling other character encodings, you want to check out Open XML, a free XML library for Delphi. As part of its XML handling, it has support for converting between 70 different encodings.


2  
Prior Delphi 2009, if you assign a WideString into a String, you'll have an automated conversion from Unicode to string=AnsiString, with the current code page of the process. Utf8Decode/Utf8Encode are necessary if you want to use UTF-8 instead of the current code page for sintr storage. – Arnaud Bouchez Nov 27 '10 at 17:39
That link to the JCL appears to be out of date. Is wiki./index.php?title=JEDI_Code_Library the current equivalent? – Jessica Brown Jan 22 at 18:07
feedback

I built a full unicode application without using Delphi 2009 (prior to it's release).

I have used the following:

  1. Use widestring as main string datatype.

  2. Used database component with unicode support(ADO use widestring too, but I didnt' use it cuz it doesn't handle unicode field names).

  3. Used free TNTControls for the UI, worked fine but it's same as Standard controls, don't have much features like other third party's controls.

  4. I have setup a VM with different language, so I can test the version in different system that doesn't support my language.

  5. FastReport was my reporting tool which support Unicode too.

also I have used DIConverters from Delphi Inspiration to convert a database from ansi to UTI8 with it's functions, you can use it for the conversion from/to UTF8, and it's freeware;-)

there's also an open source project Delphi fundamentals, which have usefual function for unicode.

but I think, if you could use D2009 for full support unicode, your work will be much easier and faster, because you will not use a slow widestring data type, and you will find most of third party offered a Unicode version or working now on it.


feedback

Use the type widestring and encoding functions to/from utf8 (UTF8Encode/UTF8Decode)

DON'T USE STRING TYPE and don't use Ansi-functions - if you are doing that, you are losing information's.


If OP is indeed aware of limitations of such Unicode support, and all UTF-8 encoded strings in question are convertible to and from current system charset without any loss of information, then this answer is wrong. – mghie Dec 20 '08 at 15:35
feedback

Converting a pre-Delphi 2009 application to unicode is difficult but doable. I'd split it up into 3 tasks.

  1. First, make sure your database handles unicode strings. Preferably UTF-16 support. Make sure all your database code handles widestrings correctly, and the drivers you are using handle this ok too.
  2. Convert all your business logic from using strings to using WideStrings. It's very easy to miss some, and you won't get any errors, as the compiler will implicitly convert widestring to string if you forget any methods. Also make sure you change all standard string functions to WideString equivalents. This process needs to extend to any thrid part components you may use.
  3. Final part is to change the standard and third party visual components you may have to WideString equivalents. This needs to be done where ever you'll be displaying strings which may contain unicode characters.

On top of all that make sure your testing is thorough, and uses unicode characters which actually use the high byte. If you just test using the latin character set, you will miss bugs.


Why the need for UTF-16 support in the database? UTF-8 would be a much better fit for an Ansi Delphi program, which would most probably be using system conversion functions for Ansi <-> UTF-8. – mghie Dec 20 '08 at 16:00
Use WideString (non reference counted, COM heavy) for UTF-16 Use TUtf8String /string (reference counted, light) for UTF-8, make sure you do something like "type TUtf8String = type string;" to make the TUtf8String distinct but compatible with string. – Jeroen Wiert Pluimers Oct 15 '09 at 12:23
feedback

If all you need to do is indeed convert your program-internal strings from the system encoding to UTF-8 and back, then use the library functions that Uwe Raabe mentioned. If you are still on Delphi 4 or 5 (which do not have those functions), you could use the functions that are in GNU gettext for Delphi.

And don't let all the answers about going completely WideString scare you - using UTF-8 as the encoding for data exchange (this is how I understand your question) should be possible in a normal Ansi Delphi program without big problems, as long as you are dealing with data that is 100% representable in your Windows encoding.

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多