Creating UTF-8 String with Team Developer 3.1

Recently we had the task to create UTF-8 strings with Team Developer 3.1. Searching the Windows API we got two promising functions:

MultiByteToWideChar(..) and WideCharToMultiByte(..). The first one converts any character encoding (ANSI, UTF-8 and a lot of Codepages) to UTF-16 (WideChar). The second one encodes UTF-16 to any character encoding.

The way is therefore as follows:
GuptaString — MultiByteToWideChar(..) –> UTF-16 — WideCharToMultiByte(..) –> UTF-8.
I think that the names of the two functions are a bit confusing: „MultiByte“ means any encoding – including SingleByte encodings like ANSI. WideChar means the Windows internal UTF-16LE encoding.
The only question now is how to use these functions in Gupta:


with parameters

_In_      UINT   CodePage = Codepage of the source string – in our case standard Windows – so we need to use the constant CP_ACP = 0x00
_In_      DWORD  dwFlags = Controls subtleties of the conversion of characters. E.g. the character „Ä“ can be represented in two different ways in Unicode. As a single Ä (U+00C4) with dwFlags = MB_PRECOMPOSED (0x01)  or as a composed character consisting of  A and ̈ (U+0041 und U+0308) with dwFlags = MB_COMPOSITE (0x02)
_In_      LPCSTR lpMultiByteStr = the source string
_In_      int    cbMultiByte = Bufferlength in bytes of the source string or -1 if it is null terminated – which is the case with Gupta strings.
_Out_opt_ LPWSTR lpWideCharStr = buffer for the target string
_In_      int    cchWideChar = Length of the target buffer in characters(!) or 0 – then the function returns the required bufferlength.

First you have to call this function with cchWideChar = 0 and get the required length of the buffer in characters(!).
But what does that mean? How many bytes is it?

The answer is simple a WideChar characters is always 2 bytes long so you have to multiply the return value by 2.

But is that really true? Because Unicode contains more characters than can be encoded with 2 bytes. A standard example is the treble clef musical_symbol_g_clef
(U + 1D11E) which UTF-16 encoding is 0xD834 0xDD1E – four bytes.
The answer is: Yes it’s true really. I have tested it. I encoded the treble clef in UTF-8 (0xF0 0x9D 0 x 84 0x9E) and requested the UTF-16 buffer length with MultiByteToWideChar(..) which returned „2“ Although it is in fact only one character.


with parameters

_In_      UINT    CodePage = Destination codepage.
_In_      DWORD   dwFlags = Flags that control the behaviour when a character cannot be converted because the target codepage does not contain it.
_In_      LPCWSTR lpWideCharStr = Sourcesting as UTF-16.
_In_      int     cchWideChar = Length of the sourcestring in widechar-characters or -1 if it is null terminated.
_Out_opt_ LPSTR   lpMultiByteStr = Buffer for the target string.
_In_      int     cbMultiByte = Length of the target buffer in bytes.
_In_opt_  LPCSTR  lpDefaultChar = Default character which is used when a UTF-16 character is not contained in the target codepage.
_Out_opt_ LPBOOL  lpUsedDefaultChar = This is actually a receive variable that indicates whether a default has been used in the conversion. For the conversion to UTF-8 the variable must be passed with null value. Which in turn is unfortunately disallowed by Gupta: If you set a receive parameter for an external function to null (e.g. Set bUsedDefaultChar = NUMBER_Null and pass it to an external function you get a run-time error). The only workaround is to define this parameter in the declaration of the external function as a BOOL and pass a value of 0.


With this knowledge we can write our function:

Function: StringToUtf8  ! __exported
    Description: Converts a Gupta string to UTF-8
        String: sAnsi
        Return Utf16ToUtf8( StringToUtf16( sAnsi ) )

Function: StringToUtf16  ! __exported
    Description: Converts a Gupta string (ANSI) to UTF-16
        String: sAnsi
    Local variables
        Number: nBufferUtf16Len
        String: sUtf16
        If ( NOT sAnsi )
            Return ''
        Call SalStrSetBufferLength( sUtf16, 1 ) ! Buffer must be 1 or bigger - otherwise we get a Gupta error when calling MultiByteToWideChar(..)
        Set nBufferUtf16Len = MultiByteToWideChar( CP_ACP, MB_PRECOMPOSED, sAnsi, -1, sUtf16, 0 )
        Call SalStrSetBufferLength( sUtf16, nBufferUtf16Len * 2 ) !  Buffersize in "WideChars" -  must be multiplied by 2 to get buffersize in bytes
        Call MultiByteToWideChar( CP_ACP, MB_PRECOMPOSED, sAnsi, -1, sUtf16, nBufferUtf16Len )  !hier wieder Buffer in Characters
        Return sUtf16

Function: Utf16ToUtf8  ! __exported
    Description: Converts a UTF-16-String to UTF-8
        String: sUtf16
    Local variables
        Number: nBufferUtf8Len
        String: sUtf8
        If ( NOT sUtf16 )
            Return ''
        Call SalStrSetBufferLength( sUtf8, 1 ) ! Buffer must be 1 or bigger - otherwise we get a Gupta error when calling WideCharToMultiByte(..)
        Set nBufferUtf8Len = WideCharToMultiByte( CP_ACP, 0, sUtf16, -1, sUtf8, 0, STRING_Null, 0 )
        Call SalStrSetBufferLength( sUtf8, nBufferUtf8Len )
        Call WideCharToMultiByte( CP_ACP, 0, sUtf16, -1, sUtf8, nBufferUtf8Len, STRING_Null, 0 )
        Return sUtf8

Sourcecode is here for download: UtfStringConverion.apt

Happy coding!


Über thomasuttendorfer
Ich bin Entwicklungsleiter bei der Softwarefirma [ frevel & fey ] in München. Wir entwickeln Business-Software für Verlage und verwenden dafür den Gupta Team-Developer sowie Visual Studio.

One Response to Creating UTF-8 String with Team Developer 3.1

  1. Magic Thomas ! Thanks for sharing . Keep up the Gupta expertness !

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

Du kommentierst mit Deinem Abmelden /  Ändern )

Google Foto

Du kommentierst mit Deinem Google-Konto. Abmelden /  Ändern )


Du kommentierst mit Deinem Twitter-Konto. Abmelden /  Ändern )


Du kommentierst mit Deinem Facebook-Konto. Abmelden /  Ändern )

Verbinde mit %s

%d Bloggern gefällt das: