A Taiwanese user of ReactOS came into the IRC channel to report a bug. If you set ReactOS's code page to 936, the function RtlMultiByteToUnicodeSize will crash during startup.
I can't code a fix for it, but I can say how. The algorithm should work like this:
- If the code page is not DBCS, don't bother, and just set *UnicodeSize to MbSize * sizeof(WCHAR). This is already done. - Begin counting with a length of 0. - While MbSize is not zero: -- Grab a byte and decrement MbSize. -- Determine whether it is a DBCS lead byte for the code page. -- If it is a lead byte: --- If MbSize is now zero, increment length, set *UnicodeSize to your length * sizeof(WCHAR) and return STATUS_SUCCESS. The broken half-character is counted. --- Decrement MbSize and increment your length. Two DBCS bytes just became a single Unicode character. We ignore the value of the second byte. -- If it is not: --- Increment length. - Set *UnicodeSize to length * sizeof(WCHAR) and return STATUS_SUCCESS.
Is it possible for a DBCS character's mapping to be a UTF-16 surrogate? If so, the routine becomes more complicated.
I personally think ReactOS should support UTF-8 as a default code page, but I doubt that others agree. This function is one of the many that would have to change...
Melissa