Just because Timo asked so nicely:
1) It will only get _bigger_, never smaller. ICU has a lot of stuff but
by no means all data we need. Notably absent is any kind of support for
complex input (i.e. IMEs)
2) It will never, ever, ever go away. Don't count on it. I plan to make
it impossible to ever remove our dependency from ICU once it's put in.
If it's good enough for embedded devices (it is), it will have to be
good enough for us, too. If not, we will make it
3) It's mostly data, not code. Data that cannot be found anywhere else,
at least not in the form ICU puts it. Data that the way overstretched
ReactOS team cannot afford to maintain. Yes, we need this. Nobody else
is going to do it. Postponing the issue will gain us nothing but grief
4) Not to mention the algorithms nobody is going to be clever enough to
implement. Algorithms that are part of the Unicode specification. And
the "useless" code (samples, extra tools) is a rather trivial part of
the whole
5) To us, ICU represents a low-maintenance route to implementing almost
all of the Win32 NLS support, forever, maintained and kept current by a
reliable third party (IBM). Most, if not all NLS APIs will consist of
thin wrappers around ICU functions. The module I committed is but a
sample of how straightforward it is to implement Unicode and I18n APIs
with ICU - eventually, a large part of kernel32.dll is going to be a
statically linked ICU (what's currently compiled as "icu4ros"). Other
DLLs will get the "ICU treatment" too, but kernel32.dll will be the
major beneficiary
6) ICU is extremely clean C++ code without any "surprises": no RTTI, nor
exceptions, nor overuse of templates - it doesn't even include PSDK
headers! I expect it will always compile very fast. Also, it implies no
threading model, relying on a simple double-checked lock algorithm for
initialization, and keeping all global data read-only, so it can be used
anywhere safely
7) The only open question about it is how to handle the datafiles:
single large file (all of 11 MB, not even _that_ big for a file that
contains _all_ unicode character properties and names, _all_ character
sets/codepages, _all_ locales, calendars, timezones, etc.), or several
ones that can be installed separatedly (but our installer is nowhere
near up to the task yet). Also, until we have a better rbuild, I'll have
to build the data file(s) "offline" and commit the binary output to the
repository. If repository size is such an issue, tell me whether you'd
prefer 45 MB of text or 11 MB of binary data - to me, it doesn't matter much
Considered that I won't officially add anything from "nls" to the build
until I have fully working code that passes regression tests, and that
this could take time, would you rather lend me a hand in accelerating
the process, or would you rather I do my development in a branch?