I've committed the first version of a DIB Blt code generator (the generator
is in tools/gendib, the file it generates ends up in
subsys/win32k/dib/dib16gen.c). For now, it only generates code for 16bpp
destination surfaces, other depths will follow.
I've decided on a compromise, only code for named rops will be generated.
This keeps the size of the code reasonable (win32k.sys grew from 952k to
1128k, an increase of 176k or 20%), while still speeding up the most used
rop codes considerably. For example, the speed of PATINVERT increased by a
factor 7. Optimized code was already present for SRCCOPY and PATCOPY, so the
generated code isn't faster for these cases.
We totally smoke the Windows XP DIB engine Blt routines now for 16bpp.
Gé van Geldorp.
-----Original Message-----
From: ros-dev-bounces(a)reactos.com
[mailto:ros-dev-bounces@reactos.com] On Behalf Of Ge van Geldorp
Sent: Friday, June 10, 2005 12:03
To: 'ReactOS Development List'
Subject: [ros-dev] DIB code generator
One of the things which has bothered me a bit is the code
duplication we have in our DIB engine (subsys/win32k/dib).
Most of the BitBlt routines in there are very similar. With
the recent interest in optimizations a bunch of new (almost
identical) routines were added. Don't get me wrong, I'm not
saying that adding those optimizations was a bad idea, I'm
just pointing out that we have a lot of code duplication.
There are 256 possible ROP codes, we support 1bpp, 4bpp,
8bpp, 16bpp, 24bpp and 32bpp, so in theory there could be
1536 routines with basically the same structure. I've been
playing around with the idea to write a code generator which
would generate the source code for those routines. That would
cut down on the duplicated source code and associated
maintenance problems (you only need to change the code
generator) while still allowing optimized code for each
individual ROP code.
Just to give you an idea what such a code generator would
look like, I've attached my first attempt. Please note that
it doesn't really try to optimize the generated code yet,
it's just to give an impression. The code generated (16bpp
only atm) is rather large, you can get it from
ftp://ftp.geldorp.nl/pub/ReactOS/dib16gen.c if you like (or
compile the code generator ("gcc -o gendib gendib.c") and run it).
A possible problem is that the generated code is quite large.
When using the generated 16bpp code, size of win32k.sys
increases by about 350kb.
Extrapolating this for all bpps, it would mean that
win32k.sys would triple in size.
So, I'm wondering what you guys are thinking. Should we
basically trade memory for speed? Problem is that I can't
quantify the speed increase at the moment.
Gé van Geldorp.