Ash wrote:
Hello,
I'd like to provide a small
VS.NET project file with 5 different tests.
- 46ffffe9 tells everything is alright with the function calculation
- times are measured with QueryPerformanceCounter
- loop run cnt: 0x2ffffff
result orig function 46ffffe9
it took 1491052
result orig function inlined 46ffffe9
it took 1035547
result second proposal inlined 46ffffe9
it took 1244434
result optimized asm 46ffffe9
it took 1338367
result debug asm 46ffffe9
it took 8774815
The second proposal is the original proposal but more shifts - still
slower tho.
Interesting is the inlined version generated by MSVC, shaving off
almost 1/3 of the overall time.
Also stared as "optimized asm" - no gurantee on register safety tho ;)
For portability and performance sake it should be considered to create
a compiler macro.
This function is terribly small, any optimisations inside are
outweighted by the calling overhead in this case.
The most impressive one is the original function inlined, althought
the ASM would only work on x86.
Please do not think about using 64k tables, thats what, 1/2 of a
Sempron L2 cache?
It would really really trash performance.
Hi Ash,
Thanks a lot for your tests. I don't have much time tonight, but if
you'd like/can, can you add two more tests?
One using "bsr", the intel opcode. It does all the work for you and
returns the index.
i think it's as simple as "bsr ecx, eax" where "ecx" is the mask
and eax
is the returned index. Or it might be backwards.
The second test is using a 256-byte log2 table:
const char LogTable256[] =
{
0, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7
};
and a lookup code like this:
Addon = 16;
if ((IntMask = Mask >> 16)) {
Addon = 0;
IntMask = Mask;
}
if (IntMask && 0xFFFFFF00) {
Addon += 8;
}
HighBit = LogTable256[(Mask >> Addon)] + Addon;