Rick Parrish wrote:
Couldn't you just take the binary log and round
down? Okay ... just
kidding.
Seriously, I think Mark had the right idea ... if you use a smaller
table. This is similar to some ECC code I use for counting the number
of 1's in a word. Removing the while() loop is left "as an exercise
for the reader" ...
static const unsigned char list[16] = {0, 1, 2, 2, 3, 3, 3, 3, 4, 4,
4, 4, 4, 4, 4, 4};
I have experimented with both 16 and 256 entry tables. 16 = somewhat
slower than no tables, 256 = somewhat faster than no tables, but both of
those results are based on the code and tables getting cached. I don't
anticipate our particular target benefiting from cache coherency... it
runs very frequently, but not in a tight loop.