Rick Parrish wrote:
Couldn't you just take the binary log and round down? Okay ... just kidding.
Seriously, I think Mark had the right idea ... if you use a smaller table. This is similar to some ECC code I use for counting the number of 1's in a word. Removing the while() loop is left "as an exercise for the reader" ...
static const unsigned char list[16] = {0, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4};
I have experimented with both 16 and 256 entry tables. 16 = somewhat slower than no tables, 256 = somewhat faster than no tables, but both of those results are based on the code and tables getting cached. I don't anticipate our particular target benefiting from cache coherency... it runs very frequently, but not in a tight loop.