Ash schrieb:
It doesnt make much sense to put the optimized ASM in there, neither is much hope of GCC having a good day and doing a lot of optimisation. So far the best option would be the macro with a lookup table (only one global kernel table tho).
What about using inline assembler in a macro - only as a platform specific optimization of course?
result bsr inlined 46ffffe9 it took 1751088 21%
Using GCC this should be much faster when using
#define get_bits(value) \ ({ \ int bits = -1; \ __asm("bsr %1, %0\n" \ : "+r" (bits) \ : "rm" (value)); \ bits; \ })
This macro returns -1 when no bits were set. I tested it and it works as expected. When the -1 as "error" isn't suitable, you might want to change it to 0 ... line 3 of the macro.
result macro 46ffffe9 it took 653692 7%
The table approach doesn't seem to be bad either <g>.
Regards, Mark