have an optimized implementation of memset() somewhere
else. One that can
be inlined, and checks the size and branches to the optimal
implementation
Yep, that's a good way to minimize where asm is used (if asm at all), making
general purpose fast functions available to any other function.
But don't call it memset (it fills byte values only), but something like
MemFill with a size param or MemFill32, MemFill64, MemFill16, MemFill8.
Good post, Michael.
Jose Catena
DIGIWAVES S.L.