That looks almost like the version I wrote, except I didn't use memory
access inside the loop, but registers, which should saves some cycles.
Yes, the same. That register vs memory does not make a difference, takes the same time if it is in the L1 cache (both just 1 cycle), and it will always be there.
Jose Catena DIGIWAVES S.L.