- A builtin / intrinsic != inline asm
I never said that. I used "instead". I apologize for not being clear enough.
but how would you want to optimize "rep
stosd" anyway?
No way. That's what I said, possibly with the exception of using a 64 bit
equivalent if we could assume that the CPU is 64 bit capable.
But Alex knows better, he's is calling me an ignorant. He says that
L1: Mov [edi], eax
Add edi, 4
Dec ecx
Jnz L1
Is faster than
rep stosd
Both things do exactly the same thing, the later much smaller AND FASTER in
any CPU from the 386 to the i7.
And he shows an irrelevant portion of code to prove nothing regarding what I
said, BTW we don't know what his compiler generated for the loop.
In other cases he changes the meaning of what I wrote, corrects something I
didn't say at all, or make unbased assumptions.
I'm not going to answer him, LOL! This would be an endless loop. Anyway I
always agreed with him in that asm is not helpful in this and most cases.
This discussion is a waste of time.
I thought from previous posts that he had better knowledge, and perhaps he
has, but certainly does not know much of assembly and CPU architectures, yet
he pretends and doesn't like to be corrected... bad for him.
none of the compilers I tested was able to generate a
rep stosd from
either a loop or memset
LOL, are we really in 2009? Try the C source I posted, it should be compiled
as rep stosd. MSVC and Intel certainly do regardless of the target CPU, and
not precisely since recent versions. Let me know if yours doesn't, I won't
like a compiler that doesn't do such a basic and evident optimization.
Most often I know pretty well what a compiler will generate without looking
at the generated asm, the way C code is written matters in some cases.
As for memset, MSVC inline memset will generate rep stosd and possibly a
stosw and/or stosb if the byte count is not a multiple of the max size or
non constant, what's ok. The library version also uses the same, with the
call overhead. Anyway memset is not suitable here, it is for 8 bit and
wmemset for 16 bit values, while we want to store 32 bit values.
Jose Catena
DIGIWAVES S.L.