Jose Catena schrieb:
  A correction of my previous msg:
 In asm I would write the loop as:
        mov eax, iColor
        mov ebx, pulLine
        mov edx, cy
 L1:
        mov edi, ebx
        mov ecx, _cx
        rep stosd
        add ebx, lDelta
        dec edx
        jnz l1
 It is not possible to optimize the loop further AFAIK, and this only saves a
 cmp and jnz in the outer loop, a tiny gain.
    
That looks almost like the version I wrote, except I didn't use memory
access inside the loop, but registers, which should saves some cycles.