Re: [ros-dev] [ros-diffs] [tkreuzer] 42353: asm version of DIB_32BPP_ColorFill: - Add frame pointer - Get rid of algin_draw, 32bpp surfaces must be DWORD aligned - Optimize the loop - Add comments

5 Aug 2009


      ...
On most processors, less than 8 iterations will be faster with a move than
with a rep.
I'd say more like 4 (separate moves), and not feasible if the number of
iterations is variable like in our case. It would be possible a loop with
many moves inside, even better SSE stores, and after that a rep stosd for
the remainder, indeed faster for large cx counts. Does any compiler
currently generate that automatically? None of the ones I know, but can be
done to some extent writing it that way in C. Possible in asm? Of course.
DMA fill? No joy. GPU accelerated fill? Perhaps in the future.
I keep thinking that this is not important enough to justify asm, not even
to break the loop in two in C. At least not before ROS is complete and
stable and we want to optimize every bit. And by then we may be very well
thinking about GPU accelerated GDI too.
Jose Catena
DIGIWAVES S.L.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [ros-dev] [ros-diffs] [tkreuzer] 42353: asm version of DIB_32BPP_ColorFill: - Add frame pointer - Get rid of algin_draw, 32bpp surfaces must be DWORD aligned - Optimize the loop - Add comments