greatlrd@svn.reactos.org wrote:
Author: greatlrd Date: Thu Aug 31 01:17:53 2006 New Revision: 23826
URL: http://svn.reactos.org/svn/reactos?rev=23826&view=rev Log: Wrote RtlUshotByteSwap RtlUlongByteSwap and RtlUlonglongByteSwap to asm code. but we need a C api for header to linking it right. Put the asm version to i386
sure there must be away to avoid this double-function-call overhead?
+.globl _UlongByteSwap
+.intel_syntax noprefix
+/* FUNCTIONS ***************************************************************/
+_UlongByteSwap:
push ebp // save basemov ebp,esp // move stack to basemov eax,[ebp+8] // load the ULONGbswap eax // swap the ULONGpop ebp // restore the baseret
this should work:
_UlongByteSwap: mov eax,[esp+8] // load the ULONG bswap eax // swap the ULONG ret
+.globl _UlonglongByteSwap
+.intel_syntax noprefix
+/* FUNCTIONS ***************************************************************/
+_UlonglongByteSwap:
push ebp // save basemov ebp,esp // move stack to basemov edx,[ebp+8] // load the higher part of ULONGLONGmov eax,[ebp+12] // load the lower part of ULONGLONGbswap edx // swap the higher partbswap eax // swap the lower partpop ebp // restore the baseret
_UlonglongByteSwap: mov edx,[esp+8] // load the higher part of ULONGLONG mov eax,[esp+12] // load the lower part of ULONGLONG bswap edx // swap the higher part bswap eax // swap the lower part ret
+_UshortByteSwap:
push ebp // save basemov ebp,esp // move stack to basemov eax,[ebp+8] // load the USHORTbswap eax // swap the USHORT, xchg is slow so we use bswap with rolrol eax,16 // make it USHORTpop ebp // restore the baseret
_UshortByteSwap: mov eax,[esp+8] // load the USHORT bswap eax // swap the USHORT, xchg is slow so we use bswap with rol rol eax,16 // make it USHORT ret
or to save a byte...
_UshortByteSwap: mov ebx,[esp+8] // load the USHORT mov al, bh mov ah, bl ret
--- Reuel ben Yisrael reuel@ev1.net wrote:
greatlrd@svn.reactos.org wrote:
Author: greatlrd Date: Thu Aug 31 01:17:53 2006 New Revision: 23826
URL: http://svn.reactos.org/svn/reactos?rev=23826&view=rev Log: Wrote RtlUshotByteSwap RtlUlongByteSwap and RtlUlonglongByteSwap to asm
code.
but we need a C api for header to linking it right. Put the asm version
to i386
sure there must be away to avoid this double-function-call overhead?
The problem was probably the different calling conventions. If you changed the asm versions to be FASTCALL then you wouldn't need the wrappers.
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Hi Royce3 alex solv it and he also wrote some other rtl* to asm with some commit.
----- Original Message ----- From: "Reuel ben Yisrael" reuel@ev1.net To: ros-dev@reactos.org Sent: Wednesday, August 30, 2006 11:29 PM Subject: Re: [ros-dev] [ros-diffs] [greatlrd] 23826: Wrote RtlUshotByteSwap RtlUlongByteSwap and RtlUlonglongByteSwap to asm code. but we need a C api for header to linking it right. Put the asm version to i386
greatlrd@svn.reactos.org wrote:
Author: greatlrd Date: Thu Aug 31 01:17:53 2006 New Revision: 23826
URL: http://svn.reactos.org/svn/reactos?rev=23826&view=rev Log: Wrote RtlUshotByteSwap RtlUlongByteSwap and RtlUlonglongByteSwap to asm
code.
but we need a C api for header to linking it right. Put the asm
version to i386
sure there must be away to avoid this double-function-call overhead?
+.globl _UlongByteSwap
+.intel_syntax noprefix
+/* FUNCTIONS
***************************************************************/
+_UlongByteSwap:
push ebp // save basemov ebp,esp // move stack to basemov eax,[ebp+8] // load the ULONGbswap eax // swap the ULONGpop ebp // restore the baseretthis should work:
_UlongByteSwap: mov eax,[esp+8] // load the ULONG bswap eax // swap the ULONG ret
+.globl _UlonglongByteSwap
+.intel_syntax noprefix
+/* FUNCTIONS
***************************************************************/
+_UlonglongByteSwap:
push ebp // save basemov ebp,esp // move stack to basemov edx,[ebp+8] // load the higher part of
ULONGLONG
mov eax,[ebp+12] // load the lower part of
ULONGLONG
bswap edx // swap the higher partbswap eax // swap the lower partpop ebp // restore the baseret_UlonglongByteSwap: mov edx,[esp+8] // load the higher part of
ULONGLONG
mov eax,[esp+12] // load the lower part of
ULONGLONG
bswap edx // swap the higher part bswap eax // swap the lower part ret+_UshortByteSwap:
push ebp // save basemov ebp,esp // move stack to basemov eax,[ebp+8] // load the USHORTbswap eax // swap the USHORT, xchg is
slow so we use bswap with rol
rol eax,16 // make it USHORTpop ebp // restore the baseret_UshortByteSwap: mov eax,[esp+8] // load the USHORT bswap eax // swap the USHORT, xchg is slow
so we use bswap with rol
rol eax,16 // make it USHORT retor to save a byte...
_UshortByteSwap: mov ebx,[esp+8] // load the USHORT mov al, bh mov ah, bl ret
Ros-dev mailing list Ros-dev@reactos.org http://www.reactos.org/mailman/listinfo/ros-dev
The new implementations for RtlFillMemory, RtlZeroMemory, and RtlMoveMemory have a bug where they don't write the last 1-3 bytes if the length isn't a multiple of the word size. They have an "or ecx, ecx" which needs to be "or ecx, edx". Also the new RtlUlonglongByteSwap is sub-optimal, it swaps eax and edx when they could have just been read in reverse order like the other version did.
--- Magnus Olsen magnus@itkonsult-olsen.com wrote:
Hi Royce3 alex solv it and he also wrote some other rtl* to asm with some commit.
----- Original Message ----- From: "Reuel ben Yisrael" reuel@ev1.net To: ros-dev@reactos.org Sent: Wednesday, August 30, 2006 11:29 PM Subject: Re: [ros-dev] [ros-diffs] [greatlrd] 23826: Wrote RtlUshotByteSwap RtlUlongByteSwap and RtlUlonglongByteSwap to asm code. but we need a C api for header to linking it right. Put the asm version to i386
greatlrd@svn.reactos.org wrote:
Author: greatlrd Date: Thu Aug 31 01:17:53 2006 New Revision: 23826
URL: http://svn.reactos.org/svn/reactos?rev=23826&view=rev Log: Wrote RtlUshotByteSwap RtlUlongByteSwap and RtlUlonglongByteSwap to
asm code.
but we need a C api for header to linking it right. Put the asm
version to i386
sure there must be away to avoid this double-function-call overhead?
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
mrnobo1024 wrote:
The new implementations for RtlFillMemory, RtlZeroMemory, and RtlMoveMemory have a bug where they don't write the last 1-3 bytes if the length isn't a multiple of the word size. They have an "or ecx, ecx" which needs to be "or ecx, edx".
Noted, thanks! Surprised it still booted!
Also the new RtlUlonglongByteSwap is sub-optimal, it swaps eax and edx when they could have just been read in reverse order like the other version did.
Not true. BSWAP will pair on the pipe and be executed much faster.
--- Alex Ionescu ionucu@videotron.ca wrote:
mrnobo1024 wrote:
The new implementations for RtlFillMemory, RtlZeroMemory, and
RtlMoveMemory
have a bug where they don't write the last 1-3 bytes if the length isn't
a
multiple of the word size. They have an "or ecx, ecx" which needs to be
"or
ecx, edx".
Noted, thanks! Surprised it still booted!
Looking at them again I noticed a couple other problems. At the end of RtlCompareMemory it subtracts the length (esp+20) instead of the buffer start (esp+12), and RtlCompareMemoryUlong subtracts from esi when it's using edi.
I also noticed some of the functions clear the direction flag, and some don't. This would be either a bug or just unnecessary code. I don't know if the Win32 calling conventions require the flag to be clear before calling a function or not.
Also the new RtlUlonglongByteSwap is sub-optimal, it swaps eax and edx when they could have just been read in reverse order like the other version did.
Not true. BSWAP will pair on the pipe and be executed much faster.
I meant that instead of mov edx,[esp+8] mov eax,[esp+4] it could be mov edx,[esp+4] mov eax,[esp+8]. I don't think that would affect instruction pairing.
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
mrnobo1024 wrote:
--- Alex Ionescu ionucu@videotron.ca wrote:
I meant that instead of mov edx,[esp+8] mov eax,[esp+4] it could be mov edx,[esp+4] mov eax,[esp+8]. I don't think that would affect instruction pairing.
That wouldn't swap them... you would just be inverting the bytes. Each ULONG must be swapped here.
--- Alex Ionescu ionucu@videotron.ca wrote:
mrnobo1024 wrote:
--- Alex Ionescu ionucu@videotron.ca wrote:
I meant that instead of mov edx,[esp+8] mov eax,[esp+4] it could be mov edx,[esp+4] mov eax,[esp+8]. I don't think that would affect instruction
pairing.
That wouldn't swap them... you would just be inverting the bytes. Each ULONG must be swapped here.
Which can be done without an explicit swap. Just change the way they're read from the stack. As an example, suppose you're calling it with 0x0011223344556677:
edx eax ecx mov edx, [esp+8] 00112233 mov eax, [esp+4] 00112233 44556677 bswap edx 33221100 44556677 bswap eax 33221100 77665544 mov ecx, eax 33221100 77665544 77665544 mov eax, edx 33221100 33221100 77665544 mov edx, ecx 77665544 33221100 77665544
mov edx, [esp+4] 44556677 mov eax, [esp+8] 44556677 00112233 bswap edx 77665544 00112233 bswap eax 77665544 33221100
They both give the correct answer in edx:eax, but the second would swap the ulongs implicitly by reading the high ulong into eax and the low one into edx. Much like how RtlUshortByteSwap works, but with ulongs instead of bytes.
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
If it should be optimized, inlined asm should be used as following:
Reuel ben Yisrael schrieb:
+.globl _UlongByteSwap
+.intel_syntax noprefix
+/* FUNCTIONS ***************************************************************/
+_UlongByteSwap:
push ebp // save basemov ebp,esp // move stack to basemov eax,[ebp+8] // load the ULONGbswap eax // swap the ULONGpop ebp // restore the baseretthis should work:
_UlongByteSwap: mov eax,[esp+8] // load the ULONG bswap eax // swap the ULONG ret
static force_inline ULONG UlongByteSwap(ULONG x) { asm volatile( "bswap %0;" : "=r" (x) : "0" (x) ); return x; }
+.globl _UlonglongByteSwap
+.intel_syntax noprefix
+/* FUNCTIONS ***************************************************************/
+_UlonglongByteSwap:
push ebp // save basemov ebp,esp // move stack to basemov edx,[ebp+8] // load the higher part of ULONGLONGmov eax,[ebp+12] // load the lower part of ULONGLONGbswap edx // swap the higher partbswap eax // swap the lower partpop ebp // restore the baseret_UlonglongByteSwap: mov edx,[esp+8] // load the higher part of ULONGLONG mov eax,[esp+12] // load the lower part of ULONGLONG bswap edx // swap the higher part bswap eax // swap the lower part ret
static force_inline ULONGLONG UlonglongByteSwap(ULONGLONG x) { ULONG h,l;
asm volatile ("": "=d" (l), "=a" (h): "A" (x));
asm volatile ( "bswap %%eax;" "bswap %%edx;" : "=A" (x) : "d" (l), "a" (h) ); return x; }
+_UshortByteSwap:
push ebp // save basemov ebp,esp // move stack to basemov eax,[ebp+8] // load the USHORTbswap eax // swap the USHORT, xchg is slow so we use bswap with rolrol eax,16 // make it USHORTpop ebp // restore the baseret_UshortByteSwap: mov eax,[esp+8] // load the USHORT bswap eax // swap the USHORT, xchg is slow so we use bswap with rol rol eax,16 // make it USHORT ret
or to save a byte...
_UshortByteSwap: mov ebx,[esp+8] // load the USHORT mov al, bh mov ah, bl ret
static force_inline USHORT UshortByteSwap(USHORT x) { asm volatile( "rolw $8, %0;" : "=r" (x) : "0" (x) ); return x; }