Am 15.05.2011 01:49, schrieb Alex Ionescu:
Look, I don't understand why this is so hard for the project to understand:

Certain Windows functions are BROKEN. Certain Windows algorithms make NO SENSE. Other things are BADLY OPTIMIZED. Other things make WRONG ASSUMPTIONS.

But these set of behaviors and design failures are what billions of customers and pieces of software DEPEND ON. Microsoft cannot change the STACK LAYOUT of IE due to fears that BHOs which look up the file handle up the stack will break. They have SHIMS that LIE about the stack order to make those BHOs work, even as the stack changes.

Software depends on behaviour. If 2 functions behave the same, then its safe to replace one with another. This is a very basic concept that ignores assembly code sequences etc. But if we couldn't rely on this concept the whole project is doomed to fail anyway, unless we copy most of MS code on assembly level, since we have no chance of evaluating where we can safely differ from MS code and where we can't.



Microsoft, on this function, writes,

" Remarks

This function is not atomic because it is implemented as two separate locked instructions. An atomic 64-bit read that occurs on another thread during the execution of this intrinsic could result in an inconsistent value being read.

"

I don't know where you have that from, but current MSDN doesn't show that remark and google gives no results either. So you can safely assume that the majority of people will not know about it. So all their code might depend on is that the function works correctly as descibed, not the opposite.



Can you claim, with absolute CERTAINTY that NOBODY in the WORLD depends on this behavior? That it's two separate intrusctions, and that an inconsistent value can be read? How do you know certain VMs don't look for this code sequence, how do you know security/emulation software doesn't abuse/utilise this? ETC ETC ETC. It's such a simple/short sequence that it's easy to recognize. 

As a simple example, HexRays recognizes the particular ASM sequence as InterlockedAddLargeStatistic. Now, in ReactOS, it won't.
Extending this argument, we couldn't succeed, unless we would copy almost every piece of code on assembly level. Implementing ONE function identically doesn't solve this problem. How can we know that certain kind of software doesn't depend on MSVC being used as a compiler? Or that certain instructions appear on the beginning of a function? What about a particular call chain being detected? What about symbols from the MS server? HexRays probably recognizes a lot of things. Because that's its job. But find me a driver that won't run, because this function is not broken.


What's worse, is that recent Windows kernels define ExInterlocked.... to the compiler intrinsic _Interlocked. Do you suggest to also implement the intrinsic differently, which is in-lined? So now you will inline those 50 lines of (perfect, yes) code in every caller? These functions are called at ISR level usually -- great idea!

Or maybe you keep the (broken) MS implementation in the intrinsic, but use this one for the export? Great, now you introduce two sets of behaviors, and add even more non-determinism.
No, it doesn't. Except the assembly sequence everything is fine. One intrinsic (broken) function and one correctly implemented function being executed at the same time on the same memory would work very well together. The only non-deterministic behaviour comes from the broken implementation (running at the same time with another broken implementation).


Do you really believe Microsoft is stupid and didn't realize this? That they didn't change things around just because they forgot? No, it's 20 years too late to change anything, and if you start doing this in ReactOS, you are certain to fail.

When the behaviour is "known" to be broken and must be that way, then why didn't the comments in the function say that? You obviously like to document every single assembly opcode to describe what it does, while every developer who can read assembly knows that anyway, but you forget to document the only and most important information, which is that this function behaves in this broken way on Windows. Maybe that was at a time where you weren't really capable of understanding such issues? When you "wrote" the code, did you even notice the problem?


Please leave all personal attacks and politics aside, tell me where I'm technically wrong. You cannot prove this better implementation won't break things. You have the compiler intrinsic to worry about. I named at least one piece of software that depends on it. I named critical speed/code size issues with this function. What more is needed to prove the point?
Anything that provides substance maybe?

Anyway, I'll leave it to Aleksey to decide if the function needs to be identical or not. I'm not going to waste any more time discussing this completely retarded and irrelevant issue.

Thanks,
Timo