You (the code) is spinning non-atomically on a spinlock at passive then raising the irql to dispatch to "acquire" it. This makes no sense whatsoever. Another code can race you to this place.
Why not just use a normal spinlock?
I can guess why such a construct is used, but better let Aleksey explain it.
Logging operations take a short amount of time and are spurious, in this case, a worker thread would work better, it seems.
Not sure what is spurious.
Tried a worker thread, but with intensive logging, the system become less responsive. Task Manager shows high CPU usage (> 50%) for System process. A dedicated thread way was more smooth.
Uhh, MSDN says there is, and it should be an intrnsic in intrin_x86.h, please check again.
I also thought that it is in intrin_x86.h, but no such luck.
MSDN: "This function is supported only on Itanium Processor Family (IPF)."
You're doing store math:
KdpFreeBytes -= num; KdpFreeBytes = KdpBufferSize;
on the variable without an interlock, so those operations will not be safe w.r.t to your interlock.
You're also doing load math:
num = KdpFreeBytes;
on the variable without a fence, so this operation will not be safe w.r.t to your interlock (only MSVC generates fences around volatile variables, and even then they're not sufficient on IA64/Alpha systems).
Yes, it is all not safe against SMP, but on UP I see only one (small) bug.