While working on proper logoff/shutdown processing, I'm running into an interesting race condition inside kernel32. The shutdown command (apps/utils/shutdown) calls ExitWindowsEx() and then terminates, which causes kernel32's DllMain() function to be called with the DLL_PROCESS_DETACH reason. This will clean up kernel32 resources, like calling RtlDeleteCriticalSection(&ConsoleLock).
Now, before the process is completely shutdown, CSRSS will send it a CTRL_LOGOFF_EVENT. To deliver this event, CSRSS calls CreateRemoteThread(). So CreateRemoteThread() is called while the main thread is executing DLL_PROCESS_DETACH cleanup. During handling of CTRL_LOGOFF_EVENT, the new thread will try to enter the ConsoleLock critical section, which was already deleted by the main thread. Chaos results.
To solve this, first I was looking for a way to disable new thread creation when a process enters ExitProcess(), but that won't solve the problem. We can still have a thread A which calls ExitProcess(), have a context switch to existing thread B of the same process which does something that needs the ConsoleLock. Then I considered SuspendThread()ing all other threads of the process when ExitProcess() is called, but that could potentially lead to deadlocks if the suspended threads hold synchronization objects. So now I'm inclined to believe that the solution is to just not do the resource cleanup during DLL_PROCESS_DETACH handling. But that doesn't feel very clean either. Anyone have a better idea?
GvG
The DllMain() help in the PSDK has the following tidbit:
While it is acceptable to create synchronization objects in DllMain, you should not perform synchronization in DllMain (or a function called by DllMain) because all calls to DllMain are serialized. Waiting on synchronization objects in DllMain can cause a deadlock.
Does ReactOS serialize calls to DllMain? I would think if it was, CreateRemoteThread() would block waiting for DLL_THREAD_ATTACH until DLL_PROCESS_DETACH had finished.
(And then of course, there are other issues to be dealt with because the DLL is unloaded.)
ExitProcess() PSDK has the following tidbit which might also help:
The ExitProcess, ExitThread, CreateThread, CreateRemoteThread functions, and a process that is starting (as the result of a call by CreateProcess) are serialized between each other within a process. Only one of these events can happen in an address space at a time. This means the following restrictions hold:
During process startup and DLL initialization routines, new threads can be created, but they do not begin execution until DLL initialization is done for the process. Only one thread in a process can be in a DLL initialization or detach routine at a time. If any process is in its DLL initialization or detach routine, ExitProcess does not return.
Hope this helps.
Thanks,
Joseph
Ge van Geldorp wrote:
While working on proper logoff/shutdown processing, I'm running into an interesting race condition inside kernel32. The shutdown command (apps/utils/shutdown) calls ExitWindowsEx() and then terminates, which causes kernel32's DllMain() function to be called with the DLL_PROCESS_DETACH reason. This will clean up kernel32 resources, like calling RtlDeleteCriticalSection(&ConsoleLock).
Now, before the process is completely shutdown, CSRSS will send it a CTRL_LOGOFF_EVENT. To deliver this event, CSRSS calls CreateRemoteThread(). So CreateRemoteThread() is called while the main thread is executing DLL_PROCESS_DETACH cleanup. During handling of CTRL_LOGOFF_EVENT, the new thread will try to enter the ConsoleLock critical section, which was already deleted by the main thread. Chaos results.
To solve this, first I was looking for a way to disable new thread creation when a process enters ExitProcess(), but that won't solve the problem. We can still have a thread A which calls ExitProcess(), have a context switch to existing thread B of the same process which does something that needs the ConsoleLock. Then I considered SuspendThread()ing all other threads of the process when ExitProcess() is called, but that could potentially lead to deadlocks if the suspended threads hold synchronization objects. So now I'm inclined to believe that the solution is to just not do the resource cleanup during DLL_PROCESS_DETACH handling. But that doesn't feel very clean either. Anyone have a better idea?
GvG
_______________________________________________ Ros-dev mailing list Ros-dev@reactos.org http://www.reactos.org/mailman/listinfo/ros-dev
From: Joseph Galbraith
Does ReactOS serialize calls to DllMain? I would think if it was, CreateRemoteThread() would block waiting for DLL_THREAD_ATTACH until DLL_PROCESS_DETACH had finished.
No, ReactOS doesn't serialize the calls.
Hope this helps.
It does, thanks! I'm not sure I know exactly how to solve it now, but it gives me something to think about.
GvG
Ge van Geldorp wrote:
From: Joseph Galbraith
Does ReactOS serialize calls to DllMain? I would think if it was, CreateRemoteThread() would block waiting for DLL_THREAD_ATTACH until DLL_PROCESS_DETACH had finished.
No, ReactOS doesn't serialize the calls.
Hope this helps.
It does, thanks! I'm not sure I know exactly how to solve it now, but it gives me something to think about.
GvG
Hi Ge,
I have some loader fixes which handle serialization and fix similar race conditions. If you look on my blog, you can see the timeframe for when I plan to commit them... if you're not going to submit the shutdown code until then, then you can probably consider this issue fixed and work on other problems. Of course, if you prefer, you can write a quick hack if you need this working now :)
Best regards, Alex Ionescu
From: Alex Ionescu
I have some loader fixes which handle serialization and fix similar race conditions. If you look on my blog, you can see the timeframe for when I plan to commit them... if you're not going to submit the shutdown code until then, then you can probably consider this issue fixed and work on other problems. Of course, if you prefer, you can write a quick hack if you need this working now :)
I had promised TwoTailedFox to commit my shutdown work this week. The race condition only results in a usermode exception, terminating the thread. Since the process was going to exit anyway, it's not really a big deal and doesn't block my progress. The reason I brought it up was that race conditions like this are generally hard to reproduce, so when I do hit a reproducible case I tend to start fixing it before moving on. So, in this case I'll continue on the shutdown stuff and commit it this week, leaving the kernel32 issue to you.
GvG
Ge van Geldorp wrote:
From: Alex Ionescu
I have some loader fixes which handle serialization and fix similar race conditions. If you look on my blog, you can see the timeframe for when I plan to commit them... if you're not going to submit the shutdown code until then, then you can probably consider this issue fixed and work on other problems. Of course, if you prefer, you can write a quick hack if you need this working now :)
I had promised TwoTailedFox to commit my shutdown work this week. The race condition only results in a usermode exception, terminating the thread. Since the process was going to exit anyway, it's not really a big deal and doesn't block my progress. The reason I brought it up was that race conditions like this are generally hard to reproduce, so when I do hit a reproducible case I tend to start fixing it before moving on. So, in this case I'll continue on the shutdown stuff and commit it this week, leaving the kernel32 issue to you.
GvG
Hi,
Thanks,
I'm -really- busy this week (paid work) so if you have some extra time to write a test case (which spefically hits this race condition) I would really appreciate that... I can do it myself later, but it's just something that might take you only 10 minutes to work on since you've diagnosed the issue. It woudl be a good generic regression test too... just a thought..and only if you have a lot of free time for it.
Best regards, Alex Ionescu
Hi, Is this what you are writing about?
lib/setupapi/parser.c:1531) (00630c50,L"RegistrationPhase2",L"RegisterDlls"): r eturning 0 (lib/setupapi/parser.c:1737) context 00630c50/00630c50/2/0 index 1 returning L"O leControlDlls" (lib/setupapi/parser.c:1319) (00630c50,L"OleControlDlls") returning 15 (lib/setupapi/parser.c:1531) (./ntoskrnl/ke/exception.c:94) KiRaiseException (ntoskrnl/ke/i386/exp.c:1312) Unhandled UserMode exception, terminating thread
This is where the wiz stops and waits for you to tell it to reboot. But, it just set there w/o anyway to select to go back or next. The only way to exit this is to right click and click on close to reboot.
Thanks, James
From: James Tabor
Hi, Is this what you are writing about?
lib/setupapi/parser.c:1531) (00630c50,L"RegistrationPhase2",L"RegisterDlls"): returning 0 (lib/setupapi/parser.c:1737) context 00630c50/00630c50/2/0 index 1 returning L"OleControlDlls" (lib/setupapi/parser.c:1319) (00630c50,L"OleControlDlls") returning 15 (lib/setupapi/parser.c:1531) (./ntoskrnl/ke/exception.c:94) KiRaiseException (ntoskrnl/ke/i386/exp.c:1312) Unhandled UserMode exception, terminating thread
No, this is an unrelated problem. I haven't seen this one before though.
GvG
From: James Tabor lib/setupapi/parser.c:1531) (00630c50,L"RegistrationPhase2",L"RegisterDlls"): r eturning 0 (lib/setupapi/parser.c:1737) context 00630c50/00630c50/2/0 index 1 returning L"O leControlDlls" (lib/setupapi/parser.c:1319) (00630c50,L"OleControlDlls") returning 15 (lib/setupapi/parser.c:1531) (./ntoskrnl/ke/exception.c:94) KiRaiseException (ntoskrnl/ke/i386/exp.c:1312) Unhandled UserMode exception, terminating thread
This is where the wiz stops and waits for you to tell it to reboot. But, it just set there w/o anyway to select to go back or next. The only way to exit this is to right click and click on close to reboot.
r19797 fixes the wizard so at least you can continue normally. The underlying problem (an exception during registration of DLLs) isn't fixed yet though (I can't reproduce it).
GvG
Ge van Geldorp wrote:
From: James Tabor lib/setupapi/parser.c:1531) (00630c50,L"RegistrationPhase2",L"RegisterDlls"): r eturning 0 (lib/setupapi/parser.c:1737) context 00630c50/00630c50/2/0 index 1 returning L"O leControlDlls" (lib/setupapi/parser.c:1319) (00630c50,L"OleControlDlls") returning 15 (lib/setupapi/parser.c:1531) (./ntoskrnl/ke/exception.c:94) KiRaiseException (ntoskrnl/ke/i386/exp.c:1312) Unhandled UserMode exception, terminating thread
This is where the wiz stops and waits for you to tell it to reboot. But, it just set there w/o anyway to select to go back or next. The only way to exit this is to right click and click on close to reboot.
r19797 fixes the wizard so at least you can continue normally. The underlying problem (an exception during registration of DLLs) isn't fixed yet though (I can't reproduce it).
GvG
Oh wow! Yes it is very intermittent, happens 5 out of 7 boots. I'll test it tonight.
Thank you! James