Hi, I believe with the new cvs changes by Hartmut caused the illusive lockups I'm having to pop up more often now. Two today and yesterday.
This print out looks good, but I've had debug messages overlapping each other at the beginning of the building process.
Debug output before lockup, there seems to be a double print out, before just one every KernelTime, and the KernelTime looks like it had stopped,
(KERNEL32:mem/global.c:412) Memory Load: 14 (ex/sysinfo.c:964) SystemFullMemoryInformation (ex/sysinfo.c:972) PID: 1, KernelTime: 4266465 PFFree: 65536 PFUsed: 0 MC_CACHE 22993, MC_USER 4858, MC_PPOOL 1079, MC_NPPOOL 7069, MmStats.NrFreePages 217935 (KERNEL32:mem/global.c:412) Memory Load: 14 (ex/sysinfo.c:964) SystemFullMemoryInformation (ex/sysinfo.c:972) PID: 1, KernelTime: 4266465 PFFree: 65536 PFUsed: 0 MC_CACHE 23009, MC_USER 4864, MC_PPOOL 1079, MC_NPPOOL 7069, MmStats.NrFreePages 217913 (KERNEL32:mem/global.c:412) Memory Load: 14 (ex/sysinfo.c:964) SystemFullMemoryInformation (ex/sysinfo.c:972) PID: 1, KernelTime: 4266465 PFFree: 65536 PFUsed: 0 MC_CACHE 23009, MC_USER 4864, MC_PPOOL 1079, MC_NPPOOL 7069, MmStats.NrFreePages 217913 (KERNEL32:mem/global.c:412) Memory Load: 14 (ex/sysinfo.c:964) SystemFullMemoryInformation (ex/sysinfo.c:972) PID: 1, KernelTime: 4266471 PFFree: 65536 PFUsed: 0 MC_CACHE 23025, MC_USER 4870, MC_PPOOL 1079, MC_NPPOOL 7069, MmStats.NrFreePages 217891 (KERNEL32:mem/global.c:412) Memory Load: 14 (ex/sysinfo.c:964) SystemFullMemoryInformation (ex/sysinfo.c:972) PID: 1, KernelTime: 4266471 PFFree: 65536 PFUsed: 0 MC_CACHE 23025, MC_USER 4870, MC_PPOOL 1079, MC_NPPOOL 7069, MmStats.NrFreePages 217891 (KERNEL32:mem/global.c:412) Memory Load: 14 (ex/sysinfo.c:964) SystemFullMemoryInformation (ex/sysinfo.c:972) PID: 1, KernelTime: 4266474 PFFree: 65536 PFUsed: 0 MC_CACHE 22065, MC_USER 4876, MC_PPOOL 1079, MC_NPPOOL 7069, MmStats.NrFreePages 218845 (KERNEL32:mem/global.c:412) Memory Load: 13 (ex/sysinfo.c:964) SystemFullMemoryInformation (ex/sysinfo.c:972) PID: 1, KernelTime: 4266474 PFFree: 65536 PFUsed: 0 MC_CACHE 22065, MC_USER 4876, MC_PPOOL 1079, MC_NPPOOL 7069, MmStats.NrFreePages 218845 (KERNEL32:mem/global.c:412) Memory Load: 13 (ex/sysinfo.c:964) SystemFullMemoryInformation (ex/sysinfo.c:972) PID: 1, KernelTime: 4266474 PFFree: 65536 PFUsed: 0 MC_CACHE 22081, MC_USER 4882, MC_PPOOL 1079, MC_NPPOOL 7069, MmStats.NrFreePages 218823 (KERNEL32:mem/global.c:412) Memory Load: 13 (ex/sysinfo.c:964) SystemFullMemoryInformation (ex/sysinfo.c:972) PID: 1, KernelTime: 4266474 PFFree: 65536 PFUsed: 0 MC_CACHE 22081, MC_USER 4882, MC_PPOOL 1079, MC_NPPOOL 7069, MmStats.NrFreePages 218823 (KERNEL32:mem/global.c:412) Memory Load: 13 (ex/sysinfo.c:964) SystemFullMemoryInformation (ex/sysinfo.c:972) PID: 1, KernelTime: 4266474 PFFree: 65536 PFUsed: 0 MC_CACHE 22097, MC_USER 4888, MC_PPOOL 1079, MC_NPPOOL 7069, MmStats.NrFreePages 218801 (KERNEL32:mem/global.c:412) Memory Load: 13 (ex/sysinfo.c:964) SystemFullMemoryInformation (ex/sysinfo.c:972) PID: 1, KernelTime: 4266474 PFFree: 65536 PFUsed: 0 MC_CACHE 22097, MC_USER 4888, MC_PPOOL 1079, MC_NPPOOL 7069, MmStats.NrFreePages 218801 (KERNEL32:mem/global.c:412) Memory Load: 13 (ex/sysinfo.c:964) SystemFullMemoryInformation (ex/sysinfo.c:972) PID: 1, KernelTime: 4266474 PFFree: 65536 PFUsed: 0 MC_CACHE 22113, MC_USER 4889, MC_PPOOL 1079, MC_NPPOOL 7069, MmStats.NrFreePages 218784 (KERNEL32:mem/global.c:412) Memory Load: 13 (ex/sysinfo.c:964) SystemFullMemoryInformation (ex/sysinfo.c:972) PID: 1, KernelTime: 4266474 PFFree: 65536 PFUsed: 0 MC_CACHE 22113, MC_USER 4889, MC_PPOOL 1079, MC_NPPOOL 7069, MmStats.NrFreePages 218784 (KERNEL32:mem/global.c:412) Memory Load: 13 (ex/sysinfo.c:964) SystemFullMemoryInformation (ex/sysinfo.c:972) PID: 1, KernelTime: 4266474 PFFree: 65536 PFUsed: 0 MC_CACHE 22129, MC_USER 4900, MC_PPOOL 1079, MC_NPPOOL 7069, MmStats.NrFreePages 218757 (KERNEL32:mem/global.c:412) Memory Load: 13 (ex/sysinfo.c:964) SystemFullMemoryInformation (ex/sysinfo.c:972) PID: 1, KernelTime: 4266474 PFFree: 65536 PFUsed: 0 MC_CACHE 22129, MC_USER 4900, MC_PPOOL 1079, MC_NPPOOL 7069, MmStats.NrFreePages 218757 (KERNEL32:mem/global.c:412) Memory Load: 13 (ex/sysinfo.c:964) SystemFullMemoryInformation (ex/sysinfo.c:972) PID: 1, KernelTime: 4266474 PFFree: 65536 PFUsed: 0 MC_CACHE 22145, MC_USER 4904, MC_PPOOL 1079, MC_NPPOOL 7069, MmStats.NrFreePages 218737 (KERNEL32:mem/global.c:412) Memory Load: 13 (ex/sysinfo.c:964) SystemFullMemoryInformation (ex/sysinfo.c:972) PID: 1, KernelTime: 4266474 PFFree: 65536 PFUsed: 0 MC_CACHE 22145, MC_USER 4904, MC_PPOOL 1079, MC_NPPOOL 7069, MmStats.NrFreePages 218737 (KERNEL32:mem/global.c:412) Memory Load: 13
System lockup, keyboard works???, num & scroll light works on/off.
Anyone have a good guess where to start?
Thanks, James
-----Original Message----- From: ros-dev-bounces@reactos.com [mailto:ros-dev-bounces@reactos.com] On Behalf Of James Tabor Sent: Thursday, November 04, 2004 7:55 AM To: ReactOS Development List Subject: [ros-dev] Race Condition?
Hi, I believe with the new cvs changes by Hartmut caused the illusive lockups I'm having to pop up more often now. Two today and yesterday.
This print out looks good, but I've had debug messages overlapping each other at the beginning of the building process.
Debug output before lockup, there seems to be a double print out, before just one every KernelTime, and the KernelTime looks like it had stopped,
Hi,
I can not reproduce your problem. I'm using a nearly clean cvs tree from yesterday. I've add some little modifications that I can compile ros with gcc-3.4.1 and the latest w32api. I use also modified inf files for the installation. My test system is a scsi only mp machine. I assume that you spoke about the up system. I'm using cmd.exe as login shell. I can compile ros on ros. I think that the changes to the hal and the irq handling are not the problem. I've add some changes for the sequence how threads acquire the PiThreadLock and the DispatcherDatabaseLock. Currently the locking sequence is only a problem for mp systems, not for up systems. The problem on mp systems is that the first thread have acquire one lock and the second thread the other. After this, the threads try to acquire the other lock. This is currently not a problem of the up system, because after a thread has acquire one lock the system runs on DISPATCH_LEVEL and no thread switching is possible, no other thread can acquire the second lock. Currently there exist more problems of the locking sequence for mp machines. You have reported the KernelTime problem. In one of Filip's changes was a little mistake, which has called KeUpdateSystemTime with the wrong irql. The time was always reported to the interrupt time. I've fixed this, but didn't add a comment.
- Hartmut
Hi! Hartmut Birr wrote:
Hi,
I can not reproduce your problem. I'm using a nearly clean cvs tree from yesterday. I've add some little modifications that I can compile ros with gcc-3.4.1 and the latest w32api. I use also modified inf files for the installation. My test system is a scsi only mp machine. I assume that you spoke about the up system. I'm using cmd.exe as login shell. I can compile ros on ros. I think that the changes to the hal and the irq handling are not the problem. I've add some changes for the sequence how threads acquire the PiThreadLock and the DispatcherDatabaseLock. Currently the locking sequence is only a problem for mp systems, not for up systems. The problem on mp
oh okay, mp = SMP
systems is that the first thread have acquire one lock and the second thread the other. After this, the threads try to acquire the other lock. This is currently not a problem of the up system, because after a thread has acquire one lock the system runs on DISPATCH_LEVEL and no thread switching is possible, no other thread can acquire the second lock. Currently there exist more problems of the locking sequence for mp machines. You have reported the KernelTime problem. In one of Filip's changes was a little mistake, which has called KeUpdateSystemTime with the wrong irql. The time was always reported to the interrupt time. I've fixed this, but didn't add a comment.
- Hartmut
Okay.
FYI, you have to let ros run for over ~12 hours to get it to fail. After dividing the KernelTime clock that would be approx 11.85 hours until it froze. Oh, this was tested w/o the explorer running just the straight cmd console after boot up.
I will start with the current cvs tree at this time and restart the run.
I'll post some more information next time. 8^(
Thanks, James
Hartmut Birr wrote:
Hi,
I can not reproduce your problem. I'm using a nearly clean cvs tree from yesterday. I've add some little modifications that I can compile ros with gcc-3.4.1 and the latest w32api. I use also modified inf files for the installation. My test system is a scsi only mp machine. I assume that you spoke about the up system. I'm using cmd.exe as login shell. I can compile ros on ros. I think that the changes to the hal and the irq handling are not the problem. I've add some changes for the sequence how threads acquire the PiThreadLock and the DispatcherDatabaseLock.
I belive there is only one lock (the dispatcher lock) on Windows (<= XP). On W2K3 there is one other lock in KPRCB because the dispatching structures were moved there and they're per-processor now. The per-processor idle thread then moves threads from other processor's queue AFAIK. Please correct me if I'm wrong.
Filip
P.S. The thread dispatching that is currently in Ps should be moved to Ke.
Filip Navara wrote:
I belive there is only one lock (the dispatcher lock) on Windows (<= XP).
That's true - i already prepared the thread management for that. The current lock could be renamed to the dispatcher lock - when inserting/removing threads from a process, the process should be just locked during the action. I have written a patch for the process locking mechanism that win2k uses - but Alex prefers to "upgrade" to xp and use pushlocks which aren't implemented yet.
P.S. The thread dispatching that is currently in Ps should be moved to Ke.
I agree, and not just that, also everything that has to do with the KTHREAD/KPROCESS structures should be moved there.
Thomas
Thomas Weidenmueller wrote:
Filip Navara wrote:
I belive there is only one lock (the dispatcher lock) on Windows (<= XP).
That's true - i already prepared the thread management for that. The current lock could be renamed to the dispatcher lock - when inserting/removing threads from a process, the process should be just locked during the action. I have written a patch for the process locking mechanism that win2k uses - but Alex prefers to "upgrade" to xp and use pushlocks which aren't implemented yet.
P.S. The thread dispatching that is currently in Ps should be moved to Ke.
I agree, and not just that, also everything that has to do with the KTHREAD/KPROCESS structures should be moved there.
And Ke should stop using Ps functions.
Thomas _______________________________________________ Ros-dev mailing list Ros-dev@reactos.com http://reactos.com:8080/mailman/listinfo/ros-dev
Best regards, Alex Ionescu
Hi,
I would like if you can commit this changes or can send me a diff. The race condition between the two locks is one of the major problems on smp machines. An other problem is the registry. Some registry functions are not thread safe, which results in wrong values and very often in a crash.
- Hartmut
-----Original Message----- From: ros-dev-bounces@reactos.com [mailto:ros-dev-bounces@reactos.com] On Behalf Of Thomas Weidenmueller Sent: Friday, November 05, 2004 5:47 PM To: ReactOS Development List Subject: Re: [ros-dev] Race Condition?
Filip Navara wrote:
I belive there is only one lock (the dispatcher lock) on
Windows (<= XP).
That's true - i already prepared the thread management for that. The current lock could be renamed to the dispatcher lock - when inserting/removing threads from a process, the process should be just locked during the action. I have written a patch for the process locking mechanism that win2k uses - but Alex prefers to "upgrade" to xp and use pushlocks which aren't implemented yet.
P.S. The thread dispatching that is currently in Ps should
be moved to
Ke.
I agree, and not just that, also everything that has to do with the KTHREAD/KPROCESS structures should be moved there.
Thomas _______________________________________________ Ros-dev mailing list Ros-dev@reactos.com http://reactos.com:8080/mailman/listinfo/ros-dev
"Hartmut Birr" hartmut.birr@gmx.de wrote:
I would like if you can commit this changes or can send me a diff. The
race
condition between the two locks is one of the major problems on smp machines. An other problem is the registry. Some registry functions are
not
thread safe, which results in wrong values and very often in a crash.
Which registry functions are not thread-safe?
Regards, Eric
Eric Kohl wrote:
"Hartmut Birr" hartmut.birr@gmx.de wrote:
I would like if you can commit this changes or can send me a diff. The
race
condition between the two locks is one of the major problems on smp machines. An other problem is the registry. Some registry functions are
not
thread safe, which results in wrong values and very often in a crash.
Which registry functions are not thread-safe?
Regards, Eric
Ros-dev mailing list Ros-dev@reactos.com http://reactos.com:8080/mailman/listinfo/ros-dev
Hi Eric,
AFAIK none of them are thread-safe...what I've seen on windows is that a caller of a registry function usually uses a special registry lock before accessing it. The functions themselves are not thread-safe.
Best regards, Alex Ionescu
-----Original Message----- From: ros-dev-bounces@reactos.com [mailto:ros-dev-bounces@reactos.com] On Behalf Of Eric Kohl Sent: Saturday, November 06, 2004 5:14 PM To: ReactOS Development List Subject: Re: [ros-dev] Race Condition?
Which registry functions are not thread-safe?
Hi,
on my smp machine I get very often a crash in CmiObjectParse/CmiAddKeyToList/RtlCopyMemory. In most cases something with the parent key was wrong. Sometimes the value from NumberOfSubKeys is 0xcccccccc. Sometimes the keys 'Windows NT', 'CurrentVersion' or 'SysFontSubstitutes' are inserted anywhere in the registry. It seems that the registry parsing is triggered from GetFontFamilyInfoForSubstitutes (win32k).
- Hartmut
"Hartmut Birr" hartmut.birr@gmx.de wrote:
Hi,
on my smp machine I get very often a crash in CmiObjectParse/CmiAddKeyToList/RtlCopyMemory. In most cases something with the parent key was wrong. Sometimes the value from NumberOfSubKeys is 0xcccccccc. Sometimes the keys 'Windows NT', 'CurrentVersion' or 'SysFontSubstitutes' are inserted anywhere in the registry. It seems that the registry parsing is triggered from GetFontFamilyInfoForSubstitutes (win32k).
Most of these issues should get fixed by the attached patch. It replaces the hive locks and hive list lock (executive resources) by a global registry lock. The result is that only a single thread can modify the registry.
Regards, Eric
Hi Eric
Most of these issues should get fixed by the attached patch. It replaces the hive locks and hive list lock (executive resources) by a global registry lock. The result is that only a single thread can modify the registry.
what is the general criterion (if one exists) one should use when choosing a synchronization object to use in kernel mode code?
I ask you this question, because a possible race condition was reported by Gé in the LPC code. As a preliminary step in fixing that, I introduced a FAST_MUTEX to queue threads that use the LPC facility. Now I see that you use an ERESOURCE to fix the same problem in the CM.
Emanuele
Aliberti Emanuele wrote:
Hi Eric
Most of these issues should get fixed by the attached patch. It replaces the hive locks and hive list lock (executive resources) by a global registry lock. The result is that only a single thread can modify the registry.
what is the general criterion (if one exists) one should use when choosing a synchronization object to use in kernel mode code?
I ask you this question, because a possible race condition was reported by Gé in the LPC code. As a preliminary step in fixing that, I introduced a FAST_MUTEX to queue threads that use the LPC facility. Now I see that you use an ERESOURCE to fix the same problem in the CM.
Emanuele _______________________________________________ Ros-dev mailing list Ros-dev@reactos.com http://reactos.com:8080/mailman/listinfo/ros-dev
Hi,
FAST_MUTEX cannot be shared, it's always exclusive, while ERESOURCE can be exclusive or shared. It's usually a waste of resources to use an ERESOURCE if you want the locking to be exclusive, but since we need to support shared access as well, ERESOURCE is a good choice. It's commonly used for Read/Writing to disk (ie, registry).
Best regards, Alex Ionescu
At 16.57 05/11/2004, you wrote:
P.S. The thread dispatching that is currently in Ps should be moved to Ke.
Fun fact about the scheduler: the reason there's both KeXxx and PsXxx scheduling functions, and both KTHREAD and ETHREAD, is yet other leftover from the initial microkernel design. Basically, KeXxx is the kerner proper (the rest of the kernel is called "executive"), in the microkernel sense of the term: it only does scheduling and obscure low-level architecture-specific stuff. It doesn't care about how are threads created, how are they maintained alive and when they are destroyed: it just requires its subsystems to please hold the dispatcher lock while they add or remove threads to its queues. It effectively allows for multiple kernels to coexist with the same scheduler (provided they have a way to share hardware resources). Keep this in mind when you read our spaghetti code that happily mixes Ps and Ke together :-P
Hi all!
Here is the last run, TickCount 121 hours about 5.069 days. No stats from the PCR Kernel Time, you can read the rest.
(ex/sysinfo.c:965) SystemFullMemoryInformation (ex/sysinfo.c:977) PID: 1, KernelTime: 37578485 PcrKTime: 0 PcrIdleTime: 37252553 TickCount: 43795486 PFFree: 65536 PFUsed: 0 MC_CACHE 20272, MC_USER 7568, MC_PPOOL 2709, MC_NPPOOL 6913, MmStats.NrFreePages 216472 (KERNEL32:mem/global.c:412) Memory Load: 14
CVS from 11/05, James
-----Original Message----- From: ros-dev-bounces@reactos.com [mailto:ros-dev-bounces@reactos.com] On Behalf Of James Tabor Sent: Thursday, November 11, 2004 8:53 AM To: ReactOS Development List Subject: Re: [ros-dev] Race Condition?
Hi all!
Here is the last run, TickCount 121 hours about 5.069 days. No stats from the PCR Kernel Time, you can read the rest.
Pcr->KernelTime and Pcr->UserTime were added later than the implementation of the counting functions. That is the reason why they are not incremented.
- Hartmut