Hello,
I have identified a major deficiency in the x86 kernel that requires minor overhaul of multiple low-level components, and stems from poorly understood implementation details of the x86 architecture, which is NMI support.
NMIs are Non Maskable Interrupts, similar to SMIs (generated for SMM) mode but typically used for critical hardware errors (somewhat a precursor to MCEs).
Most modern operating systems support NMI as a debugging tool: when a system is deadlocked due to interrupt issues or perhaps in a state with interrupts disabled, it is often hard to "break-in" the system to analyze the issue. They are also used as a last resort to terminate the system in case of major hardware error (such as power issues or parity errors).
NMIs can also be generated by external hardware:
This simple circuit generates a tri-state one-cycle SERR# pulse on the PCI bus, which causes an NMI. It can be used as an emergency dump switch, when other methods fail.
Â
The following issues exist in ReactOS that hinder NMI support:
Â
- The I/O Privilege Map (IOPM) configuration is done dangerously and incorrectly. A number of misunderstood hardcoded values are used throughout the code, assumptions are made on the number of IOPMs, and IOPM switching is done very poorly during BIOS Calls.
Â
- BIOS Calls are currently executing on the TSS context of the current state. This works fine with the normal KGDT_TSS that NTOS executes on, but causes dangerous errors on scenarios such as Double Faults (KGDT_DF_TSS) and NMIs (KGDT_NMI_TSS). These TSS segments do not have an IOPM allocated, which causes memory corruption when the BIOS Call code attempts to save and restore the IOPM by assuming it's there. It also causes BIOS code to fail during execution; after tracing with an IDP we discovered that BIOS I/O Port accesses were generating exceptions, which turned out to be due to the fact the BIOS was reading the bogus, non-existing IOPM and thus failing to validate I/O Port access. This is currently a problem in ReactOS as a double-fault trap will trigger massive corruption, as the panic code will attempt to draw the "Blue screen of death", requiring a Video ROM Interrupt 10h through a BIOS Call, which will fail as explained. In an NMI case, the same scenario would also happen.
Â
- The NMI trap code is not yet implemented.
Â
- The KeRegisterNmiCallback and KeDeregisterNmiCallback routines are not yet implemented.
Â
- KPRCB Context-switching is not yet implemented, along with related routines. Only the high-level routines used during debug traps are implemented, but not the support required for resuming after an NMI.
Â
- HalHandleNMI is subject to recursive NMI scenarios.
Â
- BIOS Calls do excessive TLB flushing.
Â
- The logic for IDT write-protection during BIOS Calls is overcomplicated. The IDT should always be made read-only and restore to its previous state.
Â
- TLB flushing in the HAL appears to be broken when global pages are used. Additionally, the same problem exists in NTOS -- there is no support for TLB flushing when Global Pages are used, even though Global Page support is enabled in the MMU and the bit is used on kernel PTEs. This leads to either over-flushing global pages during context switching, which shouldn't be done, or non-flushing of global pages, when they should be flushed.
Â
- Tangential issue: I have written a new UNIMPLEMENTED_PATH macro that now describes the exact path that was touched. Previously only the PC was given, which makes it nearly impossible to connect to the line of source causing the issue, especially for non developers. This new macro outputs a string reason. Additionally, an UNIMPLEMENTED_V86_PATH is used for scenarios where the path is only expected in VDM/V8086 scenarios, to differentiate from unlikely paths in normal execution flows.
Â
These issues have all been fixed in my toilet. All this work stemmed from doing some testing of the new ARM3 section code written recently (never debug other people's code!), which led to significant debugging pains without NMI support. It has nothing to do with the ARM port but since I've written it, I might as well pass it on instead of keeping it locally for eternity.
Â
Thoughts/comments?
Â
-r
Dear ReactOS Members,
We'd like to issue you our warmest holiday greetings and a happy new year!
Withal, receive our cordial gratitude for the recent work on getting the ARM tree building again as well as for extending support for Windows and Mac OS X build systems throughout this troubled time.
-r
Hi,
just for your information, I have already written a script and it's already
available in trunk (/trunk/tools/changelog/autocl.py), but Aleksey decided to
make the ChangeLog manually. Of course the script can't do any magic, but it
was intended to setup up a ChangeLog-base for manual adjustments with the
great advantage that nothing will be forgotten. Maybe we can improve the
script to fit it to our needs and make it as much flexible as possible, but
nevertheless, we have to give it a chance to judge...
Matthias
--
Matthias Kupfer phone +49 (0) 371 236 46 52
Wilhelm-Firl-Straße 21 mobile +49 (0) 160 859 43 54
09122 Chemnitz, Germany
Hi,
This is incredible. I can understand a delay in a release because fixing Blockers, i can understand a delay because we are afraid of "2012" movie.But i can not understand why the Changelog is not made!!. On 3rd of November Aleksey sent an email saying a Blocker was present to release 0.3.11, this means that on 3rd of November a full Changelog should have been done and just 2 lines (explaining the Fix,if needed) should have been added afterwards. But the fact is that the Blocker has been solved more than one week ago, that we are still waiting for changelog to be done(after more than a month) and that the binaries are stored in SF waiting indefinitely to link them.
So this strikes again, what is happening with Changelogs?What happens if noone wants to write his Changelog?Are we going to stay here waiting it indefinitely?Is our actual procedure correct?Or is it not practical?How can we shorten the time for a release?What happens if one of the Steps before releasing (like Changelog Step) is not properlly done?Do we have a PlanB?
Let´s review our Teorical steps before a release and finding Bottlenecks as i do in my work:
1) Coding Time.During this time Devs creates ReactOS code.Since 0.3.9 also the GoldenApps and CandidateApps are being tested during this time in regular basis to reduce the possible Blockers.
2) Choosing Minute. An ISO is chosen as Candidate. Doubts: When the Candidate is chosen?Following which criterias?Why sometimes we take an ISO after one month and other times after 2 months?
3) Colin creates a Pre-release ISO.
4) Colin creates the 0.3.XX wikipage.Tests begins and Changelog begins to be written.
5) Performing Testing on Candidate. It usually takes less than a week, and thanks to previous testings in Coding Time, there are less Blockers each time.
6A) Blocker found. If a Blocker is found, a regression testing begins(if needed) and a patch/hack is made.GoTo 7
6B) Blocker not found. prerelease ISO is released as definitive.END.
7)Patch is added to release branch, Colin creates a second RC and testers perform a full test focusing in the regression. Release.END.
Let´s begin studying the 0.3.11 case.
Step 1 was done correctly.We have devs still working on ROS.great!
Step 2 was a CHAOS. I have been asked to select first an ISO before asking Colin to make a prerelease ISO of it. Testing that first ISO we found the Vmware regression but it took a little(more than a month to fix it) so we select a different ISO and we performed a full testing (again) before asking Colin to create a prerelease.
Changing the ISO is not inside the Teorical steps,but since the patch for Vmware wasnt made and we had time it didnt suppose really a lose of time.
Step 3.First bottleneck. It took a little to contact with Colin,because he was busy with RL, so we had to wait him to create a Branch,include the reverts and create the prerelease.If I recall correctly it took more than a Week. Without the prerelease done is impossible to test anything.We should have an alternative to Colin in case Colin is busy with RL.We dont have a PlanB for this situation.
Step4. Colin creates the Wikipage.
Step5.Test begins but Changelog didnt begin to be written, it has been asked twice via ML, and zillions via IRC. Currently we are waiting for having it complete.
And now second bottleneck:
In 0.3.11 case,Blockers were solved BEFORE prerelease was made, so when Colin uploaded the prerelease iso it doesnt have any Blocker and it is ready to be released.And then the bottleneck comes:Changelog. Changelog can be created without hurry if we are in step 6A(a Blocker found) but in case 6B (as 0.3.11 prerelease is) you dont have real time to create it. When a Blocker is found,Changelog can be created during the extra time of regtesting+finding a patch+adding to branch+creating a new iso+performing again all the Tests, (this extra time is usually 4 weeks). But when non Blocker is found in Step6, Changelog stops our release.You cant made a proper Changelog of 2 months changes in a week. So our procedure currently is not optimal at all.
First bottleneck: Relying in just one guy to merge stuff in the branch+create the ISO should be solved.
Second bottleneck: If we expect that our prereleases doesnt have any blocker,then Changelog will be stopping our releases. To avoid this i propose a new procedure for 0.3.12, which is more multitasking.
STEP1: Create 0.3.12 Changelog page.Open to include the changes since the Beginning.
STEP2: Coding time. Meanwhile,testers tests goldenapps and candidateapps.
STEP2: Choosing minute.The candidate ISO is selected, Devs are warned in Changelog page which is the latest revision to include the changes and that just they have one week to finish their Changelogs.
STEP3: Release Engineers(not just Colin) creates the branch and the prerelease iso.
STEP4: Release Engineers creates a Test 0.3.12 wikipage.
STEP5: Tests are performed. This gives one week of extra time to have the Changelog done(more if a blocker is found).
If Changelog is critical for releasing, then we need a PlanB if a Dev rejects or cant make a changelog.This is called SCRIPT. I´m really bored of those guys who doesnt want the Script but doesnt write their changes in Changelog neither.
we should have a Script that by request or by lazyness of the devs can make a Changelog giving a revision range,a component or|and an author.This will make the life easier and healthier.Btw, some Devs doesnt want to waste the time writting Changelogs and prefer coding, so this tool is a must.I dont want to see devs sending less code to avoid writting a changelog.
Sorry about this long post.My fingers were trained in our Wiki this morning, btw, i hope someone can review my recent changes on the Changelog 0.3.11 to be sure i put the commits in their correct place.Thanks.
_________________________________________________________________
Date una vuelta por Sietes y conoce el pueblo de los expertos en Windows 7
http://www.sietesunpueblodeexpertos.com/