Author: sir_richard
Date: Sun Jun 6 03:04:03 2010
New Revision: 47605
URL: http://svn.reactos.org/svn/reactos?rev=47605&view=rev
Log:
[NTOS]: Fix for the the bug that broke ARM3 paged pool (and has been corrupting ReactOS paged pool behind the scenes for years):
When a KCB (key stuff) is allocated, the key name associated with it receives an NCB (name stuff). In case this name is already used, a cache exists, and an existing NCB is grabbed, and its reference count is increased. When the KCB goes away, its NCB loses a reference. When all references are gone, the NCB is destroyed. Simple enough.
It turns out that what was currently happening is that an NCB would get dereferenced to 0, deleted, but still remained attached to a valid KCB (shouldn't happen). When that KCB went away, the NCB's reference count was dropped to... -1, and then -2, -3, -4, etc. Remember this is a FREED NCB. In other words, freed pool, that might now belong to someone else, was getting "-1" operations on it. So any value stored in that freed pool would get decremented by one. In ARM3 paged pool, because the allocator keeps a linked list, what would happen is that the FLINK pointer would be 0xE0F01234 instead of 0xE1A01234. What happened is that "0xE1A0" was treated as the reference count of the freed NCB, and it kept getting dereferenced down to 0xE0F0.
Proving this was easy, by adding an ASSERT(Ncb->RefCount >= 1) to the routine that dereferences NCBs. Obviously, we should not try to dereference an NCB that has a reference count of 0, because that NCB is now gone. Adding this ASSERT immediately caught the error, regardless of which pool implementation was being used, so this was a problem in ReactOS today, right now.
My first thought was that we were taking references to NCBs without incrementing the reference count. The NCB gets referenced in two places: when it gets created, and everytime a cached NCB is re-used for a new KCB (all this in CmpGetNameControlBlock).
After adding some tracing code, I discovered that CmpGetNameControlBlock would sometimes return an NCB that was cached, but without referencing it. I did not understand why, since the code says "if (Found) Ncb->RefCount++".
Further analysis showed that what would happen, on this particular instance, is that NCB "Foo" was being Found, but NCB "Bar" was returned instead. Therefore, causing some serious issues: First, NCB Foo was receiving too many references. Secondly, NCB Bar was not being referenced.
Worse though, it turns out this would happen when "Foo" was the CORRECT NCB, and "Bar" was an INCORRECT NCB. What do we mean by correct and incorrect? Well, because NCBs are hashed, it's possible for two NCB hashes to be VERY SIMILAR, but only ONE OF THOSE NCBs will be the right one -- for example, HKLM\Software\Hello vs HKLM\Software\Hell.
In our case, when a KCB for "Hello" was searching for the "Hello" NCB, the "Hello NCB would get a reference, but the "Hell" NCB would be returned. In other words, whenever a HASH COLLISION happened, the incorrect NCB was returned, probably messing up registry code in the process. Subsequently, when the KCB was dereferneced, it was attached to this incorrect, under-referenced NCB.
Since in ANY hash collision with "Hell", in our example, the "Hell" NCB would come first, subsequent searches for "Hellmaster", "Hellboy", "Hello World" would all still return "Hell". Eventually when all these KCBs would go away, the "Hell" NCB would reach even -18 references.
The simple solution? When the CORRECT NCB is found, STOP SEARCHING! By adding a simple "break" statement. Otherwise, even after the correct NCB is found, further, incorrect, collided NCBs are found, and eventually the last one ("Hell", in our example) got returned, and under-referenced, while "Hellmaster" and "Hellboy" were not returned, but LEAKED REFERENCES.
There you have it folks, MEMORY CORRUPTION (USE-AFTER-FREE), INCORRECT REGISTRY NAME PARSHING, REFERENCE LEAKS and REFERENCE UNDERRUNS, all due to ONE missing "break;".
-r
Modified:
trunk/reactos/ntoskrnl/config/cmkcbncb.c
Modified: trunk/reactos/ntoskrnl/config/cmkcbncb.c
URL: http://svn.reactos.org/svn/reactos/trunk/reactos/ntoskrnl/config/cmkcbncb.c…
==============================================================================
--- trunk/reactos/ntoskrnl/config/cmkcbncb.c [iso-8859-1] (original)
+++ trunk/reactos/ntoskrnl/config/cmkcbncb.c [iso-8859-1] Sun Jun 6 03:04:03 2010
@@ -234,7 +234,9 @@
if (Found)
{
/* Reference it */
+ ASSERT(Ncb->RefCount != 0xFFFF);
Ncb->RefCount++;
+ break;
}
}
@@ -320,6 +322,7 @@
CmpAcquireNcbLockExclusiveByKey(ConvKey);
/* Decrease the reference count */
+ ASSERT(Ncb->RefCount >= 1);
if (!(--Ncb->RefCount))
{
/* Find the NCB in the table */
@@ -579,7 +582,7 @@
NewRefCount = OldRefCount - 1;
/* Check if we still have references */
- if( (NewRefCount & 0xFFFF) > 0)
+ if ((NewRefCount & 0xFFFF) > 0)
{
/* Do the dereference */
if (InterlockedCompareExchange((PLONG)&Kcb->RefCount,
Author: sir_richard
Date: Sat Jun 5 21:32:46 2010
New Revision: 47601
URL: http://svn.reactos.org/svn/reactos?rev=47601&view=rev
Log:
[NTOS]: Even after allowing ARM3 paged pool, we should still use the old allocator to free allocations made by the old allocator!
Modified:
trunk/reactos/ntoskrnl/mm/ARM3/expool.c
Modified: trunk/reactos/ntoskrnl/mm/ARM3/expool.c
URL: http://svn.reactos.org/svn/reactos/trunk/reactos/ntoskrnl/mm/ARM3/expool.c?…
==============================================================================
--- trunk/reactos/ntoskrnl/mm/ARM3/expool.c [iso-8859-1] (original)
+++ trunk/reactos/ntoskrnl/mm/ARM3/expool.c [iso-8859-1] Sat Jun 5 21:32:46 2010
@@ -763,8 +763,8 @@
//
// Check for paged pool
//
- if (!(AllowPagedPool) && ((P >= MmPagedPoolBase) &&
- (P <= (PVOID)((ULONG_PTR)MmPagedPoolBase + MmPagedPoolSize))))
+ if ((P >= MmPagedPoolBase) &&
+ (P <= (PVOID)((ULONG_PTR)MmPagedPoolBase + MmPagedPoolSize)))
{
//
// Use old allocator
Author: sir_richard
Date: Sat Jun 5 21:19:28 2010
New Revision: 47600
URL: http://svn.reactos.org/svn/reactos?rev=47600&view=rev
Log:
[NTOS]: Fix up POOL_PREV_BLOCK based on suggestion by hpoussin.
[NTOS]: Fix up NTAPI location in function definition.
[NTOS]: Implement even more stringent header checks: ExpCheckPoolHeader and ExpCheckPoolBlocks. Normally we would only want this on a DBG build, but I am enabling them for now until I can fix paged pool. If your machine crashes, reverting this commit is NOT the solution (boots for me).
[NTOS]: Add a AllowPagedPool BOOLEAN that will allow us to selectively enable when the ARM3 pool can be used, playing around with the situation that causes the corruption, and perhaps making it easier to find/fix.
Modified:
trunk/reactos/ntoskrnl/mm/ARM3/expool.c
Modified: trunk/reactos/ntoskrnl/mm/ARM3/expool.c
URL: http://svn.reactos.org/svn/reactos/trunk/reactos/ntoskrnl/mm/ARM3/expool.c?…
==============================================================================
--- trunk/reactos/ntoskrnl/mm/ARM3/expool.c [iso-8859-1] (original)
+++ trunk/reactos/ntoskrnl/mm/ARM3/expool.c [iso-8859-1] Sat Jun 5 21:19:28 2010
@@ -19,6 +19,8 @@
#undef ExAllocatePoolWithQuota
#undef ExAllocatePoolWithQuotaTag
+BOOLEAN AllowPagedPool = FALSE;
+
/* GLOBALS ********************************************************************/
ULONG ExpNumberOfPagedPools;
@@ -33,7 +35,7 @@
#define POOL_FREE_BLOCK(x) (PLIST_ENTRY)((ULONG_PTR)(x) + sizeof(POOL_HEADER))
#define POOL_BLOCK(x, i) (PPOOL_HEADER)((ULONG_PTR)(x) + ((i) * POOL_BLOCK_SIZE))
#define POOL_NEXT_BLOCK(x) POOL_BLOCK((x), (x)->BlockSize)
-#define POOL_PREV_BLOCK(x) POOL_BLOCK((x), -(x)->PreviousSize)
+#define POOL_PREV_BLOCK(x) POOL_BLOCK((x), -((x)->PreviousSize))
/*
* Pool list access debug macros, similar to Arthur's pfnlist.c work.
@@ -51,22 +53,22 @@
*
* For now, these are not made inline, so we can get good stack traces.
*/
-NTAPI
PLIST_ENTRY
+NTAPI
ExpDecodePoolLink(IN PLIST_ENTRY Link)
{
return (PLIST_ENTRY)((ULONG_PTR)Link & ~1);
}
-NTAPI
PLIST_ENTRY
+NTAPI
ExpEncodePoolLink(IN PLIST_ENTRY Link)
{
return (PLIST_ENTRY)((ULONG_PTR)Link | 1);
}
-NTAPI
-VOID
+VOID
+NTAPI
ExpCheckPoolLinks(IN PLIST_ENTRY ListHead)
{
if ((ExpDecodePoolLink(ExpDecodePoolLink(ListHead->Flink)->Blink) != ListHead) ||
@@ -80,22 +82,22 @@
}
}
-NTAPI
-VOID
+VOID
+NTAPI
ExpInitializePoolListHead(IN PLIST_ENTRY ListHead)
{
ListHead->Flink = ListHead->Blink = ExpEncodePoolLink(ListHead);
}
-NTAPI
BOOLEAN
+NTAPI
ExpIsPoolListEmpty(IN PLIST_ENTRY ListHead)
{
return (ExpDecodePoolLink(ListHead->Flink) == ListHead);
}
-NTAPI
-VOID
+VOID
+NTAPI
ExpRemovePoolEntryList(IN PLIST_ENTRY Entry)
{
PLIST_ENTRY Blink, Flink;
@@ -105,8 +107,8 @@
Blink->Flink = ExpEncodePoolLink(Flink);
}
-NTAPI
PLIST_ENTRY
+NTAPI
ExpRemovePoolHeadList(IN PLIST_ENTRY ListHead)
{
PLIST_ENTRY Entry, Flink;
@@ -117,8 +119,8 @@
return Entry;
}
-NTAPI
PLIST_ENTRY
+NTAPI
ExpRemovePoolTailList(IN PLIST_ENTRY ListHead)
{
PLIST_ENTRY Entry, Blink;
@@ -129,8 +131,8 @@
return Entry;
}
-NTAPI
-VOID
+VOID
+NTAPI
ExpInsertPoolTailList(IN PLIST_ENTRY ListHead,
IN PLIST_ENTRY Entry)
{
@@ -144,8 +146,8 @@
ExpCheckPoolLinks(ListHead);
}
-NTAPI
-VOID
+VOID
+NTAPI
ExpInsertPoolHeadList(IN PLIST_ENTRY ListHead,
IN PLIST_ENTRY Entry)
{
@@ -159,6 +161,131 @@
ExpCheckPoolLinks(ListHead);
}
+VOID
+NTAPI
+ExpCheckPoolHeader(IN PPOOL_HEADER Entry)
+{
+ PPOOL_HEADER PreviousEntry, NextEntry;
+
+ /* Is there a block before this one? */
+ if (Entry->PreviousSize)
+ {
+ /* Get it */
+ PreviousEntry = POOL_PREV_BLOCK(Entry);
+
+ /* The two blocks must be on the same page! */
+ if (PAGE_ALIGN(Entry) != PAGE_ALIGN(PreviousEntry))
+ {
+ /* Something is awry */
+ KeBugCheckEx(BAD_POOL_HEADER,
+ 6,
+ (ULONG_PTR)PreviousEntry,
+ __LINE__,
+ (ULONG_PTR)Entry);
+ }
+
+ /* This block should also indicate that it's as large as we think it is */
+ if (PreviousEntry->BlockSize != Entry->PreviousSize)
+ {
+ /* Otherwise, someone corrupted one of the sizes */
+ KeBugCheckEx(BAD_POOL_HEADER,
+ 5,
+ (ULONG_PTR)PreviousEntry,
+ __LINE__,
+ (ULONG_PTR)Entry);
+ }
+ }
+ else if (PAGE_ALIGN(Entry) != Entry)
+ {
+ /* If there's no block before us, we are the first block, so we should be on a page boundary */
+ KeBugCheckEx(BAD_POOL_HEADER,
+ 7,
+ 0,
+ __LINE__,
+ (ULONG_PTR)Entry);
+ }
+
+ /* This block must have a size */
+ if (!Entry->BlockSize)
+ {
+ /* Someone must've corrupted this field */
+ KeBugCheckEx(BAD_POOL_HEADER,
+ 8,
+ 0,
+ __LINE__,
+ (ULONG_PTR)Entry);
+ }
+
+ /* Okay, now get the next block */
+ NextEntry = POOL_NEXT_BLOCK(Entry);
+
+ /* If this is the last block, then we'll be page-aligned, otherwise, check this block */
+ if (PAGE_ALIGN(NextEntry) != NextEntry)
+ {
+ /* The two blocks must be on the same page! */
+ if (PAGE_ALIGN(Entry) != PAGE_ALIGN(NextEntry))
+ {
+ /* Something is messed up */
+ KeBugCheckEx(BAD_POOL_HEADER,
+ 9,
+ (ULONG_PTR)NextEntry,
+ __LINE__,
+ (ULONG_PTR)Entry);
+ }
+
+ /* And this block should think we are as large as we truly are */
+ if (NextEntry->PreviousSize != Entry->BlockSize)
+ {
+ /* Otherwise, someone corrupted the field */
+ KeBugCheckEx(BAD_POOL_HEADER,
+ 5,
+ (ULONG_PTR)NextEntry,
+ __LINE__,
+ (ULONG_PTR)Entry);
+ }
+ }
+}
+
+VOID
+NTAPI
+ExpCheckPoolBlocks(IN PVOID Block)
+{
+ BOOLEAN FoundBlock;
+ SIZE_T Size = 0;
+ PPOOL_HEADER Entry;
+
+ /* Get the first entry for this page, make sure it really is the first */
+ Entry = PAGE_ALIGN(Block);
+ ASSERT(Entry->PreviousSize == 0);
+
+ /* Now scan each entry */
+ while (TRUE)
+ {
+ /* When we actually found our block, remember this */
+ if (Entry == Block) FoundBlock = TRUE;
+
+ /* Now validate this block header */
+ ExpCheckPoolHeader(Entry);
+
+ /* And go to the next one, keeping track of our size */
+ Size += Entry->BlockSize;
+ Entry = POOL_NEXT_BLOCK(Entry);
+
+ /* If we hit the last block, stop */
+ if (Size >= (PAGE_SIZE / POOL_BLOCK_SIZE)) break;
+
+ /* If we hit the end of the page, stop */
+ if (PAGE_ALIGN(Entry) == Entry) break;
+ }
+
+ /* We must've found our block, and we must have hit the end of the page */
+ if ((PAGE_ALIGN(Entry) != Entry) || !(FoundBlock))
+ {
+ /* Otherwise, the blocks are messed up */
+ KeBugCheckEx(BAD_POOL_HEADER, 10, (ULONG_PTR)Block, __LINE__, (ULONG_PTR)Entry);
+ }
+}
+
/* PRIVATE FUNCTIONS **********************************************************/
VOID
@@ -331,8 +458,8 @@
//
// Check for paged pool
//
- if (PoolType == PagedPool) return ExAllocatePagedPoolWithTag(PagedPool, NumberOfBytes, Tag);
-
+ if (!(AllowPagedPool) && (PoolType == PagedPool)) return ExAllocatePagedPoolWithTag(PagedPool, NumberOfBytes, Tag);
+
//
// Some sanity checks
//
@@ -418,6 +545,7 @@
ExpCheckPoolLinks(ListHead);
Entry = POOL_ENTRY(ExpRemovePoolHeadList(ListHead));
ExpCheckPoolLinks(ListHead);
+ ExpCheckPoolBlocks(Entry);
ASSERT(Entry->BlockSize >= i);
ASSERT(Entry->PoolType == 0);
@@ -534,6 +662,7 @@
// and release the lock since we're done
//
Entry->PoolType = PoolType + 1;
+ ExpCheckPoolBlocks(Entry);
ExUnlockPool(PoolDesc, OldIrql);
//
@@ -590,12 +719,14 @@
//
// Release the pool lock
//
+ ExpCheckPoolBlocks(Entry);
ExUnlockPool(PoolDesc, OldIrql);
}
//
// And return the pool allocation
//
+ ExpCheckPoolBlocks(Entry);
Entry->PoolTag = Tag;
return POOL_FREE_BLOCK(Entry);
}
@@ -628,12 +759,12 @@
POOL_TYPE PoolType;
PPOOL_DESCRIPTOR PoolDesc;
BOOLEAN Combined = FALSE;
-#if 1
+
//
// Check for paged pool
//
- if ((P >= MmPagedPoolBase) &&
- (P <= (PVOID)((ULONG_PTR)MmPagedPoolBase + MmPagedPoolSize)))
+ if (!(AllowPagedPool) && ((P >= MmPagedPoolBase) &&
+ (P <= (PVOID)((ULONG_PTR)MmPagedPoolBase + MmPagedPoolSize))))
{
//
// Use old allocator
@@ -641,7 +772,6 @@
ExFreePagedPool(P);
return;
}
-#endif
//
// Quickly deal with big page allocations
@@ -680,6 +810,7 @@
//
// Check if the next allocation is at the end of the page
//
+ ExpCheckPoolBlocks(Entry);
if (PAGE_ALIGN(NextEntry) != NextEntry)
{
//