0x7F (memory leak)

In this post, we will be looking at a memory leak caused by a program called NotMyFault which is supplied by Sysinternals, they have some excellent tools you should check out if interested.
To download NotMyFault then here’s the link.

http://live.sysinternals.com/Files/NotMyFault.zip

Let’s take a look.

BugCheck 7F, {8, 80050033, 406f8, fffff80002e69f2c}

This bugcheck indicates the Kernel encountered a trap which it’s not allowed to catch, this means that it cannot be resolved and must bugcheck. In this case the cause of the crash was a double fault, this cannot be resolved and crashes the system.
A double fault occurs when an exception is takes place during the processing of another exception,  if an exception occurs when processing a double fault a triple fault can occur.

So looking at the callstack this is what we see, do note this is only a small snippet as the callstack is massive with repeats of Nvidia driver functions at the same address.

fffff880`02fddce8 fffff800`02ec7169 : 00000000`0000007f 00000000`00000008 00000000`80050033 00000000`000406f8 : nt!KeBugCheckEx
fffff880`02fddcf0 fffff800`02ec5632 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiBugCheckDispatch+0x69
fffff880`02fdde30 fffff800`02e69f2c : fffffa80`035d4000 00000000`00000000 00000000`00000000 fffff800`02ff947c : nt!KiDoubleFaultAbort+0xb2
fffff880`009ab000 fffff800`02ff947c : 00000000`00000000 fffff880`009ab080 00000000`00000000 00000000`00000000 : nt!MiExpandNonPagedPool+0x14
fffff880`009ab020 fffff800`02ffbf26 : fffff800`030586c0 00000000`00000003 00000000`00000000 fffff880`049f9c05 : nt!MiAllocatePoolPages+0xdfd
fffff880`009ab160 fffff880`04a1ea55 : 00000000`00000000 00000000`00000001 fffff880`009ab2b8 fffff880`00000000 : nt!ExAllocatePoolWithTag+0x316
fffff880`009ab250 fffff880`04a1b6e8 : fffffa80`05b75000 00000000`00000002 00000000`00000002 fffffa80`036a7000 : nvlddmkm+0x1bfa55
fffff880`009ab280 fffff880`04ae392a : fffff880`009ab318 fffffa80`00000018 fffffa80`036a7000 fffffa80`05b75000 : nvlddmkm+0x1bc6e8
fffff880`009ab2e0 fffff880`04b9f804 : 00000000`00100005 00000000`00000000 00000000`00100006 fffffa80`05b75000 : nvlddmkm+0x28492a
fffff880`009ab310 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340804
fffff880`009ab350 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
fffff880`009ab390 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
fffff880`009ab3d0 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
fffff880`009ab410 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
fffff880`009ab450 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
fffff880`009ab490 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
fffff880`009ab4d0 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827

 So what is happening is the Nvidia driver is being blamed (probably due to it being in the stack when the last context was saved which was the exception) and is calling lots of function with what appears to be allocating more pages until a double fault in initiated, I suspect the double fault occurred due to memory not being able to be allocated which caused an exception then another exception occurred.

So looking at the virtual memory usage we can see the following.

3: kd> !vm

*** Virtual Memory Usage ***
    Physical Memory:     1036418 (   4145672 Kb)
    Page File: \??\C:\pagefile.sys
      Current:   4145672 Kb  Free Space:   3702732 Kb
      Minimum:   4145672 Kb  Maximum:     12437016 Kb
    Available Pages:      100902 (    403608 Kb)
    ResAvail Pages:       209219 (    836876 Kb)
    Locked IO Pages:           0 (         0 Kb)
    Free System PTEs:   33504448 ( 134017792 Kb)
    Modified Pages:         4479 (     17916 Kb)
    Modified PF Pages:      4364 (     17456 Kb)
    NonPagedPool Usage:   764909 (   3059636 Kb)
    NonPagedPool Max:     764972 (   3059888 Kb)
    ********** Excessive NonPaged Pool Usage *****

We can see that the non paged pool memory has been completed depleted which caused the system to crash.
Now you might be asking, can’t it just put the memory onto disk to stop it crashing?
Well moving memory from RAM onto disk is known as paging which is used to save space when the memory usage is high. However, Kernel memory is mainly divided into two main categories:

-Paged Pool
-Non Paged Pool

Paged pool is for applications and other memory allocations that when not in use can be moved to disk to save storage space, non paged pool on the other hand can’t be moved to disk under any circumstances as device drivers and other critical operating system components use these dynamic memory allocations to function correctly, they must be available immediately for use.

So why can’t they just page the memory back from disk when needed?

Well it’s not that simple, paging can be very expensive in that it takes time and puts a lot of pressure on the drive which is much slower than RAM.
Not only that but the IRQL must be at 1 or below in order to page files, when the IRQL is higher than 1 paging is not allowed. Just say for example we get a system call that needs servicing quickly at an IRQL of 7 for example, that may need the device driver to perform certain tasks but it can’t because it’s paged out, we can’t page it in because the IRQL needs to be at 1 or below.
Now we can’t just lower the IRQL because the higher the IRQL the higher the priority which causes a bugcheck of 0xA or 0xD1.

So why is the memory being leaked and what is it?

A memory leak occur when an object acquires memory but doesn’t free it after it’s being used which prevents those pages from being allocated as they need to be freed but they’re not in use.
If the object keeps calling ExAllocatePool then it keeps allocating memory but not using it, just because they’re not in use doesn’t mean they can be used by anything else.
So when the last of the non paged memory pools have been used up the system cannot function anymore as critical objects cannot allocate memory to function so the system crashes.

We can look at the assembly instructions to see what is happening.

3: kd> .trap fffff880`02fdde30
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=00000000000bac2c rbx=0000000000000000 rcx=0000000000000001
rdx=fffff880009ab0b8 rsi=0000000000000000 rdi=0000000000000000
rip=fffff80002e69f2c rsp=fffff880009ab000 rbp=fffff880009ab080
 r8=ffffffffffffffff  r9=fffffa80035eb5b8 r10=00000000ffffffff
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl zr na po nc
nt!MiExpandNonPagedPool+0x14:
fffff800`02e69f2c 4156            push    r14

 So it’s calling a function which I believe tries to expand non paged pool to allow objects to allocate it as it might be too small for use.

 3: kd> u nt!MiExpandNonPagedPool+0x14
nt!MiExpandNonPagedPool+0x14:
fffff800`02e69f2c 4156            push    r14
fffff800`02e69f2e 4157            push    r15
fffff800`02e69f30 4881ecd0000000  sub     rsp,0D0h
fffff800`02e69f37 488db9ff010000  lea     rdi,[rcx+1FFh]
fffff800`02e69f3e 4881e700feffff  and     rdi,0FFFFFFFFFFFFFE00h
fffff800`02e69f45 483bf9          cmp     rdi,rcx
fffff800`02e69f48 0f82cfbffeff    jb      nt! ?? ::FNODOBFM::`string’+0x1e009 (fffff800`02e55f1d)
fffff800`02e69f4e 488b0513762100  mov     rax,qword ptr [nt!MiSystemVaTypeCount+0x28 (fffff800`03081568)]

 So here we can see push instructions which adds data onto the stack but because there is no more memory left it stops adding data and crashes.

Advertisements

0x9F DRIVER_POWER_STATE_FAILURE

First off I’d like to say I’m sorry I’ve not been posting in a while but I’ll try to post a bit more.

BugCheck 9F, {4, 258, fffffa8007005660, fffff800053e83d0}

 There are two types of 0x9F bugchecks, indicated by the first parameter, the first one is indicated with a 3 which means an IRP has been held onto for too long so the system bugchecked as it holds everything else up.
The second one is what we’re going to look at which indicates a thread is holding onto a power IRP for too long which causes it to timeout and bugcheck.

So in more detail this bugcheck indicates a power IRP failed to synchronise with the PnP manager, basically a power IRP is an I/O Request Packet that sends power transitions down a device stack.
All power IRPs must reach the PDO (Physical Device Object) at the bottom of the stack to ensure power transitions are done correctly.
When one doesn’t reach the bottom for any reason the system bugchecks, in this case a thread was blocking the IRP so it couldn’t be completed within the allocated time interval.

So lets look at the locks on the system which are blocking the IRP.
To understand what this means we need to know what a lock is (ERESOURCE structure), locks are synchronisation mechanisms that allow drivers to access resources efficiently.
There are two main types of locks, exclusive and shared where the exclusive lock is the owner and shared can be implemented across multiple threads.
They contain a read/write mechanism where only one thread can write but multiple threads can read simultaneously.
Acquiring a thread exclusivly requires no threads can be currently sharing it, for thread to acquire a lock it must be put into a wait state until it is available.

This was only a brief explanation, for more information check out this article:

http://msdn.microsoft.com/en-us/library/ff548046.aspx

Back to the topic, looking at the locks.

0: kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks..

Resource @ nt!IopDeviceTreeLock (0xfffff80003492ce0)    Shared 1 owning threads
    Contention Count = 1
     Threads: fffffa8007005660-01
KD: Scanning for held locks.

Resource @ nt!PiEngineLock (0xfffff80003492be0)    Exclusively owned
    Contention Count = 21
    NumberOfExclusiveWaiters = 1
     Threads: fffffa8007005660-01
     Threads Waiting On Exclusive Access:
              fffffa800f308b50      

KD: Scanning for held locks…………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………
18855 total locks, 2 locks currently held

Let’s look at the exclusive thread owning the lock.

0: kd> !thread fffffa8007005660
THREAD fffffa8007005660  Cid 0004.0048  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Non-Alertable
    fffffa800d035ee8  NotificationEvent
IRP List:
    fffffa8008f5cc10: (0006,03e8) Flags: 00000000  Mdl: 00000000
Not impersonating
DeviceMap                 fffff8a000008c10
Owning Process            fffffa8006f8d890       Image:         System
Attached Process          N/A            Image:         N/A
Wait Start TickCount      396427         Ticks: 38463 (0:00:10:00.026)
Context Switch Count      44059          IdealProcessor: 2  NoStackSwap
UserTime                  00:00:00.000
KernelTime                00:00:00.343
Win32 Start Address nt!ExpWorkerThread (0xfffff80003298150)
Stack Init fffff88003bd2c70 Current fffff88003bd2280
Base fffff88003bd3000 Limit fffff88003bcd000 Call 0
Priority 15 BasePriority 12 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child                                                           : Call Site
fffff880`03bd22c0 fffff800`032845f2 : fffffa80`07005660 fffffa80`07005660 00000000`00000000 00000000`00000000 : nt!KiSwapContext+0x7a
fffff880`03bd2400 fffff800`0329599f : fffffa80`0d0df208 fffff880`0ae9e10b fffffa80`00000000 00000000`00000000 : nt!KiCommitThreadWait+0x1d2
fffff880`03bd2490 fffff880`0ae915dd : fffffa80`0d035000 00000000`00000000 fffffa80`0dd8ca00 00000000`00000000 : nt!KeWaitForSingleObject+0x19f
fffff880`03bd2530 fffff880`0ae92627 : fffffa80`0d035000 00000000`00000000 fffffa80`0c0891a0 fffff880`03bd2670 : ZTEusbnet+0x35dd
fffff880`03bd2580 fffff880`0215d809 : fffffa80`0c0891a0 fffff880`020f0ecd fffff880`03bd2670 fffffa80`091c5550 : ZTEusbnet+0x4627
fffff880`03bd25b0 fffff880`0215d7d0 : fffffa80`091c54a0 fffffa80`0c0891a0 fffff880`03bd2670 fffffa80`08fc2ac0 : ndis!NdisFDevicePnPEventNotify+0x89
fffff880`03bd25e0 fffff880`0215d7d0 : fffffa80`08fc2a10 fffffa80`0c0891a0 fffffa80`091f9010 fffffa80`091f90c0 : ndis!NdisFDevicePnPEventNotify+0x50
fffff880`03bd2610 fffff880`0219070c : fffffa80`0c0891a0 00000000`00000000 00000000`00000000 fffffa80`0c0891a0 : ndis!NdisFDevicePnPEventNotify+0x50
fffff880`03bd2640 fffff880`021a1da2 : 00000000`00000000 fffffa80`08f5cc10 00000000`00000000 fffffa80`0c0891a0 : ndis! ?? ::LNCPHCLB::`string’+0xddf
fffff880`03bd26f0 fffff800`034fb121 : fffffa80`091c7060 fffffa80`0c089050 fffff880`03bd2848 fffffa80`070bfa00 : ndis!ndisPnPDispatch+0x843
fffff880`03bd2790 fffff800`0367b3a1 : fffffa80`070bfa00 00000000`00000000 fffffa80`0dc19990 fffff880`03bd2828 : nt!IopSynchronousCall+0xe1
fffff880`03bd2800 fffff800`03675d78 : fffffa80`09196e00 fffffa80`070bfa00 00000000`0000030a 00000000`00000308 : nt!IopRemoveDevice+0x101
fffff880`03bd28c0 fffff800`0367aee7 : fffffa80`0dc19990 00000000`00000000 00000000`00000003 00000000`00000136 : nt!PnpSurpriseRemoveLockedDeviceNode+0x128
fffff880`03bd2900 fffff800`0367b000 : 00000000`00000000 fffff8a0`11d1c000 fffff8a0`049330d0 fffff880`03bd2a58 : nt!PnpDeleteLockedDeviceNode+0x37
fffff880`03bd2930 fffff800`0370b97f : 00000000`00000002 00000000`00000000 fffffa80`09122010 00000000`00000000 : nt!PnpDeleteLockedDeviceNodes+0xa0
fffff880`03bd29a0 fffff800`0370c53c : fffff880`03bd2b78 fffffa80`114ab700 fffffa80`07005600 fffffa80`00000000 : nt!PnpProcessQueryRemoveAndEject+0x6cf
fffff880`03bd2ae0 fffff800`035f573e : 00000000`00000000 fffffa80`114ab7d0 fffff8a0`123a25b0 00000000`00000000 : nt!PnpProcessTargetDeviceEvent+0x4c
fffff880`03bd2b10 fffff800`03298261 : fffff800`034f9f88 fffff8a0`11d1c010 fffff800`034342d8 fffff800`034342d8 : nt! ?? ::NNGAKEGL::`string’+0x54d9b
fffff880`03bd2b70 fffff800`0352b2ea : 00000000`00000000 fffffa80`07005660 00000000`00000080 fffffa80`06f8d890 : nt!ExpWorkerThread+0x111
fffff880`03bd2c00 fffff800`0327f8e6 : fffff880`03965180 fffffa80`07005660 fffff880`0396ffc0 00000000`00000000 : nt!PspSystemThreadStartup+0x5a
fffff880`03bd2c40 00000000`00000000 : fffff880`03bd3000 fffff880`03bcd000 fffff880`03bd2410 00000000`00000000 : nt!KxStartSystemThread+0x16

A brief explanation is looking at the callstack we can see ndis functions calling the ZTEusbnet network driver about a PnP event, this looks like it’s due to the power IRP being sent down the stack but it’s being blocked by the network driver so it cannot get to the bottom f the stack, which I believe in this case is the pci.sys but I’m not too sure given it’s a USB network card and not a pci card.

So let’s look at the IRP.

0: kd> !irp fffffa8008f5cc10 7
Irp is active with 10 stacks 10 is current (= 0xfffffa8008f5cf68)
 No Mdl: No System Buffer: Thread fffffa8007005660:  Irp stack trace. 
Flags = 00000000
ThreadListEntry.Flink = fffffa8007005a50
ThreadListEntry.Blink = fffffa8007005a50
IoStatus.Status = c00000bb
IoStatus.Information = 00000000
RequestorMode = 00000000
Cancel = 00
CancelIrql = 0
ApcEnvironment = 00
UserIosb = fffff88003bd27c0
UserEvent = fffff88003bd27d0
Overlay.AsynchronousParameters.UserApcRoutine = 00000000
Overlay.AsynchronousParameters.UserApcContext = 00000000
Overlay.AllocationSize = 00000000 – 00000000
CancelRoutine = 00000000  
UserBuffer = 00000000
&Tail.Overlay.DeviceQueueEntry = fffffa8008f5cc88
Tail.Overlay.Thread = fffffa8007005660
Tail.Overlay.AuxiliaryBuffer = 00000000
Tail.Overlay.ListEntry.Flink = 00000000
Tail.Overlay.ListEntry.Blink = 00000000
Tail.Overlay.CurrentStackLocation = fffffa8008f5cf68
Tail.Overlay.OriginalFileObject = 00000000
Tail.Apc = 00000000
Tail.CompletionKey = 00000000
     cmd  flg cl Device   File     Completion-Context
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
>[ 1b,17]   0  0 fffffa800c089050 00000000 00000000-00000000   
           \Driver\ZTEusbnet

            Args: 00000000 00000000 00000000 00000000
IO verifier information:
No information available – the verifier is probably disabled

So here we can see the IRP reached ZTEusbnet but stopped there, so I think this driver is to blame.
One last thing, let’s look at the device object.

0: kd> !devobj fffffa800c089050
Device object (fffffa800c089050) is for:
 NDMP14 \Driver\ZTEusbnet DriverObject fffffa800deeae70
Current Irp 00000000 RefCount 0 Type 00000017 Flags 00002050
Dacl fffff9a10009b881 DevExt fffffa800c0891a0 DevObjExt fffffa800c08a8c0
ExtensionFlags (0x00000800)  DOE_DEFAULT_SD_PRESENT
Characteristics (0x00000100)  FILE_DEVICE_SECURE_OPEN
AttachedTo (Lower) fffffa80070bfa00 \Driver\usbccgp
Device queue is not busy.

So we can see the network driver is the upper layer and the usbccgp is the lower layer which is a USB bus driver.
The way around this I believe would be to update the driver as I’ve had no reply from the OP since.

I checked the timestamp for the network driver and it’s very outdated which is probably why it was causing such issues.

0: kd> lm vm ZTEusbnet
start             end                 module name
fffff880`0ae8e000 fffff880`0aebc000   ZTEusbnet   (no symbols)          
    Loaded symbol image file: ZTEusbnet.sys
    Image path: \SystemRoot\system32\DRIVERS\ZTEusbnet.sys
    Image name: ZTEusbnet.sys
    Timestamp:        Mon Oct 13 06:50:10 2008 (48F2E192)
    CheckSum:         000329ED
    ImageSize:        0002E000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

Hope you enjoyed reading.

0xF4 debugging



This will be a short post seen as I haven’t managed to get hold of a Kernel memory dump that involves memory leakage although I have seen it happen but I think the memory dump was a while ago so I deleted it but I can’t remember exactly.

BugCheck F4, {3, fffffa800bb94b30, fffffa800bb94e10, fffff800035e3270}

This bugcheck indicates a critical process has terminated for some reason which causes the system to crash as this process is critical for the system’s operations.

The 3 indicates a process has terminated so we can use the !process command on the second parameter.

2: kd> !process fffffa800bb94b30

GetPointerFromAddress: unable to read from fffff80003515000PROCESS fffffa800bb94b30 SessionId: none Cid: 0174 Peb: 7fffffda000 ParentCid: 0154 DirBase: 321389000 ObjectTable: fffff8a00b4f9840 HandleCount: Image: csrss.exe

The process that crashed is csrss.exe (Client/Server Runtime Subsystem) which is the Windows Subsystem, although Windows was designed to support multiple subsystems, calling each subsystem to perform functions such as display I/O would result in duplicate functions which would inevitably reduce performance, therefore designers put a lot of basic functions within this primary subsystem to improve performance. This results in the Windows Subsystem (implemented within csrss.exe) being marked as a critical process even on servers where display I/O isn’t needed so if its exited for any reason the system must bugcheck.

Now is mainly caused by disk I/O errors, so what is a disk I/O error?

Well when drive cannot perform basic operations such as read and write Windows cannot perform basic routines so the system fails resulting in a crash, this is usually the cause of a failing disk.

EXCEPTION_CODE: (NTSTATUS) 0xc0000006 – The instruction at 0x%p referenced memory at 0x%p. The required data was not placed into memory because of an I/O error status of 0x%x.

X64_0xF4_IOERR_IMAGE_csrss.exe

Secondly, severe memory leakage can cause this problem as it can drain all the systems resources, normally non paged memory pools so the system cannot function and crashes.

It’s caused by programs not freeing there pages of memory after they’ve finished using them so the pages are no longer in use by the application but they can’t be used by anything else as they haven’t been freed.

To determine whether or not you have a memory leakage you can use different programs, the Pool Monitor is one of them.

It sorts all memory used on the system into different categories of your choice such as Paged and Nonpaged pools.

For more information on the Pool Monitor look here:

How to use Memory Pool Monitor (Poolmon.exe) to troubleshoot kernel mode memory leaks

Another way is a Kernel Debugger which is my personal favourite way, you will need Kernel memory dumps to find pool leaks.

You can start with using the !poolused 2 command

I’ll show an example as I found a 0xF4 Kernel dump file but it isn’t the cause of a memory leak though.

(This was due to a disk I/O failure.)

EXCEPTION_CODE: (NTSTATUS) 0xc0000006 – The instruction at 0x%p referenced memory at 0x%p. The required data was not placed into memory because of an I/O error status of 0x%x.

Using the !vm command we can look at all the Virtual Memory being used at the time of the crash.

1: kd> !vm

*** Virtual Memory Usage *** Physical Memory: 4175860 ( 16703440 Kb) Page File: \??\C:\pagefile.sys Current: 16703440 Kb Free Space: 16703436 Kb Minimum: 16703440 Kb Maximum: 50110320 Kb Available Pages: 3833632 ( 15334528 Kb) ResAvail Pages: 4052929 ( 16211716 Kb) Locked IO Pages: 0 ( 0 Kb) Free System PTEs: 33497223 ( 133988892 Kb) Modified Pages: 137120 ( 548480 Kb) Modified PF Pages: 37803 ( 151212 Kb) NonPagedPool Usage: 13122 ( 52488 Kb) NonPagedPool Max: 3116636 ( 12466544 Kb) PagedPool 0 Usage: 27172 ( 108688 Kb) PagedPool 1 Usage: 5597 ( 22388 Kb) PagedPool 2 Usage: 0 ( 0 Kb) PagedPool 3 Usage: 0 ( 0 Kb) PagedPool 4 Usage: 56 ( 224 Kb) PagedPool Usage: 32825 ( 131300 Kb) PagedPool Maximum: 33554432 ( 134217728 Kb)

Although we can see the PagedPool usage that isn’t normally the cause of crashes due to memory leakage as it can be paged out to disk, it’s non paged pool leakage caused by device drivers that cause these issues.
Lets look at all the processes that are using the Nonpaged memory pools.
Do note the list is very long and it is ordered in size of memory usage so only the top few lines are of use.

The 2 extension is used to display the amount of nonpaged pool usage, 4 would show page pool.

1: kd> !poolused 2

….

Sorting by NonPaged Pool Consumed

NonPaged Paged

Tag Allocs Used Allocs Used


VfPT
1 8388608 0 0 Verifier Allocate/Free Pool stack traces , Binary: nt!Vf

XENO 30 2955056 0 0 UNKNOWN pooltag ‘XENO’, please update pooltag.txt

Obtd 1 2625536 0 0 UNKNOWN pooltag ‘Obtd’, please update pooltag.txt

NVRM 3064 2461228 1 528384 UNKNOWN pooltag ‘NVRM’, please update pooltag.txt

4KBS 564 2319168 0 0 UNKNOWN pooltag ‘4KBS’, please update pooltag.txt

The only pool usage that sticks out here is Driver Verifier running which separates certain pool allocations to monitor those specific drivers.

We can confirm this by running the !verifier command.

1: kd> !verifier

Verify Flags Level 0x00000dbb

STANDARD FLAGS: [X] (0x00000000) Automatic Checks [X] (0x00000001) Special pool [X] (0x00000002) Force IRQL checking [X] (0x00000008) Pool tracking [X] (0x00000010) I/O verification [X] (0x00000020) Deadlock detection [X] (0x00000080) DMA checking [X] (0x00000100) Security checks [X] (0x00000800) Miscellaneous checks

ADDITIONAL FLAGS: [ ] (0x00000004) Randomized low resources simulation [ ] (0x00000200) Force pending I/O requests [X] (0x00000400) IRP logging

[X] Indicates flag is enabled

We can see every option apart from Force pending I/O requests and Low resource simulation are enabled as these options create unrealistic environments for drivers that can cause them to crash when in reality the drivers might not crash at all so this creates false positive reports.

For more information on Driver Verifier options look here:

Driver Verifier Options (Windows Drivers)

Lastly, if pool usage is too high and causing system crashes we can take a look at IRPs being used as sometimes they can keep calling the functions using up memory.

We can do this by using the !irpfind command, unfortunately it doesn’t look like they’re saved in this dump file for some reason which I’ve never seen before.

A great example of this bugcheck can be found here.

This was originally recommend by my friend Vir Gnarus.

But I will see what happens regarding this situation and see if changing the disk drive solves the issue.

I forgot to mention the last way of tracking pool usage, Driver Verifier can use Pool Tracking to monitor all drivers selected and see if they have freed their allocations after the driver unloads, if it doesn’t the system crashes with a 0xC4 bugcheck hopefully catching the culprit.

This is not to be confused with Special Pool which Driver Verifier uses to allocate driver memory from a special pool which can be monitored for incorrect usage, for example, if a driver tries to access memory that has already been freed then the system will crash as it hasn’t been scrubbed and ready for allocation.

If a driver allocates 100 bytes but writes 110 bytes the driver will write into another driver’s header which can blame a different driver long after the culprit has left the scene so when the police come and investigate the crime scene the wrong person is locked up.

Special Pool changes how things are setup, so when a driver has allocated memory a guard page is set as well as slop byte at each end of the buffer, if the driver tries to write more than its allocated into the guard page the system immediately bugchecks, after the driver has unloaded and the memory has free the slop bytes will check to see if the memory has been freed, if it hasn’t the system will bugcheck and blame that driver.

For more information on Special Pool visit the previous link.

Debugging 0x124

Sorry for not posting in a while I’ve been distracted by other means.
Anyway, lets get into this.
So what’s a 0x124 bugcheck?

Well it means the CPU has raised the flag saying a fatal hardware error has occurred, this is noticed to windows via a standard messaging interface normally through a Machine Check Exception to notify of the error, Windows then bugchecks and stops the system in its tracks.

To sufficiently debug 0x124s we need to have multiple dump files to acquire enough evidence of the cause which is normally the CPU.

BugCheck 124, {0, fffffa80031f8028, b6472000, 1a000135}

So the first parameter which is null indicates the machine check exception which means the CPU has found the hardware error and has bugchecked, this is normally the case with 0x124s.
The second parameter is the address that contains the WHEA error record which should give us insight into the cause of the error.

3: kd> !errrec fffffa80031f8028

===============================================================================

Common Platform Error Record @ fffffa80031f8028

——————————————————————————-

Record Id     : 01cf93c95694ec87

Severity      : Fatal (1)

Length        : 928

Creator       : Microsoft

Notify Type   : Machine Check Exception

Timestamp     : 6/29/2014 20:39:44 (UTC)

Flags         : 0x00000000

===============================================================================

Section 0     : Processor Generic

——————————————————————————-

Descriptor    @ fffffa80031f80a8

Section       @ fffffa80031f8180

Offset        : 344

Length        : 192

Flags         : 0x00000001 Primary

Severity      : Fatal

Proc. Type    : x86/x64

Instr. Set    : x64

Error Type    : Cache error

Operation     : Data Read

Flags         : 0x00

Level         : 1

CPU Version   : 0x0000000000100f53

Processor ID  : 0x0000000000000003

===============================================================================

Section 1     : x86/x64 Processor Specific

——————————————————————————-

Descriptor    @ fffffa80031f80f0

Section       @ fffffa80031f8240

Offset        : 536

Length        : 128

Flags         : 0x00000000

Severity      : Fatal

Local APIC Id : 0x0000000000000003

CPU Id        : 53 0f 10 00 00 08 04 03 – 09 20 80 00 ff fb 8b 17

                00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00 – 00 00 00 00 00 00 00 00

Proc. Info 0  @ fffffa80031f8240

===============================================================================

Section 2     : x86/x64 MCA

——————————————————————————-

Descriptor    @ fffffa80031f8138

Section       @ fffffa80031f82c0

Offset        : 664

Length        : 264

Flags         : 0x00000000

Severity      : Fatal

Error         : DCACHEL1_DRD_ERR (Proc 3 Bank 0)

  Status      : 0xb64720001a000135

  Address     : 0x0000000013a87500

  Misc.       : 0x0000000000000000

So what does this mean?

Well the error you’re looking at is a Level 1 data read cache error which means the CPU failed to retrieve data stored in the Level 1 cache.
This is normally the first sign of a bad CPU but one single dump file isn’t enough to go on to fully determine the cause.
The other dump files (3 more) all indicate the same error on the same memory bank (0) on the same processor (3), this is enough to determine a bad CPU.

This can be caused by overclocking however which can cause a lot of problems, it can however be resolved if no permanent damage is caused.

3: kd> !sysinfo cpuinfo

[CPU Information]

~MHz = REG_DWORD 3206

Component Information = REG_BINARY 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

Configuration Data = REG_FULL_RESOURCE_DESCRIPTOR ff,ff,ff,ff,ff,ff,ff,ff,0,0,0,0,0,0,0,0

Identifier = REG_SZ AMD64 Family 16 Model 5 Stepping 3

ProcessorNameString = REG_SZ AMD Phenom(tm) II X4 840 Processor

VendorIdentifier = REG_SZ AuthenticAMD

 The CPU isn’t overclocked as this processor should be running at 3.2GHz which is 3200 MHz.

I would say its safe to say the CPU is bad but we can always remove the CMOs battery to remove any improper timings, and check for overheating.
If none of those help the CPU should be replaced.

I hope this has cleared some questions about these types of bugchecks.

0x101 bugcheck analysis

Today we’ll take a look at the 0x101 bugcheck error and what it means.

 BugCheck 101, {31, 0, fffff88003165180, 2}

So first of all what does all this mean?
Well this bugcheck indicates a clock interrupt was not received on a processor within the allocated time interval and therefore crashed the system.

Well a clock interrupt is a synchronization mechanism that lets the processors stay in sync to improve performance, when it’s sent out all the processors have to respond within the allocated time interval which in this case is 31 clock ticks.

The third parameter is the address that contains processor information of the hung processor.
The fourth parameter is mostly the processor that was responsible for not responding.

0: kd> kv

Child-SP          RetAddr           : Args to Child                                                           : Call Site

fffff880`06e53328 fffff800`03b2da4a : 00000000`00000101 00000000`00000031 00000000`00000000 fffff880`03165180 : nt!KeBugCheckEx

fffff880`06e53330 fffff800`03ae06f7 : 00000000`00000000 fffff800`00000002 00000000`00002710 fffff880`06e53450 : nt! ?? ::FNODOBFM::`string’+0x4e3e

fffff880`06e533c0 fffff800`03a22895 : fffff800`03a48460 fffff880`06e53570 fffff800`03a48460 00000000`00000000 : nt!KeUpdateSystemTime+0x377

fffff880`06e534c0 fffff800`03ad3113 : fffff800`03c51e80 00000000`00000001 00000000`00000001 fffff800`03a61000 : hal!HalpHpetClockInterrupt+0x8d

fffff880`06e534f0 fffff800`03aab937 : fffff800`03ae0aa5 00000000`000406f8 fffff880`03165180 fffff880`06e53710 : nt!KiInterruptDispatchNoLock+0x163 (TrapFrame @ fffff880`06e534f0)

fffff880`06e53680 fffff800`03de1d4f : 00000000`00000000 fffff880`06e53b60 00000000`00000000 00000000`00000000 : nt!KeFlushProcessWriteBuffers+0x63

fffff880`06e536f0 fffff800`03de23ad : 00000000`0307b2d0 fffff800`03dcccce 00000000`00000000 00000000`00000286 : nt!ExpQuerySystemInformation+0x13af

fffff880`06e53aa0 fffff800`03ad5e53 : 00000000`00000000 fffff880`06e53b60 ffffffff`fffe7960 000007fe`f3a80b90 : nt!NtQuerySystemInformation+0x4d

fffff880`06e53ae0 00000000`7799161a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`06e53ae0)

00000000`02e1f5b8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x7799161a

So the primary processor called the clock interrupt then bugchecked because the 2nd processor didn’t respond.
Lets look at the registers at the time of the interrupt dispatch.

0: kd> .trap fffff880`06e534f0

NOTE: The trap frame does not contain all registers.

Some register values may be zeroed or incorrect.

rax=0000000000000001 rbx=0000000000000000 rcx=0000000000000202

rdx=00000000000c00e1 rsi=0000000000000000 rdi=0000000000000000

rip=fffff80003aab937 rsp=fffff88006e53680 rbp=0000000000000000

 r8=00000000000000e1  r9=0000000000000001 r10=0000000000000000

r11=fffff880031d7180 r12=0000000000000000 r13=0000000000000000

r14=0000000000000000 r15=0000000000000000

iopl=0         nv up ei pl nz na pe nc

nt!KeFlushProcessWriteBuffers+0x63:

fffff800`03aab937 f390            pause

Here we see the the processor trying to flush the write buffers which happens when the CPU isn’t synced properly to try and help it become available to execute handler instructions and reorder the buffer(s) to hopefully allow the execution of the necessary instructions. When it attempts to flush the Trnaslation Lookaside Buffers the processor can try and get through its work and sync with the rest of the cores before the system calls a bugcheck, which is what has happened here.
So this is where I’ll do some unassembling.

0: kd> u @rip

nt!KeFlushProcessWriteBuffers+0x63:

fffff800`03aab937 f390            pause 

fffff800`03aab939 8b8380200000    mov     eax,dword ptr [rbx+2080h]

fffff800`03aab93f 3bc5            cmp     eax,ebp

fffff800`03aab941 75f4            jne     nt!KeFlushProcessWriteBuffers+0x63 (fffff800`03aab937

fffff800`03aab943 400fb6c6        movzx   eax,sil

fffff800`03aab947 440f22c0        mov     cr8,rax

fffff800`03aab94b 4c8d5c2460      lea     r11,[rsp+60h]

fffff800`03aab950 498b5b10        mov     rbx,qword ptr [r11+10h]

Lets unassemble the exact function.

0: kd> u fffff800`03aab937 fffff800`03aab943
nt!KeFlushProcessWriteBuffers+0x63:
fffff800`03aab937 f390            pause <– Pause (CPU delay) probably waiting for the spinlock to be released
fffff800`03aab939 8b8380200000    mov     eax,dword ptr [rbx+2080h]
fffff800`03aab93f 3bc5            cmp     eax,ebp
fffff800`03aab941 75f4            jne     nt!KeFlushProcessWriteBuffers+0x63 (fffff800`03aab937) <– Jump if not zero trying to stay in the loop
fffff800`03aab943 400fb6c6        movzx   eax,sil 

Now lets look at the other processors.
This processor is #1 which is actually the second one given that they actually start with #0.

1: kd> kv

Child-SP          RetAddr           : Args to Child                                                           : Call Site

fffff880`0a5b3bb0 fffff800`03b0b3cd : 00000000`00000000 00000000`00000001 00000000`476ca000 00000000`476ca000 : nt!KxFlushEntireTb+0x93 <– flushing the translation lookaside buffers.

fffff880`0a5b3bf0 fffff800`03b2f9b0 : 00000000`00000002 fffff680`0023b670 00000000`00000001 00000000`00015e7e : nt!KeFlushTb+0x119

fffff880`0a5b3c70 fffff800`03ae678f : fffff680`0023b670 fffff880`0a5b3d00 fffff700`01080000 fffffa80`0af873f8 : nt! ?? ::FNODOBFM::`string’+0xada2

fffff880`0a5b3cb0 fffff800`03af3bfe : 00000000`00000000 00000000`476ce000 fffff880`0a5b4010 fffff680`0023b670 : nt!MiResolveDemandZeroFault+0x1ff

fffff880`0a5b3da0 fffff800`03ae3179 : fffffa80`0baa6ab0 ffffffff`ffffffff fffff8a0`0c1fe740 00000000`00007000 : nt!MiDispatchFault+0x8ce

fffff880`0a5b3eb0 fffff800`03ad4cee : 00000000`00000000 00000000`476ce000 00000000`00000000 ffffffff`ffffffff : nt!MmAccessFault+0x359

fffff880`0a5b4010 fffff800`03af2798 : fffff8a0`0bebf960 fffffa80`0918f0f0 00000000`00000000 fffff800`03c0ae80 : nt!KiPageFault+0x16e (TrapFrame @ fffff880`0a5b4010)

fffff880`0a5b41a0 fffff880`0ff88ba1 : fffffa80`0ba99520 00000000`00000000 fffff880`00000002 00000000`476cc000 : nt!MmProbeAndLockPages+0x118

fffff880`0a5b42b0 fffff880`0ff87d0d : fffffa80`0918f1e0 fffff8a0`10969ad0 fffff880`4b677844 00000000`00007000 : dxgmms1!VIDMM_SEGMENT::SafeProbeAndLockPages+0x229

fffff880`0a5b4340 fffff880`0ff827d8 : fffff8a0`10969ad0 fffff8a0`10969ad0 fffff8a0`00000000 00000000`00000001 : dxgmms1!VIDMM_SEGMENT::LockAllocationBackingStore+0x8d

fffff880`0a5b43b0 fffff880`0ff76c2b : fffffa80`06ac0c80 fffffa80`095ec000 fffffa80`095ec000 fffff880`0ff76820 : dxgmms1!VIDMM_APERTURE_SEGMENT::CommitResource+0x1c4

fffff880`0a5b4400 fffff880`0ff7a59f : fffffa80`095ec000 fffff880`0a5b44d8 fffffa80`06ac0c00 00000000`00000000 : dxgmms1!VIDMM_GLOBAL::PageInAllocations+0xbb

fffff880`0a5b4460 fffff880`0ff74dd6 : fffff8a0`10969ad0 fffffa80`06ac0c80 fffff8a0`10969ad0 fffffa80`095ec000 : dxgmms1!VIDMM_GLOBAL::PageInOneAllocation+0x107

fffff880`0a5b44d0 fffff880`0ff744b3 : fffff880`0a5b4828 fffffa80`0b6a7010 fffff880`0a5b47a0 fffff880`0a5b4838 : dxgmms1!VIDMM_GLOBAL::ProcessDeferredCommand+0x3d2

fffff880`0a5b45f0 fffff880`0ff8e3ad : 00000000`00000001 fffff880`0ff600e0 fffffa80`095e0d50 fffffa80`095e0d50 : dxgmms1!VidMmiProcessSystemCommand+0x23

fffff880`0a5b4620 fffff880`0ff8d538 : fffff880`0a5b4780 00000000`00000000 fffffa80`095e0c00 00000000`00000001 : dxgmms1!VidSchiSubmitSystemCommand+0x39

fffff880`0a5b4650 fffff880`0ff5f786 : 00000000`00000000 fffffa80`095e0d50 fffffa80`095e0c00 fffffa80`095e0c00 : dxgmms1!VidSchiSubmitQueueCommand+0x74

fffff880`0a5b4680 fffff880`0ff8faa3 : fffffa80`095d4af0 fffffa80`095dd000 fffffa80`095e0c00 fffff8a0`10e11000 : dxgmms1!VidSchiSubmitQueueCommandDirect+0x1e6

fffff880`0a5b4710 fffff880`0ff745dd : fffffa80`00000001 fffffa80`09187410 fffffa80`095ec000 fffffa80`095ec000 : dxgmms1!VidSchiSubmitCommandPacketToQueue+0x15f

fffff880`0a5b4780 fffff880`0ff749f0 : fffff880`0a5b48b8 00000000`00000001 fffff880`0a5b48b8 00000000`00000001 : dxgmms1!VIDMM_GLOBAL::QueueSystemCommandAndWait+0xf9

fffff880`0a5b47f0 fffff880`0ff71cb2 : fffffa80`095ec000 fffffa80`06ac0c80 00000000`00000000 00000000`00040000 : dxgmms1!VIDMM_GLOBAL::QueueDeferredCommandAndWait+0x4c

fffff880`0a5b4860 fffff880`0ff57260 : fffffa80`095ec000 fffff8a0`10e12301 00000000`00000000 fffff880`02ee207f : dxgmms1!VIDMM_GLOBAL::BeginCPUAccess+0xcfa

fffff880`0a5b4930 fffff880`02f3f0e7 : 00000000`000ae260 fffffa80`095d5000 00000000`000ae260 fffff880`02ee107b : dxgmms1!VidMmBeginCPUAccess+0x28

fffff880`0a5b4980 fffff880`02f3f7ae : fffff8a0`0f3e34e0 fffff8a0`0f3e34e0 fffff880`0a5b4b60 00000000`00000000 : dxgkrnl!DXGDEVICE::Lock+0x287

fffff880`0a5b49e0 fffff960`002513a2 : 00000000`000ae260 00000000`40000101 00000000`000007db fffff8a0`10e11000 : dxgkrnl!DxgkLock+0x22a

fffff880`0a5b4ab0 fffff800`03ad5e53 : fffffa80`0a2d7750 fffff880`0a5b4b60 00000000`fffdb000 00000000`28d3c000 : win32k!NtGdiDdDDILock+0x12

fffff880`0a5b4ae0 00000000`7518156a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`0a5b4ae0)

00000000`000ae238 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x7518156a

Here we see some display routines by the DirectX wrappers, it looks like they’re allocating memory then locking it into memory using the MmProbeAndLockPages routine.

There’s not much unassembling to be done in this processor so I’ll not put the code in.

Now lets look at the problematic processor which is #2.

2: kd> kv

Child-SP          RetAddr           : Args to Child                                                           : Call Site

00000000`00000000 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0

2: kd> r

rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000

rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000

rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000

 r8=0000000000000000  r9=0000000000000000 r10=0000000000000000

r11=0000000000000000 r12=0000000000000000 r13=0000000000000000

r14=0000000000000000 r15=0000000000000000

iopl=0         nv up di pl nz na pe nc

cs=0000  ss=0000  ds=0000  es=0000  fs=0000  gs=0000             efl=00000000

00000000`00000000 ??              ???

The callstack and registers are zeroed out which is never good, so for this we’ll have to look at the raw stack. This is either due to the IRQL being too high or the processor was too hung to record the information.

2: kd> !irql

Debugger saved IRQL for processor 0x2 — 0 (LOW_LEVEL)

The IRQL is at passive level so it’s not the IRQL causing the problem but most likely being too hung to record the information.

fffff880`0318a978  fffff880`04285e6f*** ERROR: Module load completed but symbols could not be loaded for cfosspeed6.sys

 cfosspeed6+0x44e6f

fffff880`0318a980  fffff880`0318aa38

fffff880`0318a988  fffff880`042ca86d cfosspeed6+0x8986d

fffff880`0318a990  fffff880`0318aa30

fffff880`0318a998  fffff880`0318aa38

fffff880`0318a9a0  fffffa80`0b0c5580

fffff880`0318a9a8  fffff880`04284cb7 cfosspeed6+0x43cb7

The cfosspeed6.sys is an internet accelerator which I find to be pointless anyway.

fffff880`0318bae8  fffff880`02fd6061*** ERROR: Symbol file could not be found.  Defaulted to export symbols for VBoxNetFlt.sys – 

 VBoxNetFlt+0x3061

fffff880`0318baf0  fffffa80`06c6f830

fffff880`0318baf8  fffffa80`071efe70

fffff880`0318bb00  00000000`00000000

fffff880`0318bb08  fffffa80`071f1010

fffff880`0318bb10  fffffa80`070681a0

fffff880`0318bb18  fffff880`017b3a22 ndis!ndisMSendPacketCompleteToOpen+0x102

fffff880`0318bb20  00000000`00000000

fffff880`0318bb28  fffffa80`071efe70

fffff880`0318bb30  fffffa80`071f1010

fffff880`0318bb38  fffffa80`071efe70

fffff880`0318bb40  fffffa80`071a4010

fffff880`0318bb48  fffff880`05488848*** ERROR: Module load completed but symbols could not be loaded for tap0901.sys

 tap0901+0x3848

The Virtual Box network driver and the TAP-Win32 Adapter V9 are being flagged as well.
It looks like the Internet accelerator is conflicting with NETBIOS (Network Basic Input/Output System)

Finally lets look at the last processor.

3: kd> kv

Child-SP          RetAddr           : Args to Child                                                           : Call Site

fffff880`09ff6020 fffff800`03b0b3cd : 00000000`00000000 00000000`00000001 00000000`00000002 fffffa80`08da7c58 : nt!KxFlushEntireTb+0xcd

fffff880`09ff6060 fffff800`03b2f9b0 : 00000000`00000001 fffff680`0001e088 00000000`00000001 ffffffff`ffffffff : nt!KeFlushTb+0x119

fffff880`09ff60e0 fffff800`03ae389d : fffffa80`01e33c30 fffff880`09ff6180 fffffa80`0a3ec450 00000000`00000000 : nt! ?? ::FNODOBFM::`string’+0xada2

fffff880`09ff6120 fffff800`03ad4cee : 00000000`00000001 00000000`03c11000 00000000`00000000 00000000`00000001 : nt!MmAccessFault+0xa7d

fffff880`09ff6280 fffff960`0011ad9f : fffff960`0011ac60 fffff900`c264e670 00000000`00000000 00000000`03c10000 : nt!KiPageFault+0x16e (TrapFrame @ fffff880`09ff6280)

fffff880`09ff6410 fffff960`0011ab34 : fffff880`09ff6860 00000000`0000021d 00000000`03c10000 00000000`00000000 : win32k!vSolidFillRect1+0x13f

fffff880`09ff6450 fffff960`000f5d61 : 00000000`00000005 fffff880`09ff6860 fffff900`c2976870 fffff880`00000000 : win32k!vDIBSolidBlt+0x204

fffff880`09ff6630 fffff960`000df7d9 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff900`c264e670 : win32k!EngBitBlt+0x79d

fffff880`09ff6740 fffff960`000df3c0 : 00000000`00000000 fffff880`09ff6b60 fffff880`09ff69d0 fffff900`0000f0f0 : win32k!GrePatBltLockedDC+0x2f9

fffff880`09ff67f0 fffff960`002b9538 : fffff880`09ff69d0 00000000`fffdb000 00000000`0008ebc0 fffff960`00000000 : win32k!GrePolyPatBltInternal+0x2ec

fffff880`09ff6940 fffff800`03ad5e53 : 00000000`0008e308 00000000`00f00021 00000000`0008ebc0 00000000`00000001 : win32k!NtGdiPolyPatBlt+0x308 

fffff880`09ff6a70 00000000`751804ca : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`09ff6ae0)

00000000`0008e2e8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x751804ca

Some Win32k.sys routines, lets look at the trap frame for the page fault.

3: kd> .trap fffff880`09ff6280

NOTE: The trap frame does not contain all registers.

Some register values may be zeroed or incorrect.

rax=0000000003c10ff8 rbx=0000000000000000 rcx=0000000000000000

rdx=0000000000000006 rsi=0000000000000000 rdi=0000000000000000

rip=fffff9600011ad9f rsp=fffff88009ff6410 rbp=00000000ffffffff

 r8=0000000000000016  r9=0000000003c10874 r10=0000000000000003

r11=000000000000021d r12=0000000000000000 r13=0000000000000000

r14=0000000000000000 r15=0000000000000000

iopl=0         nv up ei pl nz ac po cy

win32k!vSolidFillRect1+0x13f:

fffff960`0011ad9f ??              ???

3: kd> u @rip

win32k!vSolidFillRect1+0x13f:

fffff960`0011ad9f ??              ???

            ^ Memory access error in ‘u @rip’

The information can’t be recorded.

So with all this said, the cause looks pretty clear that some network related drivers are causing the issues.
This dump file was in my downloads so I can’t recall the exact solution.

0x1A debugging

This is my first blog so I’ll see how this goes.

So what exactly is this blog about?

Well on a daily basis I go on forums and help people out with what is commonly known as the Blue Screen of Death or BSOD, I like to go in detail to analyse the exact cause.

I will try to explain everything as best as I can throughout this blog to try and help you understand what I’m rambling on about.

I thought I’d decide to look through some old Kernel Memory Dump files in my downloads and see what I can find.
Lets begin.

BugCheck 1A, {41790, fffffa80015c69c0, ffff, 0}
So what does this bugcheck mean?
I can guess what you’re probably thinking if you’re new to debugging

“It just looks like a bunch of random numbers and letters. How can you work with that?”

Well in Windows Debugger we can use the !analyze -v command to make some sense.

MEMORY_MANAGEMENT (1a)
    # Any other values for parameter 1 must be individually examined.
Arguments:
Arg1: 0000000000041790, A page table page has been corrupted. On a 64 bit OS, parameter 2
contains the address of the PFN for the corrupted page table page.
On a 32 bit OS, parameter 2 contains a pointer to the number of used
PTEs, and parameter 3 contains the number of used PTEs.
Arg2: fffffa80015c69c0
Arg3: 000000000000ffff
Arg4: 0000000000000000
So what does this mean?

Well, to put it simply a page table page has been corrupt.

What’s a Page Table?

A page table is the data structure which maps virtual memory addresses to physical memory address stored in RAM, it helps manage these entries by making the memory look like a flat continuous line of virtual addresses when in fact these addresses could be spread out all over the place.

So the second parameter contains the address of the Page Frame Number for the corrupted table Page Table Page.

A Page Frame Database is a way to track physical pages of memory, it keeps track of pages allocate to working sets, free, available etc.
So it’s an efficient way for the memory manager to know which pages are in use and which are available to use.

So lets take a look at the Page Table Page that’s been corrupted.

2: kd> dt nt!_MMPFN fffffa80015c69c0
   +0x000 u1               :
   +0x008 u2               :
   +0x010 PteAddress       : 0xfffff6fb`400001e0 _MMPTE
   +0x010 VolatilePteAddress : 0xfffff6fb`400001e0 Void
   +0x010 Lock             : 0n1073742304
   +0x010 PteLong          : 0xfffff6fb`400001e0
   +0x018 u3               :
   +0x01c UsedPageTableEntries : 0xffff
   +0x01e VaType           : 0 ”
   +0x01f ViewCount        : 0 ”
   +0x020 OriginalPte      : _MMPTE
   +0x020 AweReferenceCount : 0n128
   +0x028 u4               :
This indicates that the used paged table entry count has actually fallen below zero which is normally caused by drivers calling the MmUnlockPages function too many times on a linked list data structure.

So lets look at the callstack which contains a list of functions before the bugcheck.
The callstack contains all functions made starting at the bottom and working its way up to the most recent, it’s basically like a timeline.

2: kd> knL
 # Child-SP          RetAddr           Call Site
00 fffff880`138be698 fffff800`03b45d50 nt!KeBugCheckEx <– BSOD
01 fffff880`138be6a0 fffff800`03b077d9 nt! ?? ::FNODOBFM::`string’+0x35084
02 fffff880`138be860 fffff800`03dee0f1 nt!MiRemoveMappedView+0xd9
03 fffff880`138be980 fffff960`00099d06 nt!MiUnmapViewOfSection+0x1b1
04 fffff880`138bea40 fffff960`002c194b win32k!EngUnmapFontFileFD+0x8a
05 fffff880`138beab0 fffff960`00288ade win32k!ttfdSemDestroyFont+0x8b
06 fffff880`138beae0 fffff960`00286d0a win32k!PDEVOBJ::DestroyFont+0xf2
07 fffff880`138beb50 fffff960`000a933f win32k!RFONTOBJ::vDeleteRFONT+0x4a
08 fffff880`138bebc0 fffff960`000a8d73 win32k!RFONTOBJ::bMakeInactiveHelper+0x427
09 fffff880`138bec40 fffff960`000aa324 win32k!RFONTOBJ::vMakeInactive+0xa3
0a fffff880`138bece0 fffff960`00062a95 win32k!RFONTOBJ::bInit+0x1ec
0b fffff880`138bee00 fffff960`0006223f win32k!GreExtTextOutWLocked+0x7e5
0c fffff880`138bf220 fffff960`00062125 win32k!GreExtTextOutWInternal+0x10f
0d fffff880`138bf2d0 fffff960`00055c37 win32k!GreExtTextOutW+0x3d
0e fffff880`138bf330 fffff960`0006c90a win32k!DrawIt+0xd7
0f fffff880`138bf390 fffff960`00067560 win32k!DrawFrameControl+0x324
10 fffff880`138bf4b0 fffff960`00067224 win32k!CreateBitmapStrip+0x308
11 fffff880`138bf510 fffff960`00073677 win32k!xxxSetWindowNCMetrics+0x354
12 fffff880`138bf790 fffff960`00072e6e win32k!xxxUpdatePerUserSystemParameters+0x7f3
13 fffff880`138bfbf0 fffff800`03ad3e53 win32k!NtUserUpdatePerUserSystemParameters+0x2a
14 fffff880`138bfc20 00000000`76ea3d4a nt!KiSystemServiceCopyEnd+0x13
15 00000000`00abf7b8 00000000`00000000 0x76ea3d4a
 So we see a lot of win32k functions which may or may not be related to the bugcheck.

Win32k.sys is a Kernel Mode device driver that contains the window manager, graphics device interface and wrappers for DirectX support.

The window manager controls all windows, screen output displays, mouse and keyboard inputs as well as passing information to user mode applications.

The Graphics Device Interface (GDI) is a library of functions for graphics device output devices, it communicates via device drivers.
Basically applications call user mode functions for requests such as windows and buttons. The window manager communicates these requests to the GDI which are sent formatted and sent to the device driver, the device driver is then paired up with a video miniport driver to complete the display display output.

So there’s not much revealing in the callstack but there’s something else that sticks out.
WARNING: !chkimg output was truncated to 50 lines. Invoke !chkimg without ‘-lo [num_lines]’ to view  entire output.
Page 31a1d9 not present in the dump file. Type “.hh dbgerr004” for details
Page 3199f1 not present in the dump file. Type “.hh dbgerr004” for details
Page 31a167 not present in the dump file. Type “.hh dbgerr004” for details
483 errors : !win32k (fffff96000056248-fffff9600023e2b9)
What does this gibberish mean?

Well !chkimg is a way of copying executable images such as .dll, .exes etc to memory whenever a process is ran, this prevents the files from disk being altered. It’s a little more complicated than that which I need to look into but that’s the basics.

Now these images can be corrupted for various reasons but we’re not getting too many clues besides possibly bad RAM.

I decided to look at all the IRPs present in the system to see if anything else cropped up.

fffffa801620dd00 [fffffa8016215580] irpStack: ( e, 0)  fffffa800b17ae40 [ \Driver\aswNdisFlt]
fffffa8016210a60 [fffffa8016241060] irpStack: ( e, 0)  fffffa800b17ae40 [ \Driver\aswNdisFlt]
fffffa8016213680 [fffffa8016243060] irpStack: ( e, 0)  fffffa800b17ae40 [ \Driver\aswNdisFlt]
fffffa80162162f0 [fffffa8016243640] irpStack: ( e, 0)  fffffa800b17ae40 [ \Driver\aswNdisFlt]
fffffa8016219400 [fffffa8016244b50] irpStack: ( e, 0)  fffffa800b17ae40 [ \Driver\aswNdisFlt]
fffffa8016219c60 [fffffa801551d060] irpStack: ( c, 2)  fffffa8014813030 [ \FileSystem\Ntfs]
fffffa801621aa50 [fffffa8016246640] irpStack: ( e, 0)  fffffa800b17ae40 [ \Driver\aswNdisFlt]
fffffa801621eb10 [fffffa801551d060] irpStack: ( c, 2)  fffffa8014813030 [ \FileSystem\Ntfs]
fffffa80162207b0 [fffffa8016006b50] irpStack: ( e, 0)  fffffa800b044ba0 [ \FileSystem\FltMgr]
fffffa80162209e0 [fffffa8016006b50] irpStack: ( e, 0)  fffffa800b044ba0 [ \FileSystem\FltMgr]
fffffa8016228c60 [fffffa801551d060] irpStack: ( c, 2)  fffffa8014813030 [ \FileSystem\Ntfs]
fffffa8016237ee0 [fffffa8016247640] irpStack: ( d, 0)  fffffa800b1a3df0 [ \FileSystem\Npfs]
fffffa8016238a00 [fffffa8016248640] irpStack: ( e, 0)  fffffa800b17ae40 [ \Driver\aswNdisFlt]
fffffa80162399b0 [fffffa8016246b50] irpStack: ( e, 0)  fffffa800b17ae40 [ \Driver\aswNdisFlt]
fffffa801623e6d0 [fffffa8016244060] irpStack: ( e, 0)  fffffa800b17ae40 [ \Driver\aswNdisFlt]
aswNdisFlt is the Avast Anti Virus firewall driver that appears to be calling a lot of IRPs which makes me believe this is part of the problem, given that Avast is problematic.

I won’t post more code as it fills the page but I’m seeing a lot of HID USB and other USB IRPs being called, not only that but Logitech drivers specifically the keyboard and possibly mouse causing the problems.

Finally Intel Rapid Storage Technology is being flagged which isn’t surprising given that this driver is very problematic.

With all this said it looks like the cause is strong with possibly a mixture of causes and possibly bad RAM which can easily be tested using Memtest86+.