Instruction pointer misalignments

This time I’ll talk about instruction pointer misalignment.
So what is an instruction pointer misalignment?

Well, when an object references memory it uses a pointer to (you guessed it) point to a certain memory address, once it references the data inside that address it grabs the data from inside the address which is known as dereferencing.

When a pointer is misaligned it grabs data from the wrong address which causes a lot of problems by causing severe memory corruption depending on the contents of the address being referenced, if allowed to write it can completely corrupt the address, the culprit can escape and some innocent pointer comes along, tries to use the address and gets blamed by the computer police.
This is why bugchecks are called to prevent such memory corruption, now the way data structures are arranged and accessed it will read/write in chunks of 4 bytes (sometimes larger) so the memory offset size will be a multiple of the word size, the reason this is done is to maximise the performance by utilising way the CPU handles memory.

When the memory being referenced isn’t a multiple of 4 then that’s when things go wrong, it generally results in an alignment fault which is also known a bus error, a good example is this.

This instruction taken from a crash dump can explain this a little bit.
    nt!MmCleanProcessAddressSpace+0xe6

 So the nt!Mm is the module, in this case it’s a Memory Management Windows function.
The CleanProcessAddressSpace is the actual function, in this case it’s scrubbing a memory address ready for allocation.
 The +0xe6 is the offset which is like the address on a street, it’s the location which the function takes place.

 I was actually looking at the differences between a segmentation fault and a bus error as they both involve the CPU not physically being capable of addressing the memory being referenced.

  • The segmentation error (or access violation exception) occurs when memory outside of the allowed location is referenced (not to be confused with buffer overruns which involve writing outside allocated memory into another buffer).
  • The bus error occurs when an address which is not alligned is referenced, by this as you know is when they aren’t multiples of 4.

Another thing to note, in dump files you can see where it says misaligned pointer it mentions it’s probably caused by hardware. As I’ve mentioned, it’s probably due to the fact that the CPU cannot address memory that isn’t alligned with multiples of 4 so it looks like it’s due to the CPU not being able to read it at all.
Misaligned IPs don’t always result in a bus error, they can be caused by drivers writing more data in a buffer on a stack which results in a stack overflow, this also results in a bugcheck to prevent critical memory corruption.

     I hope this has helped you understand the differences and more about instruction pointers.

    Hexadecimal and Binary

    This blog will be a little different to my usual debugging blogs.
    I will be talking about hexadecimals and binary, it can be difficult to fully understand but we should be able to get through it.

    Now, at school, I was never really good at Maths, I struggled with a lot of things but I’ve picked up a few things with debugging as Windows Internals uses these figures to perform operations that would not be possible in decimal.

     Binary can be difficult to get your head round but computers use them to make things a lot more simpler.
    Remember at school when you had to use a T chart and count in tens.
    So “10” would be 1 ten and 0 ones, in binary “10” means 1 twos and 0 ones.
    “100” in binary would be 4 (2×2), “1000” would be eight (4×2) etc.
    Generally, binary is used for power states within computers because they’re multiples of 2 it would on and off.
    There would also be far less rules compared to decimal which actually simplifies things (for the computer), but for us we would need a compiler to convert the code for us to make sense of them.

    Hexadecimal is used because it’s easier to make smaller numbers, it’s mainly used to convert code into binary easier as it divides easier. Instead of multiples of 10 hexadecimal uses multiples of 16, so “10” in hexadecimal would be 16 as it’s 1 sixteen and 0 ones.
    “25” in hexadecimal would be 2 sixteens and 5 ones so it would be 37.

    But how does that work as there aren’t symbols for 10 to 16 in hexadecimal?
    This is why we have letters for 10 to 16, here’s a good conversion chart to help you understand.

    Here we can see how they all convert into each other, obviously the higher the figures the more difficult they become to understand.

    Hopefully this has helped a lot of you understand if you didn’t already.

    0x7F (memory leak)

    In this post, we will be looking at a memory leak caused by a program called NotMyFault which is supplied by Sysinternals, they have some excellent tools you should check out if interested.
    To download NotMyFault then here’s the link.

    http://live.sysinternals.com/Files/NotMyFault.zip

    Let’s take a look.

    BugCheck 7F, {8, 80050033, 406f8, fffff80002e69f2c}

    This bugcheck indicates the Kernel encountered a trap which it’s not allowed to catch, this means that it cannot be resolved and must bugcheck. In this case the cause of the crash was a double fault, this cannot be resolved and crashes the system.
    A double fault occurs when an exception is takes place during the processing of another exception,  if an exception occurs when processing a double fault a triple fault can occur.

    So looking at the callstack this is what we see, do note this is only a small snippet as the callstack is massive with repeats of Nvidia driver functions at the same address.

    fffff880`02fddce8 fffff800`02ec7169 : 00000000`0000007f 00000000`00000008 00000000`80050033 00000000`000406f8 : nt!KeBugCheckEx
    fffff880`02fddcf0 fffff800`02ec5632 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiBugCheckDispatch+0x69
    fffff880`02fdde30 fffff800`02e69f2c : fffffa80`035d4000 00000000`00000000 00000000`00000000 fffff800`02ff947c : nt!KiDoubleFaultAbort+0xb2
    fffff880`009ab000 fffff800`02ff947c : 00000000`00000000 fffff880`009ab080 00000000`00000000 00000000`00000000 : nt!MiExpandNonPagedPool+0x14
    fffff880`009ab020 fffff800`02ffbf26 : fffff800`030586c0 00000000`00000003 00000000`00000000 fffff880`049f9c05 : nt!MiAllocatePoolPages+0xdfd
    fffff880`009ab160 fffff880`04a1ea55 : 00000000`00000000 00000000`00000001 fffff880`009ab2b8 fffff880`00000000 : nt!ExAllocatePoolWithTag+0x316
    fffff880`009ab250 fffff880`04a1b6e8 : fffffa80`05b75000 00000000`00000002 00000000`00000002 fffffa80`036a7000 : nvlddmkm+0x1bfa55
    fffff880`009ab280 fffff880`04ae392a : fffff880`009ab318 fffffa80`00000018 fffffa80`036a7000 fffffa80`05b75000 : nvlddmkm+0x1bc6e8
    fffff880`009ab2e0 fffff880`04b9f804 : 00000000`00100005 00000000`00000000 00000000`00100006 fffffa80`05b75000 : nvlddmkm+0x28492a
    fffff880`009ab310 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340804
    fffff880`009ab350 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
    fffff880`009ab390 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
    fffff880`009ab3d0 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
    fffff880`009ab410 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
    fffff880`009ab450 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
    fffff880`009ab490 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827
    fffff880`009ab4d0 fffff880`04b9f827 : 00000000`00100004 00000000`00100006 fffffa80`05b75000 fffffa80`05b75000 : nvlddmkm+0x340827

     So what is happening is the Nvidia driver is being blamed (probably due to it being in the stack when the last context was saved which was the exception) and is calling lots of function with what appears to be allocating more pages until a double fault in initiated, I suspect the double fault occurred due to memory not being able to be allocated which caused an exception then another exception occurred.

    So looking at the virtual memory usage we can see the following.

    3: kd> !vm

    *** Virtual Memory Usage ***
        Physical Memory:     1036418 (   4145672 Kb)
        Page File: \??\C:\pagefile.sys
          Current:   4145672 Kb  Free Space:   3702732 Kb
          Minimum:   4145672 Kb  Maximum:     12437016 Kb
        Available Pages:      100902 (    403608 Kb)
        ResAvail Pages:       209219 (    836876 Kb)
        Locked IO Pages:           0 (         0 Kb)
        Free System PTEs:   33504448 ( 134017792 Kb)
        Modified Pages:         4479 (     17916 Kb)
        Modified PF Pages:      4364 (     17456 Kb)
        NonPagedPool Usage:   764909 (   3059636 Kb)
        NonPagedPool Max:     764972 (   3059888 Kb)
        ********** Excessive NonPaged Pool Usage *****

    We can see that the non paged pool memory has been completed depleted which caused the system to crash.
    Now you might be asking, can’t it just put the memory onto disk to stop it crashing?
    Well moving memory from RAM onto disk is known as paging which is used to save space when the memory usage is high. However, Kernel memory is mainly divided into two main categories:

    -Paged Pool
    -Non Paged Pool

    Paged pool is for applications and other memory allocations that when not in use can be moved to disk to save storage space, non paged pool on the other hand can’t be moved to disk under any circumstances as device drivers and other critical operating system components use these dynamic memory allocations to function correctly, they must be available immediately for use.

    So why can’t they just page the memory back from disk when needed?

    Well it’s not that simple, paging can be very expensive in that it takes time and puts a lot of pressure on the drive which is much slower than RAM.
    Not only that but the IRQL must be at 1 or below in order to page files, when the IRQL is higher than 1 paging is not allowed. Just say for example we get a system call that needs servicing quickly at an IRQL of 7 for example, that may need the device driver to perform certain tasks but it can’t because it’s paged out, we can’t page it in because the IRQL needs to be at 1 or below.
    Now we can’t just lower the IRQL because the higher the IRQL the higher the priority which causes a bugcheck of 0xA or 0xD1.

    So why is the memory being leaked and what is it?

    A memory leak occur when an object acquires memory but doesn’t free it after it’s being used which prevents those pages from being allocated as they need to be freed but they’re not in use.
    If the object keeps calling ExAllocatePool then it keeps allocating memory but not using it, just because they’re not in use doesn’t mean they can be used by anything else.
    So when the last of the non paged memory pools have been used up the system cannot function anymore as critical objects cannot allocate memory to function so the system crashes.

    We can look at the assembly instructions to see what is happening.

    3: kd> .trap fffff880`02fdde30
    NOTE: The trap frame does not contain all registers.
    Some register values may be zeroed or incorrect.
    rax=00000000000bac2c rbx=0000000000000000 rcx=0000000000000001
    rdx=fffff880009ab0b8 rsi=0000000000000000 rdi=0000000000000000
    rip=fffff80002e69f2c rsp=fffff880009ab000 rbp=fffff880009ab080
     r8=ffffffffffffffff  r9=fffffa80035eb5b8 r10=00000000ffffffff
    r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
    r14=0000000000000000 r15=0000000000000000
    iopl=0         nv up ei pl zr na po nc
    nt!MiExpandNonPagedPool+0x14:
    fffff800`02e69f2c 4156            push    r14

     So it’s calling a function which I believe tries to expand non paged pool to allow objects to allocate it as it might be too small for use.

     3: kd> u nt!MiExpandNonPagedPool+0x14
    nt!MiExpandNonPagedPool+0x14:
    fffff800`02e69f2c 4156            push    r14
    fffff800`02e69f2e 4157            push    r15
    fffff800`02e69f30 4881ecd0000000  sub     rsp,0D0h
    fffff800`02e69f37 488db9ff010000  lea     rdi,[rcx+1FFh]
    fffff800`02e69f3e 4881e700feffff  and     rdi,0FFFFFFFFFFFFFE00h
    fffff800`02e69f45 483bf9          cmp     rdi,rcx
    fffff800`02e69f48 0f82cfbffeff    jb      nt! ?? ::FNODOBFM::`string’+0x1e009 (fffff800`02e55f1d)
    fffff800`02e69f4e 488b0513762100  mov     rax,qword ptr [nt!MiSystemVaTypeCount+0x28 (fffff800`03081568)]

     So here we can see push instructions which adds data onto the stack but because there is no more memory left it stops adding data and crashes.

    0x9F DRIVER_POWER_STATE_FAILURE

    First off I’d like to say I’m sorry I’ve not been posting in a while but I’ll try to post a bit more.

    BugCheck 9F, {4, 258, fffffa8007005660, fffff800053e83d0}

     There are two types of 0x9F bugchecks, indicated by the first parameter, the first one is indicated with a 3 which means an IRP has been held onto for too long so the system bugchecked as it holds everything else up.
    The second one is what we’re going to look at which indicates a thread is holding onto a power IRP for too long which causes it to timeout and bugcheck.

    So in more detail this bugcheck indicates a power IRP failed to synchronise with the PnP manager, basically a power IRP is an I/O Request Packet that sends power transitions down a device stack.
    All power IRPs must reach the PDO (Physical Device Object) at the bottom of the stack to ensure power transitions are done correctly.
    When one doesn’t reach the bottom for any reason the system bugchecks, in this case a thread was blocking the IRP so it couldn’t be completed within the allocated time interval.

    So lets look at the locks on the system which are blocking the IRP.
    To understand what this means we need to know what a lock is (ERESOURCE structure), locks are synchronisation mechanisms that allow drivers to access resources efficiently.
    There are two main types of locks, exclusive and shared where the exclusive lock is the owner and shared can be implemented across multiple threads.
    They contain a read/write mechanism where only one thread can write but multiple threads can read simultaneously.
    Acquiring a thread exclusivly requires no threads can be currently sharing it, for thread to acquire a lock it must be put into a wait state until it is available.

    This was only a brief explanation, for more information check out this article:

    http://msdn.microsoft.com/en-us/library/ff548046.aspx

    Back to the topic, looking at the locks.

    0: kd> !locks
    **** DUMP OF ALL RESOURCE OBJECTS ****
    KD: Scanning for held locks..

    Resource @ nt!IopDeviceTreeLock (0xfffff80003492ce0)    Shared 1 owning threads
        Contention Count = 1
         Threads: fffffa8007005660-01
    KD: Scanning for held locks.

    Resource @ nt!PiEngineLock (0xfffff80003492be0)    Exclusively owned
        Contention Count = 21
        NumberOfExclusiveWaiters = 1
         Threads: fffffa8007005660-01
         Threads Waiting On Exclusive Access:
                  fffffa800f308b50      

    KD: Scanning for held locks…………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………
    18855 total locks, 2 locks currently held

    Let’s look at the exclusive thread owning the lock.

    0: kd> !thread fffffa8007005660
    THREAD fffffa8007005660  Cid 0004.0048  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Non-Alertable
        fffffa800d035ee8  NotificationEvent
    IRP List:
        fffffa8008f5cc10: (0006,03e8) Flags: 00000000  Mdl: 00000000
    Not impersonating
    DeviceMap                 fffff8a000008c10
    Owning Process            fffffa8006f8d890       Image:         System
    Attached Process          N/A            Image:         N/A
    Wait Start TickCount      396427         Ticks: 38463 (0:00:10:00.026)
    Context Switch Count      44059          IdealProcessor: 2  NoStackSwap
    UserTime                  00:00:00.000
    KernelTime                00:00:00.343
    Win32 Start Address nt!ExpWorkerThread (0xfffff80003298150)
    Stack Init fffff88003bd2c70 Current fffff88003bd2280
    Base fffff88003bd3000 Limit fffff88003bcd000 Call 0
    Priority 15 BasePriority 12 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
    Child-SP          RetAddr           : Args to Child                                                           : Call Site
    fffff880`03bd22c0 fffff800`032845f2 : fffffa80`07005660 fffffa80`07005660 00000000`00000000 00000000`00000000 : nt!KiSwapContext+0x7a
    fffff880`03bd2400 fffff800`0329599f : fffffa80`0d0df208 fffff880`0ae9e10b fffffa80`00000000 00000000`00000000 : nt!KiCommitThreadWait+0x1d2
    fffff880`03bd2490 fffff880`0ae915dd : fffffa80`0d035000 00000000`00000000 fffffa80`0dd8ca00 00000000`00000000 : nt!KeWaitForSingleObject+0x19f
    fffff880`03bd2530 fffff880`0ae92627 : fffffa80`0d035000 00000000`00000000 fffffa80`0c0891a0 fffff880`03bd2670 : ZTEusbnet+0x35dd
    fffff880`03bd2580 fffff880`0215d809 : fffffa80`0c0891a0 fffff880`020f0ecd fffff880`03bd2670 fffffa80`091c5550 : ZTEusbnet+0x4627
    fffff880`03bd25b0 fffff880`0215d7d0 : fffffa80`091c54a0 fffffa80`0c0891a0 fffff880`03bd2670 fffffa80`08fc2ac0 : ndis!NdisFDevicePnPEventNotify+0x89
    fffff880`03bd25e0 fffff880`0215d7d0 : fffffa80`08fc2a10 fffffa80`0c0891a0 fffffa80`091f9010 fffffa80`091f90c0 : ndis!NdisFDevicePnPEventNotify+0x50
    fffff880`03bd2610 fffff880`0219070c : fffffa80`0c0891a0 00000000`00000000 00000000`00000000 fffffa80`0c0891a0 : ndis!NdisFDevicePnPEventNotify+0x50
    fffff880`03bd2640 fffff880`021a1da2 : 00000000`00000000 fffffa80`08f5cc10 00000000`00000000 fffffa80`0c0891a0 : ndis! ?? ::LNCPHCLB::`string’+0xddf
    fffff880`03bd26f0 fffff800`034fb121 : fffffa80`091c7060 fffffa80`0c089050 fffff880`03bd2848 fffffa80`070bfa00 : ndis!ndisPnPDispatch+0x843
    fffff880`03bd2790 fffff800`0367b3a1 : fffffa80`070bfa00 00000000`00000000 fffffa80`0dc19990 fffff880`03bd2828 : nt!IopSynchronousCall+0xe1
    fffff880`03bd2800 fffff800`03675d78 : fffffa80`09196e00 fffffa80`070bfa00 00000000`0000030a 00000000`00000308 : nt!IopRemoveDevice+0x101
    fffff880`03bd28c0 fffff800`0367aee7 : fffffa80`0dc19990 00000000`00000000 00000000`00000003 00000000`00000136 : nt!PnpSurpriseRemoveLockedDeviceNode+0x128
    fffff880`03bd2900 fffff800`0367b000 : 00000000`00000000 fffff8a0`11d1c000 fffff8a0`049330d0 fffff880`03bd2a58 : nt!PnpDeleteLockedDeviceNode+0x37
    fffff880`03bd2930 fffff800`0370b97f : 00000000`00000002 00000000`00000000 fffffa80`09122010 00000000`00000000 : nt!PnpDeleteLockedDeviceNodes+0xa0
    fffff880`03bd29a0 fffff800`0370c53c : fffff880`03bd2b78 fffffa80`114ab700 fffffa80`07005600 fffffa80`00000000 : nt!PnpProcessQueryRemoveAndEject+0x6cf
    fffff880`03bd2ae0 fffff800`035f573e : 00000000`00000000 fffffa80`114ab7d0 fffff8a0`123a25b0 00000000`00000000 : nt!PnpProcessTargetDeviceEvent+0x4c
    fffff880`03bd2b10 fffff800`03298261 : fffff800`034f9f88 fffff8a0`11d1c010 fffff800`034342d8 fffff800`034342d8 : nt! ?? ::NNGAKEGL::`string’+0x54d9b
    fffff880`03bd2b70 fffff800`0352b2ea : 00000000`00000000 fffffa80`07005660 00000000`00000080 fffffa80`06f8d890 : nt!ExpWorkerThread+0x111
    fffff880`03bd2c00 fffff800`0327f8e6 : fffff880`03965180 fffffa80`07005660 fffff880`0396ffc0 00000000`00000000 : nt!PspSystemThreadStartup+0x5a
    fffff880`03bd2c40 00000000`00000000 : fffff880`03bd3000 fffff880`03bcd000 fffff880`03bd2410 00000000`00000000 : nt!KxStartSystemThread+0x16

    A brief explanation is looking at the callstack we can see ndis functions calling the ZTEusbnet network driver about a PnP event, this looks like it’s due to the power IRP being sent down the stack but it’s being blocked by the network driver so it cannot get to the bottom f the stack, which I believe in this case is the pci.sys but I’m not too sure given it’s a USB network card and not a pci card.

    So let’s look at the IRP.

    0: kd> !irp fffffa8008f5cc10 7
    Irp is active with 10 stacks 10 is current (= 0xfffffa8008f5cf68)
     No Mdl: No System Buffer: Thread fffffa8007005660:  Irp stack trace. 
    Flags = 00000000
    ThreadListEntry.Flink = fffffa8007005a50
    ThreadListEntry.Blink = fffffa8007005a50
    IoStatus.Status = c00000bb
    IoStatus.Information = 00000000
    RequestorMode = 00000000
    Cancel = 00
    CancelIrql = 0
    ApcEnvironment = 00
    UserIosb = fffff88003bd27c0
    UserEvent = fffff88003bd27d0
    Overlay.AsynchronousParameters.UserApcRoutine = 00000000
    Overlay.AsynchronousParameters.UserApcContext = 00000000
    Overlay.AllocationSize = 00000000 – 00000000
    CancelRoutine = 00000000  
    UserBuffer = 00000000
    &Tail.Overlay.DeviceQueueEntry = fffffa8008f5cc88
    Tail.Overlay.Thread = fffffa8007005660
    Tail.Overlay.AuxiliaryBuffer = 00000000
    Tail.Overlay.ListEntry.Flink = 00000000
    Tail.Overlay.ListEntry.Blink = 00000000
    Tail.Overlay.CurrentStackLocation = fffffa8008f5cf68
    Tail.Overlay.OriginalFileObject = 00000000
    Tail.Apc = 00000000
    Tail.CompletionKey = 00000000
         cmd  flg cl Device   File     Completion-Context
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                Args: 00000000 00000000 00000000 00000000
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                Args: 00000000 00000000 00000000 00000000
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                Args: 00000000 00000000 00000000 00000000
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                Args: 00000000 00000000 00000000 00000000
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                Args: 00000000 00000000 00000000 00000000
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                Args: 00000000 00000000 00000000 00000000
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                Args: 00000000 00000000 00000000 00000000
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                Args: 00000000 00000000 00000000 00000000
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                Args: 00000000 00000000 00000000 00000000
    >[ 1b,17]   0  0 fffffa800c089050 00000000 00000000-00000000   
               \Driver\ZTEusbnet

                Args: 00000000 00000000 00000000 00000000
    IO verifier information:
    No information available – the verifier is probably disabled

    So here we can see the IRP reached ZTEusbnet but stopped there, so I think this driver is to blame.
    One last thing, let’s look at the device object.

    0: kd> !devobj fffffa800c089050
    Device object (fffffa800c089050) is for:
     NDMP14 \Driver\ZTEusbnet DriverObject fffffa800deeae70
    Current Irp 00000000 RefCount 0 Type 00000017 Flags 00002050
    Dacl fffff9a10009b881 DevExt fffffa800c0891a0 DevObjExt fffffa800c08a8c0
    ExtensionFlags (0x00000800)  DOE_DEFAULT_SD_PRESENT
    Characteristics (0x00000100)  FILE_DEVICE_SECURE_OPEN
    AttachedTo (Lower) fffffa80070bfa00 \Driver\usbccgp
    Device queue is not busy.

    So we can see the network driver is the upper layer and the usbccgp is the lower layer which is a USB bus driver.
    The way around this I believe would be to update the driver as I’ve had no reply from the OP since.

    I checked the timestamp for the network driver and it’s very outdated which is probably why it was causing such issues.

    0: kd> lm vm ZTEusbnet
    start             end                 module name
    fffff880`0ae8e000 fffff880`0aebc000   ZTEusbnet   (no symbols)          
        Loaded symbol image file: ZTEusbnet.sys
        Image path: \SystemRoot\system32\DRIVERS\ZTEusbnet.sys
        Image name: ZTEusbnet.sys
        Timestamp:        Mon Oct 13 06:50:10 2008 (48F2E192)
        CheckSum:         000329ED
        ImageSize:        0002E000
        Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

    Hope you enjoyed reading.

    0xF4 debugging



    This will be a short post seen as I haven’t managed to get hold of a Kernel memory dump that involves memory leakage although I have seen it happen but I think the memory dump was a while ago so I deleted it but I can’t remember exactly.

    BugCheck F4, {3, fffffa800bb94b30, fffffa800bb94e10, fffff800035e3270}

    This bugcheck indicates a critical process has terminated for some reason which causes the system to crash as this process is critical for the system’s operations.

    The 3 indicates a process has terminated so we can use the !process command on the second parameter.

    2: kd> !process fffffa800bb94b30

    GetPointerFromAddress: unable to read from fffff80003515000PROCESS fffffa800bb94b30 SessionId: none Cid: 0174 Peb: 7fffffda000 ParentCid: 0154 DirBase: 321389000 ObjectTable: fffff8a00b4f9840 HandleCount: Image: csrss.exe

    The process that crashed is csrss.exe (Client/Server Runtime Subsystem) which is the Windows Subsystem, although Windows was designed to support multiple subsystems, calling each subsystem to perform functions such as display I/O would result in duplicate functions which would inevitably reduce performance, therefore designers put a lot of basic functions within this primary subsystem to improve performance. This results in the Windows Subsystem (implemented within csrss.exe) being marked as a critical process even on servers where display I/O isn’t needed so if its exited for any reason the system must bugcheck.

    Now is mainly caused by disk I/O errors, so what is a disk I/O error?

    Well when drive cannot perform basic operations such as read and write Windows cannot perform basic routines so the system fails resulting in a crash, this is usually the cause of a failing disk.

    EXCEPTION_CODE: (NTSTATUS) 0xc0000006 – The instruction at 0x%p referenced memory at 0x%p. The required data was not placed into memory because of an I/O error status of 0x%x.

    X64_0xF4_IOERR_IMAGE_csrss.exe

    Secondly, severe memory leakage can cause this problem as it can drain all the systems resources, normally non paged memory pools so the system cannot function and crashes.

    It’s caused by programs not freeing there pages of memory after they’ve finished using them so the pages are no longer in use by the application but they can’t be used by anything else as they haven’t been freed.

    To determine whether or not you have a memory leakage you can use different programs, the Pool Monitor is one of them.

    It sorts all memory used on the system into different categories of your choice such as Paged and Nonpaged pools.

    For more information on the Pool Monitor look here:

    How to use Memory Pool Monitor (Poolmon.exe) to troubleshoot kernel mode memory leaks

    Another way is a Kernel Debugger which is my personal favourite way, you will need Kernel memory dumps to find pool leaks.

    You can start with using the !poolused 2 command

    I’ll show an example as I found a 0xF4 Kernel dump file but it isn’t the cause of a memory leak though.

    (This was due to a disk I/O failure.)

    EXCEPTION_CODE: (NTSTATUS) 0xc0000006 – The instruction at 0x%p referenced memory at 0x%p. The required data was not placed into memory because of an I/O error status of 0x%x.

    Using the !vm command we can look at all the Virtual Memory being used at the time of the crash.

    1: kd> !vm

    *** Virtual Memory Usage *** Physical Memory: 4175860 ( 16703440 Kb) Page File: \??\C:\pagefile.sys Current: 16703440 Kb Free Space: 16703436 Kb Minimum: 16703440 Kb Maximum: 50110320 Kb Available Pages: 3833632 ( 15334528 Kb) ResAvail Pages: 4052929 ( 16211716 Kb) Locked IO Pages: 0 ( 0 Kb) Free System PTEs: 33497223 ( 133988892 Kb) Modified Pages: 137120 ( 548480 Kb) Modified PF Pages: 37803 ( 151212 Kb) NonPagedPool Usage: 13122 ( 52488 Kb) NonPagedPool Max: 3116636 ( 12466544 Kb) PagedPool 0 Usage: 27172 ( 108688 Kb) PagedPool 1 Usage: 5597 ( 22388 Kb) PagedPool 2 Usage: 0 ( 0 Kb) PagedPool 3 Usage: 0 ( 0 Kb) PagedPool 4 Usage: 56 ( 224 Kb) PagedPool Usage: 32825 ( 131300 Kb) PagedPool Maximum: 33554432 ( 134217728 Kb)

    Although we can see the PagedPool usage that isn’t normally the cause of crashes due to memory leakage as it can be paged out to disk, it’s non paged pool leakage caused by device drivers that cause these issues.
    Lets look at all the processes that are using the Nonpaged memory pools.
    Do note the list is very long and it is ordered in size of memory usage so only the top few lines are of use.

    The 2 extension is used to display the amount of nonpaged pool usage, 4 would show page pool.

    1: kd> !poolused 2

    ….

    Sorting by NonPaged Pool Consumed

    NonPaged Paged

    Tag Allocs Used Allocs Used


    VfPT
    1 8388608 0 0 Verifier Allocate/Free Pool stack traces , Binary: nt!Vf

    XENO 30 2955056 0 0 UNKNOWN pooltag ‘XENO’, please update pooltag.txt

    Obtd 1 2625536 0 0 UNKNOWN pooltag ‘Obtd’, please update pooltag.txt

    NVRM 3064 2461228 1 528384 UNKNOWN pooltag ‘NVRM’, please update pooltag.txt

    4KBS 564 2319168 0 0 UNKNOWN pooltag ‘4KBS’, please update pooltag.txt

    The only pool usage that sticks out here is Driver Verifier running which separates certain pool allocations to monitor those specific drivers.

    We can confirm this by running the !verifier command.

    1: kd> !verifier

    Verify Flags Level 0x00000dbb

    STANDARD FLAGS: [X] (0x00000000) Automatic Checks [X] (0x00000001) Special pool [X] (0x00000002) Force IRQL checking [X] (0x00000008) Pool tracking [X] (0x00000010) I/O verification [X] (0x00000020) Deadlock detection [X] (0x00000080) DMA checking [X] (0x00000100) Security checks [X] (0x00000800) Miscellaneous checks

    ADDITIONAL FLAGS: [ ] (0x00000004) Randomized low resources simulation [ ] (0x00000200) Force pending I/O requests [X] (0x00000400) IRP logging

    [X] Indicates flag is enabled

    We can see every option apart from Force pending I/O requests and Low resource simulation are enabled as these options create unrealistic environments for drivers that can cause them to crash when in reality the drivers might not crash at all so this creates false positive reports.

    For more information on Driver Verifier options look here:

    Driver Verifier Options (Windows Drivers)

    Lastly, if pool usage is too high and causing system crashes we can take a look at IRPs being used as sometimes they can keep calling the functions using up memory.

    We can do this by using the !irpfind command, unfortunately it doesn’t look like they’re saved in this dump file for some reason which I’ve never seen before.

    A great example of this bugcheck can be found here.

    This was originally recommend by my friend Vir Gnarus.

    But I will see what happens regarding this situation and see if changing the disk drive solves the issue.

    I forgot to mention the last way of tracking pool usage, Driver Verifier can use Pool Tracking to monitor all drivers selected and see if they have freed their allocations after the driver unloads, if it doesn’t the system crashes with a 0xC4 bugcheck hopefully catching the culprit.

    This is not to be confused with Special Pool which Driver Verifier uses to allocate driver memory from a special pool which can be monitored for incorrect usage, for example, if a driver tries to access memory that has already been freed then the system will crash as it hasn’t been scrubbed and ready for allocation.

    If a driver allocates 100 bytes but writes 110 bytes the driver will write into another driver’s header which can blame a different driver long after the culprit has left the scene so when the police come and investigate the crime scene the wrong person is locked up.

    Special Pool changes how things are setup, so when a driver has allocated memory a guard page is set as well as slop byte at each end of the buffer, if the driver tries to write more than its allocated into the guard page the system immediately bugchecks, after the driver has unloaded and the memory has free the slop bytes will check to see if the memory has been freed, if it hasn’t the system will bugcheck and blame that driver.

    For more information on Special Pool visit the previous link.