0x133 DPC_WATCHDOG_VIOLATION

I’ve not posted in a while but I found an interesting case on a forum and managed to acquire a Kernel memory dump.
I’m not going into detail about DPCs or interrupts as I have made blog posts on these in the past.

DPC_WATCHDOG_VIOLATION (133)
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arguments:
Arg1: 0000000000000000, A single DPC or ISR exceeded its time allotment. The offending
    component can usually be identified with a stack trace.
Arg2: 0000000000000501, The DPC time count (in ticks).
Arg3: 0000000000000500, The DPC time allotment (in ticks).
Arg4: 0000000000000000

So here it states that we encountered a DPC which exceeded the allocated time for it to finish executing. The problem is that it went over this time, and as stated before DPCs can hold up the system when taking too long to execute which can result in lagging, a slow system or even sound cutting out.

So lets look at our stack trace.

ffffd001`50c93c98 fffff800`9238bcc2 : 00000000`00000133 00000000`00000000 00000000`00000501 00000000`00000500 : nt!KeBugCheckEx
ffffd001`50c93ca0 fffff800`92271115 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff801`dceabf17 : nt! ?? ::FNODOBFM::`string’+0x18b12
ffffd001`50c93d30 fffff800`929a07b5 : ffffe001`00400a02 fffff800`922fcae6 fffff801`daed3cf8 ffffe001`00008201 : nt!KeClockInterruptNotify+0x95
ffffd001`50c93f40 fffff800`922e80e3 : ffffd001`50c93f60 00000000`00000008 ffff5377`5487cf7d 00000000`0000000c : hal!HalpTimerClockIpiRoutine+0x15
ffffd001`50c93f70 fffff800`9236412a : ffffe001`9c600500 ffffe001`9e8de1a0 00000000`00000000 00000000`00000000 : nt!KiCallInterruptServiceRoutine+0xa3
ffffd001`50c93fb0 fffff800`92364a9b : 44454c49`4146203a 696c6564`206f7420 6e657665`20726576 20212121`20352074 : nt!KiInterruptSubDispatchNoLockNoEtw+0xea
ffffd001`50c853a0 fffff800`922e8383 : ffffe001`9e92d030 ffffe001`9e968030 00000000`02290a8d 00000000`00000018 : nt!KiInterruptDispatchNoLockNoEtw+0xfb
ffffd001`50c85530 fffff801`dcfa5751 : ffffe001`9e66e7a0 ffffe001`00000000 ffffe001`9e92fbe0 00000000`fffff850 : nt!KeAcquireSpinLockRaiseToDpc+0x13
ffffd001`50c85560 fffff801`dcfa531d : ffffe001`9e96b840 fffff801`dcf2c48f ffffe001`9eb68490 fffff801`dcf2c550 : athwbx+0x161751
ffffd001`50c855f0 fffff801`dcf60c42 : ffffe001`9e96b840 ffffd001`50c85650 ffffd001`50c85654 00000000`00000000 : athwbx+0x16131d
ffffd001`50c85630 fffff801`dcf33472 : ffffe001`9e9bf030 fffff801`00000000 ffffd001`50c856d0 fffff801`dd074319 : athwbx+0x11cc42
ffffd001`50c85680 fffff801`dd0c129f : ffffe001`9e9bf030 ffffffff`ffffffff ffffe001`9e6d97e8 fffff801`dd011189 : athwbx+0xef472
ffffd001`50c856f0 fffff801`dd08679e : ffffe001`9e968030 00000000`00000000 00000000`00000000 00000000`00000000 : athwbx+0x27d29f
ffffd001`50c85720 fffff801`dae9e81e : ffffe001`9e961030 00000000`00000000 ffffd001`50c85790 00000000`00000000 : athwbx+0x24279e
ffffd001`50c85760 fffff800`92252130 : ffffd001`50c85b00 00000000`00000000 00000000`00000200 fffff800`92274ae0 : ndis!ndisInterruptDpc+0x269ce
ffffd001`50c85860 fffff800`9225134b : ffffd001`50c5c180 ffffe001`9e8f4010 ffffe001`9c46b900 ffffe001`a12f3080 : nt!KiExecuteAllDpcs+0x1b0
ffffd001`50c859b0 fffff800`923667ea : ffffd001`50c5c180 ffffd001`50c5c180 ffffd001`50c682c0 ffffe001`9dbbb540 : nt!KiRetireDpcList+0xdb
ffffd001`50c85c60 00000000`00000000 : ffffd001`50c86000 ffffd001`50c80000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x5a

So in this callstack we see our processor in an idle loop, when idle it tends to execute any DPCs if there are any waiting in the DPC queue.
It begins to execute all the DPCs in the queue (also known as draining) when get execute an [B]ndis dpc interrupt[/B], this begins to call network functions and then acquire a spinlock and raise to DPC/Dispatch IRQL level if it hasn’t already (this is the standard routine that is used, I can’t remember if it is required), we then recieve more interrupts followed by a clock interrupt and a bugcheck.

Okay so we know that we bugchecked because a DPC was taking too long to finish executing and risk holding up the system, especially where spinlocks are concerned.

The main thing that interests me is why is there a clock interrupt?

3: kd> !dpcs
CPU Type      KDPC       Function
 3: Normal  : 0xffffe0019e66e880 0xfffff801dae78eb0 ndis!ndisMTimerObjectDpc
 3: Normal  : 0xffffd00150c61668 0xfffff80092327b28 nt!PpmPerfAction
 3: Normal  : 0xffffd0015589a280 0xfffff80092258854 nt!PopExecuteProcessorCallback
 3: Threaded: 0xffffd00150c617c0 0xfffff8009231a0a0 nt!KiDpcWatchdog

I believe the ndis dpc interrupt is related to this timer object but I may be wrong, if it is related then the clock interrupt makes sense as the system requires intervals for clock interrupts to take place in order to keep track of system time and logical run time for threads and timers. Processes can modify the clock interrupt interval for their needs to process timers much quicker, I’ll not go into detail as I will talk about timers another time.

The only problem is that I ran into a dead end, I couldn’t find anything related to the network driver in terms of modifying the clock interrupt timer.

3: kd> !list “-e -x \”dt nt!_EPROCESS @$extret-@@(#FIELD_OFFSET(nt!_EPROCESS, TimerResolutionLink)) ImageFileName SmallestTimerResolution RequestedTimerResolution\” nt!ExpTimerResolutionListHead”
dt nt!_EPROCESS @$extret-@@(#FIELD_OFFSET(nt!_EPROCESS, TimerResolutionLink)) ImageFileName SmallestTimerResolution RequestedTimerResolution
   +0x438 ImageFileName            : [15]  “???”
   +0x638 RequestedTimerResolution : 0x9c3d2000
   +0x63c SmallestTimerResolution  : 0xffffe001

dt nt!_EPROCESS @$extret-@@(#FIELD_OFFSET(nt!_EPROCESS, TimerResolutionLink)) ImageFileName SmallestTimerResolution RequestedTimerResolution
   +0x438 ImageFileName            : [15]  “svchost.exe”
   +0x638 RequestedTimerResolution : 0
   +0x63c SmallestTimerResolution  : 0x2710

So I thought I’d look into this a bit more.

3: kd> u ndis!ndisInterruptDpc+0x269ce
ndis!ndisInterruptDpc+0x269ce:
fffff801`dae9e81e 488b75a7        mov     rsi,qword ptr [rbp-59h]
fffff801`dae9e822 e9e297fdff      jmp     ndis!ndisInterruptDpc+0x1b9 (fffff801`dae78009)
fffff801`dae9e827 33d2            xor     edx,edx
fffff801`dae9e829 488d4dd7        lea     rcx,[rbp-29h]
fffff801`dae9e82d 448d420d        lea     r8d,[rdx+0Dh]
fffff801`dae9e831 e8160c0000      call    ndis!ndisPcwEndCycleCounter (fffff801`dae9f44c)
fffff801`dae9e836 90              nop
fffff801`dae9e837 e9d797fdff      jmp     ndis!ndisInterruptDpc+0x1c3 (fffff801`dae78013)

It appears the interrupt routine is looping for some reason.

I can’t find anything on the cycle counter function as it is undocumented but I’ll take a guess and say that it’s keeping track of the time the interrupt has been executing, AFAIK this is don’t by using a counter on the currently executing thread to see how long it’s running.

3: kd> u nt!KeAcquireSpinLockRaiseToDpc+0x13
nt!KeAcquireSpinLockRaiseToDpc+0x13:
fffff800`922e8383 f605fcac270021  test    byte ptr [nt!PerfGlobalGroupMask+0x6 (fffff800`92563086)],21h
fffff800`922e838a 751f            jne     nt!KeAcquireSpinLockRaiseToDpc+0x3b (fffff800`922e83ab)
fffff800`922e838c f0480fba2900    lock bts qword ptr [rcx],0
fffff800`922e8392 7209            jb      nt!KeAcquireSpinLockRaiseToDpc+0x2d (fffff800`922e839d)
fffff800`922e8394 0fb6c3          movzx   eax,bl
fffff800`922e8397 4883c420        add     rsp,20h
fffff800`922e839b 5b              pop     rbx
fffff800`922e839c c3              ret

Here we can see the same DPC interrupt routine trying to acquire a spinlock yet it’s not managing to do it and therefore looping all whilst it is still running at DPC level and therefore preventing normal thread execution.

Eventually it seems it managed to acquire the spinlock and then call a clock interrupt in order to perform some operation, I suspect updating the system time in order to service the network driver with higher response times.
The system realised that it was taking too long to complete and therefore bugchecked.

3: kd> lmvm athwbx
start             end                 module name
fffff801`dce44000 fffff801`dd1ff000   athwbx     (no symbols)          
    Loaded symbol image file: athwbx.sys
    Image path: \SystemRoot\system32\DRIVERS\athwbx.sys
    Image name: athwbx.sys
    Timestamp:        Thu Oct 17 10:46:01 2013 (525FB1D9)
    CheckSum:         003BC161
    ImageSize:        003BB000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

So the network driver was quite outdated, he updated it and the blue screens stopped so it looks like it was an easy fix.

Thanks for reading.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s