common applicationregistesr和so...

NEW DATABASE - 350 MILLION DATASHEETS FROM 8500 MANUFACTURERS
& 2016 Datasheet Archive 上传我的文档
 下载
 收藏
该文档贡献者很忙,什么也没留下。
 下载此文档
正在努力加载中...
自动散热风扇控制系统设计与实现
下载积分:1998
内容提示:自动散热风扇控制系统设计与实现
文档格式:PDF|
浏览次数:36|
上传日期: 02:24:16|
文档星级:
该用户还上传了这些文档
自动散热风扇控制系统设计与实现
官方公共微信114网址导航CodeMachine
CodeMachine
Windows on ARM - An assembly language primer
The ARM CPU has garnered significant attention in the recent past due to
its wide-spread usage in mobile devices.
With Windows 8, for the first time Microsoft has released a mainstream
Windows OS to run on the ARM CPU. Windows CE has been running on ARM for
more than a decade now.
Developers and support engineers working with the Windows on ARM (WoA)
platform need a basic understanding of the ARM CPU and ARM assembler
in order to be able to effectively troubleshoot and debug issues that
occur at lowest levels of the operating system.
Although there is no shortage of information on the ARM CPU architecture and
assembly language, there is a very little information on the usage of ARM
assembly on Windows 8.
This article attempts to provide the reader with enough information to gain
a basic understand the ARM assembly language as used by Windows.
It does not attempt to be a comprehensive reference manual for the ARM CPU,
please refer to references section for detailed information on this topic.
This section covers some of the tools that were used to research this article.
In order to test the conversion of C/C++ constructs to ARM assembler,
the ARM cross compiler that ships with VS2013 was used.
To build the ARM executables the compiler was run from a console window
as shown below.
The following section assumes VS2013 is installed on the system in the default
install path.
C:\& cd C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\x86_arm
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\x86_arm& vcvarsx86_arm.bat
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\x86_arm& cd C:\work
C:\work& cl /FAcs /Zi /D _ARM_WINAPI_PARTITION_DESKTOP_SDK_AVAILABLE=1
HelloWorld.c
HelloWorld.c contains the C source code, as shown below:
#include "stdio.h"
void main (void )
printf ( "Hello World\n");
To study how the individual ARM assembler instructions are translated
into ARM opcodes the ARM assembler was used. Once the assembler generated
the object (.OBJ) file, the linker (link.exe) was used to examine
opcode sequences.
All these steps are shown below.
The ARM assembler and linker also ship with Visual Studio 2013.
C:\work& armasm HelloASM.asm
C:\work& link -dump -disasm HelloASM.obj
HelloASM.asm contains some arbitrary ARM assembler instructions, as listed below:
AREA |.text|, CODE, THUMB
subs r0,r0,r3
add r4,r6,r0,lsl #3
addw r11,sp,#8
rsbs r5,r1,#0
As shown below, the output of the linker contains the opcodes from the .text
section of the .OBJ file.
Microsoft (R) COFF/PE Dumper Version 12.00.21005.1
Copyright (C) Microsoft Corporation.
All rights reserved.
Dump of file HelloASM.obj
File Type: COFF OBJECT
: EB06 04C0 add
r4,r6,r0,lsl #3
: F20D 0B08 addw
0000000A: 424D
0000000C: 4770
0000000E: E7FE
: F000 F800 bl
84 .debug$S
CPU Version
The research for this article was performed on a Microsoft Surface RT
(Generation 1) running on an Nvidia TEGRA 3 Quad Core CPU.
The Secure Boot Signing Policy that retail devices like Surface RT ship
with does not allow live kernel debugging.
It is however possible to configure Surface RT devices to generate complete
kernel memory dumps and these memory dumps can be loaded and analyzed on
both the X86 and X64 versions of WinDBG.
So all the research for this article was done using kernel mode and
user mode memory dumps generated on the Surface RT device.
User mode dumps on the Surface RT device were generated by simply using
Task Manager's "Create dump file" option.
To generate a complete kernel memory dump on a Surface RT system the
commands listed below were run from an administrative command prompt,
followed by a system reboot and finally bug-checking the system using the
RightCtrl+ScrollLock+ScrollLock key sequence, as described in
wmic recoveros set DebugInfoType = 1
reg add "HKLM\SYSTEM\CurrentControlSet\Services\kbdhid\Parameters" /v CrashOnCtrlScroll
/t REG_DWORD /d 0x1
Loading up a kernel complete memory dump generated on the Surface RT in
WinDBG v6.3.9600 for X86/X64 shows the Windows 8 kernel was running
on a quad-core ARM CPU in Thumb-2 mode.
Loading Dump File [MEMORY.DMP]
Kernel Bitmap Dump File: Full address space is available
************* Symbol Path validation summary **************
Response Time (ms) Location
SRV*c:\SYMBOLS*/download/symbols
Symbol search path is: SRV*c:\SYMBOLS*/download/symbols
Executable search path is:
Windows 8 Kernel Version 9200 MP (4 procs) Free ARM (NT) Thumb-2
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: .armfre.win8_gdr.3
Machine Name:
Kernel base = 0x PsLoadedModuleList = 0x835d08c0
The "!sysinfo cpuinfo" command describes the CPU as ARM Family 7 Cortex-A9 r02p09.
this information the ARM Cortex-A9 Technical Reference Manual
and the ARM Architecture Reference Manual for ARM-v7-A and ARM-v7-R
were used to research this article.
0: kd& !sysinfo cpuinfo
[CPU Information]
~MHz = REG_DWORD 1300
Component Information = REG_BINARY 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Configuration Data = REG_FULL_RESOURCE_DESCRIPTOR ff,ff,ff,ff,ff,ff,ff,ff,0,0,0,0,0,0,0,0
Identifier = REG_SZ ARM Family 7 Model C09 Revision 209
ProcessorNameString = REG_SZ NVIDIA(R) TEGRA(R) 3 Quad Core CPU
VendorIdentifier = REG_SZ NVIDIA
The ARM CPU can execute in User, System, Supervisor, Abort, Undefined,
Interrupt (IRQ) and Fast Interrupt (FIQ) modes.
In total the ARM CPU has 37 physical registers, each one 32-bits wide.
Out of these 37 registers, only 17 registers are visible to software at
any given point in time, depending on the mode the CPU is executing in.
These registers comprise of thirteen general-purpose registers (r0 to
r12) and three special purpose registers (r13- r15) and the CPU Program
Status Register (CPSR).
The special purpose registers r13, r14, r15 are also referred to as
SP, LR, and PC respectively.
The CPSR register is similar to the X86/X84 flags register.
Unlike the X86, ARM does not contain any segment registers.
The table below lists the ARM CPU registers and their usage.
RegisterDescription
Contains the 1st parameter passed to functions.
32-bit function return value, similar to the EAX register on X86.
Low-word of 64-bit function return value.
Contains the 2nd parameter passed to functions.
High-word of 64-bit function return value.
Contains the 3rd parameter passed to functions.
Contains the 4th parameter passed to functions.
General purpose registers, callee saved.
Frame Pointer, similar to the EBP register on X86.
General purpose register.
Stack Pointer, similar to X86 ESP, callee saved.
Link Register, contains the return address during a function call.
Callee saved for non-leaf functions.
Leaf functions don't save this register since they don't modify it.
Program Counter, similar to EIP on X86.
Current Program Status Register. Similar to the EFlags register on X86.
Application Program Status Register.
This is not a separate register but the NZCVQ and GE bits of the CPSR
that are writable from user mode.
Saved Program Status Register.
Copy of the CPSR at the time an exception occurs. SPSR contains the pre-exception
value of the CPSR. The CPU contains a separate instance of the SPSR for
every exception mode that is supported by the ARM CPU.
Of the 17 registers mentioned above, r0-r7 and r15 are unbanked registers
i.e. they map to the same physical registers irrespective of the mode the
CPU is executing in.
Registers r8 through r14 are banked i.e. they map to different physical
registers depending on the CPU's execution mode.
The purpose of banked registers is for the CPU to automatically save
and restore these register contents across execution mode changes
and ensure that the registers are not overwritten during an exception.
Registers r13 and r14 are banked in all execution modes except in System Mode.
Registers r8?r12 are banked only in FIQ mode.
In addition, the CPSR register is banked into the SPSR registers in all
modes, expect in System Mode.
The ARM documentation refers to banked registers with the suffixes
svc, abt, und, irq or fiq representing the execution modes of the CPU in
which the registers are used.
The following table shows the banked and unbanked registers in all of the
different execution modes of the CPU:
UserSystemSupervisorAbortUndefinedIRQFIQ
r0r0r0r0r0r0r0
r1r1r1r1r1r1r1
r2r2r2r2r2r2r2
r3r3r3r3r3r3r3
r4r4r4r4r4r4r4
r5r5r5r5r5r5r5
r6r6r6r6r6r6r6
r7r7r7r7r7r7r7
r8r8r8r8r8r8r8_fiq
r9r9r9r9r9r9r9_fiq
r10r10r10r10r10r10r10_fiq
r11r11r11r11r11r11r11_fiq
r12r12r12r12r12r12r12_fiq
SPSPSP_svcSP_abtSP_undSP_irqSP_fiq
LRLRLR_svcLR_abtLR_undLR_irqLR_fiq
PCPCPCPCPCPCPC
CPSRCPSRSPSR_svcSPSR_abtSPSR_undSPSR_irqSPSR_fiq
The list of ARM registers can be examined using the debugger's register display
r0===835c3d3c r3== r5=835df580
r6======82b30890
r12=912ca010 sp=82b305e0 lr= pc= psr= -ZC-- ARM
nt!KeBugCheck2+0xfc:
f1150020 adds r0,r5,#0x20
Current Program Status Register (CPSR)
The following figure shows the format of the CPSR register.
Figure 1 : ARM CPSR Register Format
The five mode bits M[4:1] contain the values listed in the
following table indicating the mode CPU is currently operating in :
ModeValueDescription
USR0x10User Mode
FIQ0x11FastInterrupt Mode
IRQ0x12Interrupt Mode
SVC0x13Supervisor Mode
ABT0x17Abort Mode
UDF0x1BUndefined Mode
SYS0x1FSystem Mode
The J & T bits determine if the CPU is in ARM or Thumb mode,
where J = Jazelle and T = Thumb.
J=0 & T=0 ARM Mode
J=0 & T=1 Thumb Mode
The NZCV bits are used by conditional flow control instructions to alter
program execution based on the result of compare operations.
These bits are set by instructions like cmp, tst, or any other instruction
that has an "S" suffix.
The 2-letter acronyms in the Condition column in the following table are
used as suffixes to branch instructions.
The Flags column shows the value of one or more condition bits that would
result in the corresponding branch being taken.
Examples of such conditional flow control instructions are Conditional
Compare and Branch (CBxx)
and Conditional Branch (Bxx) and its
variants. The xx is the condition suffix as shown below:
Description
0000 (0)EQZ == 1Equal
0001 (1)NEZ == 0Not equal
0010 (2)CSC == 1Carry set
0011 (3)CCN == 1Carry clear
0100 (4)MIN == 1Description
0101 (5)PLN == 0Plus, positive or zero
0110 (6)VSV == 1Overflow
0111 (7)VCV == 0No overflow
1000 (8)HI(C == 1) && (Z == 0)Unsigned higher
1001 (9)LS(C == 0) || (Z == 1)Unsigned lower or same
1010 (a)GEN == VSigned greater than
1011 (b)LTN != VSigned less than
1100 (c)GT(Z == 0) && (N == V)Signed greater than
1101 (d)LE(Z == 1) || (N != V)Unsigned less than or equal
1110 (e)ALAnyAlways (unconditional)
Trap Frames
The trap frame structure (KTRAP_FRAME) is used by Windows to save
and restore register contents during interrupts, system calls and exceptions.
Due to the use of banked registers the ARM CPU does not push anything on the
stack during an exception, hence the trap frame on the ARM CPU is entirely
defined by software.
The trap frame structure for ARM CPU, defined in ntddk.h, is as follows:
0: kd& dt nt!_KTRAP_FRAME
+0x000 Arg3
+0x004 FaultStatus
+0x008 FaultAddress
+0x008 TrapFrame
+0x00c Reserved
+0x010 ExceptionActive
+0x011 ContextFromKFramesUnwound : UChar
+0x012 DebugRegistersValid : UChar
+0x013 PreviousMode
+0x013 PreviousIrql
+0x014 VfpState
: Ptr32 _KARM_VFP_STATE
+0x018 Bvr
: [8] Uint4B
+0x038 Bcr
: [8] Uint4B
+0x058 Wvr
: [1] Uint4B
+0x05c Wcr
: [1] Uint4B
+0x070 R12
+0x07c R11
+0x084 Cpsr
The kernel debugger .trap command switches the debugger's register context to the
given trap frame and displays the contents of the trap frame as shown below:
0: kd& .trap 9f40bd40
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
r1=00e8fa70
r9===00e8fa40
sp=00e8f8d0
lr=754c0c4d
pc= psr= ----- Thumb
As highlighted by the "NOTE:" in the above output, the trap frame structure
does not contain fields for non-volatile registers i.e. R4-R10. At the time of
an exception the non-volatile registers are saved in another structure called
the KEXCEPTION_FRAME.
The KEXCEPTION_FRAME structure is not exposed through public symbols but
it is defined in ntddk.h.
The macros GENERATE_EXCEPTION_FRAME and RESTORE_EXCEPTION_FRAME are defined in
the WDK Header file kxarm.h. These macros are used at the beginning and end of
functions respectively to setup and tear down the EXCEPTION_FRAME structures
on the stack.
In addition to the CPU registers described above, the KTRAP_FRAME also contains
a copy of the CPU's Breakpoint Value registers (Bvr) and the Breakpoint Control
Registers (Bcr) which control the configuration and usage of the Bvrs.
The KTRAP_FRAME also contains a copy of the CPU's Watchpoint Value Registers (Wvr)
and the Watchpoint Control Registers (Wcr) which control the configuration and usage
of the Wvrs.
All of the breakpoint and watchpoint registers reside in co-processor CP14,
more on co-processors later.
The maximum number of breakpoints and watch points that are available on a
CPU are defined in hardware and these values are cached in the Kernel Processor
Control Region (KPCR) structure.
The fields KPCR.MaxBreakpoints and KPCR.MaxWatchpoints cache the maximum
number of breakpoints and watchpoints respectively.
The content of these fields in the KPCR structure is shown below:
0: kd& !pcr
KPCR for Processor 0 at 835df000:
Major 1 Minor 1
Panic Stack
Irql addresses:
Routine 835df000
0: kd& dt nt!_KPCR 835df000 -y Prcb.Max
+0x580 Prcb
+0x510 MaxBreakpoints : 6
+0x514 MaxWatchpoints : 1
The trap frame also optionally points to the Vector Floating Point (VFP)
registers, these registers reside in co-processor CP10.
These registers are used as either 64-bit "D" floating point registers or
as the NEON 128-bit SIMD or "Q" registers.
These "D" and "Q" registers are aliased and they map to the same physical
bits in the VFP.
The VFP register values can be read using the VTSM instruction and written
to using the VLDM instruction.
The debugger's default register mask on the ARM i.e. 0x01 causes the 'r'
command to display only the integer registers.
The other registers described above can be examined by setting the
register mask to 0x4f as shown below:
Register output mask is 1:
1 - Integer state (32-bit)
0: kd& rm 4f
Register output mask is 4f:
2 - Integer state (64-bit)
4 - Floating-point state
8 - CP14 Debug registers
40 - NEON registers
r2=835c3d3c
r5=835df580
r9===82b30890
r12=912ca010
sp=82b305e0
pc= psr= -ZC-- ARM
q00=-326.276 -..
q01=-0..3 -1.1.
q02=1.9.. -5.2
q03=-1..9 -4.0.3
q04=0 0 0 0
q05=0 0 0 0
q06=0 0 0 0
q07=0 0 0 0
q08=-2.8 -1.1.9
q09=-2.0 -0..8 -2.1
q10=6.8 -0..6.7
q11=-1.7.635
q12=-1..9 -4.4.9
q13=1.9. -4.0.3
q14=-1.2 -6.1. -9.3
q15=-2. -2.6.2 -3.4
nt!KeBugCheck2+0xfc:
f1150020 adds
r0,r5,#0x20
Instruction Set
Windows, like all other modern operating systems, uses the ARM CPU in
Thumb-2 mode in which instructions are either 16 bits (Thumb) or 32 bits (ARM).
Thumb mode, which was introduced in early ARM processors, allows for higher
instruction density and uniform instruction coding but these instructions
are limited in functionality as compared to their 32-bit ARM counterparts.
Here are some of the limitations:
16-bit Thumb instructions only contain 3-bits to identify source and
destination registers.
Consequently only registers R0 - R7 can be accessed by them.
The 32-bit ARM instructions, on the other hand, can access the full set
of R0 - R15 registers.
Following are some examples of 16-bit instructions accessing registers r0 - r7.
OpcodeMnemonicOperand
2304movsr3,#4
4605movr5,r0
2D00cmpr5,#0
3B01subsr3,#1
005Clslsr4,r3,#1
Thumb instructions cannot be predicated i.e. they cannot be made to
operate conditionally using the NZCV bits like the ARM instruction set.
Immediate Values are restricted to 12 bits, so only numbers from 0 to 4095
can be encoded with the instruction.
However using the barrel shifter, described later in this article, the
immediate number can be multiplied and added to an existing register
value to increase its range.
A Thumb routine can call both Thumb code and ARM code, but it cannot
contain non-Thumb instructions. The same goes for an ARM routine.
Thumb-2, introduced in modern ARM processors, allows these limitations
to be worked around by enabling compilers and the processor to generate
and understand functions which combine both Thumb and ARM instructions
in the same instruction stream, without requiring branch instructions to
switch from one mode to the other.
During a branch operation the ARM CPU must be told that the target of
the branch is a Thumb-2 instruction.
This is indicated by setting the least significant bit of the branch address.
As a consequence of this when a function pointer is examined in WinDBG
it always points at a one byte offset within the function as illustrated
0: kd& x nt!IopErrorLogWorkItem
nt!IopErrorLogWorkItem = &no type information&
0: kd& dt 835d84d0 nt!_WORK_QUEUE_ITEM
+0x000 List
: _LIST_ENTRY [ 0x0 - 0x0 ]
+0x008 WorkerRoutine
: 0x836c8189
nt!IopErrorLogThread+0
+0x00c Parameter
0: kd& u 0x836c8189
nt!IopErrorLogThread+0x1:
836cff0 push
{r4-r11,lr}
836c818c f20d0b1c addw
r11,sp,#0x1C
836cfbb8 bl
nt!_security_push_cookie (834e0904)
836cd58 subw
sp,sp,#0x658
836c819a 930b
r3,[sp,#0x2C]
836c819c 930c
r3,[sp,#0x30]
836c819e f7fffc1b bl
nt!IopErrorLogConnectSession (836c79d8)
The function nt!IopErrorLogThread begins at address 0x836c8188, however
the field WORK_QUEUE_ITEM.WorkerRoutine contains the address 0x836c8189
which has the least significant bit is set indicating a Thumb-2 instruction stream.
Instruction Encoding
Since instruction sizes in ARM Thumb-2 mode can be both 16 and 32 bit, the
way an instruction is encoded plays a critical role in determining the actual
instruction size.
32-bit instructions are encoded as 2 separate 16-bit half-words.
The value of bits[15:11] of the first half-word determines if the instruction
is made of a single half-word (16 bits) or double half-word (32-bits).
If the value of bits[15:11] of the first half-word are either 11101 or 11110 or 11111,
the half-word is the first half-word of a 32-bit instruction otherwise it is
a 16-bit instruction.
Here is an example of an instruction encoded with a 32-bit operand:
0:000& u 77b485de L1
ntdll!TppWorkerThread+0x92:
77b485de f3bf8f5b dmb
The opcode for "dmb ish" "f3bf8f5b" is made of 2 16-bit numbers as
illustrated below, with the opcode displayed as a single word (32-bit) and
and two half-words (16-bit).
0:000& dd 77b485de
0:000& dw /c1 77b485de
The following excerpt from the ARMv7 Architecture Reference Manual
Section A8.8.43 shows the encoding of the above mentioned DMB instruction
in Thumb-2 mode.
Figure 2 : ARM 32-bit instruction encoding
The first (lower) 16 bit part of the opcode (0xf3bf) is represented by the binary
number "11 1111" which matches the first half of the instruction encoding.
The second (higher) 16 bit part of the opcode (0x8f5b) is represented by the binary
number "01 1011" which matches the second half of the instruction encoding.
The "option" value is binary 1011, and specifies the ISH option to the DMB instruction
as shown below:
Figure 3 : ISH option of DMB instruction
Here is an example of an instruction encoded with a 16-bit operand:
0: kd& u 834daa62 L1
nt!KiIdleLoop+0x3e:
834daa62 bf10
0: kd& .formats bf10
Evaluate expression:
Decimal: 48912
Thu Jan 01 08:35:12 1970
low 6.8 high 0
The opcode (0xbf10) is represented by the binary number 01 0000,
which matches the instruction bit encoding shown below.
Figure 4 : ARM 16-bit instruction encoding
Instructions on the ARM CPU have different variants depending on the prefix
that follows the primary mnemonic. These prefixes can be S, W, or .W and
determines how the instruction is encoded, whether CPSR are affected
and how some of the operands are interpreted.
Instructions that have an S suffix change the NZCV bits of the CPSR
register based on the result of the operation.
Instructions that have a .W suffix are always encoded as 32-bit ARM
instructions as opposed to 16-bit Thumb instructions.
Instructions that have a W suffix zero extend their 12-bit immediate
value i.e. the 3rd operand. ARM 32-bit instructions that don't have the
W suffix treat their 3rd operand as a 12-bit constant value and decode it
based on the value of most significant 4 bits of the constant i.e. bits 11-8.
Following are some variants of the ADD instruction with the same operands
encoded differently based on the suffix immediately following the instruction
The first column is the opcode for the instruction.
Barrel Shifter
The ARM instruction set has the capability to combine shift and rotate
operations along with arithmetic, logical, compare, load and store
operations in a single instruction.
This is achieved through the barrel shifter, a hardware logic unit in
the CPU shown below:
Figure 5 : ARM Barrel Shifter
The barrel shifter implements shift and rotate operations that can be of
arithmetic or logical type like:
Logical Shift Left (LSL)
Logical Shift Right (LSR)
Arithmetic Shift Right (ASR)
Rotate Right (ROR)
Rotate Right with Extend (RRX)
Examples of instructions that use the barrel shifter:
ea445302 orr
r3,r4,r2,lsl #0x14
eb033412 add
r4,r3,r2,lsr #0xC
The ORR instruction performs a Logical Shift Left (LSL) of register r2 by 20 positions.
The resulting operation becomes r3 = LogicalOR ( r4, LogicalShiftLeft ( r2, 0x14) ).
The ADD instruction performs a Logical Shift Right (LSR) of register r2 by 12 positions.
The resulting operation becomes r4 = Add ( r3, LogicalShiftRight ( r2, 0xc) ).
Instruction Ordering
Modern compilers attempt to optimize program execution by generating instruction
sequences which may be different from what was intended by the high level
programming language.
Modern CPUs also perform multiple run time optimizations like instruction
pipelining, write buffering, instruction and data caching, speculative
execution and out of order execution.
While these optimizations result in faster program execution, there are
cases where they may lead to undesirable results.
This is especially true for low level operations performed by the OS like
cache operations, TLB flushes, page table updates and device register accesses.
Barriers prevent both the compiler and CPU from performing the above mentioned
optimizations.
The ARM CPU documentation uses the term barrier to refer to CPU optimization prevention.
There are 3 different types of barriers that can be used on the ARM CPU.
Instr.Barrier TypeDescription
Data Memory Barrier
Ensures that all explicit memory accesses before the DMB instruction
complete before any explicit memory accesses after the DMB instruction start.
The DMB instruction is automatically inserted by the compiler whenever
any Interlocked family of functions are used in C or C++.
Additionally declaring a global variable as volatile results in the compiler
generating DMB instructions provided the file is compiled with the
/volatile:ms, instead of the /volatile:iso option.
Data Synchronization Barrier
Completes when all instructions before this instruction complete.
The DSB instruction can be directly inserted using the macro
_DataSynchronizationBarrier() which is defined in winnt.h.
Instruction Synchronization Barrier
Flushes the pipeline in the CPU, so that all instructions following
the ISB are fetched from cache or memory, after the ISB has been completed.
The ISB instruction can be directly inserted using the macro
_InstructionSynchronizationBarrier() which is defined in winnt.h.
The scope of these barrier instructions can be restricted to sharing
domains as well as to specific memory access types.
These can specified optionally as instruction suffixes to the barrier
instructions.
If a barrier instruction does not have a suffix its scope is assumed to be
system wide and it applies to both read and write type memory accesses.
Sharing DomainSuffixDescription
Non-Shareable
Per-Core TLBs
Inner Shareable
System Memory
Outer Shareable
Device Memory
Full System
System and Device Memory
Access TypeSuffixComments
Read and Write
For full system read and write access, the sharing domain and access
is combined into the suffix SY.
Write only
For full system write only access, the sharing domain and access is
combined into the suffix ST.
The following annotated code snippet shows the usage of the ISB instruction to perform a
pipeline flush before updating the exception handling settings on the ARM CPU and another
one after the update to fetch subsequent instructions directly from memory.
0: kd& uf nt!KiInitializeExceptionVectorTable
nt!KiInitializeExceptionVectorTable:
834a1b40 4b07
r3,=nt!KiArmExceptionVectors+0x1 (834dc6a1)
834a1b42 f0330301 bics
834a1b46 ee0c3f10 mcr
p15,#0,r3,c12,c0 ; r3 = Vector Base Address Register(VBAR)
834a1b4a f3bf8f6f isb
834a1b4e ee113f10 mrc
p15,#0,r3,c1,c0 System Control Register(SCTLR)
834a1b52 f4335300 bics
r3,r3,#0x2000 SCTLR.V = 0 ; Use VBAR + Low Offset
834a1b56 ee013f10 mcr
p15,#0,r3,c1,c0 Vector Base Address Register(VBAR) = r3
834a1b5a f3bf8f6f isb
834a1b5e 4770
Interlocked Operations
Unlike the X86 and X64 CPUs, which use the lock prefix before instructions to make
them atomic across multiple CPUs, the ARM CPU uses LDREX and STREX and its variants
to implement interlocked operations.
The LDREX and STREX instructions are used in pairs but there can be other
intervening instructions between them.
The following code snippet shows the assembly instructions generated by the
compiler during a call to the function InterlockedIncrement ( &g_Lock );.
004010cc f3bf8f5b dmb ish
ldr r1,=g_Lock ()
e8512f00 ldrex r2,[r1]
adds r2,#1
e8412300 strex r3,r2,[r1]
004010dc 2b00
004010de d1f8
f3bf8f5b dmb ish
In the above function, the combination of the instructions LDREX and STREX
form an atomic read modify/write pair with the intervening adds instruction
performing the value increment.
The following snippet describes the functionality of the LDREX and STREX
instructions.
LDREX r2,[r1] performs the following steps:
Place an exclusive lock on address R1
STREX r3,r2,[r1] performs the following steps:
if ( exclusivelock is held )
else // no exclusive lock
In the STREX example above the R3 register contains success (0) or
failure (1) depending on whether R2 was stored in memory pointed to by R1.
Commonly Used Instructions
This section lists the most common instructions that are encountered in
functions on the WoA platform.
Familiarity with these instructions helps in reading and understanding
most of the assembler code generated by the Visual Studio compiler targeting
Instruction opcodes are included to clearly distinguish between 16 and 32
bit Thumb-2 instructions.
Arithmetic Instructions
OpcodeInstructionOperation
subs r0,r0,r3 Subtract. r0 = r0 - r3
eb0604c0add
r4,r6,r0,lsl #3Add with Shift. r4 = r6 + LeftShift (r0, 3)
f20D0b08addw r11,sp,#8Add. r11 = sp + 0x8. The .w forces 32-bit opcode generation.
rsbs r5,r1,#0Reverse Subtract. r5 = 0 - r1.
sxth r3,r3Signed Extend Half-word. r3 = SignExtend16To32Bit(r3). Similar to X86 MOVSX instruction.
f2c00a61movt r10,#0x61Move to Top Half. r10[31:16] = 0x61.
Logical Instructions
OpcodeInstructionOperation
f0530302orrs r3,r3,#2Bitwise OR. r3 = r3 | 0x02
ea834271eor
r2,r3,r1,ror #0x11Bitwise XOR. r2 = r3 ^ RotateRight(r1,11)
f06f0200mvn
r2,#0Bitwise NOT. r2 = ~(0x0)
ands r3,r3,r2Bitwise AND. r3 = r3 & r2
asrs r3,r3,#1Arithmetic Shift Right. r3 = r3 && 1
f033043fbics r4,r3,#0x3fBitwise Bit Clear. r4 = r3 & (~0x3f)
f36f040bbfc
r4,#0,#0xCBit Field Clear, sets the specified bit range to zero. r4[11:0] = 0.
fa94f3a4rbit r3,r4Reverse Bits. r3[31:0] = r4[0:31]
f3c30644ubfx r6,r3,#1,#5Unsigned Bit Field Extract. r6 = ZeroExtend(r3[5:1]). Extract Bits 1 through 5 from r3, zero extend the result and store in r6.
OpcodeInstructionOperation
PC Relative Branch. Similar to X86 jmp instruction.
f7fefc0ebl
83454c44PC Relative Branch and Link. LR = Address of next instruction. Similar to the X86 call instruction.
lrBranch to LR. PC=LR. Similar to X86 ret instruction.
r3Branch with Link and Exchange. PC=R3, LR = Address of next instruction. Similar to BL except that BLX can change instruction set from ARM to Thumb, or vice versa.
f02aaa9bbeq
PC Relative Conditional Branch if equal. If (CPSR.Z == 1) PC = BranchTarget. Similar to the X86 JZ instruction.
83429a36PC Relative Conditional Branch if equal. If (CPSR.Z == 1) PC = BranchTarget. Similar to the X86 JZ instruction. Since the opcode for this instruction is 32-bits its target range is much larger than the previous instruction.
cbnz r3,83429d80PC Relative Compare and Branch on Nonzero. if ( R3 != 0 ) PC = BranchTarget. The range of such branches is +4 to +130 bytes.
Compare and Test
OpcodeInstructionOperation
f0130f10tst r3,#0x10Set flags based on bitwise AND operation. CPSR.Flags = r3 & 0x10
ea930f00teq r3,r0Set flags based on bitwise XOR operation. CPSR.Flags = r3 ^ r0
cmp r0,#0Set flags based on subtraction operation. CPSR.Flags = r0 - ZeroExtend(0x0). The immediate operand is zero extended to make it 32-bits wide.
f1150f02cmn r5,#2Set flags based on addition operation. CPSR.Flags = r0 + 0x2
Data Movement
OpcodeInstructionOperation
r3,[r3]Load Register Byte. r3 = ZeroExtend(*r3). Similar to the X86 mov byte ptr instruction.
r3,[r7,#0x18]Load Register. r3 = *(r7+0x18)
r3,[r1,r5]Store Register Halfword. *(r1+r5) = r3[15:0]
F8858166strb
r8,[r5,#0x166]Store Register Byte. *(r5+0x166) = r8[7:0]
e92d48b8push
{r3-r5,r7,r11,lr}Save registers r3,r4, r5, r7, r11, r14 to the stack and decrement SP
e8bd8800pop
{r11,pc}Restore register r11 and r15 from the stack and increment SP
r2,r8r2=r8
Special Instructions
Windows use the ARM CPU's capability of generating exceptions on undefined
instructions to process "well known" undefined instructions which are essentially
opcodes that are construed as undefined by ARM but convey meaning to the Window's
exception handling mechanism.
16-bit instructions starting with a 0xDE are undefined and lead to an
Undefined Instruction exception which is handled by nt!KiUndefinedInstructionException.
While executing an undefined instruction, the CPSR.Mode is set to 11011b i.e. Undefined.
KiUndefinedInstructionException() directly handles certain undefined
instructions like __ rdpmccntr64, but for the rest, it simply dispatches the
exception to KiDispatchException() which in turns calls KiPreprocessInternalInvalidOpcode().
WoA uses the following undefined instructions:
OpcodeMnemonicDescription
0xDEFE__debugbreakBreaks into the debugger. Used by ntdll!DbgUserBreakPoint().
0xDEFC__assertfailUsed to indicate critical assertion failures in the kernel debugger. Used by KeAccumulateTicks()
0xDEFB__fastfailIndicates fast fail conditions resulting in KeBugCheckEx(KERNEL_SECURITY_CHECK_FAILURE). Called by functions like InsertTailList() upon detecting a corrupted list, as described in .
0xDEFA__rdpmccntr64Reads the 64-bit performance counter co-processor register and returns the value in R0+R1. Used by ReadTimeStampCounter(), KiCacheFlushTrial() etc.
0xDEFD__debugserviceInvoke debugger breakpoint. Used by DbgBreakPointWithStatusEnd(), DebugPrompt() etc.
0xDEF9__brkdiv0Divide By Zero Exception, used by functions like
nt!_rt_udiv and
nt!_rt_udiv. Also generated by the compiler to check the divisor before division operations.
Calling Convention
The ARM CPU and the X64 CPU have very similar calling conventions
in that the first four parameters to a function are passed via registers.
However, unlike the X64 that has a register spill space, the ARM compiler does
not reserve any space on the stack for register based parameters.
Another similarity between X64 and ARM is that only the function prolog
and epilog modify the value of the stack pointer (SP), the function body
never changes SP.
The registers used for parameter passing on the ARM CPU are listed below:
R0 = Parameter #1
R1 = Parameter #2
R2 = Parameter #3
R3 = Parameter #4
The fifth parameter onwards is stored on the stack.
The following figure shows assembler code sequence during a function call.
Figure 7 : Function Parameters
Function Prolog and Epilog
The following code snippet is an example of instructions that typically
make up the prolog of a non-leaf function:
nt!ExAllocatePoolWithTag:
835aff0 push
{r4-r11,lr}
835ab1c addw
r11,sp,#0x1C
835a7008 b08f
sp,sp,#0x3C
The push instruction above saves the volatile registers R4, R5, R6, R7,
R8, R9, R10, R11 and LR (R15) on the stack. LR (R15) is used to return
execution control back to the caller.
The addws sets up the r11 register to point to the location of the stack
where the old r11 register was saved. This creates a frame pointer chain
similar to the one created on the X86 with the EBP register.
And finally the sub instruction creates space on the stack for local variables.
The corresponding function epilog is shown below:
nt!ExAllocatePoolWithTag+0x98:
835a7098 b00f
sp,sp,#0x3C
835a709a e8bd8ff0 pop
{r4-r11,pc}
The add instruction in the above snippet simply adjusts the stack pointer
to skip over the local variables.
The pop instruction restores back the contents of the non-volatile registers
which were saved in the function prolog.
The value of the saved LR register (i.e. the return address) is restored
back into the PC, thus returning control back to the caller and obviating
the need for an explicit branch instruction.
Figure 8 : Function Prolog and Epilog
The prolog and epilog for leaf functions (i.e. function that don't call others)
are very different from the sequence shown above.
Following is the complete disassembly of a non-leaf function:
nt!IopGetDeviceAttachmentBase:
d030b0 ldr
r3,[r0,#0xB0]
8345647c e002
nt!IopGetDeviceAttachmentBase+0xc ()
d330b0 ldr
r3,[r3,#0xB0]
r3,[r3,#0x18]
nt!IopGetDeviceAttachmentBase+0x6 (8345647e)
In the code snippet shown above, the LR register contains the return address
of the caller upon entry. Since this function does not modify the LR
register contents, returning to the caller simply involves branching to
LR i.e. "bx lr".
Function Disassembly Walkthrough
To tie together all the concepts introduced above, this section provides
a complete annotated listing of the user mode function CreateFileA() in kernelbase.dll.
Here is the prototype of CreateFileA() along with the registers and stack
locations that would contain the parameters passed in by the caller.
HANDLE WINAPI
CreateFile(
LPCTSTR lpFileName,
DWORD dwDesiredAccess,
DWORD dwShareMode,
LPSECURITY_ATTRIBUTES lpSecurityAttributes, P4 = r3
DWORD dwCreationDisposition, P5 = stack sp[0]
DWORD dwFlagsAndAttributes, P6 = stack sp[4]
HANDLE hTemplateFile );
P7 = stack sp[8]
The term "callee" used in the following code snippet refers to the function
CreateFileW() which is called by CreateFileA().
Figure 9 depicts the state of the stack after the sub instruction has
executed i.e. prolog for CreateFileA() has completed.
0:000& uf kernelbase!CreateFileA
KERNELBASE!CreateFileA:
757a0 push
{r4-r6,r11,lr} ; save only those non-volatile
registers that will be overwritten
757a162c f20d0b0c addw
r11,sp,#0xC point r11 to the location on the stack
where callers r11 (frame pointer) is stored
sp,sp,#0x1C create space for local variables (0x10) bytes
and for parameters to callees (0xc) bytes
r6,r1 r6 = r1 = dwDesiredAccess(caller P2)
r1,r0 r1 = r0 = lpFileName(caller P1)
r0,sp,#0x10 r0 = sp + 0x10
r4,r3 r4 = r3 = lpSecurityAttributes(caller P4)
757a163a 4615
r5,r2 r5 = r2 = dwShareMode(caller P3)
757a163c f000fc1a bl
KERNELBASE!Basep8BitStringToDynamicUnicodeString (757a1e74)
r0,KERNELBASE!CreateFileA+0x44 (757a166c) ; if the return value
from the previous function call (r0) is 0 then
757a166c (exit)
KERNELBASE!CreateFileA+0x1a:
r3,[sp,#0x38] r3 = *(sp+0x38) = hTemplateFile(caller P7)
r0,[sp,#0x14] r0 = *(sp+0x14) Local = lpFileName(callee P1)
r2,r5 r2 = r5 = dwShareMode(caller P3)
r3,[sp,#8] *(sp+0x8) = r3 = hTemplateFile(callee P7)
757a164a 9b0d
r3,[sp,#0x34] r3 = *(sp+0x34) = dwFlagsAndAttributes(caller P6)
757a164c 4631
r1,r6 r1 = r6 = dwDesiredAccess(callee P2)
757a164e 9301
r3,[sp,#4] *(sp+0x4) = r3 = dwFlagsAndAttributes(callee P6)
r3,[sp,#0x30] r3 = *(sp+0x30) = dwCreationDisposition(caller P5)
r3,[sp] *(sp+0x0) = r3 =
dwCreationDisposition(callee P5)
r3,r4 r3 = r4 = lpSecurityAttributes(callee P4)
757afe61 bl
KERNELBASE!CreateFileW (757a231c) ; invoke callee i.e. CreateFileW()
757a165a 4b06
r3,=KERNELBASE!_imp_RtlFreeUnicodeString ()
757a165c 4604
r4,r0 r4 = r0 = return value from CreateFileW()
757a165e a804
r0,sp,#0x10 r0 = sp + 0x10 = AnsiString = P1 to RtlFreeAnsiString()
r3,[r3] r3 = ntdll!RtlFreeAnsiString
r3 call RtlFreeAnsiString()
KERNELBASE!CreateFileA+0x3c:
r0,r4 CreateFileA() return value in r0
sp,sp,#0x1C free locals and parameter space on stack
757a70 pop
{r4-r6,r11,pc} ; restore all saved permanent
registers and return to caller
KERNELBASE!CreateFileA+0x44:
757a166c f06f0400 mvn
r4,#0 r4 = ~0x0 = 0xffffffff = -1 (INVALID_HANDLE_VALUE)
KERNELBASE!CreateFileA+0x3c (757a1664)
Figure 9 : Stack Layout for kernelbase!CreateFileA()
Disassembly Listing
One of the things that will become quickly apparent, when examining
ARM disassembly in WinDBG, is that the more often than not the debugger's
"uf" command will display the following warning.
0: kd& uf nt!IoCallDriver
Flow analysis was incomplete, some code may be missing
This sections explains why this happens.
The ARM compiler generates branches to absolute addresses using instruction
sequences similar to the following:
0: kd& uf nt!IoCallDriver
nt!SMKM_STORE_MGR&SM_TRAITS&::SmpPageEvict+0x3b0:
f6411c87 mov r12,#0x1987
f2c83c55 movt r12,#0x8355
4760 bx r12
WinDBG, as of version 6.3.9600, does not pay attention to mov instructions
because they do not fall under the category of flow control instructions.
WinDBG encounters the bx r12 instruction, and gives up on the static disassembly
because it assumes that the value of r12 will be determined at runtime.
It however misses the fact that the above sequence amounts to bx 0x
which is nothing but a call to another function, as shown in the figure below:
Figure 6 : Indirect Branch
So any time WinDBG encounters an indirect branch via a register it
fails to follow the function in its entirety.
Co-Processor
The ARM CPU has multiple co-processors that implement functionality that is not
a part of core instruction execution. The co-processors that are used by Windows,
as well other operating systems, are:
CP10 (Vector Floating Point Co-processor)
CP14 (Debug Co-processor)
CP15 (System control Co-processor)
The MRC and MCR instructions are used to access the co-processor registers.
The VFP (CP10) can be also be accessed using VMSR and VMRS instructions.
The compiler intrinsics MoveFromCoprocessor() and MoveToCoprocessor()
and their variants can be used to access ARM co-processors from C/C++.
The Visual Studio 2013 CRT source file "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\crt\src\ARM\helpexcept.c"
has examples on how to use these intrinsics.
Since the CP15 co-processor contains the most critical registers required
by Windows, some details of this co-processor are included in this section.
The CP15 registers are organized by function groups with each group
represented by a single primary co-processor register referred to as CRn.
The function group description and the corresponding primary control register
is listed in the table below:
CRnFunctionality
c0ID and Feature Registers
c1System Control Register
c2Translation Table Base
c3Domain Access Control
c5Fault Status
c6Fault Address Register
c7Cache/Write Buffer Control
c8TLB Maintenance Operations
c9Performance Counters
c10Memory Mapping Registers & TLB Operations
c11DMA Control
c12Security Extensions registers
c13Process, Context & Thread ID Registers
The following table contains some examples of CP15 registers that are used
by Windows for various low level operations.
Individual CP15 registers are selected by the primary co-processor register
(CRn), the secondary co-processor register (CRm), OpCode #1 (Op1) and OpCode#2 (Op2).
CP#Opc1CRnCRmOpc2Description
p150c1c00SCTLR System Control Register (Used by KiInitializeExceptionVectorTable to setup exception handling)
p150c2c00TTBR0 Translation Table Base Register 0 (KxSwapProcess writes the Page Table Base Address to this during context switch, similar to X86 CR3)
p150c5c00DFSR Data Fault Status Register (KiDataAbortException uses this to find the type of data fault that occurred)
p150c5c01IFSR Instruction Fault Status Register (KiPrefetchAbortException uses this to find the type of instruction fetch fault that occurred)
p150c6c00DFAR Data Fault Address Register (KiDataAbortException uses this to find the address at which the fault occurred, similar to X86 CR2)
p150c6c02IFAR Instruction Fault Address Register (KiPrefetchAbortException uses this to find the address at which the fault occurred, similar to X86 CR2)
p150c9c130PMCCNTR Cycle Count Register (Used by the compiler intrinsic __rdpmccntr64)
p150c12c00VBAR Vector Base Address Register (base of exception table, contains nt!KiArmExceptionVectors)
p150c13c01CONTEXTIDR Context ID Register (contains Address Space IDentifier i.e. KPROCESS-&Asid)
p150c13c02TPIDRURW Thread ID User Read Write (TEB Thread Environment Block)
p150c13c03TPIDRURO Thread ID User Read Only, Privileged Read Write (31:6 KTHREAD, 3:0 IRQL)
p150c13c04TPIDRPRW Privileged Read Write (KPCR Kernel Process Control Region)
A full list of co-processor options is available in
The following code snippet shows the MRC and MCR instructions accessing
the contents of the TPIDRUR0 register in CP15 using primary register c13,
secondary register c0, OpCode1=0 and OpCode2=3.
The MRC instruction reads the contents of TPIDRUR0 into ARM register r0.
The MCR instruction writes the contents of ARM register r0 to TPIDRUR0.
Figure 10 labels the various operands passed to the MRC instruction.
ee1d0f70 mrc
p15,#0,r0,c13,c0,#3 ; r0 = TPIDRURO
ee0d0f70 mcr
p15,#0,r0,c13,c0,#3 ; TPIDRURO = r0
Figure 10 : Co-processor Register Access
Here are some Windows kernel mode functions that access CP15 registers.
0: kd& uf nt!PsGetCurrentProcess
nt!PsGetCurrentProcess:
83442a18 ee1d3f70 mrc p15,#0,r3,c13,c0,#3 ; R3 = TPIDRURO
83442a1c f033033f bics r3,r3,#0x3F r3 = r3 & ~0x3f
ldr r0,[r3,#0x74] r0 = r3 + 0x74 ; r0 = KTHREAD.ApcState.Process
bx l return
0: kd& uf hal!KeGetCurrentIrql
hal!KeGetCurrentIrql:
ee1d3f70 mrc p15,#0,r3,c13,c0,#3 ; R3 = TPIDRURO
3000f ands r0,r3,#0xF r0 = r3 & 0x0 R0 = Irql
bx l return
Following is a user mode function that accesses CP15.
Some of the low level macros like NtGetCurrentTeb() which access CP15 are
defined in winnt.h.
0:000& uf kernel32!GetCurrentThreadId
kernel32!GetCurrentThreadId:
77361fd0 ee1d3f50 mrc p15,#0,r3,c13,c0,#2 ; R3 = TPIDRURW
ldr r0,[r3,#0x24] TEB.ClientId.UniqueThread
bx l return
System Calls
The SVC instruction causes a Supervisor Call exception.
This provides a mechanism for unprivileged software (user mode applications)
to make calls into the operating system (kernel routines).
WoA uses this mechanism to implement native system calls similar
to the int 0x2e, sysenter and syscall instructions on the X86 and X64 CPUs.
In the code snippet shown below the NTDLL native API NtClose uses the
SVC #1 instruction to invoke the exception handler for system call exceptions
(nt!KiSWIException).
This service index for NtClose() is 0x0d. The usage of register r12 to pass
the service index into the system call is recommended by the ARM
Application Binary Interface (ABI).
0:000& uf ntdll!NtClose
ntdll!NtClose:
77b8e230 f04f0c0d mov r12,#0xD ; r12 = system call identifier
77b8e234 df01
svc #1 call into kernel
77b8e236 4770 return to caller
WoA uses a system service dispatch table similar to the one on X64.
The kernel variable nt!KiServiceTable points to a table that
contains 32 bit entries each containing a 28 bit relative service offset
and a 4 bit argument count.
The kernel initialization function nt!KeCompactServiceTable() sets up the table.
The logic ServiceAddress = KiServiceTable + KiServiceTable[ServiceIndex] && 4 )
computes the address of the function that implements the native service.
"return from exception" instruction (i.e. RFE sp) transfers execution back
to user mode.
The following example shows the address of the function nt!NtClose being
computed relative to the base of the table at nt!KiServiceTable using the
service index 0x0d.
0: kd& u nt!KiServiceTable + ( poi(nt!KiServiceTable + (d * 4)) && 4 )
nt!NtClose+0x1:
d4ff0 push
{r4-r11,lr}
d0b1c addw
r11,sp,#0x1C
8364a92c f695ffea bl
nt!_security_push_cookie (834e0904)
sp,sp,#0x28
ee1d3f70 mrc
p15,#0,r3,c13,c0,#3
3033f bics
r3,r3,#0x3F
8364a93c f993815a ldrsb
r8,[r3,#0x15A]
Exception Handling
On the X86/X64 CPU the Interrupt Descriptor Table (IDT) contains pointers
to exception handlers, software interrupt handlers and hardware interrupt
On the ARM CPU, has a separate exception vector table that contains
instruction opcodes instead of function pointers.
The opcode for each type of exception in the table is the same (0xf8dff01c)
and it encodes an instruction that will transfer execution control to
the PC relative offset to the handler for that exception.
As a part of system startup, the kernel function nt!KiInitializeExceptionVectorTable()
writes the address of the Windows exception vector table (nt!KiArmExceptionVectors) to
the Vector Base Address Register (VBAR) in CP15.
The ARM exception table along with the registered exception handlers is shown below.
0: kd& u nt!KiArmExceptionVectors
nt!KiArmExceptionVectors:
834dc6a0 f8dff01c ldr pc,=0xFFFFFFFF ; [nt!KiArmExceptionVectors+0x20(834dc6c0)]
834dc6a4 f8dff01c ldr pc,=nt!KiUndefinedInstructionException+0x1 (834dade1) ; [nt!KiArmExceptionVectors+0x24 (834dc6c4)]
834dc6a8 f8dff01c ldr pc,=nt!KiSWIException+0x1 (834db941) ; [nt!KiArmExceptionVectors+0x28 (834dc6c8)]
834dc6ac f8dff01c ldr pc,=nt!KiPrefetchAbortException+0x1 (834db001) ; [nt!KiArmExceptionVectors+0x2c (834dc6cc)]
834dc6b0 f8dff01c ldr pc,=nt!KiDataAbortException+0x1 (834db161) ; [nt!KiArmExceptionVectors+0x30 (834dc6d0)]
834dc6b4 f8dff01c ldr pc,=0xFFFFFFFF ; [nt!KiArmExceptionVectors+0x34(834dc6d4)]
834dc6b8 f8dff01c ldr pc,=nt!KiInterruptException+0x1 (834db601) ; [nt!KiArmExceptionVectors+0x38 (834dc6d8)]
834dc6bc f8dff01c ldr pc,=nt!KiFIQException+0x1 (834db721) ; [nt!KiArmExceptionVectors+0x3c (834dc6dc)]
Figure 11 : ARM Exception Table
On the X86 and X64 there is a single exception handler that handles all
types of page faults.
On the ARM CPU there are two different handlers one for data page faults
(nt!KiDataAbortException) and another one for code page faults
(nt!KiPrefetchAbortException). Both these exception handlers call the
common routine nt!KiCommonMemoryManagementAbort to perform the bulk of
page fault handling.
Fast IRQ handling is not supported on the WoA platform.
Examining the implementation of the FIQ exception handler (nt!KiFIQException)
shows that this function if ever called would bug-check the system with
the stop code 0x3d (INTERRUPT_EXCEPTION_NOT_HANDLED).
0: kd& uf KiFIQException
nt!KiFIQException:
834db720 e98dc011 srs
834db724 e9cd4502 strd
r4,r5,[sp,#8]
834db728 466c
nt!KiFIQException+0x10c:
834db82c 203d
834db82e 2100
834db830 2200
834db832 2300
834db834 468c
834db836 f000fa51 bl
nt!KiBugCheckDispatch (834dbcdc)
834db83a defe
__debugbreak
Interrupt Descriptor Tables
On the X86/X64 CPU, drivers register their interrupt service routines (ISRs)
through a system provided template directly in the interrupt descriptor table
ARM platforms that have a generic interrupt controller (GIC) do not
support vectored interrupts.
So WoA routes all hardware interrupts through a single entry point
(nt!KiInterruptException) which is responsible for determining the source
of the interrupt from the GIC and then dispatching the interrupt to the
appropriate driver's ISR.
Similar to the X64 CPU, WoA uses a total of 16 IRQLs.
The IRQLs associated with hardware devices are in the range 0x8 through 0xb.
For each device IRQL, the first 16 device interrupts at that IRQL are
registered directly in the KPCR-&Idt[] array. Any overflow interrupts i.e.
beyond the 16 interrupts per device IRQL, are registered in the
KPCT-&IdtExt[] array.
The function KiConnectInterruptInternal() determines if there is an overflow
situation and accordingly allocates the extended IDT at KPCT-&IdtExt from
NonPagedPool with 0x400 entries.
Both the primary IDT (KPCR-&Idt[]) and the extended IDT (KPCR-&IdtExt[])
contain pointers to KINTERRUPT structures that were allocated as a result
of drivers registration of their ISR.
The following debugger commands show one such KINTERRUPT structure.
0: kd& !pcr
KPCR for Processor 0 at 835df000:
Major 1 Minor 1
Panic Stack
Irql addresses:
Routine 835df000
0: kd& dt nt!_KPCR 835df000 -a Idt
+0x12c Idt :
[128] 0x8fb38a80 Void
[129] 0x8fb38880 Void
[130] 0x8e723980 Void
[131] 0x8e723600 Void
[132] 0x8e723900 Void
[133] 0x8e723200 Void
[134] 0x8e723e00 Void
[135] 0x8e723e80 Void
[144] 0x8fb38b00 Void
[145] 0x8fb38900 Void
[146] 0x8e723a80 Void
[147] 0x8e723680 Void
[148] 0x8e723a00 Void
[149] 0x8e723500 Void
[150] 0x8e723300 Void
[151] 0x8e723f00 Void
0: kd& dt nt!_KINTERRUPT 0x8fb38a80
+0x000 Type
+0x002 Size
+0x004 InterruptListEntry : _LIST_ENTRY [ 0x8fb38a84 - 0x8fb38a84 ]
+0x00c ServiceRoutine
unsigned char
dxgkrnl!DpiFdoLineInterruptRoutine+0
+0x010 MessageServiceRoutine : (null)
+0x014 MessageIndex
+0x018 ServiceContext
: 0x87b1d768 Void
+0x01c SpinLock
+0x020 TickCount
+0x024 ActualLock
: 0x877cb360
+0x028 DispatchAddress
+0x02c Vector
+0x030 Irql
+0x031 SynchronizeIrql
+0x032 FloatingSave
+0x033 Connected
+0x034 Number
+0x038 ShareVector
+0x03a ActiveCount
+0x03c InternalState
+0x040 Mode
: 0 ( LevelSensitive )
+0x044 Polarity
: 0 ( InterruptPolarityUnknown )
+0x048 ServiceCount
+0x04c DispatchCount
+0x050 PassiveEvent
+0x054 TrapFrame
: 0x82b30d30 _KTRAP_FRAME
+0x058 DispatchCode
+0x068 DisconnectData
+0x06c ServiceThread
Unlike the X86/X64 where the IDT is a hardware defined structure, on the
ARM CPU the IDT is software defined.
This has an interesting security benefit in that the KINTERRUPT structure
on ARM no longer needs to contain any executable code, as can be observed
from the size of the KINTERRUPT.DispatchCode[] array in the above output,
and hence it can be allocated out of Non-Executable NonPagedPool.
In addition to the primary and extended IDTs described above, WoA
also uses a global secondary IDT for General Purpose I/O (GPIO) interrupts.
This IDT is allocated from non-paged pool and is pointed to by the global
variable nt!KiGlobalSecondaryIDT.
Each entry in this table is of type KSECONDARY_IDT_ENTRY which contains
an embedded KINTERRUPT structure as shown below.
The current implementation allocates the secondary IDT with 0x100 entries.
0: kd& db nt!KiSecondaryInterruptServicesEnabled L1
0: kd& ? poi (nt!KiGlobalSecondaryIDT)
Evaluate expression: - = 868ea000
0: kd& dt 868ea000
nt!_KSECONDARY_IDT_ENTRY
+0x000 SpinLock
+0x004 ConnectLock
+0x014 LineMasked
+0x018 InterruptList
: 0x8fb38e80 _KINTERRUPT
0: kd& dt 0x8fb38e80 nt!_KINTERRUPT
+0x000 Type
+0x002 Size
+0x004 InterruptListEntry : _LIST_ENTRY [ 0x8fb38e84 - 0x8fb38e84 ]
+0x00c ServiceRoutine
: 0x8c7ab761
unsigned char
portcls!CInterruptSync::GetKInterrupt+0
+0x010 MessageServiceRoutine : (null)
+0x014 MessageIndex
+0x018 ServiceContext
: 0x87d92d68 Void
+0x01c SpinLock
+0x020 TickCount
+0x024 ActualLock
: 0x87d92d98
+0x028 DispatchAddress
+0x02c Vector
+0x030 Irql
+0x031 SynchronizeIrql
+0x032 FloatingSave
+0x033 Connected
+0x034 Number
+0x038 ShareVector
+0x03a ActiveCount
+0x03c InternalState
+0x040 Mode
: 1 ( Latched )
+0x044 Polarity
: 0 ( InterruptPolarityUnknown )
+0x048 ServiceCount
+0x04c DispatchCount
+0x050 PassiveEvent
+0x054 TrapFrame
+0x058 DispatchCode
+0x068 DisconnectData
+0x06c ServiceThread
As of WinDBG v6.3.9600, the debugger's "!idt" and "!idt -a"
commands display all of the 3 IDTs mentioned above, but only expand the
entries in the secondary IDT as shown below:
0: kd& !idt -a
Dumping IDT: 835df12c
Dumping Extended IDT:
Dumping Secondary IDT: 868ea000
1000:portcls!CInterruptSync::GetKInterrupt+0x20 (KINTERRUPT 8fb38e80)
1002:mbtu97w8arm+0x358c (KMDF) (KINTERRUPT 8fb38f00)
1003:hidi2c!OnInterruptIsr (KMDF) (KINTERRUPT 8e723180)
1004:SurfaceHomeButton+0x28bc (KMDF) (KINTERRUPT 8e723100)
1005:SurfaceHomeButton+0x28bc (KMDF) (KINTERRUPT 8e723080)
1006:SurfaceHomeButton+0x28bc (KMDF) (KINTERRUPT 8e723000)
1007:SurfaceHomeButton+0x28bc (KMDF) (KINTERRUPT 8fb38f80)
1008:hidi2c!OnInterruptIsr (KMDF) (KINTERRUPT 8e723280)
100a:nvthml+0x2ebc (KMDF) (KINTERRUPT 8fb38d00)
100b:sdbus!SdbusGpioInterrupt (KINTERRUPT 8fb38b80)
Conclusion
This article described the ARM CPU, registers and Thumb-2 instructions.
It explained the functionality of the instructions typically seen in code
generated by the Visual Studio compiler as well as details of the function
calling convention.
It covered some unique aspects of the ARM CPU like the barrel shifter,
the co-processors, and explicit opcodes used for memory barriers and undefined
instructions, while also explaining how such aspects are used by Windows.
This article also highlighted some of the key differences between how
certain features like trap frames, exception handling, interrupt dispatching,
interrupt descriptor tables, system calls and interlocked operations are
implemented on ARM as compared to X86/X64.
Special thanks to
(@aionescu) for his review and valuable feedback on this article.
References
[4] (Login Required)
发表评论:
TA的最新馆藏[转]&[转]&[转]&[转]&[转]&

我要回帖

更多关于 common 的文章

 

随机推荐