Chapter 18. System

System "Lite"

There are other instructions available in your processor, but they have very little to no relationship to your application code. As mentioned at the beginning of this book, there are basically three types of instructions. (Note that I am oversimplifying here!) They are general-purpose, floating-point, and system instructions. The existence of these later instructions has to do with writing system level, thus operating system, code. They are not typically accessible or needed by those programmers writing non-operating system code. As this book is not targeted for that market, there is no need to make the standard application programmer wade through it. But as some of you may just cry foul, I have included a very light overview of these instructions. Besides, there are some tidbits in here for all of you!

Chapter 3, "Processor Differential Insight," as well as Chapter 16, "What CPUID?" gave some background on the processor. We shall now continue with that information. Some of what is included here is not necessarily just for system programmers as some features of the 80×86 are system related but are accessible from the application level. Note the System "Lite" part? Keep in mind that this is a superficial overview. If you need an in-depth explanation, please refer to documentation direct from the manufacturer.

System Timing Instructions

RDPMC — Read Performance — Monitoring Counters

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

RDPMC

RDPMC — Read Performance — Monitoring Counters

RDPMC — Read Performance — Monitoring Counters

RDPMC — Read Performance — Monitoring Counters

RDPMC — Read Performance — Monitoring Counters

RDPMC — Read Performance — Monitoring Counters

RDPMC — Read Performance — Monitoring Counters

RDPMC — Read Performance — Monitoring Counters

RDPMC — Read Performance — Monitoring Counters

RDPMC — Read Performance — Monitoring Counters

RDPMC — Read Performance — Monitoring Counters

rdpmc

This instruction loads the 40-bit performance monitoring counter indexed by ECX into the EDX:EAX register pair. For 64-bit mode, RDX[0...31]:RAX[0...31]=[RCX]. This instruction is accessible from any layer inclusive of the application layer only if the PCE flag in CR4 is set. When the flag is clear, this instruction can only be run from privilege level 0.

RDTSC — Read Time-Stamp Counter

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

RDTSC

RDTSC — Read Time-Stamp Counter

RDTSC — Read Time-Stamp Counter

RDTSC — Read Time-Stamp Counter

RDTSC — Read Time-Stamp Counter

RDTSC — Read Time-Stamp Counter

RDTSC — Read Time-Stamp Counter

RDTSC — Read Time-Stamp Counter

RDTSC — Read Time-Stamp Counter

RDTSC — Read Time-Stamp Counter

RDTSC — Read Time-Stamp Counter

rdtsc

This system instruction reads the 64-bit time-stamp counter and loads the value into the EDX:EAX registers. The counter is incremented every clock cycle and is cleared (reset) to zero upon the processor being reset. This instruction is accessible from any layer inclusive of the application layer unless the TSD flag in CR4 is set. So far while running under Win32 the flag has been clear as a default, thus allowing an application to access this instruction.

  ;       void CpuDelaySet(void)

         public  CpuDelaySet
  CpuDelaySet    proc    near
          rdtsc                      ; Read time-stamp counter

          mov     tclkl,eax          ; Save low 32 bits
          mov     tclkh,edx          ; Save high 32 bits
          ret
  CpuDelaySet    endp
  ; long int CpuDelayCalc(void)
  ;
  ; This function is called after IClkSet() to get the
  ; elapsed interval in clock cycles.
  ;
  ; Note: On a 400MHz computer, only reading the lower 32 bits
  ; gives a maximum 10 second sampling before rollover.

           public   CpuDelayCalc
  CpuDelayCalc proc     near
           rdtsc                     ; Read time-stamp counter

           sub      eax,tclkl
           sbb      edx,tclkh        ; edx:eax = total elapsed interval

           ret                       ; return edx:eax = 64 bits of info.
  CpuDelayCalc endp

These two functions can be used for time trials while optimizing code. Due to multithreaded environments, another thread or interrupt can steal your time slice while you are trying to do time analysis on a bit of code. You could divide the number of loops into the total delay to get an average loop delay count. What I like to do is run a benchmark of executing the same code a few thousand times, ignoring the effects the prefetch has on these times or the fact the Nth time around the data is already sitting in memory. One time I took the governor off an MPEG decoder so it would run full speed, allowing code to be optimized so that it would run faster and faster.

Calculating Processor Speed

The following code snippet can be included within your own code for determining computer speed. The computer quite often is not running at the speed you may think. I had a weird problem in an application running on my laptop and it did not make any sense until I wrote this code Even then I thought it had a bug until I realized the laptop had a thermal problem and dropped its computer speed by 50% or more so as to run cooler. Clients running your application may have some weird problems or be misinformed of their machines' capabilities, and this code can give you or customer support representatives more debugging insight.

  typedef struct SpeedDataType
  {
    uint tSpeed;
    uint tSpeedState;
    uint nCnt;
    uint wTimerID;
  } SpeedData;


  // Win32 Timer - Calculate CPU Speed

  void CALLBACK SpeedCalcTimer(UINT wTimerID, UINT msg, DWORD dwUser,
                               DWORD dw1, DWORD dw2)
  {
     SpeedData *sp = (SpeedData *)dwUser;

     if (sp->wTimerID != wTimerID) // Is this our timer ID?
     {
        return;
     }

     switch(sp->tSpeedState)
     {
     case 2:                      // 2nd tick (avg of the two intervals)
       sp->tSpeed = (CpuDelayCalc() + sp->tSpeed) >> 1;
       sp->nCnt++;
       CpuDelaySet();
       break;

     case 1:                      // 1st tick
       sp->tSpeed = CpuDelayCalc();
       sp->nCnt++;

       // Allow flow through!

     case 0:                      // Starting tick
      CpuDelaySet();
      sp->tSpeedState++;
      break;

     default:
        break;
   }
 }


 // Be VERY careful when this is called, as your OS may not like it!

 uint SpeedCalc(void)
 {
    TIMECAPS tc;
    uint wTimerRes, nCnt;
    SpeedData sd;

    wTimerRes = 1;

 // Set the timer resolution for the multimedia timer
    if (TIMERR_NOERROR == timeGetDevCaps(&tc, sizeof(TIMECAPS)))
    {
     wTimerRes = min(max(tc.wPeriodMin, 1), tc.wPeriodMax);
     timeBeginPeriod(wTimerRes); // 1ms resolution
 }

 sd.nCnt = sd.tSpeed = sd.tSpeedState = 0;

 sd.wTimerID = timeSetEvent(1, wTimerRes, SpeedCalcTimer,
     (DWORD)&sd, TIME_PERIODIC | TIME_KILL_SYNCHRONOUS);

 if (sd.wTimerID)               // If we were given a TimerId
 {                              // (Should not fail!)
    do {
       nCnt = sd.nCnt;
       Sleep(10);               // Sleep 10ms

       if (sd.nCnt > 100)       // Cycle 100 times
       {
          timeKillEvent(sd.wTimerID);
          return sd.tSpeed/1000;
       }
    } while (nCnt != sd.nCnt);  // If the same, the timer failed!

    timeKillEvent(sd.wTimerID);
 }
// Didn't work? Try it the really not-so-accurate way!

   CpuDelaySet();
   Sleep(10);
   return CpuDelayCalc()/10000;
}

80×86 Architecture

The Intel and AMD processors have similar functional architecture. Different processors have different numbers of caches, on chip cache, off chip cache, different speeds, different instruction sets, different methods of pipelining instructions. All this book is interested in is helping you, the application programmer, make your code go fast by writing it in assembly. You have no control over what flavor of processor the user of your application chooses to run their applications on. (Of course you could program your application to check these parameters and refuse to run on a system you do not like! But that would be evil!)

test  ebx,ebx
mov   ecx,ebx
mov   esi,ebx
mov   edi,ebx
test  ebx,ebx

The use of full registers (such as in the above 32-bit code snippet in Protected Mode) allows instructions to be able to be executed on the same clock.

Partial stalls occur if a short version of a register is written to and then immediately followed by a larger version. For example:

mov al,9
add bx,ax     ; clock stall

mov al,9
add ebx,eax   ; clock stall

mov ax,9
add ebx,ax    ; clock stall

The AL register will cause the next instruction to have a partial stall if it contains a large form such as AX, EAX, or RAX and if it is being written. This is like being at a red signal light in your car and when the light turns green you slam down on the accelerator; your car will sputter, spit a little, hesitate (stall), and then finally accelerate.

CPU Status Registers (32-Bit EFLAGS/64-Bit RFLAGS)

CPU status register

Figure 18-1. CPU status register

EFLAG

Code

Bit

Flag Descriptions

EFLAGS_CF

000000001h

0

Carry

 

000000002h

1

1

EFLAGS_PF

000000004h

2

Parity

 

000000008h

3

0

EFLAGS_AF

000000010h

4

Auxiliary Carry

 

000000020h

5

0

EFLAGS_ZF

000000040h

6

Zero

EFLAGS_SF

000000080h

7

Sign

EFLAGS_TF

000000100h

8

Trap

EFLAGS_IF

000000200h

9

Interrupt Enable

EFLAGS_DF

000000400h

10

Direction

EFLAGS_OF

000000800h

11

Overflow

EFLAGS_IOPL

000003000h

12, 13

I/O Privilege Level

EFLAGS_NT

000004000h

14

Nested Task

 

000010000h

15

0

EFLAGS_RF

000010000h

16

Resume

EFLAGS_VM

000020000h

17

Virtual-8086 Mode

EFLAGS_AC

000040000h

18

Alignment Check

EFLAGS_VIF

000080000h

19

Virtual Interrupt

EFLAGS_VIP

000100000h

20

Virtual Interrupt Pending

EFLAGS_ID

000200000h

21

CPUID

  

23...31

0

And in 64-bit mode the upper 32 bits of the RFLAGS register (0:EFLAGS):

  

32...63

RFLAG (extra) bits

Protection Rings

The 386 and above have layers of protection referred to as protection rings.

Protection rings

Figure 18-2. Protection rings

The inner ring #0 contains the operating system kernel. The two middle rings (#1 and #2) contain the operating system services (device drivers), and the outer ring #3 is where the application (user code) resides. The ring numbers are also referred to as privilege levels with 0 being the highest and 3 being the lowest.

An application can access functions in the other rings by means of a gate. The SYSCALL and SYSENTER functions are two methods. This is a protection system to protect the inner rings from the outer. You know, to keep the riffraff out! Any attempt to access an inner ring without going through a gate will cause a general protection fault.

Control Registers

There are four control registers {CR0, CR2, CR3, CR4} that control system level operations. Note that CR1 is reserved.

Table 18-1. Control register 0 (CR0) extensions

CR0

Code

Bit

Flag Descriptions

CR0_PE

000000001h

0

Protection Enable

CR0_MP

000000002h

1

Monitor Coprocessor

CR0_EM

000000004h

2

Emulation

CR0_TS

000000008h

3

Task Switched

CR0_ET

000000010h

4

Extension Type

CR0_NE

000000020h

5

Numeric Error

  

6...15

 

CR0_WP

000010000h

16

Write Protected

  

17

 

CR0_AM

000040000h

18

Alignment Mask

  

19...28

 

CR0_NW

020000000h

29

Not Write-Through

CR0_CD

040000000h

30

Cache Disable

CR0_PG

080000000h

31

Paging

And in 64-bit mode the upper 32 bits of the CR0 register (0:CR0):

  

32...63

 

Control register 2 (CR2) is a 32/64-bit page fault linear address.

Table 18-2. Control register 3 (CR3) extensions

CR3

Code

Bit

Flag Descriptions

  

0...2

 

CR3_PWT

000000008h

3

Page Writes Transparent

CR3_PCD

000000010h

4

Page Cache Disable

Page Dir.Base

 

12...31

 

And in 64-bit mode the upper 32 bits of the CR3 register (0:CR3):

  

32...63

CR3 (extra) bits

Table 18-3. Control register 4 (CR4) extensions

CR4

Code

Bit

Flag Descriptions

CR4_VME

000000001h

0

Virtual-8086 Mode Ext.

CR4_PVI

000000002h

1

Protected Virtual Int.

CR4_TSD

000000004h

2

Time Stamp Disable

CR4_DE

000000008h

3

Debugging Extensions

CR4_PSE

000000010h

4

Page Size Extension

CR4_PAE

000000020h

5

Physical Address Ext.

CR4_MCE

000000040h

6

Machine Check Enable

CR4_PGE

000000080h

7

Global Page Enable

CR4_PCE

000000100h

8

RDPMC Enabled

CR4_OSFXSR

000000200h

9

FXSAVE, FXRSTOR

CF4_OSXMMEXCPT

000000400h

10

Unmasked SIMD FP Exception

And in 64-bit mode the upper 32 bits of the CR4 register (0:CR4):

  

32...63

CR4 (extra) bits

(TPR) Task Priority Registers — (CR8)

Table 18-4. Control register 8 (CR8) extensions. This is new for EM64T.

CR8

Code

Bit

Flag Descriptions

CR8_APSC

 

0...3

Arbitration Priority Sub-class

CR8_AP

 

4...7

Arbitration Priority

  

8...63

CR4 (extra bits)

Debug Registers

There are eight debug registers: {DR0, DR1, DR2, DR3, DR4, DR5, DR6, DR7}. Knowing them is unimportant as you are most likely using a debugger to develop your application, not building a debugger. These are privileged resources and only accessible at the system level to set up and monitor the breakpoints {0...3}.

Cache Manipulation

Several mechanisms have been put into place to squeeze optimal throughput from the processors. One method of cache manipulation discussed in Chapter 10, "Branching," is Intel's hint as to the prediction of logic flow through branches counter to the static prediction logic. Another mechanism is a hint to the processor about cache behavior so as to give the processor insight into how a particular piece of code is utilizing memory access. Here is a brief review of some terms that have already been discussed:

  • Temporal data — Memory that requires multiple accesses and therefore needs to be loaded into a cache for better throughput.

  • Non-temporal hint — A hint (an indicator) to the processor that memory only requires a single access (one shot). This would be similar to copying a block of memory or performing a calculation, but the result is not going to be needed for a while so there is no need to write it into the cache. Thus, the memory access has no need to read and load cache, and therefore the code can be faster.

For speed and efficiency, when memory is accessed for read or write a cache line containing that data (whose length is dependent upon manufacturer and version) is copied from system memory to high-speed cache memory. The processor performs read/write operations on the cache memory. When a cache line is invalidated, the write back of that cache line to system memory occurs. In a multiprocessor system, this occurs frequently due to non-sharing of internal caches. The second stage of writing the cache line back to system memory is called a "write back."

Cache Sizes

Different processors have different cache sizes for data and for code. These are dependent upon processor model, manufacturer, etc., as shown below:

CPU

L1 Cache (Data /Code)

L2 Cache

Celeron

16Kb / 16Kb

256Kb

Pentium 4

8Kb / 12Kμops

512Kb

Athlon XP

64Kb / 64Kb

256Kb

Duron

64Kb / 64Kb

64Kb

Pentium M

32Kb / 32Kb

1024Kb

Xeon

 

512Kb

Depending on your code and level of optimization, the size of the cache may be of importance. For the purposes of this book, however, it is being ignored, as that topic is more suitable for a book very specifically targeting heavy-duty optimization. This book, however, is interested in the cache line size as that is more along the lightweight optimization that has been touched on from time to time. It should be noted that AMD uses a minimum size of 32 bytes.

Cache Line Sizes

The (code/data) cache line size determines how many instruction/data bytes can be preloaded.

Intel

Cache Line Size

PIII

32

Pentium M

64

P4

64

Xeon

64

AMD

Cache Line Size

Athlon

64

Opteron

64

The cache line size can be obtained by using the CPUID instruction with EAX set to 1. The following calculation will give you the actual cache line size.

mov    eax,1
cpuid

and    ebx,00000FF00h
shr    ebx,8-3                   ; ebx = size of cache line

PREFETCHx — Prefetch Data into Caches

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

PREFETCH

   

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

  

PREFETCHx — Prefetch Data into Caches

  

PREFETCHNTA

    

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

PREFETCHT0

    

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

PREFETCHT1

    

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

PREFETCHT2

    

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

PREFETCHW

   

PREFETCHx — Prefetch Data into Caches

PREFETCHx — Prefetch Data into Caches

  

PREFETCHx — Prefetch Data into Caches

  

3DNow!

prefetch

mSrc8

 

prefetchw

mSrc8

SSE

prefetcht0

mSrc8

 

prefetcht1

mSrc8

 

prefetcht2

mSrc8

 

prefetchnta

mSrc8

The PREFETCHNTA instruction performs a non-temporal hint to the processor with respect to all the caches, to load from system memory mSrc8 into the first-level cache for a PIII or a second-level cache for a P4 or Xeon processor.

The PREFETCHT0 instruction performs a temporal hint to the processor to load from system memory mSrc8 into the first- or second-level cache for a PIII, or a second-level cache for a P4 or Xeon processor.

The PREFETCHT1 instruction performs a temporal hint to the processor with respect to the first-level cache to load from system memory mSrc8 into the second-level cache for PIII, P4, or Xeon processor.

The PREFETCHT2 instruction performs a temporal hint to the processor with respect to the second-level cache to load from system memory mSrc8 into the first-level cache for PIII or the second-level cache for P4 or Xeon processor.

If data is already loaded at the same or higher cache, then no operation is performed.

AMD processors alias PREFETCHT1 and PREFETCHT2 instructions to the PREFETCHT0 instructions, so they all have the PREFETCHT0 functionality.

The 3DNow! PREFETCH instruction loads a cache line into the L1 data cache from the mSrc8.

The 3DNow! PREFETCHW instruction loads a cache line into the L1 data cache from the mSrc8 but sets a hint indicating that it is for write operations.

LFENCE — Load Fence

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

LFENCE

      

LFENCE — Load Fence

LFENCE — Load Fence

LFENCE — Load Fence

LFENCE — Load Fence

lfence

This instruction is similar to the MFENCE instruction, but it acts as a barrier between memory load instructions issued before and after the LFENCE and MFENCE instructions.

SFENCE — Store Fence

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

SFENCE

    

SFENCE — Store Fence

SFENCE — Store Fence

SFENCE — Store Fence

SFENCE — Store Fence

SFENCE — Store Fence

SFENCE — Store Fence

sfence

This instruction is similar to the instruction MFENCE but it acts as a barrier between memory save instructions issued before and after the SFENCE or MFENCE instructions.

MFENCE — Memory Fence

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

MFENCE

      

MFENCE — Memory Fence

MFENCE — Memory Fence

MFENCE — Memory Fence

MFENCE — Memory Fence

mfence

This instruction is a barrier (fence) to isolate system memory to and from cache memory operations that occur before and after this instruction.

CLFLUSH — Flush Cache Line

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

CLFLUSH

      

CLFLUSH — Flush Cache Line

CLFLUSH — Flush Cache Line

CLFLUSH — Flush Cache Line

CLFLUSH — Flush Cache Line

clflush mSrc8

This instruction invalidates the cache line (code or data) containing the linear address specified by mSrc8. If the line is dirty — that is, different from the system memory in the process of being written to — it is written back to system memory. This instruction is ordered by the MFENCE instruction. Check CPUID bit #19 (CLFSH) to see if this instruction is available.

INVD — Invalidate Cache (WO/Writeback)

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

INVD

INVD — Invalidate Cache (WO/Writeback)

INVD — Invalidate Cache (WO/Writeback)

INVD — Invalidate Cache (WO/Writeback)

INVD — Invalidate Cache (WO/Writeback)

INVD — Invalidate Cache (WO/Writeback)

INVD — Invalidate Cache (WO/Writeback)

INVD — Invalidate Cache (WO/Writeback)

INVD — Invalidate Cache (WO/Writeback)

INVD — Invalidate Cache (WO/Writeback)

INVD — Invalidate Cache (WO/Writeback)

invd

This instruction invalidates the internal caches without waiting for write back of modified cache lines and initiates bus cycles for external caches to flush. This is similar to WBINVD but without a write back.

WBINVD — Write Back and Invalidate Cache

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

WBINVD

WBINVD — Write Back and Invalidate Cache

WBINVD — Write Back and Invalidate Cache

WBINVD — Write Back and Invalidate Cache

WBINVD — Write Back and Invalidate Cache

WBINVD — Write Back and Invalidate Cache

WBINVD — Write Back and Invalidate Cache

WBINVD — Write Back and Invalidate Cache

WBINVD — Write Back and Invalidate Cache

WBINVD — Write Back and Invalidate Cache

WBINVD — Write Back and Invalidate Cache

wbinvd

This instruction writes back all the modified cache lines, then invalidates the internal caches, and initiate bus cycles for external caches to flush. This is similar to INVD but with a write back.

System Instructions

The scope of system instructions are not covered in this book. Refer to the Intel and AMD specific documentation for full specifications. They are considered OS/System instructions and as such will not be discussed in this book. Some are accessible by the application layer at the low privilege level but are not part of the general application development process. They are only referenced here for informational purposes and to ensure this book lists all instructions available at the time of its publication.

ARPL — Adjust Requested Privilege Level

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

ARPL

ARPL — Adjust Requested Privilege Level

ARPL — Adjust Requested Privilege Level

ARPL — Adjust Requested Privilege Level

ARPL — Adjust Requested Privilege Level

ARPL — Adjust Requested Privilege Level

ARPL — Adjust Requested Privilege Level

ARPL — Adjust Requested Privilege Level

32

ARPL — Adjust Requested Privilege Level

32

arpl rmDst16, rSrc16

This system instruction adjusts the RPL (Request Privilege Level) by comparing the segment selector of rSrc with rmDst. If rSrc > rmDst, then set the Zero flag; otherwise clear (reset) it. This instruction can be accessed by an application.

BOUND — Check Array Index For Bounding Error

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

BOUND

BOUND — Check Array Index For Bounding Error

BOUND — Check Array Index For Bounding Error

BOUND — Check Array Index For Bounding Error

BOUND — Check Array Index For Bounding Error

BOUND — Check Array Index For Bounding Error

BOUND — Check Array Index For Bounding Error

BOUND — Check Array Index For Bounding Error

32

BOUND — Check Array Index For Bounding Error

32

bound

rSrcA16, mSrcB16 ^ 16

bound

rSrcA32, mSrcB32 ^ 32

This system instruction checks if the array index rSrcA is within the bounds of the array specified by mSrcB. A #BR (Bounds Range) exception is triggered if it is not inclusive.

CLTS — Clear Task Switch Flag

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

CLTS

CLTS — Clear Task Switch Flag

CLTS — Clear Task Switch Flag

CLTS — Clear Task Switch Flag

CLTS — Clear Task Switch Flag

CLTS — Clear Task Switch Flag

CLTS — Clear Task Switch Flag

CLTS — Clear Task Switch Flag

CLTS — Clear Task Switch Flag

CLTS — Clear Task Switch Flag

CLTS — Clear Task Switch Flag

clts

This system instruction clears the task switch flag TS Bit #3 of CR0 (CR0_TS). The operating system sets this flag every time a task switch occurs and this flag is used to clear it. It is used in conjunction with the synchronization of the task switch with the FPU.

HLT — Halt Processor

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

HLT

HLT — Halt Processor

HLT — Halt Processor

HLT — Halt Processor

HLT — Halt Processor

HLT — Halt Processor

HLT — Halt Processor

HLT — Halt Processor

HLT — Halt Processor

HLT — Halt Processor

HLT — Halt Processor

hlt

This is a system instruction that stops the processor and puts it into a halt state.

UD2 — Undefined Instruction

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

UD2

UD2 — Undefined Instruction

UD2 — Undefined Instruction

UD2 — Undefined Instruction

UD2 — Undefined Instruction

UD2 — Undefined Instruction

UD2 — Undefined Instruction

UD2 — Undefined Instruction

UD2 — Undefined Instruction

UD2 — Undefined Instruction

UD2 — Undefined Instruction

ud2

UD2 is an undefined instruction and guaranteed to throw an opcode exception in all modes.

INVLPG — Invalidate TLB

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

INVLPG

INVLPG — Invalidate TLB

INVLPG — Invalidate TLB

INVLPG — Invalidate TLB

INVLPG — Invalidate TLB

INVLPG — Invalidate TLB

INVLPG — Invalidate TLB

INVLPG — Invalidate TLB

INVLPG — Invalidate TLB

INVLPG — Invalidate TLB

INVLPG — Invalidate TLB

invlpg mSrc

This instruction invalidates the TLB (Translation Lookaside Buffer) page referenced by mSrc.

LAR — Load Access Rights

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

LAR

LAR — Load Access Rights

LAR — Load Access Rights

LAR — Load Access Rights

LAR — Load Access Rights

LAR — Load Access Rights

LAR — Load Access Rights

LAR — Load Access Rights

LAR — Load Access Rights

LAR — Load Access Rights

LAR — Load Access Rights

lar

rDst16, rmSrc16

lar

rDst32, rmSrc32

lar

rDst64, rmSrc64

This system instruction copies the access rights from the segment descriptor referenced by the source rmSrc, stores them in the destination rDst, and sets the zero flag. This instruction can only be called from Protected Mode.

LOCK — Assert Lock # Signal Prefix

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

LOCK

LOCK — Assert Lock # Signal Prefix

LOCK — Assert Lock # Signal Prefix

   

LOCK — Assert Lock # Signal Prefix

LOCK — Assert Lock # Signal Prefix

LOCK — Assert Lock # Signal Prefix

LOCK — Assert Lock # Signal Prefix

LOCK — Assert Lock # Signal Prefix

lock

This system instruction is a code prefix to turn the trailing instruction into an atomic instruction. In a multiprocessor environment it ensures that the processor using the lock has exclusive access to memory shared with the other processor.

This instruction can only be used with the following instructions and only when they are performing a write operation to memory: ADD, ADC, AND, BTC, BTR, BTS, CMPSCHG, CMPXCHG8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XADD, XCHG.

This instruction works best with a read-modify-write operation such as the BTS instruction.

LSL — Load Segment Limit

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

LSL

LSL — Load Segment Limit

LSL — Load Segment Limit

LSL — Load Segment Limit

LSL — Load Segment Limit

LSL — Load Segment Limit

LSL — Load Segment Limit

LSL — Load Segment Limit

LSL — Load Segment Limit

LSL — Load Segment Limit

LSL — Load Segment Limit

lsl

rDst16, rmSrc16

lsl

rDst32, rmSrc32

lsl

rDst64, rmSrc64

This system instruction copies the segment descriptor referenced by the source rmSrc to the destination rDst.

MOV — Move To/From Control Registers

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

MOV CR

MOV — Move To/From Control Registers

MOV — Move To/From Control Registers

MOV — Move To/From Control Registers

MOV — Move To/From Control Registers

MOV — Move To/From Control Registers

MOV — Move To/From Control Registers

MOV — Move To/From Control Registers

MOV — Move To/From Control Registers

MOV — Move To/From Control Registers

MOV — Move To/From Control Registers

mov

cr{0...4}, r32

32

mov

r32, cr{0...4}

 

This system instruction copies memory from the control register to a general-purpose register or from a general-purpose register to a control register.

MOV — Move To/From Debug Registers

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

MOV DR

MOV — Move To/From Debug Registers

MOV — Move To/From Debug Registers

MOV — Move To/From Debug Registers

MOV — Move To/From Debug Registers

MOV — Move To/From Debug Registers

MOV — Move To/From Debug Registers

MOV — Move To/From Debug Registers

MOV — Move To/From Debug Registers

MOV — Move To/From Debug Registers

MOV — Move To/From Debug Registers

mov

r32, dr{0...7}

32

mov

dr{0...7}, r32

 

This system instruction copies memory from the debug register to a general-purpose register or from a general-purpose register to a debug register.

STMXCSR — Save MXCSR Register State

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

STMXCSR

     

STMXCSR — Save MXCSR Register State

STMXCSR — Save MXCSR Register State

STMXCSR — Save MXCSR Register State

STMXCSR — Save MXCSR Register State

STMXCSR — Save MXCSR Register State

stmxcsr mDst32

This system instruction saves the MXCSR control and status register to the destination mDst32. The complement to this instruction is LDMXCSR.

LDMXCSR — Load MXCSR Register State

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

LDMXCSR

     

LDMXCSR — Load MXCSR Register State

LDMXCSR — Load MXCSR Register State

LDMXCSR — Load MXCSR Register State

LDMXCSR — Load MXCSR Register State

LDMXCSR — Load MXCSR Register State

ldmxcsr mSrc32

This system instruction loads the MXCSR control and status register from the source mSrc32. The complement of this instruction is STMXCSR.

The default value is 00001F80h.

SGDT/SIDT — Save Global/Interrupt Descriptor Table

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

SGDT

SGDT/SIDT — Save Global/Interrupt Descriptor Table

SGDT/SIDT — Save Global/Interrupt Descriptor Table

SGDT/SIDT — Save Global/Interrupt Descriptor Table

SGDT/SIDT — Save Global/Interrupt Descriptor Table

SGDT/SIDT — Save Global/Interrupt Descriptor Table

SGDT/SIDT — Save Global/Interrupt Descriptor Table

SGDT/SIDT — Save Global/Interrupt Descriptor Table

SGDT/SIDT — Save Global/Interrupt Descriptor Table

SGDT/SIDT — Save Global/Interrupt Descriptor Table

SGDT/SIDT — Save Global/Interrupt Descriptor Table

SIDT

SGDT/SIDT — Save Global/Interrupt Descriptor Table

SGDT/SIDT — Save Global/Interrupt Descriptor Table

SGDT/SIDT — Save Global/Interrupt Descriptor Table

SGDT/SIDT — Save Global/Interrupt Descriptor Table

SGDT/SIDT — Save Global/Interrupt Descriptor Table

SGDT/SIDT — Save Global/Interrupt Descriptor Table

SGDT/SIDT — Save Global/Interrupt Descriptor Table

SGDT/SIDT — Save Global/Interrupt Descriptor Table

SGDT/SIDT — Save Global/Interrupt Descriptor Table

SGDT/SIDT — Save Global/Interrupt Descriptor Table

sgdt

m

sidt

m

The SGDT system instruction copies the Global Descriptor Table Register (GDTR) to the destination. The complement of this instruction is LGDT.

The SIDT system instruction copies the Interrupt Descriptor Table Register (IDTR) to the destination. The complement of this instruction is LIDT.

LGDT/LIDT — Load Global/Interrupt Descriptor Table

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

LGDT

LGDT/LIDT — Load Global/Interrupt Descriptor Table

LGDT/LIDT — Load Global/Interrupt Descriptor Table

LGDT/LIDT — Load Global/Interrupt Descriptor Table

LGDT/LIDT — Load Global/Interrupt Descriptor Table

LGDT/LIDT — Load Global/Interrupt Descriptor Table

LGDT/LIDT — Load Global/Interrupt Descriptor Table

LGDT/LIDT — Load Global/Interrupt Descriptor Table

LGDT/LIDT — Load Global/Interrupt Descriptor Table

LGDT/LIDT — Load Global/Interrupt Descriptor Table

LGDT/LIDT — Load Global/Interrupt Descriptor Table

LIDT

LGDT/LIDT — Load Global/Interrupt Descriptor Table

LGDT/LIDT — Load Global/Interrupt Descriptor Table

LGDT/LIDT — Load Global/Interrupt Descriptor Table

LGDT/LIDT — Load Global/Interrupt Descriptor Table

LGDT/LIDT — Load Global/Interrupt Descriptor Table

LGDT/LIDT — Load Global/Interrupt Descriptor Table

LGDT/LIDT — Load Global/Interrupt Descriptor Table

LGDT/LIDT — Load Global/Interrupt Descriptor Table

LGDT/LIDT — Load Global/Interrupt Descriptor Table

LGDT/LIDT — Load Global/Interrupt Descriptor Table

lgdt

mSrc16 ^ (32/64)

lidt

mSrc16 ^ (32/64)

The LGDT system instruction loads the source mSrc16 into the Global Descriptor Table Register (GDTR).

The LIDT system instruction loads the source mSrc16 into the Interrupt Descriptor Table Register (IDTR).

SLDT — Save Local Descriptor Table

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

SLDT

SLDT — Save Local Descriptor Table

SLDT — Save Local Descriptor Table

SLDT — Save Local Descriptor Table

SLDT — Save Local Descriptor Table

SLDT — Save Local Descriptor Table

SLDT — Save Local Descriptor Table

SLDT — Save Local Descriptor Table

SLDT — Save Local Descriptor Table

SLDT — Save Local Descriptor Table

SLDT — Save Local Descriptor Table

sldt

rmDst16

This system instruction copies the segment selector from the Local Descriptor Table Register (LDTR) to the destination rmDst16.

LLDT — Load Local Descriptor Table

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

LLDT

LLDT — Load Local Descriptor Table

LLDT — Load Local Descriptor Table

LLDT — Load Local Descriptor Table

LLDT — Load Local Descriptor Table

LLDT — Load Local Descriptor Table

LLDT — Load Local Descriptor Table

LLDT — Load Local Descriptor Table

LLDT — Load Local Descriptor Table

LLDT — Load Local Descriptor Table

LLDT — Load Local Descriptor Table

lldt

rmSrc16

This system instruction loads the source rmSrc16 into the segment selector element of the Local Descriptor Table Register (LDTR). This instruction is only available in Protected Mode.

SMSW — Save Machine Status Word

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

SMSW

SMSW — Save Machine Status Word

SMSW — Save Machine Status Word

SMSW — Save Machine Status Word

SMSW — Save Machine Status Word

SMSW — Save Machine Status Word

SMSW — Save Machine Status Word

SMSW — Save Machine Status Word

SMSW — Save Machine Status Word

SMSW — Save Machine Status Word

SMSW — Save Machine Status Word

smsw rmDst16

This system instruction copies the lower 16 bits of control register CR0 into the destination rmDst16.

LMSW — Load Machine Status Word

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

LMSW

LMSW — Load Machine Status Word

LMSW — Load Machine Status Word

LMSW — Load Machine Status Word

LMSW — Load Machine Status Word

LMSW — Load Machine Status Word

LMSW — Load Machine Status Word

LMSW — Load Machine Status Word

LMSW — Load Machine Status Word

LMSW — Load Machine Status Word

LMSW — Load Machine Status Word

lmsw rmSrc16

This system instruction loads the lower four bits of the source rmSrc16 and overwrites the lower four bits of the control register CR0.

STR — Save Task Register

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

STR

STR — Save Task Register

STR — Save Task Register

STR — Save Task Register

STR — Save Task Register

STR — Save Task Register

STR — Save Task Register

STR — Save Task Register

STR — Save Task Register

STR — Save Task Register

STR — Save Task Register

str rmDst16

This system instruction reads the task register and saves the segment selector value into the 16-bit destination rmDst16. The register gets the upper 16 bits cleared to zero in the upper bits of the 32-bit form.

str ax             ; actually stores 0000:AX into EAX

LTR — Load Task Register

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

LTR

LTR — Load Task Register

LTR — Load Task Register

LTR — Load Task Register

LTR — Load Task Register

LTR — Load Task Register

LTR — Load Task Register

LTR — Load Task Register

LTR — Load Task Register

LTR — Load Task Register

LTR — Load Task Register

ltr rmSrc16

This system instruction sets the task register with the segment selector stored in the 16-bit source rmSrc16.

RDMSR — Read from Model Specific Register

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

RDMSR

RDMSR — Read from Model Specific Register

RDMSR — Read from Model Specific Register

RDMSR — Read from Model Specific Register

RDMSR — Read from Model Specific Register

RDMSR — Read from Model Specific Register

RDMSR — Read from Model Specific Register

RDMSR — Read from Model Specific Register

RDMSR — Read from Model Specific Register

RDMSR — Read from Model Specific Register

RDMSR — Read from Model Specific Register

rdmsr

This is a system instruction that may only be run in Privilege Level 0. The Model Specific Register (MSR) indexed by ECX is loaded into the EDX:EAX register pair.

WRMSR — Write to Model Specific Register

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

WRMSR

WRMSR — Write to Model Specific Register

WRMSR — Write to Model Specific Register

WRMSR — Write to Model Specific Register

WRMSR — Write to Model Specific Register

WRMSR — Write to Model Specific Register

WRMSR — Write to Model Specific Register

WRMSR — Write to Model Specific Register

WRMSR — Write to Model Specific Register

WRMSR — Write to Model Specific Register

WRMSR — Write to Model Specific Register

wrmsr

This system instruction writes the 64-bit value in EDX:EAX to the Model Specific Register specified by the ECX register. In 64-bit mode the lower 32 bits of each 64-bit register RDX[0..31]:[RAX[0...31] form the 64-bit value that is written to the MSR specified by the RCX register.

MSR[ecx] = edx:eax

SWAPGS — Swap GS Base Register

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

SWAPGS

       

64

 

64

swapgs

This system instruction swaps the GS register value with the value in the MSR address C0000102H.

SYSCALL — 64-Bit Fast System Call

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

SYSCALL

  

SYSCALL — 64-Bit Fast System Call

SYSCALL — 64-Bit Fast System Call

SYSCALL — 64-Bit Fast System Call

  

SYSCALL — 64-Bit Fast System Call

 

64

syscall

This instruction is a fast 64-bit system call to privilege level 0. It allows code at the lower privilege levels to call code within Privilege Level 0.

SYSRET — Fast Return from 64-Bit Fast System Call

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

SYSRET

  

SYSRET — Fast Return from 64-Bit Fast System Call

SYSRET — Fast Return from 64-Bit Fast System Call

SYSRET — Fast Return from 64-Bit Fast System Call

  

SYSRET — Fast Return from 64-Bit Fast System Call

 

64

sysret

This instruction is a return from a fast 64-bit system call. It is a complement to SYSCALL.

SYSENTER — Fast System Call

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

SYSENTER

     

SYSENTER — Fast System Call

SYSENTER — Fast System Call

 

SYSENTER — Fast System Call

SYSENTER — Fast System Call

sysenter

This instruction is a fast system call to Privilege Level 0. It allows code at the lower privilege levels to call code within Privilege Level 0.

SYSEXIT — Fast Return from Fast System Call

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

SYSEXIT

     

SYSEXIT — Fast Return from Fast System Call

SYSEXIT — Fast Return from Fast System Call

 

SYSEXIT — Fast Return from Fast System Call

SYSEXIT — Fast Return from Fast System Call

sysexit

This instruction is a return from a fast system call. It is a complement to SYSENTER.

RSM — Resume from System Management Mode

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

RSM

RSM — Resume from System Management Mode

RSM — Resume from System Management Mode

RSM — Resume from System Management Mode

RSM — Resume from System Management Mode

RSM — Resume from System Management Mode

RSM — Resume from System Management Mode

RSM — Resume from System Management Mode

RSM — Resume from System Management Mode

RSM — Resume from System Management Mode

RSM — Resume from System Management Mode

rsm

This system instruction returns control from the System Management Mode (SMM) back to the operating system or the application that was interrupted by the SMM interrupt.

VERR/VERW — Verify Segment for Reading

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

VERR

VERR/VERW — Verify Segment for Reading

VERR/VERW — Verify Segment for Reading

VERR/VERW — Verify Segment for Reading

VERR/VERW — Verify Segment for Reading

VERR/VERW — Verify Segment for Reading

VERR/VERW — Verify Segment for Reading

VERR/VERW — Verify Segment for Reading

VERR/VERW — Verify Segment for Reading

VERR/VERW — Verify Segment for Reading

VERR/VERW — Verify Segment for Reading

VERW

VERR/VERW — Verify Segment for Reading

VERR/VERW — Verify Segment for Reading

VERR/VERW — Verify Segment for Reading

VERR/VERW — Verify Segment for Reading

VERR/VERW — Verify Segment for Reading

VERR/VERW — Verify Segment for Reading

VERR/VERW — Verify Segment for Reading

VERR/VERW — Verify Segment for Reading

VERR/VERW — Verify Segment for Reading

VERR/VERW — Verify Segment for Reading

verr

rm16

verw

rm16

mov  ax,cs
verr ax
verw ax

These instructions verify whether the specified segment/selector CS, DS, ES, FS, or GS is VERR (readable) or VERW (writeable) and sets the zero flag to 1 if yes or resets (clears) the zero flag if no. Code segments are never verified as writeable. The stack segment-selector (SS) is not an allowed register. These instructions are not available in Real Mode.

LDS/LES/LFS/LGS/LSS — Load Far Pointer

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

LDS

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LES

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LFS

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LGS

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LSS

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

LDS/LES/LFS/LGS/LSS — Load Far Pointer

lds

r32Dst, mSrc(16:32)

Protected Mode

48

lds

r16Dst, mSrc(16:16)

Real Mode

32

les

r32Dst, mSrc(16:32)

Protected Mode

48

les

r16Dst, mSrc(16:16)

Real Mode

32

lfs

r64Dst, mSrc(16:64)

64-bit Mode

80

lfs

r32Dst, mSrc(16:32)

64-bit, Protected Mode

48

lfs

r16Dst, mSrc(16:16)

64-bit, Real Mode

32

lgs

r64Dst, mSrc(16:64)

64-bit Mode

80

lgs

r32Dst, mSrc(16:32)

64-bit, Protected Mode

48

lgs

r16Dst, mSrc(16:16)

64-bit, Real Mode

32

lss

r64Dst, mSrc(16:64)

64-bit Mode

80

lss

r32Dst, mSrc(16:32)

64-bit, Protected Mode

48

lss

r16Dst, mSrc(16:16)

64-bit, Real Mode

32

This is a special memory pointer instruction that moves a memory address into a register pair with a specified pointer value. The form you use is determined by the (64-bit/Protected/Real) mode your code is for.

Flags

O.flow

Sign

Zero

Aux

Parity

Carry

 

-

-

-

-

-

-

Flags: None are altered by this opcode.

Protected Mode Win95 programmers do not need to get at the VGA, but if you have an old monochrome adapter plugged into your system this will be handy using Microsoft's secret (unpublished) selector {013fh}, which gets you access to every linear address on your machine {013fh:00000000...0ffffffffh}. This became the data selector for Win 95B and is a Bounds Error for Win32 and Win64 developers.

monoadr dd      0b0000h
monosel dw      013fh


        mov     edi,monoadr
        mov     es,monosel

or

        les     FWORD PTR monoadr

Of course, that pointer is used in a function such as:

        mov     es:[edi],eax
        add     edi,4

Saving that pointer back to the address:

        mov     monoadr,edi
        mov     monosel,es

That was fine and dandy, but the following is a quicker method even though it takes a little organization and is very easy to make a mistake due to its length.

monobase FWORD 013f000b0000h

        les   edi,monobase

The declaration has too many zeros and is a lil' too darn long, don't you think! It almost looks like binary. Loading that address into the pointer is very quick, but trying to save the pointer back to the address isn't so slick and it seems a little murky to me.

        mov     DWORD PTR monobase,edi
        mov     WORD PTR monobase+4,es

An alternate method would be using a data structure such as follows:

;       Protected Mode address (Far)
PMADR   STRUC
        adr     dd       ?         ; PM Address
        sel     dw       ?         ; PM Segment (Selector)
PMADR   ends

monobase PMADR   {000b0000h,013fh} ; Monochrome Base Address

And to actually get the pointer:

        les     edi,FWORD PTR monobase

Save the pointer back:

        mov     monobase.adr,edi
        mov     monobase.sel,es

Now, doesn't that look much cleaner? Assembly coding can get convoluted enough without creating one's own confusion. Now for those Real Mode programmers, a touch of VGA nostalgia:

vgaseg  dw    0a000h

mov     di,0
mov     es,vgaseg

The following code snippet is similar to the previous 32-bit version but scaled down for 16-bit. Using the same techniques:

;       Real Mode address (Far)
RMADR   STRUC
        off     dw       ?        ; Real Mode Offset
        rseg    dw       ?        ; Real Mode Segment
RMADR   ends

vgabase RMADR     {0,0a000h}      ; VGA Base Address

les     di,vgabase

And using that pointer:

        mov     es:[di],ax
        add     di,2

Hyperthreading Instructions

The scope of hyperthreading instructions is not covered in this book. Refer to the Intel-specific documentation for full specifications. They are considered OS/System instructions and as such will not be discussed in this book. They are accessible by the application layer at the low privilege level but are not part of the general application development process. They are only referenced here for informational purposes.

MONITOR — Monitor

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

MONITOR

        

MONITOR — Monitor

MONITOR — Monitor

monitor

This system instruction sets up a hardware monitor using an address stored in the EAX register and arms the monitor. Registers ECX and EDX contain information to be sent to the monitor. This is accessible at any privilege level unless the MONITOR flag in the CPUID is not set, indicating the processor does not support this instruction. This instruction is used in conjunction with the instruction MWAIT.

MWAIT — Wait

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

MWAIT

        

MWAIT — Wait

MWAIT — Wait

mwait

This system instruction is similar to a NOP but works in conjunction with the MONITOR instruction for signaling to a hardware monitor. It is a hint to the processor that it is okay to stop instruction execution until a monitor related event. MONITOR can be used by itself, but if MWAIT is used, only one MWAIT instruction follows a MONITOR instruction (especially in a loop).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset