There are multiple manufacturers all making different models of the 80×86 type microprocessors. Some are variations of the Intel processors and are highly specialized, but most are not. They are clones of the Intel processor family but with their own designs that require alternate optimization methods. Most of these manufacturers have technical manuals usually available in a PDF format that can be downloaded from the Internet and used for all your custom optimization needs. If the project you're coding for uses custom hardware, then you are probably using a custom processor such as National Semiconductor's NS486SXF under an operating system such as pSOS. When you are designing code for a specific processor, your code can be highly optimized and tuned accordingly.
When the hardware you are writing code for is a little more generic, the programmer needs a method to identify the exact model of processor that the code is running on. Each manufacturer has written a sample CPU code detection algorithm that uses the CPUID instruction. This is great, but these code samples are not exactly compatible with each other. Since it is ridiculous to write code that encapsulates all of these samples I have written this chapter to help you. You can find all sorts of variations of the following program on the Internet, but the following is designed to be expandable and versatile.
Most of these Intel processors are deviations of each other but if we take a closer look at their "family type" we will note a pattern of 80(x)86, where the x represents a family number. A 3 would be the 80386, etc. So using this family type number we can actually group the processor into a category of functionality, as each "group" actually has its individual subset of instructions that it could execute.
Other manufacturers have second sourced various models of the 80×86 processor line. Intel and AMD are the primary manufacturers, but other manufacturers have brought to market their modified or less expensive versions of these same processors.
Workbench Files:Benchx86chap03projectplatform
Mnemonic | P | PII | K6 | 3D! | 3Mx+ | SSE | SSE2 | A64 | SSE3 | E64T |
---|---|---|---|---|---|---|---|---|---|---|
CPUID |
cpuid
This instruction uses the value stored in the EAX register as a function identifier and returns the related requested information in the various associated registers.
With the release of the Pentium chip, Intel instituted the CPUID instruction, which gives detailed information of the capabilities of the individual processor. This was also introduced into the re-release of the Intel 80486 processor. AMD has implemented it in all models since the Am486. This makes it easier to identify the capabilities of the CPU being tested.
Before trying to use this instruction, bit #21 of the EFLAGS/ RFLAGS must be tested to see if it is writable. If it is, the CPUID instruction exists and therefore can be called. The application code uses mainly the PUSHFD/PUSHFQ and POPFD/POPFQ instructions to manipulate the EFLAGS/RFLAGS register.
pushfd ; push EFLAGS register pop eax ; pop those flags into EAX xor eax,EFLAGS_ID ; flip ID bit#21 in EFLAGS push eax ; push modified flags on stack popfd ; pop flags back into EFLAGS pushfd ; Push resulting EFLAGS on stack pop ecx ; pop those flags into ECX xor eax,ecx ; See if bit stayed flipped jz $nope ; Jump if bit not flipped ; If here then bit flipped so CPUID exists cpuid
At a very minimum, all CPUs that support the CPUID instruction support both functions #0 and #1.
Flags | O.flow | Sign | Zero | Aux | Parity | Carry |
---|---|---|---|---|---|---|
- | - | - | - | - | - |
Flags: None are altered by this opcode.
Function | Returned Data |
---|---|
EAX=0 | EAX = The highest CPUID function number this CPU can handle. The Intel Pentium and 486 return a 1 in EAX. The Pentium Pro returns a 2 in EAX. The EBX, EDX, ECX registers contain a text identifier. ebx edx ecx Amd = Auth, enti, cAMD Centaur = Cent, aurH, auls Cyrix = Cyri, xIns, tead Intel = Genu, ineI, ntel |
EAX=1 | EAX = Version Information. Bits 0...3 – Stepping ID Bits 4...7 – Model Bits 8...11 – Generation / family Bits 12...15 – Reserved Bits 16...19 – Extended model Bits 20...27 – Extended family Bits 28...31 – Reserved. EBX = Bits 0...7 – Brand Index Bits 8...15 – CLFLUSH line size Bits 16...23 – (Intel) # of logical processors (AMD) Reserved Bits 24...31 – Processor's initial local APIC ID ECX = (Intel) Feature info. (AMD) Reserved EDX= Feature info |
Intel EAX=2 | |
Intel EAX=3 | EAX, EBX, ECX, EDX = Reserved |
Intel EAX=4 | EAX = Bits 0...4 – Cache type Bits 5...7 – Cache level Bit 8 – Self-initializing cache Bit 9 – Fully associative cache Bits 10...13 – Reserved Bits 14-25 – Number of threads sharing cache Bits 26...31 – Number of processor cores on the die EBX = Bits 0...11 L = System coherency line size Bits 12...21 P = Physical line partitions Bits 22...31 W = Ways of associativity ECX = 0...31 Number of sets EDX = Reserved |
Intel EAX=5 | EAX = Bits 0...15 – Smallest monitor-line byte size Bits 16...31 – Reserved EBX, ECX, EDX = Reserved |
AMD, Cyrix, and WinChip EAX= 80000000h | If string identifier with function #0 matches for AMD, Cyrix, or WinChip, test for this function. If a non-zero value is returned in EAX, an extended function set is supported, just like function #0. The EAX register contains the highest extended function that the CPU can handle. |
Intel EAX= 80000000h | EAX = Maximum input value for extended CPUIDs EBX, ECX, EDX = Reserved |
AMD, Cyrix, and WinChip EAX= 80000001h | See the Intel – Standard CPUID ECX-Feature Flags section. EAX = Processor signature EBX, ECX = Reserved See the AMD – Extended #1 CPUID EDX-Feature Flags section. |
Intel EAX= 80000001h | Extended processor signature and extended feature bits. See the Intel – Extended #1 CPUID EDX-Feature Flags section. |
AMD, Cyrix, WinChip, and Intel EAX= 80000002h 80000003h 80000004h | EAX, EBX, ECX, EDX = 4 * 4 * 3 = 48 byte text string |
AMD, Cyrix, WinChip, and Intel EAX= 80000006h | L2 Cache bits ECX = Bits 0...7 – Cache line size Bits 8...11 – Lines per tag Bits 12...15 – L2 Associativity Bits 16...31 – Number of 1K cache blocks EAX, EBX, EDX = Reserved |
AMD EAX= 80000007h | EDX = Advanced power management EAX, EBX, ECX = Reserved |
Intel EAX= 80000007h | EAX, EBX, ECX, EDX = Reserved |
AMD, Intel EAX= 80000008h | EAX = Bits 0...7 – Physical address bits Bits 8...15 – Virtual address bits Bits 16...31 – Reserved EBX, ECX, EDX = Reserved |
The initial CPUID call gives us the manufacturer ID string.
Intel: db "GenuineIntel" mov eax,0 ; Function #0 cpuid cmp ebx,dword ptr Intel jne $Nope ; Jump if not a match cmp edx,dword ptr Intel+4 jne $Nope ; Jump if not a match cmp ecx,dword ptr Intel+8 jne $Nope ; Jump if not a match ; We have a match!!! (If an Intel chip!)
; CPUID (EDX= flags) <<< Command EAX=1
CPUIDFLG_ | Code | Bit | Flag Descriptions |
---|---|---|---|
FPU | 000000001h | 0 | Floating-point support |
VME | 000000002h | 1 | Virtual Mode Extensions |
DE | 000000004h | 2 | Debugging Extensions |
PSE | 000000008h | 3 | Page Size Extension |
TSC | 000000010h | 4 | RDTSC supported |
MSR | 000000020h | 5 | RDMSR and WRMSR |
PAE | 000000040h | 6 | Physical Address Extensions |
MCE | 000000080h | 7 | Machine Check Exception |
CX8 | 000000100h | 8 | CMPXCHG8B supported |
APIC | 000000200h | 9 | |
--- | 000000400h | 10 | Reserved |
SEP | 000000800h | 11 | SYSCALL, SYSRET enable |
MTRR | 000004000h | 12 | Memory-type Range Reg |
PGE | 000002000h | 13 | Page Global Enable |
MCA | 000004000h | 14 | Machine Check Architecture |
CMOV | 000008000h | 15 | CMOV supported |
PAT | 000010000h | 16 | Page Attribute Table |
PSE | 000020000h | 17 | 36-bit Page-Size Extensions |
PSN | 000040000h | 18 | (Intel) Processor Serial # (AMD) Reserved |
CLFLUSH | 000080000h | 19 | CLFlush enabled |
--- | 000100000h | 20 | Reserved |
DS | 000200000h | 21 | (Intel) Debug Store (AMD) Reserved |
ACPI | 000400000h | 22 | (Intel) Thermal Monitor (AMD) Reserved |
000800000h | 23 | MMX supported | |
FXSR | 001000000h | 24 | Fast floating-point save and load |
SSE | 002000000h | 25 | SSE supported |
SSE2 | 004000000h | 26 | SSE2 supported |
SS | 008000000h | 27 | (Intel) Self Snoop (AMD) Reserved |
HTT | 010000000h | 28 | (Intel) HTT (HyperThread) (AMD) Reserved |
TM | 020000000h | 29 | (Intel) Thermal Monitor (AMD) Reserved |
--- | 040000000h | 30 | Reserved |
PBE | 080000000h | 31 | (Intel) Pending Break (AMD) Reserved |
; CPUID (ECX= flags) <<< Command EAX=1
CPUIDFLG_ | Code | Bit | Flag Descriptions |
---|---|---|---|
SSE3 | 000000001h | 0 | SSE3 supported |
--- | 00000000xh | 1, 2 | Reserved |
MONITOR | 000000008h | 3 | MONITOR,WAIT supported |
DS_CPL | 000000010h | 4 | CPL Qualified Debug Store |
--- | 0000000x0h | 5, 6 | Reserved |
EIST | 000000080h | 7 | Enhanced Intel SpeedStep |
TM2 | 000000100h | 8 | Thermal Monitor 2 |
--- | 000000200h | 9 | Reserved |
CID | 000000400h | 10 | Context ID |
--- | 00000xx00h | 11-13 | Reserved |
xTPR | 000004000h | 14 | Send Task Priority Messages |
--- | 15-31 | Reserved |
; CPUID (EDX= flags) <<< Command EAX=8000:0001h
; CPUID (EDX= flags) <<< Command EAX= 8000:0001h
AMD_EFLG | Code | Bit | Flag Descriptions |
---|---|---|---|
FPU | 000000001h | 0 | Floating Point support |
VME | 000000002h | 1 | Virtual Mode Extensions |
DE | 000000004h | 2 | Debugging Extensions |
PSE | 000000008h | 3 | Page Size Extension |
TSC | 000000010h | 4 | RDTSC supported |
MSR | 000000020h | 5 | RDMSR and WRMSR |
PAE | 000000040h | 6 | Physical Address Extensions |
MCE | 000000080h | 7 | Machine Check Exception |
CX8 | 000000100h | 8 | CMPXCHG8B supported |
APIC | 000000200h | 9 | Advanced Programmable Interrupt Controller |
--- | 000000400h | 10 | Reserved |
SEP | 000000800h | 11 | SYSCALL, SYSRET enabled |
MTRR | 000004000h | 12 | Memory-type Range Reg |
PGE | 000002000h | 13 | Global Page Extension |
MCA | 000004000h | 14 | Machine Check Architecture |
CMOV | 000008000h | 15 | CMOV supported |
PAT | 000010000h | 16 | Page Attribute Table |
PSE | 000020000h | 17 | Page-Size Extensions |
--- | 0000x0000h | 18, 19 | Reserved |
NEPP | 000100000h | 20 | No-Execute Page Protection |
--- | 000200000h | 21 | Reserved |
MMXEXT | 000400000h | 22 | MMX Extensions supported |
MMX | 000800000h | 23 | MMX supported |
FXSAVE | 001000000h | 24 | FXSAVE, FXRSTOR enable |
FFXSAVE | 002000000h | 25 | Fast FXSAVE, FXRSTOR |
26, 28 | Reserved | ||
EM64T | 020000000h | 29 | |
3DNOWX | 040000000h | 30 | 3DNow! MMX+ supported |
3DNOW | 080000000h | 31 | 3DNow! supported |
Intel created a feature for the PIII processor in the original SSE instruction set, but due to a political uproar as an infringement upon privacy it was removed in successive processors. In some respects it was a good thing to be able to track a particular computer, such as a violator of an online gaming network. An exact machine could be banned due to its fingerprint. However, others felt that people would lose their anonymity while on the Internet.
mov eax,1 cpuid test edx,CPUIDFLG_PSN jz $xit ; CPUID serial number is supported and enabled! push eax mov eax,3 cpuid pop eax ; eax:edx:ecx = 96-bit serial number in capitalized hex digits. ; XXXX-XXXX-XXXX-XXXX-XXXX-XXXX $xit:
There are a lot of features in the CPUID, but most of them are not needed for what we are doing here. I have documented some of what this instruction does (a lot more than what I normally need), but I strongly recommend that if you are truly interested in this instruction that you download the manufacturer's technical manuals.
Most programs being written these days are primarily written for a Protected Mode environment and so we only need to deal with, at a minimum, the first processor capable of truly running in Protected Mode — the 386 processor. (The 80286 does not count!) This CPU detection algorithm detects the model, manufacturer, and capabilities, and sets flags as such. As we really only deal with 32-bit modes in this book, we do not bother detecting for an 8086, 80186, or an 80286. We do, however, detect for a 386 or above. In our algorithm we use the following CPU IDs.
This instruction has been enhanced since I wrote Vector Game Math Processors as newer instructions have been added to the processor. It has been used throughout the book, but let us examine it a bit closer.
; CPU Detect - definition IDs CPU_386 = 3 ; 80386 CPU_486 = 4 ; 80486 CPU_PENTIUM = 5 ; P5 (Pentium) CPU_PENTIUM_PRO = 6 ; Pentium Pro CPU_PII = 6 ; PII
Prior to the Pentium processor, a computer system would optionally have a floating-point chip, which contained a FPU. In the case of CPUs, no functionality is lost as one upgrades to a more advanced processor; they are all downward compatible. This is not the case with the FPU. Some functionality was lost; so if writing any floating-point instructions, you should know which FPU you are coding for. Some external FP chips did not exactly match the processor but were compatible.
; Legacy CPUs and compatible FPU coprocessors ; CPU_086 NONE, FPU_087 ; CPU_186 NONE, FPU_087 ; CPU_286 NONE, FPU_287 ; CPU_386 NONE, FPU_287, FPU_387 ; CPU_486 NONE, FPU_387, FPU_487 ; FPU Detect - definition IDs FPU_NONE = 0 ; No FPU chip FPU_087 = 1 ; 8087 FPU_287 = 2 ; 80287 FPU_387 = 3 ; 80387 FPU_487 = CPU_486 FPU_PENTIUM = CPU_PENTIUM FPU_PII = CPU_PII
The various manufacturers implemented the same functionality as Intel but recently have begun to do their own. Due to this, unions and intersections can be drawn, and so we use individual flags to indicate CPU capability.
typedef enum { CPUBITS_FPU = 0x0001, // FPU flag CPUBITS_MMX = 0x0002, // MMX flag CPUBITS_3DNOW = 0x0004, // 3DNow! flag CPUBITS_FXSR = 0x0008, // Fast FP Store CPUBITS_SSE = 0x0010, // SSE CPUBITS_SSE2 = 0x0020, // SSE (Ext 2) CPUBITS_3DNOW_MMX = 0x0040, // 3DNow! (MMX Ext) CPUBITS_3DNOW_EXT = 0x0080, // 3DNow! (Ext) CPUBITS_3DNOW_SSE = 0x0100, // 3DNow! Professional CPUBITS_HTT = 0x0200, // Hyperthreading Tech CPUBITS_SSE3 = 0x0400, // Prescott NI CPUBITS_EM64T = 0x0800, // EM64T supported CPUBITS_AMD64 = 0x1000, // AMD Long Mode } CPUBITS;
Each manufacturer has its own unique optimization methods and so we get a vendor name.
Example 16-1. ...inc???CpuAsm.h
typedef enum { CPUVEN_UNKNOWN = 0, // Unknown CPUVEN_INTEL = 1, // Intel CPUVEN_AMD = 2, // AMD CPUVEN_CYRIX = 3, // Cyrix CPUVEN_CENTAUR = 4, // IDT Centaur (WinChip) CPUVEN_NATIONAL = 5, // National Semiconductor CPUVEN_UMC = 6, // UMC CPUVEN_NEXGEN = 7, // NexGen CPUVEN_RISE = 8, // Rise CPUVEN_TRANSMETA = 9 // Transmeta } CPUVEN;
We use the following data structure to reference the extracted CPU information.
typedef struct CpuInfoType { uint nCpuId; // CPU type identifier uint nFpuId; // floating-point Unit ID uint nBits; // Feature bits uint nMfg; // Manufacturer byte nProcCnt; // # of logical processors byte pad[3];
} CpuInfo; CpuInfo struct 4 nCpuId dd 0 ; CPU type identifier nFpuId dd 0 ; Floating-point unit identifier nBits dd 0 ; Feature bits nMfg dd 0 ; Manufacturer nProcCnt db 0 ; # of logical processors pad db 0,0,0 CpuInfo ends
This book's CPU detection uses the following data structure for finding matching vendor information. Each microprocessor that supports the CPUID instruction has encoded a 12-byte text string identifying the manufacturer.
; Vendor Data Structure VENDOR STRUCT 4 vname BYTE '------------' Id DWORD CPUVEN_UNKNOWN VENDOR ENDS VENDOR { "AMD ISBETTER", CPUVEN_AMD } ; AMD Proto VENDOR { "AuthenticAMD", CPUVEN_AMD } ; AMD VENDOR { "CyrixInstead", CPUVEN_CYRIX } ; Cyrix & IBM VENDOR { "GenuineIntel", CPUVEN_INTEL } ; Intel VENDOR { "CentaurHauls", CPUVEN_CENTAUR } ; Centaur VENDOR { "UMC UMC UMC ", CPUVEN_UMC } ; UMC (retired) VENDOR { "NexGenDriver", CPUVEN_NEXGEN } ; NexGen (retired) VENDOR { "RiseRiseRise", CPUVEN_RISE } ; Rise VENDOR { "GenuineTMx86", CPUVEN_TRANSMETA } ; Transmeta
Example 16-2. ...RootApp.cpp
#include "CpuAsm.h" // CPU module
CpuInfo cinfo;
char szBuf[ CPU_SZBUF_MAX ];
CpuDetect( &cinfo ); // Detect CPU
cout << "
CPU Detection Code Snippet
";
// Fills in buffer 'szBuf' with CPU information!
cout << CpuInfoStr( szBuf, &cinfo ) << endl;
CpuSetup( &cinfo ); // Now set up function pointers
This is an example of what gets filled into the ASCII buffer with a call to the function CpuInfoStr().
"CpuId:15 'INTEL' FPU MMX FXSR SSE SSE2 SSE3 HTT"
That took care of the initial detection code. Now comes the fun part —function mapping. Every function you write should have a set of slower default code written in a high-level language such as C. This is really very simple. First there are the private definitions:
void FmdSetup(const CpuInfo * const pcinfo); void vmp_FMulGeneric(float * const pfD, float fA, float fB); void vmp_FMulAsm3DNow(float * const pfD, float fA, float fB); void vmp_FMulAsmSSE(float * const pfD, float fA, float fB); void vmp_FDivGeneric(float * const pfD, float fA, float fB); void vmp_FDivAsm3DNow(float * const pfD, float fA, float fB); void vmp_FDivAsmSSE(float * const pfD, float fA, float fB); void vmp_FDivFastAsm3DNow(float * const pfD, float fA, float fB); void vmp_FDivFastAsmSSE(float * const pfD, float fA, float fB);
Then there are the public application definitions:
// Multiplication typedef void (*vmp_FMulProc)(float * const pfD, float fA, float fB); extern vmp_FMulProc vmp_FMul; // Division typedef void (*vmp_FDivProc)(float * const pfD, float fA, float fB); extern vmp_FDivProc vmp_FDiv; extern vmp_FDivProc vmp_FDivFast;
There are the generic as well as processor-based functions such as:
// Multiplication void vmp_FMulGeneric(float * const pfD, float fA, float fB) { ASSERT_PTR4(pfD); *pfD = fA * fB; }
The initialization code assigns the appropriate processor-based function to the public function pointer:
void CpuSetup(const CpuInfo * const pcinfo) { ASSERT_PTR4(pcinfo); if (CPUBITS_SSE & pcinfo->nBits) { vmp_FMul = vmp_FMulAsmSSE; vmp_FDiv = vmp_FDivAsmSSE; vmp_FDivFast = vmp_FDivFastAsmSSE; // ***FAST*** }
else if (CPUBITS_3DNOW & pcinfo->nBits) { vmp_FMul = vmp_FMulAsm3DNow; vmp_FDiv = vmp_FDivAsm3DNow; vmp_FDivFast = vmp_FDivFastAsm3DNow; //***FAST*** } else { vmp_FMul = vmp_FMulGeneric; vmp_FDiv = vmp_FDivGeneric; vmp_FDivFast = vmp_FDivGeneric; } }
You will probably need to play with the mapping until you get used to it. You could use case statements, function table lookups, or other methods, but due to similarity of processor types I find the conditional branching with Boolean logic seems to work best.
What is supplied should be thought of as a starting point. It should be included with most applications, even those that do not use any custom assembly code, as it will compile a breakdown of the computer that ran the application. With custom assembly code, it is the building block of writing cross processor code. There is one more bit of "diagnostic" information that you can use — the processor speed. It can give you an idea of why your application is not running well. (Sometimes processors do not run at their marked speed either through misconfiguration or overheating.) This is discussed in Chapter 18, "System."
The listed information can be obtained by using the included function CpuDetect(); however, from your point of view, who manufactured the CPU is not nearly as important as to the bits CPUBITS listed above! Each of those bits being set indicates the existence of the associated functionality. Your program would merely check the bit and correlate the correct set of code. If the processor sets the CPUBITS_3DNOW bit, then it would need to vector to the 3DNow!-based algorithm. If the CPUBITS_SSE bit is set, then it would vector to that set of code. Keep in mind that when I first started writing this book neither existed on the same CPU, but while I was writing it, AMD came out with 3DNow! Professional. This is a union of the two superset families (excluding the SSE3) for which there is also a CPU bit definition. However, that can easily change in the future. My recommendation would be to rate their priority from highest to lowest performance in the initialization logic of your program based upon your applications' criteria.