Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 1. Introduction

When the processor manufacturer Intel is mentioned, two 64-bit processors come to mind: EM64T and the Itanium. For AMD: the AMD64. Non-80×86 manufacturers discovered years ago that competing against an established desktop market is difficult to impossible. The key to successful market injection is being able to run a large quantity of pre-existing applications. Intel and AMD have built their business upon this by periodically creating superset instruction sets for their 80×86 processors so that pre-existing software still runs on the new equipment and new software can be written for the new equipment.

The technology has been forked into two 64-bit paths. One uses the Itanium-based platform with a new 64-bit primary instruction set that belongs to the IA-64 family. The other is a superset to the IA-32, referred to as the Extended Memory 64 Technology (EM64T). Newer P4 and Xeon processors are of this alternate type.

This book targets the AMD32/64, IA-32, and EM64T processor technology. It is not written for the Itanium series. (Look for a future book to cover the Itanium processor.) The EM64T supports a new superset instruction set, SSE-3, and 64-bit extensions to the IA-32 general-purpose instruction set. It also allows 64-bit operating systems such as Windows XP Professional x64 and Windows Server 2003 x64 editions to run both 64-bit and 32-bit software on the same machine.

This book can be used for both 32-bit and 64-bit instruction sets, but there is an operating system application dependency that needs to be followed.

Operating System	App (32-bit) IA-32	App (64-bit) AMD64/EM64T
Win9X (32-bit)	×
WinXP (32-bit)	×
Win2K (32-bit)	×
Win2003 (32-bit)	×
XP - X64 (64-bit)	×	×
Win Server 2003 X64	×	×

The 80×86 processor has joined the domain of the super computer since the introduction of the SIMD (single instruction multiple data) such as Intel's Pentium III used in the Xbox, and all other x86s including the Pentium IV and AMD's 3DNow! extension instructions used in PCs. And now they are available in 64 bit. Both fixed-point (inclusive of integer) and floating-point math are being used by the computer, video gaming, and embedded worlds in assembly and vector-based operations.

3D graphic rendering hardware has been going through major increases in the numbers of polygons that can be handled by using geometry engines as part of their rendering hardware to accelerate the speed of mathematical calculations. There is also the recent introduction of the programmable vertex and pixel shaders built into newer video cards that use this same vector functionality. (This is another type of assembly language programming. For more information on shaders read my book Learn Vertex and Pixel Shader Programming with DirectX 9.) These work well for rendering polygons with textures, depth ordering Z-buffers or W-buffers, and translucency controlled alpha channels with lighting, perspective correction, etc., at relatively high rates of speed. The problem is that the burden of all the other 3D processing, culling, transformations, rotations, etc., are put on the computer's central processing unit (CPU), which is needed for artificial intelligence (AI), terrain following, landscape management, property management, sound, etc. Fortunately for most programmers, a continuous growth market of middle-ware providers is developing key building blocks such as the Unreal 3D rendering libraries and physics packages such as Havok. Whether you are looking to become employed by these companies and generate this technology or merely one who wishes to use these libraries, you should keep in mind that the introduction of new hardware technology has created a surplus of CPU processor power that can now be used to fulfill aspects of your programming projects as well as develop new technologies. All of this creates openings for programmers needing to write assembly language, whether using a scalar or parallel architecture.

There are perhaps only two reasons for writing code in assembly language: writing low-level kernels in operating systems and writing high-speed optimized critical code. A vector processor can be given sequences and arrays of calculations to perform to enhance the performance above that of scalar operations that high-level compilers typically generate during a compile.

There are exceptions to this as some vector compilers do exist but as of yet have not been adopted into the mainstream marketplace. These are well worth investigating if you are in need of high-level C code that takes advantages of SIMD instruction sets.

One other item to keep in mind is that if you understand this information, it may be easier for you to get a job in the game or embedded software development industry. This is because you will have enhanced your programming foundation and possibly have a leg up on your competition. Even if you rarely program in 80×86 assembly language, peeking at the disassembly output of your high-level compiler while debugging your application can give you insight into code bloat due to your coding methodology and you will better be able to resolve some of the weird bugs you encounter in your applications.

Goal: A better understanding of 80×86 assembly.

I know how a number of you like technical books to be like a resource bible, but I hate for assembly books (no matter how detailed) to be arranged in that fashion, because:

It takes me too long to find what I am looking for!
They almost always put me to sleep!

This book is not arranged like a bible, but it contains the same information. By using the instruction mnemonic lookup in Appendix B, it becomes an abstracted bible. It is instead arranged in chapters of functionality. If you want that bible-like alpha-sorted organization, just look at the index or Appendix B of this book, scan for the instruction you are looking for, and turn to the page.

I program multiple processors in assembly and occasionally have to reach for a book to look up the correct mnemonic. Quite often my own books! Manufacturers almost always seem to camouflage those needed instructions. As an example, mnemonics shifting versus rotating can be located all over the place in a book. For example, in the 80×86, {psllw, pslld, psllq, ..., shld, shr, shrd} are mild cases due to the closeness of their spellings, but for Boolean bit logic, {and...or, pand...xor} are all over the place in an alphabetical arrangement. When grouped in chapters of functionality, however, one merely turns to the chapter related to what functionality is required and then leafs through the pages. For these examples, merely turn to Chapter 4, "Bit Mangling" or Chapter 5, "Bit Wrangling." Okay, okay, so I had a little fun with the chapter titles, but there is no having to wade through pages of extra information trying to find what you are looking for. In addition (not meant to be a pun), there are practical examples near the descriptions as well as in Chapter 19, which are even more helpful in jogging your memory as to an instruction's usage. Even the companion code for this book uses this same orientation.

The examples are for the 80×86. I tried to minimize printed computer code as much as possible so that the pages of the book do not turn into a mere source code listing! Hopefully I did not overtrim and make it seem confusing. If that occurs, merely open your source code editor or integrated development environment (IDE) to the chapter and project in the accompanying code related to that point in the book you are trying to understand. By the way, if you find a discrepancy between the code and the book, you should favor the code as the code in the book was cut and pasted from elements of code that could be lost during the editing process.

The book is also written in a friendly style so as to occasionally be amusing and thus help you in remembering the information over a longer period of time. What good is a technical book that is purely mathematical in nature, difficult to extract any information from, and just puts you (I mean me) to sleep? You would most likely have to reread the information again once you woke up! The idea is that you should be able to sit down in a comfortable setting and read the book cover to cover to get a global overview. (My favorite place to read is in a lawn chair on the back patio with a wireless laptop.) Then go back to your computer and, using the book as a tool, implement what you need or cut and paste into your code. But use at your own risk! You should use this book as an appendix to more in-depth technical information to gain an understanding of that information.

An attempt was made to layer the information so you would be able to master the information at your skill level. In regard to cutting and pasting: You will find portions of this book also occur inside one of my other published books: Vector Game Math Processors. There is a degree of overlap, but this book is to be considered the prequel and a foundation for that book. Any duplication of information between the two has been enhanced in this book as it is now almost three years later and the technology has been extended.

The code is broken down by platform, chapter, and project, but most of the code has not been optimized. This is explained later but briefly, optimized code is difficult to read and understand. For that reason, I tried to keep this book as clear and as readable as possible. Code optimizers such as Intel's VTune program are available for purposes of optimization.

This book, as mentioned, is divided into chapters of functionality. It is related to the use of learning to write 80×86 assembly language for games, or embedded and scientific applications. (Except for writing a binary-coded decimal (BCD) package, there is not a big need for assembly language in pure business applications.) Now graphically or statistically oriented, that is another matter. With that in mind, you will learn from this book:

Adapted coding standards that this book recommends
Bit manipulations and movement
Converting data from one form into another
Addition/subtraction (integer/floating-point/BCD)
Multiplication/division (integer/floating-point)
Special functions
(Some) trigonometric functionality
Branching and branchless coding
Some vector foundations
Debugging

It is very important to write functions in a high-level language such as C code before rewriting in assembly. Do not write code destined for assembly code using the C++ programming language because you will have to untangle it later. Assembly language is designed for low-level development and C++ for high-level object-oriented development using inheritance, name mangling, and other levels of abstraction, which makes the code harder to simplify. There is of course no reason why you would not wrap your assembly code with C++ functions or libraries. But I strongly recommend you debug your assembly language function before locking it away in a static or dynamic library, as debugging it will become harder. This allows the algorithm to be debugged and mathematical patterns to be identified before writing the algorithm. In addition, the results of both algorithms can be compared to verify that they are identical and thus the assembly code is functioning as expected.

Tip

Sometimes examining compiler output can give insight into writing optimized code. (That means peeking at the disassembly view while debugging your application.)

Conventions Used in This Book

Companion Code

This book is accompanied by downloadable code available from www.wordware.com/files/8086 and www.leiterman.com/books.html. Each chapter with related sample code will have a table similar to the following:

Workbench Files: Benchx86chap02projectplatform

	project	platform
Ram Test	ramtest	vc6
ram	ram	vc.net

Substituting the data elements in a column into the "Workbench Files:" path will establish a path as to where a related project is stored.

The idea of the code is to be a usable sample of assembly code that has been split open to display its internals for observation and learning. Each module has its own initialization routine to set up function prototype pointers. These are used to vector to the appropriate function designed to be executed by a specific instruction set. This is mainly for the PC as the Xbox only uses a single PIII processor. The correct functions best suited for that processor are assigned to the pointers. If you are one of those (unusual) people running two or more processors in a multiprocessor computer, and they are not identical, then I am sorry! A few motherboards still support this scenario. The good news is that the samples are single threaded so you should theoretically be okay.

One other item: There are many flavors of processors and although the basis of each function call was a cut-and-paste from other functions, the core functionality of the function was not. So there may be some debris where certain comments are not exactly correct; forgive me. There is a lot of code to have to wade through and keep in sync with the different models of 80×86 processors. It has been primarily borrowed from my vector book.

Image Patterning

The images in this book show flow control and sometimes have a split in the middle. These types of images are typically 128-bit SSE, but in cases where the 64-bit MMX has identical functionality, the right side of the split is used to represent the 64-bit sub-set bits 0... 63.

Sometimes the sub-pattern is not appropriate and so it will be shown separately.

Some of these instructions support not only 4× single-precision floating-point (SSE) and 2× single-precision floating-point (MMX), but also 2× double-precision floating-point (SSE2). It just so happens that the patterns for MMX and SSE2 match. So refer to the MMX pattern when working with double-precision floating-point.

Processor Legend

Each mnemonic will indicate a version of processor that supports it.

	Intel
P	Pentium
PII	Pentium II (MMX)
SSE	SSE (Katmai NI)
SSE2	SSE2
SSE3	SSE3 (Prescott NI)
E64T	64-bit Memory

	AMD
K6	K6
3D!	3DNow!
3Mx+	3DNow! and MMX Ext.
A64	AMD64

Mnemonic	P	PII	K6	3D!	3Mx+	SSE	SSE2	A64	SSE3	E64T
INC								32		32
SYSRET										64

	Mnemonic supported
32	32-bit support only
64	64-bit support only

Each mnemonic will also have a table showing its organization as well as type of data, bit size of data, and operand method for each processor supported.

	Op rmDst, rSrc(8/16/32/64)	Signed
	Op rDst, rmSrc(8/16/32/64)
MMX	Op mmxDst, mmxSrc/m64	Signed	64
3DNow!	Op mmxDst, mmxSrc/m64	Signed	64
SSE	Op xmmDst, xmmSrc/m128	Single-precision	128
SSE2	Op xmmDst, xmmSrc/m128	Single-precision	128

rDst	General-purpose register — {8-, 16-, 32-, 64-bit}
rSrc{8/16/32/64}
m32, m64, m128	32-, 64-, 128-bit memory access
rmDst	General-purpose register and/or memory
mmxDst, mmxSrc	MMX registers
xmmDst, xmmSrc	XMM registers

Note

When general-purpose registers specify 64 bit, such as rDst, rmSrc{8/16/32/64}, then that is only available when running in 64-bit mode. Contrarily, 64-bit data is available for MMX and SSE instruction sets.

Some operators take zero, one, two, or three arguments. Unlike some non-80×86 processors, sourceB is the same as the destination. The 80×86 uses a D += A methodology. When the SIMD operator is being explained, both arguments will be used, such as "The Op instruction operates upon the sources aSrc (xmmSrc) and bSrc (xmmDst), and the result is stored in the destination Dst (xmmDst)." In some cases operand is used instead of source and/or destination, especially when the instruction has no destination. SSE registers (xmm) are favored over MMX (mmx) registers in explanations. 32-bit general-purpose registers are favored over 8/16/64-bit registers as that is the default size for both 32-bit and 64-bit processors.

The following list describes the data types given above.

Unsigned — Unsigned integer
Signed — Signed integer
[Un]signed — Sign neutral, thus integer can be either signed or unsigned
Single-Precision — Single-precision floating-point
Double-Precision — Double-precision floating-point
Extended-Precision — Double extended-precision floating-point
DPFP → INT32 — (Conversion) double-precision floating-point to signed 32-bit integer

Note

Technically an integer that is considered signless, meaning without sign (sign neutral), is called either signless in languages such as in Pascal or [un]signed.

Normally, however, there is typically no third case because if it is not signed, then it is unsigned. In regard to this book, however, I needed this third case to represent that a value could be either signed or unsigned, thus [un]signed is utilized.

Notes, Tips, and Hints

A Hint is an indicator of something to watch for and take notice. It is typically something that can be helpful in the development of code, such as the following:

The No 64-bit icon indicates that a particular instruction is not supported in 64-bit mode. It looks like the following:

The No App (Sys Only) icon indicates that a particular instruction is only accessible from Real Mode or Privilege Level 0.

Pseudo Vec

A pseudo vector declaration will have an associated C language macro or functions to emulate the functionality of a SIMD operator.

The source code contains various sets of code for the different processors. Almost none of the functions actually return a value; instead the results of the equation are assigned to a pointer specified by the first (and sometimes second) function parameters. They will always be arranged in an order similar to the following:

void vmp_functionPlatform(argDest, argSourceA, argSourceB)

For the generic C code, the term "Generic" will be embedded as in the following:

void vmp_FAddGeneric(float *pfD,                   // Dest
                     float fA, float fB)           // SrcA

void vmp_QVecAddGeneric(vmp3DQVector * const pvD,  // Dest
                  const vmp3DQVector * const pvA,  // SrcA
                  const vmp3DQVector * const pvB)  // SrcB

The important item to note is the actual parameter, {pvD, pvA, pvB} in the above example. The "F" in FAdd represents a scalar single-precision float.

Table 1-1. Float data type declaration

"F" in vmp_FAdd	Scalar single-precision float
"Vec" in vmp_VecAdd	Single-precision (3) float vector
"QVec" in vmp_QVecAdd	Single-precision (4) float vector
"DVec" in vmp_DVecAdd	Double-precision (4) float vector

Table 1-2. Data size of an element

"v" as in pvD	128-bit vector typically (SP float)
"b" as in pbD	8-bit [Un]signed packed
"h" as in phD	16-bit [Un]signed packed
"w" as in pwD	32-bit [Un]signed packed
"d" as in pdD	64-bit [Un]signed packed

Really complicated, don't you think? Of course when you selectively cut and paste some of this library code into your functions and change them just enough to make them your own, you will most probably be using your own conventions and not mine. Whatever works for you!

Pseudo Vec (x86) (3DNow!) (3DNow!+) (MMX) (MMX+) (SSE) (SSE2) (SSE3)

Declarations that are followed by a processor label specify alternate assembly code specifically for use by that processor to emulate an operation.

Graphics 101 (x86) (3DNow!) (MMX)

An algorithm typically found in graphics programming; the 101 is a play on a university beginning class number. This is a graphics algorithm implementation for the specified processor instruction set.

Algebraic Law

This item is self-explanatory. Algebraic laws are used in this book as well as other books related to mathematics. The first time one of the laws is encountered it will be defined similarly to this as a reminder to the rules of algebra learned back in school.

The following are some of the symbols used throughout this book.

I-VU-Q

The I-VU-Q abbreviation represents the answer to an interview question that has been encountered and its solution.

I have thrown interview questions and their answers into this book that have come up from time to time during my interviews over the years. Hopefully they will help you when you have to handle a programming test, as it seems that some of those same questions are continuously passed around from company to company.

If you wish to read my diatribe related to programming tests, either buy my vector book or one of my new books to be published, such as Programming Pyrite — The Fool's Gold of Programming Video Games!

So let us begin! But wait! During the development of this book (or any of my books) I frequent bookstores and examine similar books to see what were good and bad features, what they did to make their books more complete, etc. I was just about to send this book to the publisher when a new book came out. I rushed over to check out a copy. I picked it up, saw the reduced price, leafed through it, smiled, and returned it to the shelf. For 80×86 processors, do not buy a book published prior to 2001 unless you are truly a beginner and are trying to learn basic foundations. Even if the book was just published, look at it carefully and compare it to others. All the post-MMX instructions came out from 2001 to the present. These later instructions are really the interesting ones. Some of those early instructions are not even available on the latest processors, so you would learn something that would not even be valid. I have seen some of these books not only discuss but document all the interrupt function methods including video interrupt 10h, DOS interrupt 21h, printer interrupts, or some of the other interrupts long since retired with the release of Win95. (That was around 1995, was it not?) Those were DOS commands (pre-32-bit Windows!). (Okay, I admit I did a tad, but only enough to explain the functionality of the INT instruction.) Organizations of CD or disk drive sectors was important back then as well but not now unless you are writing your own kernel drivers for the manufacturer. The point is, you will not find filler such as that in this book.

At the time of this writing this book was the most complete book in terms of instruction sets that I could find. Besides, books with large page counts are hard to just kick back with feet on the desk and read. Your hands get cramps just trying to hold the book in a comfortable position.

This book does not come with a CD, but the source code used in this book is downloadable from both www.wordware.com/files/8086 and www.leiterman.com/books.html.

Now we are ready to begin!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 1. Introduction

Create new playlist

Sign In

Sign Up

Chapter 1. Introduction

Tip

Conventions Used in This Book

Companion Code

Image Patterning

Processor Legend

Note

Note

Notes, Tips, and Hints

Pseudo Vec

Pseudo Vec (x86) (3DNow!) (3DNow!+) (MMX) (MMX+) (SSE) (SSE2) (SSE3)

Graphics 101 (x86) (3DNow!) (MMX)

Algebraic Law

I-VU-Q

Table of Contents for
1. Introduction