This chapter reminds me of the old Batman and Robin show where in a fight scene we see sound effect words flash on screen such as "OOF," "KABONG," "ZING," "MASM," "ZOT," "TASM," "ZANG," "WASM," and "POW." These, by an amazing coincidence, are the war cry of the assembly language programmer. Wars have started for far less than trying to get one of these programmers to use a different C/C++ compiler or assembler. At one company I spent almost two years writing a good portion of the application core libraries and all the documentation for their SDK that allowed internal and external programmers to write online computer games using dedicated game servers. For the last year and a half there I worked on their Win32 Network API SDK (not to be confused with Microsoft's GameSDK). For the first four months there I wrote the DOS SDK, which uses 16-bit and various forms of 32-bit Extended DOS. It was a mix and match of C/C++ compilers, assemblers, linkers, and DOS extenders. It seemed every company had their own flavor. They had gotten used to their favorite combination and nothing was better! So every now and then I had to create libraries for that new flavor to entice new clients. Some of the code I currently write also uses the High C/C++ compiler with MASM or Pharlap's 386ASM. I do not use it these days, but there is also the Watcom C/C++ with their WASM Assembler. Occasionally on software I write today, I get inquiries if my libraries are compatible with the Borland TASM Assembler. I have used all of these and a few others, and to date my favorite is MASM by Microsoft.
There is a form of assembly that we should not forget: in-line assembly. Some people swear by it. I, on the other hand, swear at it! I rarely use it and only for some specific type of data conversion that I need to be fast without the penalty of a stack call to call a pure assembly function. It is akin to programming with one arm tied behind one's back. A lot of macro assembler functions are not available.
I have read book reviews in which advocates of non-MASM assemblers indicate a book could have been a lot better if the author had used TASM instead of MASM. Again, a personal bias! Although I have a few apprehensions about MASM, I have a personal bias for it. In writing this book I have tried to appease the critics by keeping the examples as generic as possible, and if this was not good enough for you, " RASPBERRIES! " MASM is only available separately by download but it's built into the Visual C++ 6 and VC .NET compilers.
You should always use the latest and greatest version of your favorite assembler because if you do not, your version could have bugs (I find them all the time) or be too old to support some of the newer instructions. Back when MMX first came out I had to use the IAMMX.INC by Intel with the MASM as a workaround just to support MMX instructions. Since then it has been built into MASM. Now for SSE3 support you need to either get the latest VC .NET or download the ia_pni.inc file to get assembly instruction macro emulation. With one other company's assembler I had to hand-code the opcodes to make sure I had the appropriate JMP instruction. There was a bug and the jump instruction that I had coded in assembly code was not the jump instruction being encoded into machine code. A bug was being introduced into compiled code because of a bug in the assembler itself!
With the latest instruction sets there seem to be two assemblers at the forefront with recently introduced assembly instructions: MASM and NASM. No matter whose assembler you're using, I use the following as placeholders for the arguments being passed into the example code used in this book:
arg1 equ 8 ; Argument #1 arg2 equ (arg1+4) ; Argument #2 arg3 equ (arg2+4) ; Argument #3 arg4 equ (arg3+4) ; Argument #4 arg5 equ (arg4+4) ; Argument #5 arg6 equ (arg5+4) ; Argument #6 arg7 equ (arg6+4) ; Argument #7 arg8 equ (arg7+4) ; Argument #8 ; void unzip(byte *pRaw, byte *pZip, uint nWidth); public unzip unzip proc near push ebp mov ebp,esp push ebx push esi
push edi mov esi,[ebp+arg1] ; pRaw mov edi,[ebp+arg2] ; pZip mov ecx,[ebp+arg3] ; nWidth ; ; ; pop edi pop esi pop ebx pop ebp ret unzip endp
You will note that I used arg1 instead of 8 as shown below:
mov eax,[ebp+8]
As an alternative to the arg1 you could use a define to make the argument name make more sense.
pRaw = arg1 mov eax,[ebp+pRaw]
The following information is a brief overview and you should refer to your assembler's documentation for specific information.
When using Macro Assembler by Microsoft you should use the latest and greatest version because of the extended instruction sets. However, to use those new instructions you need to turn on functionality.
When using this assembler, the first thing you need to do is activate the appropriate CPU target by using one of the following assembler directives depending on what processor will be executing that section of code. There are several of these directives such as .386, .486, .586, etc. If the target is for an embedded 486, then obviously the .586 directive would not be used, as instructions would be allowed that the 486 would not understand. When you write your code for a single processor you can merely set the appropriate directive(s) at the top of the file, but quite often a single file will contain sets of code unique to individual processors.
.686 — This allows Model 6 type x86 code to be assembled. The next line actually allows MMX instructions to be assembled. You can pretty much have this directive as most processors being released these days support MMX. You do need to make sure that the code is only going to be executed by one of those processors, however.
The directives not only target processors but certain instruction sets, and so care must be used when setting the appropriate directives.
.MMX — An alternate method is to set the supported instruction sets such as this directive for enabling MMX code.
.K3D — This is the directive for the 3DNow! instruction set. As you do not want an Intel processor trying to execute one of these instructions, only insert this above 3DNow! instruction code. These are also order dependent and so this must occur after the .MMX directive.
.XMM — Use this if you are using any SSE-based instructions requiring XMM registers.
There are other legacy declarations such as .387, .286, .386, .386P, .486, .486P, etc. The suffix "P" indicates an enabling of privileged instructions.
For more information, see http://msdn2.microsoft.com/library/afzk3475(en-us,vs.80).aspx
.
Here is a sample file that you should be able to drop into a Win32 application in conjunction with the Visual C++ compiler:
TITLE zipX86M.asm – My x86 (MASM) Assembly PAGE 53,160 ; This module is designed to Blah Blah Blah! ; ; Created - 20 May 98 - J.Leiterman ; Tabs = 8 .686 .MMX .K3D .model flat, C .data ALIGN 4 foo dd 0 ; Data value zipX86M SEGMENT USE32 PUBLIC 'CODE' ; ; void unzip(byte *pRaw, byte *pZip, uint nWidth); ;
align 16 unzip PROC C PUBLIC USES ebx esi edi pRaw:PTR, pZip:PTR, nWidth:DWORD mov esi,pRaw mov edi,pZip mov ecx,nWidth ; ; ; ret unzip endp zipX86M ends end
The function is declared PUBLIC, meaning it's global in definition and can therefore be accessed by functions in other files.
unzip PROC C PUBLIC USES ebx esi edi pRaw:PTR, pZip:PTR, nWidth:DWORD
For convenience, you can specify the registers to push onto the stack and in what order. The RET instruction is actually a macro when used within this PROC, and therefore the registers are popped automatically in a reverse order wherever a RET instruction is encountered. The coup de grâce? No more pesky code like:
mov esi,[ebp+arg1] ; pRaw
Instead, you just use:
mov esi,pRaw
The assembler expands the PROC macro and takes care of everything for you, making your code a little more readable.
You will notice that I used the default data segment (.data) as this is a flat memory model, but I declared a 32-bit Protected Mode code segment. The reasoning is that I tend to group my assembly files using an object-oriented approach and as such all my decompression functions/ procedures would reside within this segment. Other assembly code related to other functionality would be contained in a different file with a different segment name. They can occur with the same segment name but they wouldn't appear very organized, especially in the application address/data map.
zipX86M SEGMENT USE32 PUBLIC 'CODE' : : zipX86M ends
Since segments are being mentioned I am going to give you a snapshot of segments back in the days of DOS and DOS extenders. Code and data was differentiated by 16-bit code/data versus 32-bit code/data addressing. The following is a snippet of code from those days.
; Segment Ordering Sequence INN_CODE32 segment para USE32 'CODE' INN_CODE32 ends _TEXT segment _TEXT ends INN_DATA32 segment para USE32 'DATA' INN_DATA32 ends DGROUP GROUP INN_DATA32 CGROUP GROUP _TEXT CGROUP GROUP INN_CODE32
We also must not forget the (end) signal to the assembler that it has reached the end of the file:
End
I personally think this is just a carryover from the good old days of much simpler assemblers. With the advent of macros such as the following, you can turn on or off various sections of code and not just the bottom portion of your file:
if 0 else endif
Visual C++ has never really had a peaceful coexistence with its own MASM Assembler. In the early days of around version 3.x you had to assemble your files using batch files or external make files and only link the *.obj files into your project files. Microsoft has fortunately made this a little simpler, but in my opinion it still seems shortsighted. My assumption is that they would prefer you to use either inline assembly or none at all. (But I've been known to be wrong before!)
The first thing you need to do is add the MASM hooks into your version 5.0 or above Visual C++ environment. Select the Tools|Options menu item, and then select the Directories tab. Set the following to the path of your MASM installation:
Executable Files: c:masmin c:masminr Include Files: c:masminclude
With your project loaded in your FileView tab, just right-click on the project files folder, and select the pop-up menu item Add Files to Project. The Insert Files into Project dialog box will be displayed. That dialog seems to support almost every file type known except for assembly! What you need to do is select the All Files (*.*) option, select the assembly file you desire, and then press the OK button.
Now that the file occurs in your list of files in your project, right-click on that file and select the Settings item from the pop-up menu. In the Commands edit box insert the following:
ml @MyGame.amk ..utilunzipx86.asm
This will execute the assembler using the option switches defined in the MyGame.amk file. In the Outputs edit box insert the following:
unzipx86.obj
Then press the OK button.
To make my life simpler I use a file, such as the following, that I refer to as my assembly make file. I clone it from project to project, as you'll never know when you'll need to tweak it.
File: MYGAME.AMK /L../util /c /coff /Cp /Fl /Fm /FR /Sg /Zd /Zi
For those of you who would prefer to use in-line assembly or just plain don't have an assembler, you can do the same thing with the following from within your C/C++ code.
void unzip(byte *pRaw, byte *pZip, uint nWidth) { __asm { mov esi,pRaw mov edi,pZip mov ecx,nWidth }; }
You should be very careful if you mix C/C++ and in-line assembly code unless you push the registers to save them. Setting a breakpoint at the beginning of your function and then examining the source code during run time can help point out any register use conflicts.
MASM is my favorite macro assembler as it has an excellent macro expansion ability. Not only can new instructions be incorporated by use of macros but the predefined macro expansion can be taken to advantage as they are C like. In some cases, I find it better than C. In fact, in-line assembly sucks! (Another technical term!) (Note: I only said in some cases!) The following are some of the highlights. For details, read the technical manuals. For example, the MASM toolset has the following manuals:
Environment and Tools
Programmers Guide
Reference
In the following charts, notice the C method on the left and the MASM method on the right.
Defines are pretty similar; however, enums do not exist and so must be emulated with a standard equate.
#define FOO 3 FOO = 3 typedef enum { CPUVEN_UNKNOWN = 0, CPUVEN_UNKNOWN = 0 } CPUVEN;
MASM can contain a structure definition just like C:
typedef struct CpuInfoType { CpuInfo struct 4 uint nCpuId; // CPU Id nCpuId dd 0 ; CPU Id uint nFpuId; // FPU Id nFpuId dd 0 ; FPU Id uint nBits; // Feature nBits dd 0 ; Feature uint nMfg; // Mfg nMfg dd 0 ; Mfg. uint16 SerialNum[6]; SerialNum dw 0,0,0,0,0,0 uint nSpeed; // Speed nSpeed dd 0 ; Speed } CpuInfo; CpuInfo ends
In C there is no looping macro expansion; there is only one-shot (a definition gets expanded). However, some special macro functionality is available when using a MACRO assembler.
MASM supports the repeat declaration when used in conjunction with a local temporary argument.
i = 0 REPEAT 5 mov [i + ebx],eax i = i + 4 ENDM
The i is temporary and expands the code. For this example, the REPEAT macro is replicated five times and adds 4 (the size of the write) onto every iteration. So the code is unrolled:
mov [0 + ebx],eax mov [4 + ebx],eax mov [8 + ebx],eax mov [12 + ebx],eax mov [16 + ebx],eax
MASM also supports a while loop.
i = 0 WHILE i LE 20 ; < mov [i + ebx],eax i = i + 4 ENDM
This is essentially similar code. The example was a simple loop, but while loops are typically used in loops of more complexity.
MASM also supports a for loop.
FOR arg, <1,3,5,7,9,11,13,17,19,23> out dx,arg ENDM
As mentioned, these are examples of MASM related code. Those assemble-time loops are something not available to a C compiler. Other items are available including access to data/code segment specification and all assembly instructions, while inline assembly has only a limited set of instructions available. The macro assembler allows code/data intermixed, while a C compiler does not. The IF-ELSE-ENDIF conditionals are also available, along with other features available in a standard C compiler.
The more recent Visual C++ and Intel compilers support a method of programming in assembly language referred to as intrinsics. This is where the functionality of SIMD instructions has been wrapped within C wrappers and compiled into code as inline code. Let us examine the following example:
void test(float *c, float a, float b) { *c = a + b; }
Not to oversimplify the power of using intrinsics to get code up and running quickly, the following code uses intrinisics in conjunction with (__m128) XMM registers with SSE single-precision floating-point instructions. Note that it looks more complicated, but I chose a simple scalar expression to resolve.
#include <xmmintrin.h> void test(float *c, float a, float b) { __m128 ta, tb; ta = _mm_load_ps(&a); tb = _mm_load_ps(&b);
ta = _mm_add_ps(ta, tb); _mm_store_ps(c, ta); }
But underneath in the pure assembly code generated by the compiler this breaks down to something similar to the following:
push ebx mov ebx,esp sub esp,8 and esp,0FFFFFFF0h ; 16-byte align stack add esp,4 push ebp mov ebp,dword ptr [ebx+4] mov dword ptr [esp+4],ebp mov ebp,esp sub esp,98h push esi push edi ; __m128 ta, tb ; ta = _mm_load_ps(&a); lea eax,[ebx+0Ch] movaps xmm0,xmmword ptr [eax] movaps xmmword ptr [ebp-30h],xmm0 movaps xmm0,xmmword ptr [ebp-30h] movaps xmmword ptr [ebp-10h],xmm0 ; tb = _mm_load_ps(&b); lea eax,[ebx+10h] movaps xmm0,xmmword ptr [eax] movaps xmmword ptr [ebp-40h],xmm0 movaps xmm0,xmmword ptr [ebp-40h] movaps xmmword ptr [ebp-20h],xmm0 ; ta = _mm_add_ps(ta, tb); movaps xmm0,xmmword ptr [ebp-20h] movaps xmm1,xmmword ptr [ebp-10h] addps xmm1,xmm0 movaps xmmword ptr [ebp-50h],xmm1 movaps xmm0,xmmword ptr [ebp-50h] movaps xmmword ptr [ebp-10h],xmm0 ; _mm_store_ps(c, ta); movaps xmm0,xmmword ptr [ebp-10h] mov eax,dword ptr [ebx+8] movaps xmmword ptr [eax],xmm0 pop edi pop esi mov esp,ebp pop ebp mov esp,ebx pop ebx ret