Converting an ASCII string to binary-coded decimal is as easy as pie (or is it a piece of cake?). In BCD, for every byte, the lower 4-bit nibble and upper 4-bit nibble each store a value from 0 to 9 (think double-digit hex only the upper six values A through F are ignored).
Workbench Files:Benchx86chap15projectplatform
project | platform | |
ASE to VMP | ase2vmp | vc6 |
BCD 2N | cd | vc.net |
Table 15-1. ASCII numerical digit to hex and decimal values
ASCII | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Hex | 0x30 | 0x31 | 0x32 | 0x33 | 0x34 | 0x35 | 0x36 | 0x37 | 0x38 | 0x39 |
Decimal | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 |
BCD | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Binary | 0000 | 0001 | 0010 | 0011 | 0100 | 0101 | 0110 | 0111 | 1000 | 1001 |
Converting a BCD value from ASCII to a nibble is as easy as subtracting the hex value of 0x30, '0', or 48 decimal from the ASCII numerical value and get the resulting value with a range of {0...9}.
byte ASCIItoBCD(char c) { ASSERT(('0' <= c) && (c <= '9')); return (byte)(c - '0'), }
When the 8086 processor was first manufactured the FPU was a separate optional chip (8087). There was a need for some BCD operations similar to other processors and so it was incorporated into the CPU. The 8087 had some BCD support as well. When the 64-bit processor was developed, it was decided that BCD support was not required anymore as the FPU was an alternative method.
The FPU uses the first nine bytes to support 18 BCD digits. The uppermost bit of the 10th byte indicates the value is negative if set or positive if the bit is clear.
Figure 15-1. Ten-byte BCD data storage. MSB in far left byte (byte #9) is the sign bit and the rightmost eight bytes (#8...0) contain the BCD value pairs. The 18th BCD digit resides in the upper nibble of byte #8 and the 1st BCD digit resides in the lower nibble of byte #0.
Setting the upper nibble of a byte is merely the shifting left of a BCD digit by four bits, then logical ORing (or suming) the lower nibble.
byte BCDtoByte(byte lo, byte hi) { return (hi << 4) | lo; }
daa | Signed |
The DAA general-purpose instruction adjusts the EFLAGS for a decimal carry after an addition.
Flags | O.flow | Sign | Zero | Aux | Parity | Carry |
---|---|---|---|---|---|---|
- | - | - | X | - | X |
Flags: The Aux and Carry flags are set to 1 if an addition resulted in a decimal carry in their associated 4-bit nibble; otherwise they are cleared to 0.
xor eax,eax ; Reset Carry(s) $L1: mov al,[edi] ; D = D + A adc al,[esi] daa mov [edi],al ; Store result dec esi dec edi dec ecx jne $L1 ; Loop for n BCD bytes
Note that this function steps through memory in reverse byte order, which is not processor efficient. High digits are in low offset bytes, and low digits are in high offset bytes: {N...0}. So the operation must go to the end of the buffer and traverse memory backward from low-digit pairs to high-digit pairs. If not working with the FPU to handle BCD, then each nibble pair could be stored in reverse order: {0...N}. Only when they need to be displayed or printed would there be a reverse increment through memory. Note this is backward to the ordering of the FPU! The sample code uses this method.
das | Signed |
The DAS general-purpose instruction adjusts the EFLAGS for a decimal borrow after a subtraction.
Flags | O.flow | Sign | Zero | Aux | Parity | Carry |
---|---|---|---|---|---|---|
- | - | - | X | - | X |
Flags: The Aux and Carry flags are set to 1 if a subtraction resulted in a decimal carry set due to a borrow in their associated 4-bit nibble; otherwise they are cleared to 0.
xor eax,eax ; Reset Carry(s) $L1: mov al,[edi] ; D = D + A sbb al,[esi] das mov [edi],al ; Store result dec esi dec edi dec ecx jne $L1 ; Loop for n BCD bytes
aaa | Signed |
The AAA general-purpose instruction adjusts the EFLAGS for a decimal carry. If a resulting calculation is greater than 9, then AL is set to the remainder between (0...9) and AH is incremented.
Flags | O.flow | Sign | Zero | Aux | Parity | Carry |
---|---|---|---|---|---|---|
- | - | - | X | - | X |
Flags: The Aux and Carry flags are set to 1 if a decimal carry resulted; otherwise they are cleared to 0.
add al,ah aaa or al,'0' ; '0' + {0...9} = ASCII '0...9'
aas | Signed |
The AAS general-purpose instruction adjusts the EFLAGS depending on the results of the AL register after a multiplication operation. If a resulting calculation sets the carry indicating a borrow has occurred, then AL is set to the remainder between (0...9) and AH is decremented.
Flags | O.flow | Sign | Zero | Aux | Parity | Carry |
---|---|---|---|---|---|---|
- | - | - | X | - | X |
Flags: The Aux and Carry flags are set to 1 if a decimal borrow resulted; otherwise they are cleared to 0.
sub al,'7' aas or al,'0' ; '0' + {0...9} = ASCII '0...9'
aam | Signed |
The AAM general-purpose instruction adjusts the EFLAGS depending on the results of the AL register after a multiplication operation.
Flags | O.flow | Sign | Zero | Aux | Parity | Carry |
---|---|---|---|---|---|---|
- | X | X | - | X | - |
Flags: The Sign, Zero, and Parity flags are set to the resulting value in theAL register.
mul al,bh aam
aad | Signed |
The AAD general-purpose instruction adjusts the EFLAGS in preparation for a division operation.
Flags | O.flow | Sign | Zero | Aux | Parity | Carry |
---|---|---|---|---|---|---|
- | X | X | - | X | - |
Flags: The Sign, Zero, and Parity flags are set to the resulting value in the AL register.
and eax,0000111100001111b aad
FPU | fbld | source | BCD | 80 |
How does this all work? Well, the FPU has a single instruction that loads a BCD value and converts it to an 80-bit (10-byte) double extended precision floating-point value that it stores on the FPU stack. This can then be written back to computer memory as double-precision floating-point. Simple, fast, and minimal excess code and nothing time intensive.
Example 15-1. ...chap15ase2vmputil.cpp
unsigned char bcd[10]; double f; __asm { fbld tbyte ptr bcd ; Load (80-bit) BCD fstp f ; Write 64-bit double-precision }
The returned floating-point value contains the BCD number as an integer with no fractional component. For example:
byte bcd[10] = {0x68, 0x23, 0x45, 0x67, 0x89, 0x98, 0x87, 0x76, 0x65, 0x80};
The float returned is –657,687,988,967,452,368.0
At this point the decimal place needs to be adjusted to its correct position using the product of an exponential 10-n. This can be done with either a simple table lookup or a call to the function pow(10,-e), but the table lookup is faster. And speed is what it is all about.
All of you who start a processing tool to convert art resources or game resources into a game database and then leave to have lunch, get a soda, have a snack, go to the bathroom, pick up your kids from school, or go home, all yell, "ME!"
WOW! That was loud! It could be heard reverberating across the planet.
Those of you who have worked on games in the past, did you meet your timelines? Did you find yourself working lots of extra (crunch) time to meet a milestone? (We will ignore E3 and the final milestones!) How often do you have to wait for a tool to complete a data conversion? Add up all that "waiting" time. What did your tally come to?
You don't really know? Here is a thought: Add a wee bit of code to your program and write the results to an accumulative log file. Then check it from time to time to see where some of that time is going.
Some people believe in optimizing the game only if there is time somewhere in the schedule. Management quite often counts the time beans and decides that getting the milestone met is much more important than early ongoing debugging or optimization. But just think of that time savings if your tools are written with optimization. Just do not tell management about it or they will think they can ship the product early.
3D rendering tools are expensive and so programmers typically do not have ready access to a live tool. They sometimes write plug-ins, but quite often they will merely write an ASCII scene exporter (ASE) file parser to import the 3D data into their tools that generate the game databases. With this method, programmers do not have to have a licensed copy of a very expensive tool sitting on their desks.
This little item brings up a trivial item of artist versus programmer wars. It all comes down to who will have the task of running the tools to export and convert data into a form loaded and used by a game application. Neither typically wants the task and both consider it mundane, but it is nevertheless required. Artists need to run the tools occasionally so as to check results of their changes to art resources. Programmers occasionally need to run the tools to test changes to database designs, etc. But nobody wants to do it all the time. So my suggestion is to automate the tools and incorporate the who and what into the game design, technical design, and art bibles for the project. In that way there will be no misperception.
Let's talk about something else but related to assembly.
In this particular case, an ASE file is an ASCII export from 3D Studio MAX. How many of you have actually written a parser and have wondered where all your processing time had gone? Did you use streaming file reads to load a line at a time, or a block read to read the entire file into memory?
I personally write ASE parsers by loading the entire file into memory even when they are 20MB or larger in size. The core ASE parser code included with this book can actually parse an entire 20MB file and convert about 1.15 million floating-point values from ASCII to doubles in a few seconds. But here is where it really gets interesting!
Calling the standard C language function atof() to convert an ASCII floating-point value to single or double-precision will add significant time onto your processing time for those large ASE files.
But I have good news for you. The following function will carve those hours back to something a lot more reasonable. What it does is take advantage of a little-known functionality within the floating-point unit of the 80×86 processor.
As discussed in Chapter 8, the FPU loads and handles the following data types:
(4-byte) single-precision floating-point
(8-byte) double-precision floating-point
(10-byte) double extended-precision floating-point
(10-byte) binary-coded decimal (BCD)
Note that the following code sample expects a normal floating-point number and no exponential. The ASE files do not contain exponential, just really long ASCII floating-point numbers; thus, the reason this code traps for more than 18 digits.
Example 15-2. ...chap15ase2vmputil.cpp
double exptbl[] = // -e { 1.0, 0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001, 0.0000001, 0.00000001, 0.000000001, 0.0000000001, 0.00000000001, 0.000000000001, 0.0000000000001, 0.00000000000001, 0.000000000000001, 0.0000000000000001, 0.00000000000000001, 0.000000000000000001 }; // Limit 18 places double ASCIItoDouble(const char *pStr) { #ifdef CC_VMP_WIN32 unsigned int dig[80], *pd; unsigned char bcd[10+2], *pb; double f; int n, e; const char *p; ASSERT_PTR(pStr); *(((uint32*)bcd)+0) = 0; // Clear (12 bytes) *(((uint32*)bcd)+1) = 0; *(((uint32*)bcd)+2) = 0; // 2 + 2 spare bytes // Collect negative/positive – and delimiters are pre-stripped. p = pStr; if ('-' == *p)
{ *(bcd+9) = 0x80; // Set the negative bit into the BCD p++; } // Collect digits and remember position of decimal point *dig = 0; // Prepend a leading zero e = n = 0; pd = dig+1; while (('0' <= *p) && (*p <= '9')) { *pd++ = (*p++ - '0'), // Collect a digit n++; // The decimal place is checked after the first digit as no // floating-point value should start with a decimal point. // Even values between 0 and 1 should have a leading zero! 0.1 if ('.' == *p) // Decimal place? { // Remember its position e = n; p++; } } // Check for a really BIG (and thus ridiculous) number if (n > 18) // More than 18 digits? { return atof(pStr); } if (e) // 0=1.0 1=0.1 2=0.01 3=0.001, etc. { e = n - e; // Get correct exponent } // repack into BCD (preset lead zeros) // last to first digit n = (n+1)>>1; // Start in middle of BCD buffer pb = bcd; // Calc. 1st BCD character position while(n--) // loop for digit pairs { pd-=2; // Roll back to last 2 digits *pb++ = ((*(pd+0)<<4) | *(pd+1)); // blend two digits }
__asm {
fbld tbyte ptr bcd ; Load (10-byte) BCD
fstp f ; Write 64-bit double-precision
}
return f * exptbl[e]; // FASTER
// return f * pow( 10.0, (double) -e ); // FAST
#else
return atof(p); // Really SLOW
#endif
}
If you do not believe me about the speed, then replace all the atof() functions in your current tool with a macro to assign 0.0 and measure the difference in speed. Or better yet, embed the atof() function within this function and then do a float comparison with the precision slop factor since by now you should be very aware that you never ever compare two floating-point numbers to each other to test for equivalence unless a precision slop factor (accuracy) is utilized.
One should always test optimized code (vector based or not) in conjunction with slow scalar code written in C to ensure that the code is functioning as required.
One more thing: If you insist on using atof() or sscanf(), copy the ASCII number to a scratch buffer before processing it with either of these two functions because processing them within a 20MB file dramatically increases the processing time by hours. Apparently these conversion functions scan the string until they reach the terminator, which in the case of an ASE file can be a few megabytes away instead of a few bytes.