Real mode

Segment registers are a rather interesting topic, as they are the ones that tell the processor which memory areas may be accessed and how exactly they may be accessed. In real mode, segment registers used to contain a 16-bit segment address. The difference between a normal address and segment address is that the latter is shifted 4 bits to the right when stored in the segment register. For example, if a certain segment register was loaded with the 0x1234 value, it, in fact, was pointing to the address 0x12340; therefore, pointers in real mode were rather offsets into segments pointed to by segment registers. As an example, let's take the DI register (as we are talking about a 16-bit real mode now), which is used with the DS (data segment) register automatically, and load it with, let's say, 0x4321 when the DS register is loaded with the 0x1234 value. Then the 20-bit address would be 0x12340 + 0x4321 = 0x16661. Thus, it was possible to address at most 1 MB of memory in real mode.

There are in total six segment registers:

  • CS: This register contains the base address of the currently used code segment.
  • DS: This register contains the base address of the currently used data segment.
  • SS: This register contains the base address of the currently used stack segment.
  • ES: This is the extra data segment for the programmer's use.
  • FS and GS: These were introduced with the Intel 80386 processor. These two segment registers have no specific hardware-defined function and are for the programmer's use. It is important to know that they do have specific tasks in Windows and Linux, but those tasks are operating system dependent only and have no connection to hardware specifications.

The CS register is used together with the IP register (the instructions pointer, also known as the program counter on other platforms), where the IP (or EIP in protected mode and RIP in long mode) points to the offset of the instruction in the code segment following the instruction currently being executed.

DS and ES are implied when using SI and DI registers, respectively, unless another segment register is implicitly specified in the instruction. For example, the lodsb instruction, although, it is written with no operands, loads a byte from the address specified by DS:SI into the AL register and the stosb instruction (which has no visible operands either) stores a byte from the AL register at the address specified by ES:DI. Using SI/DI registers with other segments would require explicitly mentioning those segments with the relevant segment register. Consider the following code, for example:

mov ax, [si] 
mov [es:di], ax

The preceding code loads a double word from the location pointed by DS:SI and stores it to another location pointed by ES:DI.

The interesting thing about segment registers and segments at all is that they may peacefully overlap. Consider a situation where you want to copy a portion of code to either another place in the code segment or into a temporary buffer (for example, for decryptor). In such a case, both CS and DS registers may either point to the same location or the DS register may point somewhere into the code segment.

