Structures (or structs, for short) are similar to arrays, but they comprise elements of different types. Structures are commonly used by malware authors to group information. It’s sometimes easier to use a structure than to maintain many different variables independently, especially if many functions need access to the same group of variables. (Windows API functions often use structures that must be created and maintained by the calling program.)
In Example 6-26, we define a structure at ❶ made up of an integer array, a character, and a double. In
main
, we allocate memory for the structure and pass the struct to
the test
function. The struct
gms
defined at ❷ is a global variable.
Example 6-26. C code for a struct example
struct my_structure { ❶ int x[5]; char y; double z; }; struct my_structure *gms; ❷ voidtest
(struct my_structure *q) { int i; q->y = 'a'; q->z = 15.6; for(i = 0; i<5; i++){ q->x[i] = i; } } voidmain
() { gms = (struct my_structure *) malloc( sizeof(struct my_structure)); test(gms); }
Structures (like arrays) are accessed with a base address used as a starting pointer. It is difficult to determine whether nearby data types are part of the same struct or whether they just happen to be next to each other. Depending on the structure’s context, your ability to identify a structure can have a significant impact on your ability to analyze malware.
Example 6-27 shows the main
function from Example 6-26, disassembled. Since the
struct gms
is a global variable, its base address will be the
memory location dword_40EA30
as shown in Example 6-27. The base address of this structure is passed
to the sub_401000
(test
)
function via the push eax
at ❶.
Example 6-27. Assembly code for the main
function in the struct example
in Example 6-26
00401050 push ebp 00401051 mov ebp, esp 00401053 push 20h 00401055 call malloc 0040105A add esp, 4 0040105D movdword_40EA30
, eax 00401062 mov eax,dword_40EA30
00401067 push eax ❶ 00401068 callsub_401000
0040106D add esp, 4 00401070 xor eax, eax 00401072 pop ebp 00401073 retn
Example 6-28 shows the disassembly of the
test
method shown in Example 6-26. arg_0
is the base address of the structure. Offset 0x14
stores the character within the struct, and 0x61 corresponds to the letter a in
ASCII.
Example 6-28. Assembly code for the test
function in the struct example
in Example 6-26
00401000 push ebp 00401001 mov ebp, esp 00401003 push ecx 00401004 mov eax,[ebp+arg_0] 00401007 mov byte ptr [eax+14h], 61h 0040100B mov ecx, [ebp+arg_0] 0040100E fld ds:dbl_40B120 ❶ 00401014 fstp qword ptr [ecx+18h] 00401017 mov [ebp+var_4], 0 0040101E jmp short loc_401029 00401020 loc_401020: 00401020 mov edx,[ebp+var_4] 00401023 add edx, 1 00401026 mov [ebp+var_4], edx 00401029 loc_401029: 00401029 cmp [ebp+var_4], 5 0040102D jge short loc_40103D 0040102F mov eax,[ebp+var_4] 00401032 mov ecx,[ebp+arg_0] 00401035 mov edx,[ebp+var_4] 00401038 mov [ecx+eax*4],edx ❷ 0040103B jmp short loc_401020 0040103D loc_40103D: 0040103D mov esp, ebp 0040103F pop ebp 00401040 retn
We can tell that offset 0x18 is a double because it is used as part of a floating-point
instruction at ❶. We can also tell that integers are
moved into offset 0, 4, 8, 0xC, and 0x10 by examining the for
loop and where these offsets are accessed at ❷. We can
infer the contents of the structure from this analysis.
In IDA Pro, you can create structures and assign them to memory references using the T hotkey.
Doing this will change the instruction mov [eax+14h], 61h
to
mov [eax + my_structure.y], 61h
. The latter is easier to read,
and marking structures can often help you understand the disassembly more quickly, especially if you
are constantly viewing the structure used. To use the T hotkey effectively in this example, you
would need to create the my_structure
structure manually using
IDA Pro’s structure window. This can be a tedious process, but it can be helpful for
structures that you encounter frequently.