Understanding the memory layout of composite data types

Let's first take a look at a simple example for a composite type for tracking the coordinates of a point:

struct Point
x
y
end

When the field type is not specified, it is implicitly interpreted as Any, the super type of all types, hence the preceding code is syntactically equivalent to the following (except that we have renamed the type name as Point2 to avoid confusion):

struct Point2
x::Any
y::Any
end

The fields x and y have the Any type, meaning that they can be anything: Int64, Float64, or any other data type. To compare the memory layout and utilization, it is worth creating a new point type that uses a small concrete type, such as UInt8:

struct Point3
x::UInt8
y::UInt8
end

As we know, UInt8 should occupy a single byte of storage. Having both x and y fields should consume only two bytes of storage. Perhaps we should just prove it to ourselves. Check the following code:

Clearly, a single Point3 object only occupies two bytes. Let's do the same with the original Point object:

The Point object takes 16 bytes, even though we want to store just two bytes. As we know, the Point object can take any data type in the x and y fields. Now, let's do the same exercise with a larger data type, such as Int128:

An Int128 is a 128-bit integer, which occupies 16 bytes in memory. Interestingly, even though we are carrying two Int128 fields in Point, the size of the object remains as 16 bytes.

Why? It is because Point actually stores two 64-bit pointers, each occupying eight bytes of storage. We can visualize the memory of a Point object as follows:

When the field types are concrete, the Julia compiler knows exactly what the memory layout looks like. With two UInt8 fields, it is compactly represented with two bytes. With two Int128 fields, it will occupy 32 bytes. Let's try that in REPL:

The memory layout of Point4 is compact, as shown in the following diagram:

Now that we know the difference in memory layout, we can immediately see the benefits of using concrete types. Every time we need to access the x or y field, if it is a concrete type, then the data is right there. If the fields are just pointers, then we have to dereference the pointer to find the data. Furthermore, the physical memory locations of x and y may not even be adjacent to each other, which may cause hardware cache misses, further hurting performance.

So, do we just follow the rule of using concrete types directly in the field definitions? Not necessarily. There are other options that we can consider, which we will do in the following sections.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset