For the sake of efficiency, most information stored in the disk data structures of an Ext2 partition are copied into RAM when the filesystem is mounted, thus allowing the kernel to avoid many subsequent disk read operations. To get an idea of how often some data structures change, consider some fundamental operations:
When a new file is created, the values of the
s_free_inodes_count
field in the Ext2 superblock
and of the bg_free_inodes_count
field in the
proper group descriptor must be decremented.
If the kernel appends some data to an existing file so that the
number of data blocks allocated for it increases, the values of the
s_free_blocks_count
field in the Ext2 superblock
and of the bg_free_blocks_count
field in the group
descriptor must be modified.
Even just rewriting a portion of an existing file involves an update
of the s_wtime
field of the Ext2 superblock.
Since all Ext2 disk data structures are stored in blocks of the Ext2 partition, the kernel uses the buffer cache and the page cache to keep them up to date (see Section 14.2.4).
Table 17-6 specifies, for each type of data related to Ext2 filesystems and files, the data structure used on the disk to represent its data, the data structure used by the kernel in memory, and a rule of thumb used to determine how much caching is used. Data that is updated very frequently is always cached; that is, the data is permanently stored in memory and included in the buffer cache or in the page cache until the corresponding Ext2 partition is unmounted. The kernel gets this result by keeping the buffer’s usage counter greater than 0 at all times.
Table 17-6. VFS images of Ext2 data structures
Type |
Disk data structure |
Memory data structure |
Caching mode |
---|---|---|---|
Superblock |
|
|
Always cached |
Group descriptor |
|
|
Always cached |
Block bitmap |
Bit array in block |
Bit array in buffer |
Fixed limit |
Inode bitmap |
Bit array in block |
Bit array in buffer |
Fixed limit |
Inode |
|
|
Dynamic |
Data block |
Unspecified |
Buffer page |
Dynamic |
Free inode |
|
None |
Never |
Free block |
Unspecified |
None |
Never |
The never-cached data is not kept in any cache since it does not represent meaningful information.
In between these extremes lie two other modes:
fixed-limit
and
dynamic
. In the fixed-limit mode, a specific
number of data structures can be kept in the buffer cache; older ones
are flushed to disk when the number is exceeded. In the dynamic mode,
the data is kept in a cache as long as the associated object (an
inode or data block) is in use; when the file is closed or the data
block is deleted, the shrink_caches( )
function may
remove the associated data from the cache.
When an Ext2 filesystem is mounted, the u
field of
the
VFS superblock, which contains
filesystem-specific data, is loaded with a structure of type
ext2_sb_info
so that the kernel can find out
things related to the filesystem as a whole. This structure includes
the following information:
Most of the disk superblock fields
The block bitmap cache, tracked by the
s_block_bitmap
and
s_block_bitmap_number
arrays (see the next
section)
The inode bitmap cache, tracked by the
s_inode_bitmap
and
s_inode_bitmap_number
arrays (see the next
section)
An s_sbh
pointer to the buffer head of the buffer
containing the disk superblock
An s_es
pointer to the buffer containing the disk
superblock
The number of group descriptors, s_desc_ per_block
, that can be packed in a block
An s_group_desc
pointer to an array of buffer
heads of buffers containing the group descriptors (usually, a single
entry is sufficient)
Other data related to mount state, mount options, and so on
Similarly, when an inode object pertaining to an Ext2 file is
initialized, the u
field is loaded with a
structure of type ext2_inode_info
, which includes
this information:
Most of the fields found in the disk’s inode structure that are not kept in the generic VFS inode object (see Table 12-3 in Chapter 12)
The fragment size and the fragment number (not yet used)
The block_group
block group index at which the
inode belongs (see Section 17.2 earlier in this chapter)
The i_prealloc_block
and
i_prealloc_count
fields, which are used for data
block preallocation (see Section 17.6.5 later in this chapter)
The i_osync
field, which is a flag specifying
whether the disk inode should be synchronously updated
When the kernel mounts an Ext2 filesystem, it allocates a buffer for the Ext2 disk superblock and reads its contents from disk. The buffer is released only when the Ext2 filesystem is unmounted. When the kernel must modify a field in the Ext2 superblock, it simply writes the new value in the proper position of the corresponding buffer and then marks the buffer as dirty.
Unfortunately, this approach cannot be adopted for all Ext2 disk data structures. The tenfold increase in disk capacity reached in recent years has induced a tenfold increase in the size of inode and data block bitmaps, so we have reached the point at which it is no longer convenient to keep all the bitmaps in RAM at the same time.
For instance, consider a 4-GB disk with a 1-KB block size. Since each bitmap fills all the bits of a single block, each of them describes the status of 8,192 blocks — that is, of 8 MB of disk storage. The number of block groups is 4,096 MB/8 MB=512. Since each block group requires both an inode bitmap and a data block bitmap, 1 MB of RAM would be required to store all 1,024 bitmaps in memory.
The solution adopted to limit the memory requirements of the Ext2
descriptors is to use, for any mounted Ext2 filesystem, two caches of
size EXT2_MAX_GROUP_LOADED
(usually 8). One cache
stores the most recently accessed inode bitmaps, while the other
cache stores the most recently accessed block bitmaps. Buffers that
contain bitmaps included in a cache have a usage counter greater than
0, therefore they are never freed by shrink_mmap( )
(see Section 16.7).
Conversely, buffers that contain bitmaps not included in a bitmap
cache have a null usage counter, so they can be freed if free memory
becomes scarce.
Each cache is implemented by means of two arrays of
EXT2_MAX_GROUP_LOADED
elements. One array contains
the indexes of the block groups whose bitmaps are currently in the
cache, while the other array contains pointers to the buffer heads
that refer to those bitmaps.
The ext2_sb_info
structure stores the arrays
pertaining to the inode bitmap cache; indexes of block groups are
found in the s_inode_bitmap
field and pointers to
buffer heads are found in the
s_inode_bitmap_number
field. The corresponding
arrays for the block bitmap cache are stored in the
s_block_bitmap
and
s_block_bitmapnumber
fields.
The load_inode_bitmap( )
function loads the inode
bitmap of a specified block group and returns the cache position in
which the bitmap can be found.
If the bitmap is not already in the bitmap cache,
load_inode_bitmap( )
invokes
read_inode_bitmap( )
. The latter function gets the
number of the block containing the bitmap from the
bg_inode_bitmap
field of the group descriptor, and
then invokes bread( )
to allocate a new buffer and
read the block from disk if it is not already included in the buffer
cache.
If the number of block groups in the Ext2 partition is less than or
equal to EXT2_MAX_GROUP_LOADED
, the index of the
cache array position in which the bitmap is inserted always matches
the block group index passed as the parameter to the
load_inode_bitmap( )
function.
Otherwise, if there are more block groups than cache positions, a bitmap is removed from the cache, if necessary, by using a Least Recently Used (LRU) policy, and the requested bitmap is inserted in the first cache position. Figure 17-3 illustrates the three possible cases in which the bitmap in block group 5 is referenced: where the requested bitmap is already in cache, where the bitmap is not in cache but there is a free position, and where the bitmap is not in cache and there is no free position.
The load_block_bitmap( )
and
read_block_bitmap( )
functions are very similar to
load_inode_bitmap( )
and
read_inode_bitmap( )
, but they refer to the block
bitmap cache of an Ext2 partition.
Figure 17-4 illustrates the memory data structures
of a mounted Ext2 filesystem. In our example, there are three block
groups whose descriptors are stored in three blocks on disk;
therefore, the s_group_desc
field of the
ext2_sb_info
points to an array of three buffer
heads. We have shown just one inode bitmap having index 2 and one
block bitmap having index 4, although the kernel may keep 2
× EXT2_MAX_GROUP_LOADED
bitmaps in the bitmap caches, and even more may be stored in the
buffer cache.