The first block in any Ext2 partition is never managed by the Ext2 filesystem, since it is reserved for the partition boot sector (see Appendix A). The rest of the Ext2 partition is split into block groups , each of which has the layout shown in Figure 17-1. As you will notice from the figure, some data structures must fit in exactly one block, while others may require more than one block. All the block groups in the filesystem have the same size and are stored sequentially, thus the kernel can derive the location of a block group in a disk simply from its integer index.
Block groups reduce file fragmentation, since the kernel tries to keep the data blocks belonging to a file in the same block group, if possible. Each block in a block group contains one of the following pieces of information:
A copy of the filesystem’s superblock
A copy of the group of block group descriptors
A data block bitmap
A group of inodes
An inode bitmap
A chunk of data that belongs to a file; i.e., a data block
If a block does not contain any meaningful information, it is said to be free.
As can be seen from Figure 17-1, both the
superblock
and the group descriptors are duplicated
in each block group. Only the superblock and the group descriptors
included in block group 0 are used by the kernel, while the remaining
superblocks and group descriptors are left unchanged; in fact, the
kernel doesn’t even look at them. When the
e2fsck
program executes a consistency check on
the filesystem status, it refers to the superblock and the group
descriptors stored in block group 0, and then copies them into all
other block groups. If data corruption occurs and the main superblock
or the main group descriptors in block group 0 becomes invalid, the
system administrator can instruct e2fsck
to
refer to the old copies of the superblock and the group descriptors
stored in a block groups other than the first. Usually, the redundant
copies store enough information to allow e2fsck
to bring the Ext2 partition back to a consistent state.
How many block groups are there? Well, that depends both on the partition size and the block size. The main constraint is that the block bitmap, which is used to identify the blocks that are used and free inside a group, must be stored in a single block. Therefore, in each block group, there can be at most 8×b blocks, where b is the block size in bytes. Thus, the total number of block groups is roughly s/(8×b), where s is the partition size in blocks.
For example, let’s consider an 8 GB Ext2 partition with a 4-KB block size. In this case, each 4-KB block bitmap describes 32K data blocks — that is, 128 MB. Therefore, at most 64 block groups are needed. Clearly, the smaller the block size, the larger the number of block groups.
An Ext2 disk
superblock
is stored in an ext2_super_block
structure, whose
fields are listed in Table 17-1. The _ _u8
, _ _u16
, and _ _u32
data types denote unsigned numbers of length 8, 16,
and 32 bits respectively, while the _ _s8
,
_ _s16
, _ _s32
data types
denote signed numbers of length 8, 16, and 32 bits.
Table 17-1. The fields of the Ext2 superblock
Type |
Field |
Description |
---|---|---|
|
|
Total number of inodes |
|
|
Filesystem size in blocks |
|
|
Number of reserved blocks |
|
|
Free blocks counter |
|
|
Free inodes counter |
|
|
Number of first useful block (always 1) |
|
|
Block size |
|
|
Fragment size |
|
|
Number of blocks per group |
|
|
Number of fragments per group |
|
|
Number of inodes per group |
|
|
Time of last mount operation |
|
|
Time of last write operation |
|
|
Mount operations counter |
|
|
Number of mount operations before check |
|
|
Magic signature |
|
|
Status flag |
|
|
Behavior when detecting errors |
|
|
Minor revision level |
|
|
Time of last check |
|
|
Time between checks |
|
|
OS where filesystem was created |
|
|
Revision level |
|
|
Default UID for reserved blocks |
|
|
Default GID for reserved blocks |
|
|
Number of first nonreserved inode |
|
|
Size of on-disk inode structure |
|
|
Block group number of this superblock |
|
|
Compatible features bitmap |
|
|
Incompatible features bitmap |
|
|
Read-only compatible features bitmap |
|
|
128-bit filesystem identifier |
|
|
Volume name |
|
|
Pathname of last mount point |
|
|
Used for compression |
|
|
Number of blocks to preallocate |
|
|
Number of blocks to preallocate for directories |
|
|
Alignment to word |
|
|
Nulls to pad out 1,024 bytes |
The s_inodes_count
field stores the number of
inodes, while the s_blocks_count
field stores the
number of blocks in the Ext2 filesystem.
The s_log_block_size
field expresses the block
size as a power of 2, using 1,024 bytes as the unit. Thus, 0 denotes
1,024-byte blocks, 1 denotes 2,048-byte blocks, and so on. The
s_log_frag_size
field is currently equal to
s_log_block_size
, since block fragmentation is not
yet implemented.
The s_blocks_per_group
,
s_frags_per_group
, and
s_inodes_per_group
fields store the number of
blocks, fragments, and inodes in each block group, respectively.
Some disk blocks are reserved to the superuser (or to some other user
or group of users selected by the s_def_resuid
and
s_def_resgid
fields). These blocks allow the
system administrator to continue to use the filesystem even when no
more free blocks are available for normal users.
The s_mnt_count
,
s_max_mnt_count
, s_lastcheck
,
and s_checkinterval
fields set up the Ext2
filesystem to be checked automatically at boot time. These fields
cause e2fsck
to run after a predefined number of
mount operations has been performed, or when a predefined amount of
time has elapsed since the last consistency check. (Both kinds of
checks can be used together.) The consistency check is also enforced
at boot time if the filesystem has not been cleanly unmounted (for
instance, after a system crash) or when the kernel discovers some
errors in it. The s_state
field stores the value 0
if the filesystem is mounted or was not cleanly unmounted, 1 if it
was cleanly unmounted, and 2 if it contains errors.
Each block group has its own group
descriptor, an ext2_group_desc
structure whose
fields are illustrated in Table 17-2.
Table 17-2. The fields of the Ext2 group descriptor
Type |
Field |
Description |
---|---|---|
|
|
Block number of block bitmap |
|
|
Block number of inode bitmap |
|
|
Block number of first inode table block |
|
|
Number of free blocks in the group |
|
|
Number of free inodes in the group |
|
|
Number of directories in the group |
|
|
Alignment to word |
|
|
Nulls to pad out 24 bytes |
The bg_free_blocks_count
,
bg_free_inodes_count
, and
bg_used_dirs_count
fields are used when allocating
new inodes and data blocks. These fields determine the most suitable
block in which to allocate each data structure. The bitmaps are
sequences of bits, where the value 0 specifies that the corresponding
inode or data block is free and the value 1 specifies that it is
used. Since each bitmap must be stored inside a single block and
since the block size can be 1,024, 2,048, or 4,096 bytes, a single
bitmap describes the state of 8,192, 16,384, or 32,768 blocks.
The
inode table consists of a series of
consecutive blocks, each of which contains a predefined number of
inodes. The block number of the first block of the inode table is
stored in the bg_inode_table
field of the group
descriptor.
All inodes have the same size: 128 bytes. A 1,024-byte block contains
8 inodes, while a 4,096-byte block contains 32 inodes. To figure out
how many blocks are occupied by the inode table, divide the total
number of inodes in a group (stored in the
s_inodes_per_group
field of the superblock) by the
number of inodes per block.
Each Ext2 inode is an ext2_inode
structure whose
fields are illustrated in Table 17-3.
Table 17-3. The fields of an Ext2 disk inode
Type |
Field |
Description |
---|---|---|
|
|
File type and access rights |
|
|
Owner identifier |
|
|
File length in bytes |
|
|
Time of last file access |
|
|
Time that inode last changed |
|
|
Time that file contents last changed |
|
|
Time of file deletion |
|
|
Group identifier |
|
|
Hard links counter |
|
|
Number of data blocks of the file |
|
|
File flags |
|
|
Specific operating system information |
|
|
Pointers to data blocks |
|
|
File version (used when the file is accessed by a network filesystem) |
|
|
File access control list |
|
|
Directory access control list |
|
|
Fragment address |
|
|
Specific operating system information |
Many fields related to POSIX specifications are similar to the corresponding fields of the VFS’s inode object and have already been discussed in Section 12.2.2. The remaining ones refer to the Ext2-specific implementation and deal mostly with block allocation.
In particular, the i_size
field stores the
effective length of the file in bytes, while the
i_blocks
field stores the number of data blocks
(in units of 512 bytes) that have been allocated to the file.
The values of i_size
and
i_blocks
are not necessarily related. Since a file
is always stored in an integer number of blocks, a nonempty file
receives at least one data block (since fragmentation is not yet
implemented) and i_size
may be smaller than 512
× i_blocks
. On the other hand, as we
shall see in Section 17.6.4 later in this chapter, a
file may contain holes. In that case, i_size
may
be greater than 512 × i_blocks
.
The i_block
field is an array of
EXT2_N_BLOCKS
(usually 15) pointers to blocks used
to identify the data blocks allocated to the file (see
Section 17.6.3 later in this
chapter).
The 32 bits reserved for the i_size
field limit
the file size to 4 GB. Actually, the highest-order bit of the
i_size
field is not used, so the maximum file size
is limited to 2 GB. However, the Ext2 filesystem includes a
“dirty trick” that allows larger
files on 64-bit architectures like Hewlett-Packard’s
Alpha. Essentially, the i_dir_acl
field of the
inode, which is not used for regular files, represents a 32-bit
extension of the i_size
field. Therefore, the file
size is stored in the inode as a 64-bit integer. The 64-bit version
of the Ext2 filesystem is somewhat compatible with the 32-bit version
because an Ext2 filesystem created on a 64-bit architecture may be
mounted on a 32-bit architecture, and vice versa. On a 32-bit
architecture, a large file cannot be accessed, unless opening the
file with the O_LARGEFILE
flag set (see
Section 12.6.1).
Recall that the VFS model requires each file to have a different inode number. In Ext2, there is no need to store on disk a mapping between an inode number and the corresponding block number because the latter value can be derived from the block group number and the relative position inside the inode table. For example, suppose that each block group contains 4,096 inodes and that we want to know the address on disk of inode 13,021. In this case, the inode belongs to the third block group and its disk address is stored in the 733rd entry of the corresponding inode table. As you can see, the inode number is just a key used by the Ext2 routines to retrieve the proper inode descriptor on disk quickly.
The different types of files recognized by Ext2 (regular files, pipes, etc.) use data blocks in different ways. Some files store no data and therefore need no data blocks at all. This section discusses the storage requirements for each type, which are listed in Table 17-4.
Table 17-4. Ext2 file types
File_type |
Description |
---|---|
0 |
Unknown |
1 |
Regular file |
2 |
Directory |
3 |
Character device |
4 |
Block device |
5 |
Named pipe |
6 |
Socket |
7 |
Symbolic link |
Regular files are the most common case
and receive almost all the attention in this chapter. But a regular
file needs data blocks only when it starts to have data. When first
created, a regular file is empty and needs no data blocks; it can
also be emptied by the truncate( )
or
open( )
system calls. Both situations are common;
for instance, when you issue a shell command that includes the string
>filename
, the shell creates an empty file or
truncates an existing one.
Ext2
implements directories as a special kind of file whose data blocks
store filenames together with the corresponding inode numbers. In
particular, such data blocks contain structures of type
ext2_dir_entry_2
. The fields of that structure are
shown in Table 17-5. The structure has a variable
length, since the last name
field is a variable
length array of up to EXT2_NAME_LEN
characters
(usually 255). Moreover, for reasons of efficiency, the length of a
directory entry is always a multiple of 4 and, therefore, null
characters (