ext3 filesystem provides journalling as an important enhancement over ext2 filesystem, where the filesystem maintains a journal to log the changes.
How is the journalling feature in filesystem helpful?
After an unclean shutdown of system, due to power failure or system crash, in case of ext2 filesystem, each mounted partition must be checked for consistency using e2fsck program. This causes delay in system boot time significantly, especially in case of large partitions containing a large number of files. During this time, any data on the partitions is unreachable.
In case if fsck need to be run on a live sytem, the partitions need to be remounted as read only. When a filesystem is mounted as readonly, all pending metadata updates (and writes) are then forced to the disk prior to the remount. This ensures the filesystem is in a consistent state and it is now possible to run fsck -n.
The journalling feature provided by the ext3 file system means that this sort of file system check is no longer necessary after an unclean system shutdown.
The time to recover an ext3 file system after an unclean system shutdown does not depend on the size of the file system or the number of files; rather, it depends on the size of the journal used to maintain consistency. The default journal size takes about a second to recover, depending on the speed of the hardware. So journalling has made running fsck after an unclean unmount unnecessary.
Is fsck necessary when journalling feature is available for a filesystem?
In case of extreme cases like hard drive failures, file system consistency check(fsck) is very much necessary.
How journalling works?
There are three modes of journalling
1) ordered (only the metadata is journalled - default)
2) writeback (Only the metadat, but no guarantee for order of commits)
3) journal (both data and metadata are journalled)
Ordered mode :
mount -o data=ordered
In ordered mode, the data blocks related to a metadata change are written to the disk before the metadata is committed to the journal. This ensures that every metadata change recorded in journal actually reflects the writes that have been made to the disk.
Ordered mode is the default journal mode used in most systems.
Writeback mode:
mount -o data=writeback
This is the fastest mode. In this mode, metadata may be committed to the journal even before the databalocks related to the metadata change are written to the disk. Thus files may contain stale data.
Journal mode:
mount -o data=journal
In this mode, both metadata and datablocks related to the metadata change are journalled.
Here, a copy of the modified databalocks are first written to the journal. Then the modified datablocks are writted to the filesystem. Once the I/O data transfer to the filesystem terminates (data is committed to the filesystem), the copies of the blocks in the journal are discarded.
This is the slowest mode of journalling. More total disk I/O is being done here. However, this merges lots of small writes around the disk into efficient linear IO, which helps in avoiding expensive seeks for small, random writes.
How to know if a filesystem has journalling enabled or not?
# dumpe2fs /dev/sda2 | grep -i has_journal
dumpe2fs 1.41.12 (17-May-2010)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
# debugfs -R features /dev/sda2
debugfs 1.41.12 (17-May-2010)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
How to find the size of the journal?
# dumpe2fs /dev/sda2 | egrep -i '(journal|size)'
dumpe2fs 1.41.12 (17-May-2010)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Block size: 4096
Fragment size: 4096
Flex block group size: 16
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Journal backup: inode blocks
Journal features: journal_incompat_revoke
Journal size: 128M
Journal length: 32768
Journal sequence: 0x0002cb40
Journal start: 1
How to improve journal performance?
2) journal partition must be created with same block size as that used by the filesystem it is journalling
Let us see how to go about a creating a external journal partiton. Say, for filesystems in partitions on the device /dev/sda, we want to create a external journal partition on device /dev/sdb.
# dumpe2fs /dev/sda1 | egrep -i '(journal|size)'
# tune2fs -O ^has_journal /dev/sda1
# mke2fs -O journal_dev -b <block-size> /dev/sdb1
# tune2fs -j -J device=/dev/sdb1 /dev/sda1
No comments:
Post a Comment