Friday, August 1, 2014

Hadoop Namenode - Image, Checkpoint(fsimage), Journal(edits)

Image

HDFS Namenode keeps the entire namespace in RAM. The inode data and list of blocks belonging to each file comprise the metadata of the name system called the image.

Checkpoint (fsimage)

The persistent record of the image stored in the Namenode's native fielsystem is called a checkpoint.The locations of block replicas may change over time and are not part of the persistent checkpoint. checkpoint also called fsimage(filesystem image).

The fsimage file contains a serialized form of all the directory and file inodes in the filesystem. Each inode is an internal representation of a file or directory’s metadata and contains such information as the file’s
replication level, modification and access times, access permissions, block size, and the blocks a file is made up of. For directories, the modification time, permissions, and quota metadata is stored.

Note: The fsimage file does not record the datanodes on which the blocks are stored. Instead the namenode keeps this mapping in memory, which it constructs by asking the datanodes for their block lists when they join the cluster and periodically afterward to ensure the namenode’s block mapping is up-to-date.

Journal(edit log)

The NameNode also stores the modification log of the image called the journal in the local host’s native file system. journal also called edits. When a filesystem client performs a write operation (such as creating or moving a file), it is first recorded in the edit log. The namenode also has an in-memory representation of the filesystem metadata, which it updates after the edit log has been modified. The in-memory metadata is used to serve read requests.

Upon namenode startup, the fsimage file is loaded into RAM and any changes in the edits file are replayed, bringing the in-memory view of the filesystem up to date

How HDFS is different from traditional filesystems like ext3?


  1. Traditional filesystems like ext3 are implemented as kernel modules. HDFS is a userspace filesystem - filesystem code runs outside kernel as OS process and is not registered or exposed via the Linux VFS layer.
  2. Traditional filesystems need to mounted. HDFS filesystems need not be mounted, as it just runs as a OS process.
  3. HDFS is ditributed filesystem - distributed across many machines. So size of a HDFS file is not limited by the machine capacity. In traditional filesystems, file size cannot exceed the disk space capacity of the machine.
  4. Traditional filesystems use block size of 4KB or 8KB. HDFS uses larger block size of 64MB by default.
  5. Unlike conventional file systems, HDFS provides an API that exposes the locations of a file blocks. This allows applications like the MapReduce framework to schedule a task to where the data are located, thus improving the read performance.