Friday, August 1, 2014

How HDFS is different from traditional filesystems like ext3?


  1. Traditional filesystems like ext3 are implemented as kernel modules. HDFS is a userspace filesystem - filesystem code runs outside kernel as OS process and is not registered or exposed via the Linux VFS layer.
  2. Traditional filesystems need to mounted. HDFS filesystems need not be mounted, as it just runs as a OS process.
  3. HDFS is ditributed filesystem - distributed across many machines. So size of a HDFS file is not limited by the machine capacity. In traditional filesystems, file size cannot exceed the disk space capacity of the machine.
  4. Traditional filesystems use block size of 4KB or 8KB. HDFS uses larger block size of 64MB by default.
  5. Unlike conventional file systems, HDFS provides an API that exposes the locations of a file blocks. This allows applications like the MapReduce framework to schedule a task to where the data are located, thus improving the read performance.

No comments:

Post a Comment