HDFS vs NFS

 In this article, we are going to discuss the difference between HDFS(Hadoop distributed file system) and NFS (Network file system).

HDFS (Hadoop distributed file system) :
It is a distributed file system that handles a large set of data that is running on commodity hardware and in which data is distributed among many data nodes or networked computers. 

It is mainly used for enlarging a single Apache Hadoop cluster to hundreds and even sometimes thousands of nodes. It is considered one of the major components of Apache Hadoop. It is not similar to Apache HBase, which is a column-oriented non-relational DBMS that sits on top of HDFS, which can better support real-time data with its in-memory processing engine. 

It is mainly used to store big data and also makes it responsible for faster data transactions. 
This file system stores multiple replicas of files, that’s why it is called fault-tolerant. Here the default replication level is 3.

NFS (Network file system) :
This NFS file system is a distributed file system that permits its client to access the file over the network. This file system is an open standard. That’s the reason this file system can be implemented easily. Initially, this file system was created for experimental purposes, but later its second variety was released for public use after the first success.


All data is accumulated on one main system and all the remaining systems of the network can access the data stored on that as if it was stored in their local system. But here one problem arises. If the main system went down, then there is a high possibility of a loss of data and here the storage also relies on the space available on that system.

Here, a mount command is used for accessing the exported data. After the successful accessing of data, the client machine can interconnect with the file systems within the specified parameters.

Difference between HDFS & NFS :

  • NFS does not have any built-in fault-tolerance but HDFS was designed to survive failures as it has fault-tolerance or replication.
  • HDFS’s storage capacity is comparatively high.

Benefits of HDFS over NFS? 
Other than fault tolerance, HDFS does support multiple replicas of files, which avoids the common bottleneck of many clients accessing a single file. It has reading performance scales better than NFS because of having multiple replicas on different physical disks.

The difference between HDFS & NFS in tabular form :

Criteria

HDFS

NFS

DefinitionIt is a file system in which data is distributed among many data nodes or networked computers.It is a file system or protocol which allows its client to access the file over the network.
Supporting Data Size –It is mainly used to store and process big data. It can store and process a small amount of data.
Data storage –Its data blocks are dispersed on the local drives of hardware.Data is stored on a single dedicated hardware.
Reliability –Its data is stored reliably. Here, data is available even after machine failure.No reliability, data is not available in case of machine failure.
Data Redundancy –It runs on a cluster of different machines, data redundancy may occur due to replication protocol.It runs on a single machine, with no chance of data redundancy.
Domain –It is for multi-domain.It is for a single domain.
Client-Server Trust –Here, client identity is trusted by the OS.Here, client identity is trusted by default.
Compatibility with O/S –It has different calls. It is mainly used for non-interactive programs.It has the same system calls as O/S.