Integrating LVM with Hadoop and providing Elasticity to Data Node Storage

Hey all, back again with an article cum demonstration, here I will introduce you - What exactly “Hadoop” is? and how It helps in Distributive Storage of Big Data.
Then We will talk about the Slave nodes that provides Storage to the Master Node also called Name Node. How much memory get wasted due to this association and How LVM (Logical Volume Management) concept can solve this issue.


It is an Open Source tool from Apache Community. It provide framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Mainly used for Storing and Big Data.

Let’s Start with the Hadoop Cluster Setup part:
Step 1: Create 4 Ec2 Instances over AWS cloud and name them as “Master”, “DN1”, “DN2” and “Client”.

Go to all the nodes by login via putty tool and giving their respective private key.

Step 2: Here, I was having some configuration files of Hadoop Master Configuration and also for Hadoop Slave and Client that I have transferred via “WinSCP” from my local Workspace to the Linux Instance over Cloud.
And Also the java-software and Hadoop Software, I have installed both of these using “rpm” command…
> rpm -ivh jdk-<versionfile_x64>.rpm

Now install Hadoop using following rpm command:
> rpm -ivh hadoop-<version_64>.rpm — force

Step 3: Copy both the configuration files containing XML code over the hadoop document root “/etc/hadoop” . The Configuration file names are:
1) hdfs-site.xml
2) core-site.xml

Step 4: Check for the java Program Service if running or not using command
> jps

Similarly do the same set of Steps with DN1 and DN2 Data Nodes. Here you need to Create a folder that you have to associate with master for providing storage from Data Node to Master Node in the Hadoop Distributive File System Cluster Setup. Likewise have created “/dn” folder here.
Then Start Data Node using Command:
> start datanode

Now move to Name Node and check for the dfsadmin -report that would now shoe the Storage Shared by one of the Data Node to the master Node via hdfs protocol.

Also check for the > jps command that will show now that the DataNode has also started working.

Now If we need more Storage we will attach another hard disk to the Data Node and them mount using a folder to the master node. But we have a challenge that Static Storage that are mounted are not that flexible in sharing the storage as per need.

To overcome this we will first make the hard disk LVM format type then will mount it to the Master Node.

Let’s Start Creating LVM

Create Physical Volume using command:
> pvcreate <diskname>

Now Crate Volume Group before creating Logical Volume.

After it we can create a Logical Volume of desirable Size i.e. 20G and format the same using ext4 format type.

Now mount the /dn folder to the Volume group and Then restart the Data Node Service.

Increase or Decrease the Size of LVM Partition after attaching the volume group.

Use “lvextend” to extend the volume, Let’s increase by +10G then again check the storage shared by using “hadoop dfsadmin -report”.

Now Let’s Decrease the volume of our Volume Group by -10G.

Hence, We have increased and Decreased the Volume online. By +10G/-10G and shared a dynamic Volume over Hadoop Distributed Cluster, master Node.
Thanks for Reading!! Hope it would help you :)



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store