Integrate LVM with the Hadoop to Achieve Dynamic Storage

Shubham Jangid
5 min readMar 14, 2021

What is physical volume ?

A physical volume is a collection of disk partitions used to store all server data. Physical volumes have a maximum size of 16 TB. Because a physical volume can contain any portion of one or more disks, you must specify several characteristics of a physical volume when creating it.

What is Volume Group?

A volume group ( VG ) is the central unit of the Logical Volume Manager (LVM) architecture. It is what we create when we combine multiple physical volumes to create a single storage structure, equal to the storage capacity of the combined physical devices.

What is LVM?

In Linux, Logical Volume Manager (LVM) is a device mapper framework that provides logical volume management for the Linux kernel. Most modern Linux distributions are LVM-aware to the point of being able to have their root file systems on a logical volume.

We are using the concepts of LVM in the hadoop cluster to provide the Dynamic Storage to the data nodes.

Steps to be followed:-

step1: Add to hdd to the system which you want to make as data node

Step2: Check the hard Drives name which helps us to create PV

Here We have two new drive /dev/nvme0n2 and /dev/nvme0n3

Step3: Create PV for both drives

Step4: Create Volume Groupe using both PVs

Step5: Now create LV of 12 GiB in my Case

Step6: Display LV to check it is created or not

Step7: Mount LV with the “/data_node” directory to provide storage for the LV to the Data Node

Step8: Configure hdfs-site.xml for Name Node

Step9: Configure core-site.xml for Name Node

Step10: Configure hdfs-site.xml for the Data node

Step11: Configure core-site.xml for Data Node

Step12: Format the Name node

Step13: Start the Name Node

Step14: Start the Data node

Step15: Checking the report that storage is provided by data node is equal to the created LV storage

You can perform the LVM part with the provided python script by me here I’m not using script to demonstrate how we can create lvm with the CMD’s

Script To demonstrate How we can create LVM given below

#!/usr/bin/python3
import subprocess as sp
import getpass as gp
ip_dn = input(“Enter The IP of system Which you want to make the data node:- “)
ip_nn = input(“Enter The IP of system Which you want to make the name node:- “)
p_dn = gp.getpass(“Enter the passwd of Data Node :- “)
p_nn = gp.getpass(“Enter the passwd of Name Node :- “)
print(“This is compulsury To add the both of HDD with the Name node”)hdd1 = input(“Enter the first HHD name Which you want to add in VG :- “)
hdd2 = input(“Enter the second HHD name Which you want to add in VG :- “)
create_pv1 = “sshpass -p “ + p_dn + “ ssh root@” + ip_dn + “ pvcreate “ + hdd1
create_pv2 = “sshpass -p “ + p_dn + “ ssh root@” + ip_dn + “ pvcreate “ + hdd2
create_vg = “sshpass -p “ + p_dn + “ ssh root@” + ip_dn + “ vgcreate dynamic_storage “ + hdd1 + “ “ + hdd2
create_lv = “sshpass -p “ + p_dn + “ ssh root@” + ip_dn + “ lvcreate — size 12G — name hadoop_dn dynamic_storage”
format_lv = “sshpass -p “ + p_dn + “ ssh root@” + ip_dn + “ mkfs.ext4 /dev/dynamic_storage/hadoop_dn”
create_dir = “sshpass -p “ + p_dn + “ ssh root@” + ip_dn + “ mkdir /datanode”
mnt_dir = “sshpass -p “ + p_dn + “ ssh root@” + ip_dn + “ mount /dev/dynamic_storage/hadoop_dn /data_node/”
show_mnt = “sshpass -p “ + p_dn + “ ssh root@” + ip_dn + “ df -h”
sp.getoutput(create_pv1)
sp.getoutput(create_pv2)
sp.getoutput(create_vg)
sp.getoutput(create_lv)
sp.getoutput(format_lv)
sp.getoutput(create_dir)
sp.getoutput(mnt_dir)
sp.getoutput(show_mnt)

GitHub URL:-

--

--