Integrate LVM with the Hadoop to Achieve Dynamic Storage
What is physical volume ?
A physical volume is a collection of disk partitions used to store all server data. Physical volumes have a maximum size of 16 TB. Because a physical volume can contain any portion of one or more disks, you must specify several characteristics of a physical volume when creating it.
What is Volume Group?
A volume group ( VG ) is the central unit of the Logical Volume Manager (LVM) architecture. It is what we create when we combine multiple physical volumes to create a single storage structure, equal to the storage capacity of the combined physical devices.
What is LVM?
In Linux, Logical Volume Manager (LVM) is a device mapper framework that provides logical volume management for the Linux kernel. Most modern Linux distributions are LVM-aware to the point of being able to have their root file systems on a logical volume.
We are using the concepts of LVM in the hadoop cluster to provide the Dynamic Storage to the data nodes.
Steps to be followed:-
step1: Add to hdd to the system which you want to make as data node
Step2: Check the hard Drives name which helps us to create PV
Here We have two new drive /dev/nvme0n2 and /dev/nvme0n3
Step3: Create PV for both drives
Step4: Create Volume Groupe using both PVs
Step5: Now create LV of 12 GiB in my Case
Step6: Display LV to check it is created or not
Step7: Mount LV with the “/data_node” directory to provide storage for the LV to the Data Node
Step8: Configure hdfs-site.xml for Name Node
Step9: Configure core-site.xml for Name Node
Step10: Configure hdfs-site.xml for the Data node
Step11: Configure core-site.xml for Data Node
Step12: Format the Name node
Step13: Start the Name Node
Step14: Start the Data node
Step15: Checking the report that storage is provided by data node is equal to the created LV storage
You can perform the LVM part with the provided python script by me here I’m not using script to demonstrate how we can create lvm with the CMD’s
Script To demonstrate How we can create LVM given below
#!/usr/bin/python3
import subprocess as sp
import getpass as gpip_dn = input(“Enter The IP of system Which you want to make the data node:- “)
ip_nn = input(“Enter The IP of system Which you want to make the name node:- “)p_dn = gp.getpass(“Enter the passwd of Data Node :- “)
p_nn = gp.getpass(“Enter the passwd of Name Node :- “)print(“This is compulsury To add the both of HDD with the Name node”)hdd1 = input(“Enter the first HHD name Which you want to add in VG :- “)
hdd2 = input(“Enter the second HHD name Which you want to add in VG :- “)create_pv1 = “sshpass -p “ + p_dn + “ ssh root@” + ip_dn + “ pvcreate “ + hdd1
create_pv2 = “sshpass -p “ + p_dn + “ ssh root@” + ip_dn + “ pvcreate “ + hdd2
create_vg = “sshpass -p “ + p_dn + “ ssh root@” + ip_dn + “ vgcreate dynamic_storage “ + hdd1 + “ “ + hdd2create_lv = “sshpass -p “ + p_dn + “ ssh root@” + ip_dn + “ lvcreate — size 12G — name hadoop_dn dynamic_storage”
format_lv = “sshpass -p “ + p_dn + “ ssh root@” + ip_dn + “ mkfs.ext4 /dev/dynamic_storage/hadoop_dn”create_dir = “sshpass -p “ + p_dn + “ ssh root@” + ip_dn + “ mkdir /datanode”
mnt_dir = “sshpass -p “ + p_dn + “ ssh root@” + ip_dn + “ mount /dev/dynamic_storage/hadoop_dn /data_node/”
show_mnt = “sshpass -p “ + p_dn + “ ssh root@” + ip_dn + “ df -h”sp.getoutput(create_pv1)
sp.getoutput(create_pv2)
sp.getoutput(create_vg)
sp.getoutput(create_lv)
sp.getoutput(format_lv)
sp.getoutput(create_dir)
sp.getoutput(mnt_dir)
sp.getoutput(show_mnt)
GitHub URL:-