Hadoop uses the concept of parallelism to upload the split data while fulfilling Velocity problem.

December 10, 2023

TASK-DESCRIPTION: -

🔷According to popular articles, Hadoop uses the concept of parallelism to upload the split data while fulfilling Velocity problem.

👉🏻 Research with your teams and conclude this statement with proper proof

✴️Hint: tcpdump

>>tcpdump is a most powerful and widely used command-line package analyzer tool which is used to capture or filter TCP/IP packets that recieved or transferred over a network on a specific interface. It also gives us a option to save captured packets in a file for future analysis.

For this task I have created a cluster and tested the way of packets flow with the tcpdump

Step 1: - We have to upload the data from any client then we can observer “how the packets are getting transferred”

Step 2: - And also, we can read the file to observe in what way the files are getting read from the Hadoop cluster

Conclusion: - I found Client is uploading data in only first Data node and rest replications are made by all Data nodes, like I noticed when the Client uploads the data in the first data node., at the same time first Data node creates a replica to the second Data node and the second Data node creates a replica in third Data node.

I also noticed one thing that the first Data node always ping the Name node so that Name node would feel that there is a live node to which data is to be transferred and the second node pings to first data node and the third Data node pings to second node to mark it as a Live Data node.

Search This Blog

Devops

Hadoop uses the concept of parallelism to upload the split data while fulfilling Velocity problem.

Comments

Post a Comment

Popular posts from this blog

how to deploy WordPress applications on an ec2 instance with Aws RDS

Adding Fun and Whimsy to Your Linux Terminal with Figlet, Cowsay, Nyancat, and More!

Leveraging Jenkins for Industry Use Cases: A Comprehensive Guide