Hadoop uses the concept of parallelism to upload the split data while fulfilling Velocity problem.
TASK-DESCRIPTION: -
🔷According to popular articles, Hadoop uses the concept of parallelism to upload the split data while fulfilling Velocity problem.
👉🏻 Research with your teams and conclude this statement with proper proof
✴️Hint: tcpdump
>>tcpdump is a most powerful and widely used command-line package analyzer tool which is used to capture or filter TCP/IP packets that recieved or transferred over a network on a specific interface. It also gives us a option to save captured packets in a file for future analysis.
For this task I have created a cluster and tested the way of packets flow with the tcpdump

Step 1: - We have to upload the data from any client then we can observer “how the packets are getting transferred”
Step 2: - And also, we can read the file to observe in what way the files are getting read from the Hadoop cluster
Conclusion: - I found Client is uploading data in only first Data node and rest replications are made by all Data nodes, like I noticed when the Client uploads the data in the first data node., at the same time first Data node creates a replica to the second Data node and the second Data node creates a replica in third Data node.
I also noticed one thing that the first Data node always ping the Name node so that Name node would feel that there is a live node to which data is to be transferred and the second node pings to first data node and the third Data node pings to second node to mark it as a Live Data node.
Comments
Post a Comment