Enabling Multi-pipeline Data Transfer in HDFS for Big Data Applications

Date and Time: 
2015 April 14 @ 4:00pm
Location: 
FL2-1022 Large Auditorium
Speaker: 
Liqiang Wang

Authors: Liqiang Wang, Hong Zhang (University of Wyoming), Hai Huang (IBM TJ Watson Research Center)

The current HDFS in Hadoop is inefficient when handling upload of data files from client local file system, especially when the storage cluster is configured to use replicas. The root cause is HDFS’s synchronous pipeline design. We introduce an improved HDFS design called SMARTH. It utilizes asynchronous multi-pipeline data transfers instead of a single pipeline stop-and-wait mechanism. SMARTH records the actual transfer speed of data blocks and sends this information to the namenode along with periodic heartbeat messages. The namenode sorts datanodes according to their past performance and tracks this information continuously. When a client initiates an upload request, the namenode will send it a list of “high performance” datanodes that it thinks will yield the highest throughput for the client. By choosing higher performance datanodes relative to each client and by taking advantage of the multi-pipeline design, our experiments show that SMARTH significantly improves the performance of data write operations compared to HDFS. Specifically, SMARTH is able to improve the throughput of data transfer by 27-245% in a heterogeneous virtual cluster on Amazon EC2.

Speaker Description: 

Dr. Liqiang Wang is currently an associate professor in the Department of Computer Science at the University of Wyoming. He is currently taking sabbatical leave and working as a visiting research scientists at IBM T.J. Watson Research Center. His research focuses on an interdisciplinary area between big-data computing and software analytics. His work applies program analysis techniques to improve correctness and resilience of data-intensive computing as well as optimize its performance and scalability, especially on Cloud, GPU, and multicore platforms. He received an NSF CAREER Award in 2011.

AttachmentSize
PDF icon SEA2015_Smarth-Wang.pdf1.16 MB

Event Category: