DFS-Perf: A Benchmarking System with Scalability and Extensibility for Distributed File Systems

DFS-Perf: A Benchmarking System with Scalability and Extensibility for Distributed File Systems.

Background:

Distributed file systems (DFS) store large-scale data and play an import role in various Big Data applications. DFS form the cornerstones of the upper distributed computing frameworks, which makes them become widely-used and diversified.

Significance:
Evaluating the performance of DFS can:

help users choose suitable DFS for their applications (one benchmark on different DFS).
guide developers to improve DFS themselves (different benchmarks on one DFS).
reveal the fundamental research issues on DFS (comparing different DFS)

DFS-Perf is (as far as we know, the first) unified benchmarking framework for evaluating the performance of various DFS.It works with excellent scalability and extensibility:

scale to multi-node, multi-process, and multi-thread testing modes.
easily add new workloads or plug in new underlying file systems.

Now we have implemented the prototype of DFS-Perf. It contains several typical workloads and supports certain popular DFS, such as Tachyon, HDFS, GlusterFS and GPFS.

For more information about the design of Dolphin and up-to-date documentation on many of our research ideas, check out our website：https://github.com/PasaLab/tachyon-perf.