DFS-Perf: A Benchmarking System with Scalability and Extensibility for Distributed File Systems.

Perf
Background:

Distributed file systems (DFS) store large-scale data and play an import role in various Big Data applications. DFS form the cornerstones of the upper distributed computing frameworks, which makes them become widely-used and diversified.

Significance:
Evaluating the performance of DFS can:
  • help users choose suitable DFS for their applications (one benchmark on different DFS).
  • guide developers to improve DFS themselves (different benchmarks on one DFS).
  • reveal the fundamental research issues on DFS (comparing different DFS)
DFS-Perf is (as far as we know, the first) unified benchmarking framework for evaluating the performance of various DFS.It works with excellent scalability and extensibility:
  • scale to multi-node, multi-process, and multi-thread testing modes.
  • easily add new workloads or plug in new underlying file systems.

Now we have implemented the prototype of DFS-Perf. It contains several typical workloads and supports certain popular DFS, such as Tachyon, HDFS, GlusterFS and GPFS.

For more information about the design of Dolphin and up-to-date documentation on many of our research ideas, check out our website:https://github.com/PasaLab/tachyon-perf.