Running DFS-Perf on a Cluster
This guide describes how to get DFS-Perf running on a cluster.
Prerequisites
The prerequisites for this part contains all the prerequisites of Running DFS-Perf Locally. In addition, to run on a cluster, you need a Distributed File System from GlusterFS, GPFS, HDFS and Alluxio.
Configurations
Prepare the binary distribution of DFS-Perf:
$ tar xvfz dfs-perf-0.1-bin.tar.gz
$ cd dfs-perf-0.1
Before running DFS-Perf, requisite environment variables must be specified in conf/dfs-perf-env.sh
$ cp conf/dfs-perf-env.sh.template conf/dfs-perf-env.sh
To run on a cluster, these variables in conf/dfs-perf-env.sh
should be set as follows:
export JAVA={where.your.java}
# Set to the distributed file system, here take hdfs as example
export DFS_PERF_DFS_ADRESS="hdfs://master:9000"
# Set master address
DFS_PERF_MASTER_HOSTNAME="master"
For different DFS, some specific variables need to be set. You can see the details in Running DFS-Perf on GlusterFS, Running DFS-Perf on GPFS, Running DFS-Perf on HDFS and Running DFS-Perf on Alluxio.
Then, set the slaves in conf/slaves
. Each line represents a slave (actually a process).
Example
Now you can run DFS-Perf on DFS. For example, run the metadata workload.
$ bin/dfs-perf-clean
$ bin/dfs-perf Metadata
$ bin/dfs-perf-collect Metadata
See more examples on Examples.