Running Octopus on a Cluster
This guide describes how to get Octopus running on a cluster.
Prerequisites
The prerequisites for this part contains all the prerequisites of Running Octopus Locally. In addition, to run on a cluster, you need a Distributed File System, e.g. HDFS or Tachyon, at least one Computing Framework, e.g. Spark, Hadoop MapReduce or MPI.
More a step, if you want to use the apply
method, you should have Rserve package on each of your cluster nodes. You can install the Rserve package with following steps (Note that this should be executed on each node):
1. start R shell with root permission
2. > install.packages("Rserve")
3. quit R shell and root
4. $ R CMD Rserve
Configurations
Prepare the binary distribution of Octopus:
$ tar xvfz octopus-0.1-bin.tar.gz
$ cd octopus-0.1
Before installing Octopus, requisite environment variables must be specified in conf/octopus-env.R
To run on a cluster, these variables in conf/octopus-env.R
should be set as follows:
# If you has Spark as a computing framework, set this to TRUE
OCTOPUS_SPARK_START=TRUE
# Set the home directory path of Octopus
OCTOPUS_HOME="{where.your.octopus-0.1}"
# To run on a cluster, set this to a distributed file system, e.g. hdfs or tachyon
OCTOPUS_UNDERFS_ADDRESS="hdfs://ip:port"
# Set the data folder's path of Octopus in underlying file system
OCTOPUS_WAREHOUSE="/tmp/octopus_warehouse"
Moreover, set those variables correspond to your computing framework. For different computing frameworks, you can distinguish those configurations with their prefixes, e.g. OCTOPUS_SPARK_
, OCTOPUS_HADOOP_
or OCTOPUS_MPI_
.
Then, the following command should be added to make R be able to load the configurations. It can be added to Rprofile.site
in the installation path of R, e.g. /usr/lib64/R/etc/Rprofile.site
. Or added to .Rprofile
in the user’s home path, e.g. ~/.Rprofile
.
source("{where.your.octopus-0.1}/conf/octopus-env.R")
Now, you can install Octopus by running:
$ R CMD INSTALL {where.your.octopus-0.1}/R/pkg
Before using Octopus, you can format your Octopus Warehouse by running:
$ sbin/octopus-format.sh
Example
Now you can write a few R script to verify if the installed package OctMatrix
works well.
require(OctMatrix)
#Here we take Spark and HDFS as example
engineType <- "Spark"
outputPath <- "hdfs://master:9000/tmp/octoput-test-c"
a <- OctMatrix(1:4, 2, 2, engineType)
b <- OctMatrix(5:8, 2, 2, engineType)
c <- a + b
c
WriteOctMatrix(c, outputPath)
You can see the correct result of a + b
and the result c
is output to the target path.