Running Octopus on Spark
To run Octopus on Spark, you must install Octopus in cluster mode
Configurations
Before installing the OctMatrix R package, you should set these Spark-related variables in conf/octopus-env.R
:
# Make sure Spark has been started
OCTOPUS_SPARK_START=TRUE
# Set the master URL for the Spark cluster
OCTOPUS_SPARK_MASTER="spark://ip:port"
# Set the maximum amount of CPU cores of the application from the cluster. If not set, the default will be spark.deploy.defaultCores on Spark's standalone cluster manager.
OCTOPUS_SPARK_CORES_MAX="4"
# Set the amount of memory to use per executor process, in the same format as JVM memory strings (e.g. 512m, 2g).
OCTOPUS_SPARK_EXECUTOR_MEMORY="8g"
# Set the default task number of shuffle process
OCTOPUS_SPARK_DEFAULT_PARALLELISM=4
Usages
When using OctMatrix package in R script, set the engineType to be “Spark” then the matrix will be handled by Spark.
a <- OctMatrix(data, nrow, ncol, "Spark", byrow)