Octopus Overview

Octopus is a high-level and unified programming model and platform for big data analytics and mining. It can transparently work on top of various distributed computing frameworks, allowing data analysts and big data application programmers to easily design and implement machine learning and data mining algorithms for big data analytics. It offers an R package that provides easy-to-use scalable matrix operations from R and seamlessly executes computation on the single-node R or distributed computing frameworks such as Spark, Hadoop, MPI, etc. It allows users to run R scripts or shell commands interactively on a cluster without the distributed programming knowledge such as Hadoop MapReduce or Spark RDD.

Gitbucket Repository | Releases and Downloads | User Documentation | JIRA | User Mailing List | Developer Documentation

Current Features

User Documentation

Running Octopus Locally: Get Octopus up and running on a single node for a quick spin in ~ 2 minutes.

Running Octopus on a Cluster: Get Octopus up and running on your own cluster.

Running Octopus on Spark: Get Octopus running on Spark.

Running Octopus on Hadoop: Get Octopus running on Hadoop MapReduce.

Running Octopus on MPI: Get Octopus running on MPI.

Configuration Settings: How to configure Octopus.

Octopus User API (R)

Octopus Machine Learning Library (R)

Octopus Presentations:

Support or Contact

If you are interested in trying out Octopus in your cluster, please contact Yihua Huang and Rong Gu. Users are welcome to join our mailing list to discuss questions and make suggestions. We use JIRA to track development and issues.

Acknowledgement

Ocotpus is a research project started at the Nanjing University PASA Lab and currently led by Yihua Huang & Rong Gu. This research and development is funded in part by Jiangsu Province Industry Support Program (BE2014131) and China NSF Grants (No.61223003). We would also like to thank to our initial project contributors in PASALab.