Marlin: Efficient Large-Scale Distributed Matrix Computation library with Spark

Background:

Matrix computation is the core of many massive data-intensive scientific applications such as large-scale numerical analysis, data mining, and computational physics. In the Big Data era, as the scale of the matrix grows, traditional single-node matrix computation can hardly cope with such large data and computation.

Significance:

Marlin builds on top of Spark, providing high-level matrix computation primitives for users to deal with large-scale distributed matrices. The performance of Marlin is far beyond than MapReduce implementations and even better than traditional methods based-on MPI which is not widely-used in big data era.

For more information about the design of Marlin and up-to-date documentation on many of our research ideas, check out our website：https://github.com/PasaLab/marlin.

System stack: