Marlin: Efficient Large-Scale Distributed Matrix Computation library with Spark

Background:

Matrix computation is the core of many massive data-intensive scientific applications such as large-scale numerical analysis, data mining, and computational physics. In the Big Data era, as the scale of the matrix grows, traditional single-node matrix computation can hardly cope with such large data and computation...More About Marlin→

Dolphin: A parallel Deep Learning algorithm(StackedAutoencoder) Based on Intel Xeon Phi

Background:

Deep learning is a unsupervised feature extraction algorithm. The training process of it usually contains large number matrix operations. It can be quite time-consuming as data size increases. Intel Xeon Phi is a many-core platform introduced by Intel which is suitable for vector operations. With the help Intel MKL math lib, The platform can deal with the training process easily...More About Dolphin→

Cichlid: Efficient Large Scale RDFS/OWL Reasoning with Spark

Introduction:

Cichlid is a distributed RDFS & OWL reasoning system based on Spark. Cichlid achieves higher efficiency and scalability than the existing large scale reasoning systems based the MapReduce framework or the P2P self-organizing networks. The major contributions and novelties of Cichlid are summarized as follows...More About Cichlid→

Perf: A Benchmarking System with Scalability and Extensibility for Distributed File Systems

Background:

Distributed file systems (DFS) store large-scale data and play an import role in various Big Data applications. DFS form the cornerstones of the upper distributed computing frameworks, which makes them become widely-used and diversified...More About DFS-Perf→

YARM: Efficient and Scalable RDFS Semantic Reasoning Engine Based on MapReduce

Primary goals:

Optimize the distributed parallel reasoning algorithm on MapReduce;Overcome the lack of scalability of the existing semantic reasoning engines;To improve the efficiency of reasoning...More About YARM→

HIBase: An Hierarchical Indexing System for Storing and Querying Big Data

Introduction:

HiBase is a hierarchical indexing mechanism and a prototype distributed data-storage system, which has hierarchical indexes for non-primary keys in tables and makes data querying more efficient. HiBase uses HBase as the lower data storage and creates a memory cache with Hot Score Algorithm for more efficient data transmission...More About HIBase→

YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark

Background:

The frequent itemset mining (FIM) is one of the most important techniques to extract knowledge from data in many real-world...More About YAFIM→