YARM: Efficient and Scalable RDFS Semantic Reasoning Engine Based on MapReduce.
Primary goals:
- Optimize the distributed parallel reasoning algorithm on MapReduce.
- Overcome the lack of scalability of the existing semantic reasoning engines.
- To improve the efficiency of reasoning.
YARM includes four major optimizations:
- It adopts a well-designed data partitioning schema and a corresponding reasoning algorithm to minimize the amount of data transferred among computing nodes.
- It optimizes the execution order of the reasoning rules to improve the computing speed.
- It uses an efficient way to remove duplicates yielded in reasoning process. This avoids the need of extra MapReduce jobs to do this work.
- Based on the optimizations above, we design and implement a new parallel reasoning algorithm on the Hadoop MapReduce framework.
Experimental results on both real-world and synthetic datasets show that YARM is about 10 times faster than the latest reasoning engine and also achieves better scalability.