INTRODUCTION TO PASA LAB

中国云·互联网论坛

In recent years, as one of the hottest topics on information technology, big data has been rapidly developing around the world, having attracted great attentions and interests from industry, academia, and governments of many countries, leading to an upsurge of research and development comparable to the Information Highway in 90's. Due to its big value and impact, big data is deemed as new "oil" in the future and will play important impact on the science, technology and economy in the near future.

The PASA Big Data Lab (PASA: Parallel Algorithms, Systems, and Applications) at Nanjing University is one of a few earliest groups in China aiming to research and teach on big data. We have stepped into the big data research area since 2009 when big data had not raised much attention yet. In the past 5 years, we have conducted a series of comprehensive researches on big data including distributed big data storage and management, big data parallel computing models and systems, Hadoop/Spark performance improvement and function enhancement, parallel algorithm design for machine learning and data mining, large-scale semantic data analytics, big data computer architecture and cloud computing, large-scale web data mining, and big data applications. We undertake a number of research projects from national or provincial research programs and also have received collaborative research grants from well-known industry partners such as Intel, Google, ZTE, and Baidu. Currently we also conduct collaborative research and development with the UC Berkeley AMP Lab to work on Spark and Tachyon and now we are the contributor to Apache Spark and Tachyon. In the recent years, we have published more than 30 journal or conference papers and 2 textbooks on big data. The recently published textbook "Understanding Big Data—Big Data Parallel Processing and Programming" has been listed, by the Steering Committee of Ministry of Education for Teaching of Computer Major, as one of series textbooks for cultivating students' ability of computer system.

In 2009, we received the funding from Google for creating the offering the course "Big Data Parallel Processing with MapReduce" for graduate students in our department, becoming one of a few earliest groups in China offering the big data course. At the same time, in the past 5 years we have gradually built a large-scale big data processing cluster with more than 150 server nodes and about 1PB data storage for sharing use to well support the research and teaching tasks in our department.