A Case Study of Accelerating Apache Spark with FPGA

Best Demo Award

Abstract

Apache Spark is an efficient distributed computing framework for big data processing. It supports in-memory computation of RDDs (Resilient Distributed Dataset) and provides a provision of reusability, fault tolerance, and real-time stream processing. However, the tasks in Spark framework are only performed on CPU. The low degree of parallelism and power inefficiency of CPU may restrict the performance and scalability of the cluster. In order to improve the performance and power dissipation of the data center, heterogeneous accelerators such as FPGA, GPU, MIC (Many Integrated Core) exhibit more efficient performance than the general-purpose processor in big data processing. In this work, we propose a framework to integrate FPGA accelerator into a Spark cluster. We use FPGA to accelerate the Spark tasks developed with Python, and in this way, the main computing load is performed on FPGA instead of CPU. We illustrate the performance of the FPGA based Spark framework with a case study of 2D-FFT algorithm acceleration. The results showed that FPGA based Spark implementation acquires 1.79x speedup than CPU implementation.

Publication
In 2018 IEEE International Conference On Trust, Security And Privacy In Computing And Communications
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.
Click the Slides button above to demo Academic's Markdown slides feature.

Supplementary notes can be added here, including code and math.

Related