Performance Comparison of Apache Spark MLlib

Sharma, Pallavi

dc.contributor.author	Sharma, Pallavi
dc.description.abstract	This study makes an attempt to understand the performance of Apache Spark and the MLlib platform. To this end, the cluster computing system of Apache Spark is set up and five supervised machine learning algorithms (Naïve-Bayes, Decision Tree, Random Forest, Support Vector Machine and Logistic Regression) were investigated. Among the available cluster modes, these algorithms were implemented on two cluster modes, Local and GPU Cluster mode. The performance metrics such as classification accuracy, area under ROC and area under PR for the algorithms were investigated by considering three datasets. It is concluded that the algorithms are computed in parallel in both the modes with GPU Cluster mode performing better than the Local mode for all algorithms in terms of time taken for completion. However, the mentioned performance metrics were not affected in the two modes hinting that the parallel computation does not play a major role in determining these metrics.	en_US
dc.publisher	North Dakota State University	en_US
dc.rights	NDSU Policy 190.6.2
dc.title	Performance Comparison of Apache Spark MLlib	en_US
dc.type	Master's paper	en_US
dc.date.accessioned	2018-09-20T21:42:51Z
dc.date.available	2018-09-20T21:42:51Z
dc.date.issued	2018
dc.identifier.uri	https://hdl.handle.net/10365/28864
dc.subject.lcsh	Spark (Electronic resource : Apache Software Foundation)
dc.subject.lcsh	Machine learning.
dc.subject.lcsh	Computer algorithms.
dc.subject.lcsh	Big data.
dc.rights.uri	https://www.ndsu.edu/fileadmin/policy/190.pdf
ndsu.degree	Master of Science (MS)	en_US
ndsu.college	Engineering	en_US
ndsu.department	Computer Science	en_US
ndsu.program	Computer Science	en_US
ndsu.advisor	Ludwig, Simone

Files in this item

Name:: Performance Comparison of Apache ...
Size:: 4.464Mb
Format:: PDF
Description:: Performance Comparison of Apache ...

View/Open

This item appears in the following Collection(s)

Computer Science Masters Papers

Show simple item record