Performance Comparison of Apache Spark MLlib

dc.contributor.authorSharma, Pallavi
dc.date.accessioned2018-09-20T21:42:51Z
dc.date.available2018-09-20T21:42:51Z
dc.date.issued2018
dc.description.abstractThis study makes an attempt to understand the performance of Apache Spark and the MLlib platform. To this end, the cluster computing system of Apache Spark is set up and five supervised machine learning algorithms (Naïve-Bayes, Decision Tree, Random Forest, Support Vector Machine and Logistic Regression) were investigated. Among the available cluster modes, these algorithms were implemented on two cluster modes, Local and GPU Cluster mode. The performance metrics such as classification accuracy, area under ROC and area under PR for the algorithms were investigated by considering three datasets. It is concluded that the algorithms are computed in parallel in both the modes with GPU Cluster mode performing better than the Local mode for all algorithms in terms of time taken for completion. However, the mentioned performance metrics were not affected in the two modes hinting that the parallel computation does not play a major role in determining these metrics.en_US
dc.identifier.urihttps://hdl.handle.net/10365/28864
dc.publisherNorth Dakota State Universityen_US
dc.rightsNDSU Policy 190.6.2
dc.rights.urihttps://www.ndsu.edu/fileadmin/policy/190.pdf
dc.subject.lcshSpark (Electronic resource : Apache Software Foundation)
dc.subject.lcshMachine learning.
dc.subject.lcshComputer algorithms.
dc.subject.lcshBig data.
dc.titlePerformance Comparison of Apache Spark MLliben_US
dc.typeMaster's paperen_US
ndsu.advisorLudwig, Simone
ndsu.collegeEngineeringen_US
ndsu.degreeMaster of Science (MS)en_US
ndsu.departmentComputer Scienceen_US
ndsu.programComputer Scienceen_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Performance Comparison of Apache Spark MLlib.pdf
Size:
4.46 MB
Format:
Adobe Portable Document Format
Description:
Performance Comparison of Apache Spark MLlib

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.63 KB
Format:
Item-specific license agreed to upon submission
Description: