Show simple item record

dc.contributor.authorSharma, Pallavi
dc.description.abstractThis study makes an attempt to understand the performance of Apache Spark and the MLlib platform. To this end, the cluster computing system of Apache Spark is set up and five supervised machine learning algorithms (Naïve-Bayes, Decision Tree, Random Forest, Support Vector Machine and Logistic Regression) were investigated. Among the available cluster modes, these algorithms were implemented on two cluster modes, Local and GPU Cluster mode. The performance metrics such as classification accuracy, area under ROC and area under PR for the algorithms were investigated by considering three datasets. It is concluded that the algorithms are computed in parallel in both the modes with GPU Cluster mode performing better than the Local mode for all algorithms in terms of time taken for completion. However, the mentioned performance metrics were not affected in the two modes hinting that the parallel computation does not play a major role in determining these metrics.en_US
dc.publisherNorth Dakota State Universityen_US
dc.rightsNDSU Policy 190.6.2
dc.titlePerformance Comparison of Apache Spark MLliben_US
dc.typeMaster's paperen_US
dc.date.accessioned2018-09-20T21:42:51Z
dc.date.available2018-09-20T21:42:51Z
dc.date.issued2018
dc.identifier.urihttps://hdl.handle.net/10365/28864
dc.subject.lcshSpark (Electronic resource : Apache Software Foundation)
dc.subject.lcshMachine learning.
dc.subject.lcshComputer algorithms.
dc.subject.lcshBig data.
dc.rights.urihttps://www.ndsu.edu/fileadmin/policy/190.pdf
ndsu.degreeMaster of Science (MS)en_US
ndsu.collegeEngineeringen_US
ndsu.departmentComputer Scienceen_US
ndsu.programComputer Scienceen_US
ndsu.advisorLudwig, Simone


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record