Performance Comparison of Apache Spark MLlib
dc.contributor.author | Sharma, Pallavi | |
dc.date.accessioned | 2018-09-20T21:42:51Z | |
dc.date.available | 2018-09-20T21:42:51Z | |
dc.date.issued | 2018 | |
dc.description.abstract | This study makes an attempt to understand the performance of Apache Spark and the MLlib platform. To this end, the cluster computing system of Apache Spark is set up and five supervised machine learning algorithms (Naïve-Bayes, Decision Tree, Random Forest, Support Vector Machine and Logistic Regression) were investigated. Among the available cluster modes, these algorithms were implemented on two cluster modes, Local and GPU Cluster mode. The performance metrics such as classification accuracy, area under ROC and area under PR for the algorithms were investigated by considering three datasets. It is concluded that the algorithms are computed in parallel in both the modes with GPU Cluster mode performing better than the Local mode for all algorithms in terms of time taken for completion. However, the mentioned performance metrics were not affected in the two modes hinting that the parallel computation does not play a major role in determining these metrics. | en_US |
dc.identifier.uri | https://hdl.handle.net/10365/28864 | |
dc.publisher | North Dakota State University | en_US |
dc.rights | NDSU Policy 190.6.2 | |
dc.rights.uri | https://www.ndsu.edu/fileadmin/policy/190.pdf | |
dc.subject.lcsh | Spark (Electronic resource : Apache Software Foundation) | |
dc.subject.lcsh | Machine learning. | |
dc.subject.lcsh | Computer algorithms. | |
dc.subject.lcsh | Big data. | |
dc.title | Performance Comparison of Apache Spark MLlib | en_US |
dc.type | Master's paper | en_US |
ndsu.advisor | Ludwig, Simone | |
ndsu.college | Engineering | en_US |
ndsu.degree | Master of Science (MS) | en_US |
ndsu.department | Computer Science | en_US |
ndsu.program | Computer Science | en_US |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- Performance Comparison of Apache Spark MLlib.pdf
- Size:
- 4.46 MB
- Format:
- Adobe Portable Document Format
- Description:
- Performance Comparison of Apache Spark MLlib
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.63 KB
- Format:
- Item-specific license agreed to upon submission
- Description: