Show simple item record

dc.contributor.authorZheng, Songtao
dc.description.abstractMachine learning algorithms have the advantage of making use of the powerful Hadoop distributed computing platform and the MapReduce programming model to process data in parallel. Many machine learning algorithms have been investigated to be transformed to the MapReduce paradigm in order to make use of the Hadoop Distributed File System (HDFS). Naïve Bayes classifier is one of the supervised learning classification algorithm that can be programmed in form of MapReduce. In our study, we build a Naïve Bayes MapReduce model and evaluate the classifier on five datasets based on the prediction accuracy. Also, a scalability analysis is conducted to see the speedup of the data processing time with the increasing number of nodes in the cluster. Results show that running the Naïve Bayes MapReduce model across multiple nodes can save considerate amount of time compared with running the model against a single node, without sacrificing the classification accuracy.en_US
dc.publisherNorth Dakota State Universityen_US
dc.rightsNDSU Policy 190.6.2
dc.titleNaïve Bayes Classifier: A MapReduce Approachen_US
dc.typeMaster's paperen_US
dc.date.accessioned2014-12-23T14:55:46Z
dc.date.available2014-12-23T14:55:46Z
dc.date.issued2014
dc.identifier.urihttp://hdl.handle.net/10365/24752
dc.subject.lcshBig data.en_US
dc.subject.lcshMachine learning.en_US
dc.subject.lcshApache Hadoop.en_US
dc.subject.lcshMapReduce (Computer file)en_US
dc.subject.lcshBayesian statistical decision theory.en_US
dc.rights.urihttps://www.ndsu.edu/fileadmin/policy/190.pdf
ndsu.degreeMaster of Science (MS)en_US
ndsu.collegeEngineeringen_US
ndsu.departmentComputer Scienceen_US
ndsu.programComputer Scienceen_US
ndsu.advisorLudwig, Simone


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record