Show simple item record

dc.contributor.authorNayakam, GhanaShyam Nath
dc.description.abstractMapReduce is a programming model for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Similarity metric is the basic measurement used by a number of data mining algorithms. It is used to measure similarity between data objects. These objects may have one or more than one attributes related to them. In this paper, for a given input data of users and their page entity pairs we calculate the similarity index between the users with respect to the page edits. We consider four different algorithms for the calculation of similarity coefficients. They are Jaccard, Cosine, Tanimoto and Dice’s coefficient. We implement these algorithms using MapReduce Programming structure, and study their behavior with respect to different input sizes and cluster sizes.en_US
dc.publisherNorth Dakota State Universityen_US
dc.rightsNDSU Policy 190.6.2
dc.titleStudy of Similarity Coefficients Using MapReduce Programming Modelen_US
dc.typeMaster's paperen_US
dc.date.accessioned2013-03-01T21:22:48Z
dc.date.available2013-03-01T21:22:48Z
dc.date.issued2013
dc.identifier.urihttp://hdl.handle.net/10365/22599
dc.subject.lcshBig data.en_US
dc.subject.lcshMapReduce (Computer file)en_US
dc.subject.lcshComputer algorithms.en_US
dc.subject.lcshData mining.en_US
dc.rights.urihttps://www.ndsu.edu/fileadmin/policy/190.pdf
ndsu.degreeMaster of Science (MS)en_US
ndsu.collegeEngineeringen_US
ndsu.departmentComputer Scienceen_US
ndsu.programComputer Scienceen_US
ndsu.advisorLudwig, Simone


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record