NDSU North Dakota State University
Fargo, N.D.

NDSU Institutional Repository

Study of Similarity Coefficients Using MapReduce Programming Model

Show simple item record

dc.description.abstract MapReduce is a programming model for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Similarity metric is the basic measurement used by a number of data mining algorithms. It is used to measure similarity between data objects. These objects may have one or more than one attributes related to them. In this paper, for a given input data of users and their page entity pairs we calculate the similarity index between the users with respect to the page edits. We consider four different algorithms for the calculation of similarity coefficients. They are Jaccard, Cosine, Tanimoto and Dice’s coefficient. We implement these algorithms using MapReduce Programming structure, and study their behavior with respect to different input sizes and cluster sizes. en_US
dc.title Study of Similarity Coefficients Using MapReduce Programming Model en_US
dc.date.accessioned 2013-03-01T21:22:48Z
dc.date.available 2013-03-01T21:22:48Z
dc.date.issued 2013-03-01
dc.identifier.uri http://hdl.handle.net/10365/22599
dc.thesis.degree Paper (M.S.)--North Dakota State University, 2013. en_US
dc.contributor.advisor Ludwig, Simone
dc.subject.lcsh Computer algorithms. en_US
dc.subject.lcsh Data mining. en_US
dc.subject.lcsh Big data.
dc.subject.lcsh MapReduce (Computer file)
dc.creator.author Nayakam, GhanaShyam Nath
dc.degree.departmentCollege Master of Science / Computer Science, College of Science and Mathematics, 2013.
dc.date.created 2013

This item appears in the following Collection(s)

Show simple item record

Search DSpace

Advanced Search


Your Account