Study of Similarity Coefficients Using MapReduce Programming Model

Nayakam, GhanaShyam Nath

dc.contributor.author	Nayakam, GhanaShyam Nath
dc.description.abstract	MapReduce is a programming model for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Similarity metric is the basic measurement used by a number of data mining algorithms. It is used to measure similarity between data objects. These objects may have one or more than one attributes related to them. In this paper, for a given input data of users and their page entity pairs we calculate the similarity index between the users with respect to the page edits. We consider four different algorithms for the calculation of similarity coefficients. They are Jaccard, Cosine, Tanimoto and Dice’s coefficient. We implement these algorithms using MapReduce Programming structure, and study their behavior with respect to different input sizes and cluster sizes.	en_US
dc.publisher	North Dakota State University	en_US
dc.rights	NDSU Policy 190.6.2
dc.title	Study of Similarity Coefficients Using MapReduce Programming Model	en_US
dc.type	Master's paper	en_US
dc.date.accessioned	2013-03-01T21:22:48Z
dc.date.available	2013-03-01T21:22:48Z
dc.date.issued	2013
dc.identifier.uri	http://hdl.handle.net/10365/22599
dc.subject.lcsh	Big data.	en_US
dc.subject.lcsh	MapReduce (Computer file)	en_US
dc.subject.lcsh	Computer algorithms.	en_US
dc.subject.lcsh	Data mining.	en_US
dc.rights.uri	https://www.ndsu.edu/fileadmin/policy/190.pdf
ndsu.degree	Master of Science (MS)	en_US
ndsu.college	Engineering	en_US
ndsu.department	Computer Science	en_US
ndsu.program	Computer Science	en_US
ndsu.advisor	Ludwig, Simone

Files in this item

Name:: Ghanashyam Nath Nayakam.pdf
Size:: 954.0Kb
Format:: PDF
Description:: GhanaShyam Nath Nayakam

View/Open

This item appears in the following Collection(s)

Computer Science Masters Papers

Show simple item record