Show simple item record

dc.contributor.authorWoznica, Szymon
dc.description.abstractEfficient querying and discovery of meaningful patterns in data becomes more and more important with accelerating growth of data published every day on the Internet. Tree pruning-based algorithms used in most popular search programs have troubles when dealing with infrequent query strings, limiting the number of returned results that might be of interest to the user. Furthermore, the existing tools are not capable of finding data patterns that could inform the user about the frequency of occurrence and location of a specific set of words in large, user-defined sets of textual data, in an efficient manner. In this paper, we present a new search tool, which is based on n-grams and modern software technologies. Our tool can efficiently index word n-grams existing in large sets of user-defined, textual data and subsequently assist users in querying the text corpus, helping them to find hidden patterns and their locations in the input data, effectively. We describe an algorithm for extracting word n-grams with a parameter "n" equal to two, three and four, and demonstrate how it can be leveraged by the end-user of the search tool to mine data in a new way. The presented tool offers a unique feature that allows the user to search a set of n-grams, extracted from abstracts of biomedical publications obtained from the U.S. National Library of Medicine (NLM), filtering the search result by words existing in the English language. The data tier of the search tool is based on the Microsoft SQL Server 2008 supported by a set of Common Language Runtime (CLR) functions and Transact Structured Query Language (T-SQL) based stored procedures, whereas the business logic and the user interface utilizes C# .NET 3.5 libraries to support regular expression patterns, database connection (LINQ to SQL) and multithreaded system operations.en_US
dc.publisherNorth Dakota State Universityen_US
dc.rightsNDSU policy 190.6.2en_US
dc.titleN-gram-based Search Procedureen_US
dc.typeMaster's Paperen_US
dc.date.accessioned2024-05-07T21:38:09Z
dc.date.available2024-05-07T21:38:09Z
dc.date.issued2009
dc.identifier.urihttps://hdl.handle.net/10365/33813
dc.subject.lcshInformation retrieval.en_US
dc.subject.lcshDatabase searching.en_US
dc.subject.lcshInternet searching.en_US
dc.rights.urihttps://www.ndsu.edu/fileadmin/policy/190.pdfen_US
ndsu.degreeMaster of Science (MS)en_US
ndsu.collegeScience and Mathematicsen_US
ndsu.departmentComputer Scienceen_US
ndsu.programComputer Scienceen_US
ndsu.advisorDenton, Anne


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record