Show simple item record

dc.contributor.authorWu, Jianfei
dc.description.abstractAdvances in storage technology have long been driving the need for new data mining techniques. Not only are typical data sets becoming larger, but the diversity of available attributes is increasing in many problem domains. In biological applications for example, a single protein may have associated sequence-, text-, graph-, continuous and item data. Correspondingly, there is growing need for techniques to find patterns in such complex data. Many techniques exist for mapping specific types of data to vector space representations, such as the bag-of-words model for text [58] or embedding in vector spaces of graphs [94, 91]. However, there are few techniques that recognize the resulting vector space representations as units that may be combined and further processed. This research aims to mine important vector-item patterns hidden across multiple and diverse data sources. We consider sets of related continuous attributes as vector data and search for patterns that relate a vector attribute to one or more items. The presence of an item set defines a subset of vectors that may or may not show unexpected density fluctuations. Two types of vector-item pattern mining algorithms have been developed, namely histogram-based vector-item pattern mining algorithms and point distribution vector-item pattern mining algorithms. In histogram-based vector-item pattern mining algorithms, a vector-item pattern is significant or important if its density histogram significantly differs from what is expected for a random subset of transactions, using χ² goodness-of-fit test or effect size analysis. For point distribution vector-item pattern mining algorithms, a vector-item pattern is significant if its probability density function (PDF) has a big KullbackLeibler divergence from random subsamples. We have applied the vector-item pattern mining algorithms to several application areas, and by comparing with other state-of-art algorithms we justify the effectiveness and efficiency of the algorithms.en_US
dc.publisherNorth Dakota State Universityen_US
dc.rightsNDSU Policy 190.6.2
dc.titleVector-Item Pattern Mining Algorithms and their Applicationsen_US
dc.typeDissertationen_US
dc.date.accessioned2018-08-30T17:59:58Z
dc.date.available2018-08-30T17:59:58Z
dc.date.issued2011en_US
dc.identifier.urihttps://hdl.handle.net/10365/28841
dc.subject.lcshData mining.en_US
dc.subject.lcshPattern recognition systems.en_US
dc.subject.lcshComputer algorithms.en_US
dc.rights.urihttps://www.ndsu.edu/fileadmin/policy/190.pdf
ndsu.degreeDoctor of Philosophy (PhD)en_US
ndsu.collegeEngineeringen_US
ndsu.departmentComputer Scienceen_US
ndsu.programComputer Scienceen_US
ndsu.advisorDenton, Anne M.


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record