Vector-Item Pattern Mining Algorithms and their Applications

dc.contributor.authorWu, Jianfei
dc.date.accessioned2018-08-30T17:59:58Z
dc.date.available2018-08-30T17:59:58Z
dc.date.issued2011en_US
dc.description.abstractAdvances in storage technology have long been driving the need for new data mining techniques. Not only are typical data sets becoming larger, but the diversity of available attributes is increasing in many problem domains. In biological applications for example, a single protein may have associated sequence-, text-, graph-, continuous and item data. Correspondingly, there is growing need for techniques to find patterns in such complex data. Many techniques exist for mapping specific types of data to vector space representations, such as the bag-of-words model for text [58] or embedding in vector spaces of graphs [94, 91]. However, there are few techniques that recognize the resulting vector space representations as units that may be combined and further processed. This research aims to mine important vector-item patterns hidden across multiple and diverse data sources. We consider sets of related continuous attributes as vector data and search for patterns that relate a vector attribute to one or more items. The presence of an item set defines a subset of vectors that may or may not show unexpected density fluctuations. Two types of vector-item pattern mining algorithms have been developed, namely histogram-based vector-item pattern mining algorithms and point distribution vector-item pattern mining algorithms. In histogram-based vector-item pattern mining algorithms, a vector-item pattern is significant or important if its density histogram significantly differs from what is expected for a random subset of transactions, using χ² goodness-of-fit test or effect size analysis. For point distribution vector-item pattern mining algorithms, a vector-item pattern is significant if its probability density function (PDF) has a big KullbackLeibler divergence from random subsamples. We have applied the vector-item pattern mining algorithms to several application areas, and by comparing with other state-of-art algorithms we justify the effectiveness and efficiency of the algorithms.en_US
dc.identifier.urihttps://hdl.handle.net/10365/28841
dc.publisherNorth Dakota State Universityen_US
dc.rightsNDSU Policy 190.6.2
dc.rights.urihttps://www.ndsu.edu/fileadmin/policy/190.pdf
dc.subject.lcshData mining.en_US
dc.subject.lcshPattern recognition systems.en_US
dc.subject.lcshComputer algorithms.en_US
dc.titleVector-Item Pattern Mining Algorithms and their Applicationsen_US
dc.typeDissertationen_US
ndsu.advisorDenton, Anne M.
ndsu.collegeEngineeringen_US
ndsu.degreeDoctor of Philosophy (PhD)en_US
ndsu.departmentComputer Scienceen_US
ndsu.programComputer Scienceen_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Wu_Jianfei_Computer Science PHD_2011.pdf
Size:
6.27 MB
Format:
Adobe Portable Document Format
Description:
Vector-Item Pattern Mining Algorithms and their Applications

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.63 KB
Format:
Item-specific license agreed to upon submission
Description: