Show simple item record

dc.contributor.authorKabir, Md Faisal
dc.description.abstractIn healthcare, a large number of medical data has emerged. To effectively use these data to improve healthcare outcomes, clinicians need to identify the relevant measures and apply the correct analysis methods for the type of data at hand. In this dissertation, we present various machine learning (ML) and data mining (DM) methods that could be applied to the type of data sets that are available in the healthcare area. The first part of the dissertation investigates DM methods on healthcare or medical data to find significant information in the form of rules. Class association rule mining, a variant of association rule mining, was used to obtain the rules with some targeted items or class labels. These rules can be used to improve public awareness of different cancer symptoms and could also be useful to initiate prevention strategies. In the second part of the thesis, ML techniques have been applied in healthcare or medical data to build a predictive model. Three different classification techniques on a real-world breast cancer risk factor data set have been investigated. Due to the imbalance characteristics of the data set various resampling methods were used before applying the classifiers. It is shown that there was a significant improvement in performance when applying a resampling technique as compared to applying no resampling technique. Moreover, super learning technique that uses multiple base learners, have been investigated to boost the performance of classification models. Two different forms of super learner have been investigated - the first one uses two base learners while the second one uses three base learners. The models were then evaluated against well-known benchmark data sets related to the healthcare domain and the results showed that the SL model performs better than the individual classifier and the baseline ensemble. Finally, we assessed cancer-relevant genes of prostate cancer with the most significant correlations with the clinical outcome of the sample type and the overall survival. Rules from the RNA-sequencing of prostate cancer patients was discovered. Moreover, we built the regression model and from the model rules for predicting the survival time of patients were generated.en_US
dc.publisherNorth Dakota State Universityen_US
dc.rightsNDSU policy 190.6.2en_US
dc.titleExtracting Useful Information and Building Predictive Models from Medical and Health-Care Data Using Machine Learning Techniquesen_US
dc.typeDissertationen_US
dc.typeVideoen_US
dc.date.accessioned2021-05-25T18:18:53Z
dc.date.available2021-05-25T18:18:53Z
dc.date.issued2020
dc.identifier.urihttps://hdl.handle.net/10365/31924
dc.subjectclassificationen_US
dc.subjectdata miningen_US
dc.subjectgene expressionen_US
dc.subjecthealthcareen_US
dc.subjectmachine learningen_US
dc.subjectsuper learneren_US
dc.identifier.orcid0000-0001-6088-9487
dc.rights.urihttps://www.ndsu.edu/fileadmin/policy/190.pdfen_US
ndsu.degreeDoctor of Philosophy (PhD)en_US
ndsu.collegeEngineeringen_US
ndsu.departmentComputer Scienceen_US
ndsu.programComputer Scienceen_US
ndsu.advisorLudwig, Simone


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record