Extracting Useful Information and Building Predictive Models from Medical and Health-Care Data Using Machine Learning Techniques

Kabir, Md Faisal

dc.contributor.author	Kabir, Md Faisal
dc.description.abstract	In healthcare, a large number of medical data has emerged. To effectively use these data to improve healthcare outcomes, clinicians need to identify the relevant measures and apply the correct analysis methods for the type of data at hand. In this dissertation, we present various machine learning (ML) and data mining (DM) methods that could be applied to the type of data sets that are available in the healthcare area. The first part of the dissertation investigates DM methods on healthcare or medical data to find significant information in the form of rules. Class association rule mining, a variant of association rule mining, was used to obtain the rules with some targeted items or class labels. These rules can be used to improve public awareness of different cancer symptoms and could also be useful to initiate prevention strategies. In the second part of the thesis, ML techniques have been applied in healthcare or medical data to build a predictive model. Three different classification techniques on a real-world breast cancer risk factor data set have been investigated. Due to the imbalance characteristics of the data set various resampling methods were used before applying the classifiers. It is shown that there was a significant improvement in performance when applying a resampling technique as compared to applying no resampling technique. Moreover, super learning technique that uses multiple base learners, have been investigated to boost the performance of classification models. Two different forms of super learner have been investigated - the first one uses two base learners while the second one uses three base learners. The models were then evaluated against well-known benchmark data sets related to the healthcare domain and the results showed that the SL model performs better than the individual classifier and the baseline ensemble. Finally, we assessed cancer-relevant genes of prostate cancer with the most significant correlations with the clinical outcome of the sample type and the overall survival. Rules from the RNA-sequencing of prostate cancer patients was discovered. Moreover, we built the regression model and from the model rules for predicting the survival time of patients were generated.	en_US
dc.publisher	North Dakota State University	en_US
dc.rights	NDSU policy 190.6.2	en_US
dc.title	Extracting Useful Information and Building Predictive Models from Medical and Health-Care Data Using Machine Learning Techniques	en_US
dc.type	Dissertation	en_US
dc.type	Video	en_US
dc.date.accessioned	2021-05-25T18:18:53Z
dc.date.available	2021-05-25T18:18:53Z
dc.date.issued	2020
dc.identifier.uri	https://hdl.handle.net/10365/31924
dc.subject	classification	en_US
dc.subject	data mining	en_US
dc.subject	gene expression	en_US
dc.subject	healthcare	en_US
dc.subject	machine learning	en_US
dc.subject	super learner	en_US
dc.identifier.orcid	0000-0001-6088-9487
dc.rights.uri	https://www.ndsu.edu/fileadmin/policy/190.pdf	en_US
ndsu.degree	Doctor of Philosophy (PhD)	en_US
ndsu.college	Engineering	en_US
ndsu.department	Computer Science	en_US
ndsu.program	Computer Science	en_US
ndsu.advisor	Ludwig, Simone

Files in this item

Name:: Extracting Useful Information ...
Size:: 1.507Mb
Format:: PDF
Description:: Extracting Useful Information ...

View/Open

Name:: Md Faisal Kabir - Dissertation ...
Size:: 73.87Mb
Format:: MPEG-4 video
Description:: Md Faisal Kabir - Dissertation ...

View/Open

This item appears in the following Collection(s)

Computer Science Doctoral Work

Show simple item record