Chemical Compound Classification Ensemble

Zhu, Ya

dc.contributor.author	Zhu, Ya
dc.description.abstract	In the research of health science, scientists often need to screen numerous chemical compounds to find drugs that can treat a disease. The process of testing the functionality of these compounds in the laboratory is very time-consuming. Computational methods have been used to accelerate this process. These computational methods are implemented based on the principle that chemical compounds with similar structure often have similar function. Thus, these methods maintain a database of chemical compounds whose function has been verified using laboratory experiments. The database contains the chemical structural formula of a compound, the 3D coordinate of every atom, and whether it has a certain function, e.g. it can kill a virus. Then, for a new compound, the programs compare its structure with those in the database and predict if it has the function based on the structure similarity. Thus, predicting the function of a compound is a two-class classification problem. In this project, we try to address this two-class classification problem using global and local similarity between compounds. The global similarity measures the overall structural resemblance between two compounds. When a group of compounds have the same function, they usually share some common sub-structures. These common sub-structures may correspond to their functional sites. Local similarity is computed based on the occurrences of common sub-structures between compounds. We built several classification models based on global and local similarity. To improve the classification result, we used an ensemble of those models to predict the function compounds in NCI cancer data sets. We predict whether a compound can inhibit cancer cell growth or not, obtaining AUC higher than 80% for five datasets. We compare our results with other state-of-the-art methods. Our classification result is the best in all five datasets. Our results show that local similarity is more useful than global similarity in predicting compound function. An ensemble method integrating global and local similarity achieves much better performance than single predicting models.	en_US
dc.publisher	North Dakota State University	en_US
dc.rights	NDSU Policy 190.6.2
dc.title	Chemical Compound Classification Ensemble	en_US
dc.type	Thesis	en_US
dc.description	Document incorrectly classified as a dissertation on title page (decision to classify as a thesis from NDSU Graduate School)	en_US
dc.date.accessioned	2017-12-15T19:12:45Z
dc.date.available	2017-12-15T19:12:45Z
dc.date.issued	2013
dc.identifier.uri	https://hdl.handle.net/10365/27059
dc.rights.uri	https://www.ndsu.edu/fileadmin/policy/190.pdf
ndsu.degree	Master of Science (MS)	en_US
ndsu.college	Engineering	en_US
ndsu.department	Computer Science	en_US
ndsu.program	Computer Science	en_US
ndsu.advisor	Yan, Changhui

Files in this item

Name:: Chemical Compound Classification ...
Size:: 961.3Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Computer Science Masters Theses

Show simple item record