Using Machine Learning and Text Mining Algorithms to Facilitate Research Discovery of Plant Food Metabolomics and Its Application for Human Health Benefit Targets
Abstract
With the increase in scholarly articles published every day, the need for an automated systematic exploratory literature review tool is rising. With the advance in Text Mining and Machine Learning methods, such data exploratory tools are researched and developed in every scientific domain. This research aims at finding the best keyphrase extraction algorithm and topic modeling algorithm that is going to be the foundation and main component of a tool that will aid in Systematic Literature Review. Based on experimentation on a set of highly relevant scholarly articles published in the domain of food science, two graph-based keyphrase extraction algorithms, TopicalPageRank and PositionRank were picked as the best two algorithms among 9 keyphrase extraction algorithms for picking domain-specific keywords. Among the two topic modeling algorithms, Latent Dirichlet Assignment (LDA) and Non-zero Matrix Factorization (NMF), documents chosen in this research were best classified into suitable topics by the NMF method validated by a domain expert. This research lays the framework for a faster tool development for Systematic Literature Review.