Mining Novel Knowledge from Biomedical Literature using Statistical Measures and Domain Knowledge
Abstract
The problem of inferring novel knowledge from implicit facts by logically connecting independent fragments of literature is known as Literature Based Discovery (LBD). In LBD, to discover hidden links, it is important to determine the relevancy between concepts using appropriate information measures. In this study, to discover interesting and inherent links latent in large corpora, nine distinct methods, comprising variants of statistical information measures and derived semantic knowledge from domain ontology, are designed and compared. A series of experiments are performed and analyzed for those proposed methods. Also, a new strategy of effective preprocessing is proposed, which is capable of removing terms that have meager chances of constituting a new discovery. Finally, an organized list of final concepts deemed worthy of scientific investigation are provided to the user. Overall, our research presents a comprehensive analysis and perspective of how different statistical information measures and semantic knowledge affect the knowledge discovery procedure.