Comparison of Non-Learned and Learned Molecule Representations for Catalyst Discovery
View/ Open
Abstract
Catalyst discovery is one very important task in storing renewable energy to address climate change and energy scarcity globally. Catalyst candidates can be represented by molecular descriptors and also can be modeled by graphs. Properties of catalyst candidates can be calculated by Density Functional Theory (DFT). Machine learning algorithms are applied to predict properties of catalyst candidates because DFT is computationally expensive. However, machine learning algorithms cannot operate over some standard molecular formats. Therefore, to represent molecules in the format that is required by machine learning algorithms is the primary task to tackle. Thus, this paper compared non-learned and learned representation methods. Accuracy of each representation method using RMSE are provided and discussed. The results show that the learned representations perform in a more stable manner than non-learned representations regardless of the linear models used for the downstream task.