Exploring the GDB-13 chemical space using deep generative models

Arús-Pous, Josep; Blaschke, Thomas; Ulander, Silas; Reymond, Jean-Louis; Chen, Hongming; Engkvist, Ola (2019). Exploring the GDB-13 chemical space using deep generative models. Journal of cheminformatics, 11(1), p. 20. Springer 10.1186/s13321-019-0341-z

[img]
Preview
Text
document.pdf - Published Version
Available under License Creative Commons: Attribution (CC-BY).

Download (3MB) | Preview

Recent applications of recurrent neural networks (RNN) enable training models that sample the chemical space. In this study we train RNN with molecular string representations (SMILES) with a subset of the enumerated database GDB-13 (975 million molecules). We show that a model trained with 1 million structures (0.1% of the database) reproduces 68.9% of the entire database after training, when sampling 2 billion molecules. We also developed a method to assess the quality of the training process using negative log-likelihood plots. Furthermore, we use a mathematical model based on the “coupon collector problem” that compares the trained model to an upper bound and thus we are able to quantify how much it has learned. We also suggest that this method can be used as a tool to benchmark the learning capabilities of any molecular generative model architecture. Additionally, an analysis of the generated chemical space was performed, which shows that, mostly due to the syntax of SMILES, complex molecules with many rings and heteroatoms are more difficult to sample.

Item Type:

Journal Article (Original Article)

Division/Institute:

08 Faculty of Science > Department of Chemistry, Biochemistry and Pharmaceutical Sciences (DCBP)

UniBE Contributor:

Arus Pous, Josep, Reymond, Jean-Louis

Subjects:

500 Science > 570 Life sciences; biology
500 Science > 540 Chemistry

ISSN:

1758-2946

Publisher:

Springer

Language:

English

Submitter:

Sandra Tanja Zbinden Di Biase

Date Deposited:

21 Jan 2020 15:02

Last Modified:

02 Mar 2023 23:33

Publisher DOI:

10.1186/s13321-019-0341-z

PubMed ID:

30868314

BORIS DOI:

10.7892/boris.138445

URI:

https://boris.unibe.ch/id/eprint/138445

Actions (login required)

Edit item Edit item
Provide Feedback