Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain

Thakkar, Amol; Kogej, Thierry; Reymond, Jean-Louis; Engkvist, Ola; Bjerrum, Esben Jannik (2020). Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain. Chemical Science, 11(1), pp. 154-168. The Royal Society of Chemistry 10.1039/C9SC04944D

[img]
Preview
Text
c9sc04944d.pdf - Published Version
Available under License Creative Commons: Attribution (CC-BY).

Download (1MB) | Preview

Computer Assisted Synthesis Planning (CASP) has gained considerable interest as of late. Herein we investigate a template-based retrosynthetic planning tool, trained on a variety of datasets consisting of up to 17.5 million reactions. We demonstrate that models trained on datasets such as internal Electronic Laboratory Notebooks (ELN), and the publicly available United States Patent Office (USPTO) extracts, are sufficient for the prediction of full synthetic routes to compounds of interest in medicinal chemistry. As such we have assessed the models on 1731 compounds from 41 virtual libraries for which experimental results were known. Furthermore, we show that accuracy is a misleading metric for assessment of the policy network, and propose that the number of successfully applied templates, in conjunction with the overall ability to generate full synthetic routes be examined instead. To this end we found that the specificity of the templates comes at the cost of generalizability, and overall model performance. This is supplemented by a comparison of the underlying datasets and their corresponding models.

Item Type:

Journal Article (Original Article)

Division/Institute:

08 Faculty of Science > Department of Chemistry, Biochemistry and Pharmaceutical Sciences (DCBP)

UniBE Contributor:

Thakkar, Amol Vijay, Reymond, Jean-Louis

Subjects:

500 Science > 570 Life sciences; biology
500 Science > 540 Chemistry

ISSN:

2041-6520

Publisher:

The Royal Society of Chemistry

Language:

English

Submitter:

Sandra Tanja Zbinden Di Biase

Date Deposited:

24 Jan 2020 15:05

Last Modified:

05 Dec 2022 15:35

Publisher DOI:

10.1039/C9SC04944D

BORIS DOI:

10.7892/boris.138520

URI:

https://boris.unibe.ch/id/eprint/138520

Actions (login required)

Edit item Edit item
Provide Feedback