Extraction of organic chemistry grammar from unsupervised learning of chemical reactions

Schwaller, Philippe; Hoover, Benjamin; Reymond, Jean-Louis; Strobelt, Hendrik; Laino, Teodoro (2021). Extraction of organic chemistry grammar from unsupervised learning of chemical reactions. Science Advances, 7(15) American Association for the Advancement of Science 10.1126/sciadv.abe4166

[img]
Preview
Text
sciadv.abe4166.pdf - Published Version
Available under License Creative Commons: Attribution-Noncommercial (CC-BY-NC).

Download (1MB) | Preview

Humans use different domain languages to represent, explore, and communicate scientific concepts. During the last few hundred years, chemists compiled the language of chemical synthesis inferring a series of “reaction rules” from knowing how atoms rearrange during a chemical transformation, a process called atom-mapping. Atom-mapping is a laborious experimental task and, when tackled with computational methods, requires continuous annotation of chemical reactions and the extension of logically consistent directives. Here, we demonstrate that Transformer Neural Networks learn atom-mapping information between products and reactants without supervision or human labeling. Using the Transformer attention weights, we build a chemically agnostic, attention-guided reaction mapper and extract coherent chemical grammar from unannotated sets of reactions. Our method shows remarkable performance in terms of accuracy and speed, even for strongly imbalanced and chemically complex reactions with nontrivial atom-mapping. It provides the missing link between data-driven and rule-based approaches for numerous chemical reaction tasks.

Item Type:

Journal Article (Original Article)

Division/Institute:

08 Faculty of Science > Department of Chemistry, Biochemistry and Pharmaceutical Sciences (DCBP)

UniBE Contributor:

Reymond, Jean-Louis

Subjects:

500 Science > 570 Life sciences; biology
500 Science > 540 Chemistry

ISSN:

2375-2548

Publisher:

American Association for the Advancement of Science

Language:

English

Submitter:

Sandra Tanja Zbinden Di Biase

Date Deposited:

19 Jan 2022 15:01

Last Modified:

05 Dec 2022 15:59

Publisher DOI:

10.1126/sciadv.abe4166

PubMed ID:

33827815

BORIS DOI:

10.48350/162987

URI:

https://boris.unibe.ch/id/eprint/162987

Actions (login required)

Edit item Edit item
Provide Feedback