Interpreting High-resolution Spectroscopy of Exoplanets using Cross-correlations and Supervised Machine Learning

Fisher, Chloe; Hoeijmakers, H. Jens; Kitzmann, Daniel; Márquez-Neila, Pablo; Grimm, Simon L.; Sznitman, Raphael; Heng, Kevin (2020). Interpreting High-resolution Spectroscopy of Exoplanets using Cross-correlations and Supervised Machine Learning. The astronomical journal, 159(5), p. 192. American Astronomical Society 10.3847/1538-3881/ab7a92

Fisher_2020_AJ_159_192.pdf - Published Version
Available under License Creative Commons: Attribution (CC-BY).

Download (4MB) | Preview

We present a new method for performing atmospheric retrieval on ground-based, high-resolution data of exoplanets. Our method combines cross-correlation functions with a random forest, a supervised machine-learning technique, to overcome challenges associated with high-resolution data. A series of cross-correlation functions are concatenated to give a "CCF-sequence" for each model atmosphere, which reduces the dimensionality by a factor of ~100. The random forest, trained on our grid of ~65,000 models, provides a likelihood-free method of retrieval. The precomputed grid spans 31 values of both temperature and metallicity, and incorporates a realistic noise model. We apply our method to HARPS-N observations of the ultra-hot Jupiter KELT-9b and obtain a metallicity consistent with solar (logM = − 0.2 ± 0.2). Our retrieved transit chord temperature ($T={6000}_{-200}^{+0}$K) is unreliable as strong ion lines lie outside of the extent of the training set, which we interpret as being indicative of missing physics in our atmospheric model. We compare our method to traditional nested sampling, as well as other machine-learning techniques, such as Bayesian neural networks. We demonstrate that the likelihood-free aspect of the random forest makes it more robust than nested sampling to different error distributions, and that the Bayesian neural network we tested is unable to reproduce complex posteriors. We also address the claim in Cobb et al. 2019 that our random forest retrieval technique can be overconfident but incorrect. We show that this is an artifact of the training set, rather than of the machine-learning method, and that the posteriors agree with those obtained using nested sampling.

Item Type:

Journal Article (Original Article)


08 Faculty of Science > Physics Institute > Space Research and Planetary Sciences
08 Faculty of Science > Physics Institute
10 Strategic Research Centers > Center for Space and Habitability (CSH)
08 Faculty of Science > Physics Institute > NCCR PlanetS

UniBE Contributor:

Fisher, Chloe Elizabeth; Hoeijmakers, Herman Jens; Kitzmann, Daniel; Grimm, Simon Lukas and Heng, Kevin


500 Science > 520 Astronomy
500 Science > 530 Physics
600 Technology > 620 Engineering




American Astronomical Society




Simon Lukas Grimm

Date Deposited:

09 Mar 2021 14:58

Last Modified:

11 Mar 2021 07:12

Publisher DOI:





Actions (login required)

Edit item Edit item
Provide Feedback