Interpreting High-Resolution Spectroscopy of Exoplanets Using Cross-Correlations and Supervised Machine Learning

Fisher, Chloe; Hoeijmakers, H. Jens; Kitzmann, Daniel; Márquez-Neila, Pablo; Grimm, Simon L.; Sznitman, Raphael; Heng, Kevin (3 March 2020). Interpreting High-Resolution Spectroscopy of Exoplanets Using Cross-Correlations and Supervised Machine Learning (arXiv). Cornell University

[img] Text
1910.11627.pdf - Draft Version
Restricted to registered users only
Available under License Publisher holds Copyright.

Download (4MB) | Request a copy

We present a new method for performing atmospheric retrieval on ground-based, high-resolution data of exoplanets. Our method combines cross-correlation functions with a random forest, a supervised machine learning technique, to overcome challenges associated with high-resolution data. A series of cross-correlation functions are concatenated to give a "CCF-sequence" for each model atmosphere, which reduces the dimensionality by a factor of ~100. The random forest, trained on our grid of ~65,000 models, provides a likelihood-free method of retrieval. The pre-computed grid spans 31 values of both temperature and metallicity, and incorporates a realistic noise model. We apply our method to HARPS-N observations of the ultra-hot Jupiter KELT-9b, and obtain a metallicity consistent with solar (logM = −0.2±0.2). Our retrieved transit chord temperature (T = 6000+0−200K) is unreliable as the ion cross-correlations lie outside of the training set, which we interpret as being indicative of missing physics in our atmospheric model. We compare our method to traditional nested-sampling, as well as other machine learning techniques, such as Bayesian neural networks. We demonstrate that the likelihood-free aspect of the random forest makes it more robust than nested-sampling to different error distributions, and that the Bayesian neural network we tested is unable to reproduce complex posteriors. We also address the claim in Cobb et al. (2019) that our random forest retrieval technique can be over-confident but incorrect. We show that this is an artefact of the training set, rather than the machine learning method, and that the posteriors agree with those obtained using nested-sampling.

Item Type:

Working Paper


08 Faculty of Science > Physics Institute > Space Research and Planetary Sciences
10 Strategic Research Centers > ARTORG Center for Biomedical Engineering Research
08 Faculty of Science > Physics Institute
10 Strategic Research Centers > Center for Space and Habitability (CSH)
08 Faculty of Science > Physics Institute > NCCR PlanetS

UniBE Contributor:

Fisher, Chloe Elizabeth, Hoeijmakers, Herman Jens, Kitzmann, Daniel, Márquez Neila, Pablo, Grimm, Simon Lukas, Sznitman, Raphael, Heng, Kevin


500 Science > 570 Life sciences; biology
600 Technology > 610 Medicine & health
500 Science > 520 Astronomy
500 Science
500 Science > 530 Physics




Cornell University




Danielle Zemp

Date Deposited:

13 May 2020 11:33

Last Modified:

02 Mar 2023 23:33




Actions (login required)

Edit item Edit item
Provide Feedback