Horvat, Christian (2022). Density estimation on low-dimensional manifolds (Unpublished). (Dissertation)
Text
horvat_christian_PhDThesis.pdf - Other Restricted to registered users only Available under License BORIS Standard License. Download (18MB) |
Machine learning models large datasets in potentially high dimensions using the mathematical rigor of probability theory. A fundamental assumption is that there is a latent variable $Z\in \mathbb{R}^{d}$, latent density $\pi(z)$, and a generator mapping $f$ such that the data are realizations of the random variable $f(Z)=X \in \mathbb{R}^{D}$ with density $p(x)$. A special case of that setting is where $f$ is an embedding, i.e. a continuously differentiable mapping with a continuously differentiable inverse. If $d<D$, this special case is often referred to as manifold hypothesis, i.e. high dimensional data populate a low dimensional manifold in the embedding space. Normalizing Flows (NFs) are bijective neural networks which can be used to learn any $p(x)$ with support diffeomorphic to $\mathbb{R}^{D}$, i.e. NFs learn $f$ exactly when $d=D$. However, when $d<D$, standard NFs fail to learn $f$ and therefore $p(x)$. In this thesis, we show how we can overcome this topological constraint of standard NF (first main result). We prove that by adding a specific noise in the manifold's normal space, we can still learn $p(x)$ exactly using a standard NF. When using standard Gaussian instead of a Gaussian in the manifold's normal space, our method can be used to approximate any density $p(x)$ supported on an unknown low-dimensional manifold. Based on this theoretical foundation, we will show that we can not only learn $f$ and $p(x)$, but also the inverse $f^{-1}$ which allows us to compress the data into low dimensions (second main result). The method, coined denoising normalizing flow (DNF), learns a denoising mapping after inflating the data with standard Gaussian noise and is trained such that the first $d$ latent variables are noise insensitive and thus encode the manifold. However, this requires knowing $d$ a priori which limits the applicability of the DNF in real-world scenarios where this number is unknown. Existing methods to estimate $d$ do not scale to large dimensions. We provide a new method able to estimate $d$ also in this high-dimensional case (third main result).
Item Type: |
Thesis (Dissertation) |
---|---|
Division/Institute: |
04 Faculty of Medicine > Pre-clinic Human Medicine > Institute of Physiology |
Graduate School: |
Graduate School for Cellular and Biomedical Sciences (GCB) |
UniBE Contributor: |
Horvat, Christian |
Subjects: |
600 Technology > 610 Medicine & health |
Funders: |
[4] Swiss National Science Foundation |
Language: |
English |
Submitter: |
Christian Horvat |
Date Deposited: |
28 Dec 2023 07:12 |
Last Modified: |
28 Dec 2023 07:12 |
BORIS DOI: |
10.48350/190721 |
URI: |
https://boris.unibe.ch/id/eprint/190721 |