Density estimation on low-dimensional manifolds

Horvat, Christian (2022). Density estimation on low-dimensional manifolds (Unpublished). (Dissertation)

[img] Text
horvat_christian_PhDThesis.pdf - Other
Restricted to registered users only
Available under License BORIS Standard License.

Download (18MB) | Request a copy

Machine learning models large datasets in potentially high dimensions using the mathematical rigor of probability theory. A fundamental assumption is that there is a latent variable $Z\in \mathbb{R}^{d}$, latent density $\pi(z)$, and a generator mapping $f$ such that the data are realizations of the random variable $f(Z)=X \in \mathbb{R}^{D}$ with density $p(x)$. A special case of that setting is where $f$ is an embedding, i.e. a continuously differentiable mapping with a continuously differentiable inverse. If $d<D$, this special case is often referred to as manifold hypothesis, i.e. high dimensional data populate a low dimensional manifold in the embedding space. Normalizing Flows (NFs) are bijective neural networks which can be used to learn any $p(x)$ with support diffeomorphic to $\mathbb{R}^{D}$, i.e. NFs learn $f$ exactly when $d=D$. However, when $d<D$, standard NFs fail to learn $f$ and therefore $p(x)$. In this thesis, we show how we can overcome this topological constraint of standard NF (first main result). We prove that by adding a specific noise in the manifold's normal space, we can still learn $p(x)$ exactly using a standard NF. When using standard Gaussian instead of a Gaussian in the manifold's normal space, our method can be used to approximate any density $p(x)$ supported on an unknown low-dimensional manifold. Based on this theoretical foundation, we will show that we can not only learn $f$ and $p(x)$, but also the inverse $f^{-1}$ which allows us to compress the data into low dimensions (second main result). The method, coined denoising normalizing flow (DNF), learns a denoising mapping after inflating the data with standard Gaussian noise and is trained such that the first $d$ latent variables are noise insensitive and thus encode the manifold. However, this requires knowing $d$ a priori which limits the applicability of the DNF in real-world scenarios where this number is unknown. Existing methods to estimate $d$ do not scale to large dimensions. We provide a new method able to estimate $d$ also in this high-dimensional case (third main result).

Item Type:

Thesis (Dissertation)

Division/Institute:

04 Faculty of Medicine > Pre-clinic Human Medicine > Institute of Physiology

Graduate School:

Graduate School for Cellular and Biomedical Sciences (GCB)

UniBE Contributor:

Horvat, Christian

Subjects:

600 Technology > 610 Medicine & health

Funders:

[4] Swiss National Science Foundation

Language:

English

Submitter:

Christian Horvat

Date Deposited:

28 Dec 2023 07:12

Last Modified:

28 Dec 2023 07:12

BORIS DOI:

10.48350/190721

URI:

https://boris.unibe.ch/id/eprint/190721

Actions (login required)

Edit item Edit item
Provide Feedback