Szabo, Attila (2019). Learning Interpretable Representations of Images. (Dissertation, Universität Bern, Philosophisch-naturwissenschaftliche Fakultät)
Text
Attila_Szabo_thesis.pdf - Other Restricted to registered users only Available under License BORIS Standard License. Download (27MB) |
Computers represent images with pixels and each pixel contains three numbers for red, green and blue colour values. These numbers are meaningless for humans and they are mostly useless when used directly with classical machine learning techniques like linear classifiers. Interpretable representations are the attributes that humans understand: the colour of the hair, viewpoint of a car or the 3D shape of the object in the scene. Many computer vision tasks can be viewed as learning interpretable representations, for example a supervised classification algorithm directly learns to represent images with their class labels.
In this work we aim to learn interpretable representations (or features) indirectly with lower levels of supervision. This approach has the advantage of cost savings on dataset annotations and the flexibility of using the features for multiple follow-up tasks. We made contributions in three main areas: weakly supervised learning, unsupervised learning and 3D reconstruction.
In the weakly supervised case we use image pairs as supervision. Each pair shares a common attribute and differs in a varying attribute. We propose a training method that learns to separate the attributes into separate feature vectors. These features then are used for attribute transfer and classification. We also show theoretical results on the ambiguities of the learning task and the ways to avoid degenerate solutions.
We show a method for unsupervised representation learning, that separates semantically meaningful concepts. We explain and show ablation studies how the components of our proposed method work: a mixing autoencoder, a generative adversarial net and a classifier.
We propose a method for learning single image 3D reconstruction. It is done using only the images, no human annotation, stereo, synthetic renderings or ground truth depth map is needed. We train a generative model that learns the 3D shape distribution and an encoder to reconstruct the 3D shape. For that we exploit the notion of image realism. It means that the 3D reconstruction of the object has to look realistic when it is rendered from different random angles. We prove the efficacy of our method from first principles.
Item Type: |
Thesis (Dissertation) |
---|---|
Division/Institute: |
08 Faculty of Science > Institute of Computer Science (INF) 08 Faculty of Science > Institute of Computer Science (INF) > Computer Vision Group (CVG) |
UniBE Contributor: |
Szabo, Attila |
Subjects: |
000 Computer science, knowledge & systems 500 Science > 510 Mathematics |
Language: |
English |
Submitter: |
Llukman Cerkezi |
Date Deposited: |
31 Aug 2022 15:00 |
Last Modified: |
05 Dec 2022 16:23 |
BORIS DOI: |
10.48350/172507 |
URI: |
https://boris.unibe.ch/id/eprint/172507 |