Methods for identifying emergent concepts in deep neural networks.

Räz, Tim (2023). Methods for identifying emergent concepts in deep neural networks. Patterns, 4(6), p. 100761. Cell Press 10.1016/j.patter.2023.100761

[img]
Preview
Text
1-s2.0-S266638992300106X-main.pdf - Published Version
Available under License Creative Commons: Attribution-Noncommercial-No Derivative Works (CC-BY-NC-ND).

Download (2MB) | Preview

The present perspective discusses methods to detect concepts in internal representations (hidden layers) of deep neural networks (DNNs), such as network dissection, feature visualization, and testing with concept activation vectors (TCAV). I argue that these methods provide evidence that DNNs are able to learn non-trivial relations between concepts. However, the methods also require users to specify or detect concepts via (sets of) instances. This underdetermines the meaning of concepts, making the methods unreliable. The problem could be overcome, to some extent, by systematically combining the methods and by using synthetic datasets. The perspective also discusses how conceptual spaces-sets of concepts in internal representations-are shaped by a trade-off between predictive accuracy and compression. I argue that conceptual spaces are useful, or even necessary, to understand how concepts are formed in DNNs but that there is a lack of method for studying conceptual spaces.

Item Type:

Journal Article (Further Contribution)

Division/Institute:

06 Faculty of Humanities > Department of Art and Cultural Studies > Institute of Philosophy

UniBE Contributor:

Räz, Tim

Subjects:

100 Philosophy

ISSN:

2666-3899

Publisher:

Cell Press

Language:

English

Submitter:

Pubmed Import

Date Deposited:

06 Jul 2023 16:04

Last Modified:

16 Jul 2023 02:27

Publisher DOI:

10.1016/j.patter.2023.100761

PubMed ID:

37409048

Uncontrolled Keywords:

TCAV concepts deep neural networks feature visualization image classification internal representation interpretability network dissection

BORIS DOI:

10.48350/184550

URI:

https://boris.unibe.ch/id/eprint/184550

Actions (login required)

Edit item Edit item
Provide Feedback