Räz, Tim (2023). Methods for identifying emergent concepts in deep neural networks. Patterns, 4(6), p. 100761. Cell Press 10.1016/j.patter.2023.100761
|
Text
1-s2.0-S266638992300106X-main.pdf - Published Version Available under License Creative Commons: Attribution-Noncommercial-No Derivative Works (CC-BY-NC-ND). Download (2MB) | Preview |
The present perspective discusses methods to detect concepts in internal representations (hidden layers) of deep neural networks (DNNs), such as network dissection, feature visualization, and testing with concept activation vectors (TCAV). I argue that these methods provide evidence that DNNs are able to learn non-trivial relations between concepts. However, the methods also require users to specify or detect concepts via (sets of) instances. This underdetermines the meaning of concepts, making the methods unreliable. The problem could be overcome, to some extent, by systematically combining the methods and by using synthetic datasets. The perspective also discusses how conceptual spaces-sets of concepts in internal representations-are shaped by a trade-off between predictive accuracy and compression. I argue that conceptual spaces are useful, or even necessary, to understand how concepts are formed in DNNs but that there is a lack of method for studying conceptual spaces.
Item Type: |
Journal Article (Further Contribution) |
---|---|
Division/Institute: |
06 Faculty of Humanities > Department of Art and Cultural Studies > Institute of Philosophy |
UniBE Contributor: |
Räz, Tim |
Subjects: |
100 Philosophy |
ISSN: |
2666-3899 |
Publisher: |
Cell Press |
Language: |
English |
Submitter: |
Pubmed Import |
Date Deposited: |
06 Jul 2023 16:04 |
Last Modified: |
16 Jul 2023 02:27 |
Publisher DOI: |
10.1016/j.patter.2023.100761 |
PubMed ID: |
37409048 |
Uncontrolled Keywords: |
TCAV concepts deep neural networks feature visualization image classification internal representation interpretability network dissection |
BORIS DOI: |
10.48350/184550 |
URI: |
https://boris.unibe.ch/id/eprint/184550 |