Self-Attention and Ingredient-Attention Based Model for Recipe Retrieval from Image Queries

Fontanellaz, Matthias; Christodoulidis, Stergios; Mougiakakou, Stavroula (21 October 2019). Self-Attention and Ingredient-Attention Based Model for Recipe Retrieval from Image Queries. In: 5th International Workshop on Multimedia Assisted Dietary Management (MADiMa '19) (pp. 25-31). New York: ACM 10.1145/3347448.3357163

Text
p25-fontanellaz.pdf - Published Version
Restricted to registered users only
Available under License Publisher holds Copyright.
Download (7MB) | Request a copy

Direct computer vision based-nutrient content estimation is a demanding task, due to deformation and occlusions of ingredients, as well as high intra-class and low inter-class variability between meal classes. In order to tackle these issues, we propose a system for recipe retrieval from images. The recipe information can subsequently be used to estimate the nutrient content of the meal. In this study, we utilize the multi-modal Recipe1M dataset, which contains over 1 million recipes accompanied by over 13 million images. The proposed model can operate as a first step in an automatic pipeline for the estimation of nutrition content by supporting hints related to ingredient and instruction. Through self-attention, our model can directly process raw recipe text, making the upstream instruction sentence embedding process redundant and thus reducing training time, while providing desirable retrieval results. Furthermore, we propose the use of an ingredient attention mechanism, in order to gain insight into which instructions, parts of instructions or single instruction words are of importance for processing a single ingredient within a certain recipe. Attention-based recipe text encoding contributes to solving the issue of high intra-class/low inter-class variability by focusing on preparation steps specific to the meal. The experimental results demonstrate the potential of such a system for recipe retrieval from images. A comparison with respect to two baseline methods is also presented.

Item Type:	Conference or Workshop Item (Paper)
Division/Institute:	10 Strategic Research Centers > ARTORG Center for Biomedical Engineering Research
Graduate School:	Graduate School for Cellular and Biomedical Sciences (GCB)
UniBE Contributor:	Fontanellaz, Matthias Andreas, Christodoulidis, Stergios, Mougiakakou, Stavroula
Subjects:	600 Technology > 610 Medicine & health 600 Technology > 620 Engineering
ISBN:	978-1-4503-6916-9
Publisher:	ACM
Language:	English
Submitter:	Stavroula Mougiakakou
Date Deposited:	17 Dec 2019 14:37
Last Modified:	02 Mar 2023 23:32
Publisher DOI:	10.1145/3347448.3357163
Uncontrolled Keywords:	Neural Networks, Deep Learning, Cross-modal Retrieval, Natural Language Processing, Self-attention
BORIS DOI:	10.7892/boris.135255
URI:	https://boris.unibe.ch/id/eprint/135255

Actions (login required)

Edit item

Self-Attention and Ingredient-Attention Based Model for Recipe Retrieval from Image Queries

Interest & Impact

Downloads

Citations

Search

Services

Actions (login required)

Item Type:

Division/Institute:

Graduate School:

UniBE Contributor:

Subjects:

ISBN:

Publisher:

Language:

Submitter:

Date Deposited:

Last Modified:

Publisher DOI:

Uncontrolled Keywords:

BORIS DOI:

URI:

Actions (login required)