Detecting food items with possible allergen using deep learning

Mayank Mishra
Tanmay Sarkar
Tanupriya Choudhury

Project Overview


Allergies triggered by food are prevalent amongst people. In some severe cases, allergies can trigger a life-threatening reaction. However, such an event can be avoided by being knowledgeable about possible allergens present in food items. Most common intolerances are caused by food items with compounds like lactose, histamine, caffeine, gluten, lactose, and salicylate. Our project is a step forward in building a robust object detection model to recognize food items with these compounds and to prevent a possible allergic reaction.

Image classification
Figure 1: Common allergy causing compounds

Dataset preparation


  • We have modelled our own dataset for the project. Extensive research was performed to finalize a list of food categories that are highly rich in the compounds mentioned in Figure 1 and are commonly used on a daily basis.

  • We used various search engines (Google, Bing, etc.) to crawl and look for suitable images using javascript queries for each food item from the list created. The images with incomplete RGB channels were removed, and the images collected from different search engines were compiled.

  • After merging, a number of duplicate images were encountered. We implemented image hashing to conduct the detection of such duplicate images and deleted the extra ones.

  • When downloading images from search engines, many images are irrelevant to the purpose, especially the ones with text in them. We deployed the EAST text detector (based on the paper EAST: An Efficient and Accurate Scene Text Detector) to segregate such images.

  • Finally, we conducted a comprehensive manual inspection to ensure the relevancy of images in the dataset.

Object Detection


We have utilized various object detection algorithms to recognize food items with potential allergens from natural scene images. As a baseline, we have deployed the two-stage object detection algorithms from the R-CNN model family. The models propose a set of regions of interest by select search or regional proposal network, and then classifiers predict by processing the region candidates.

Further, we have studied the performance of various one-stage object detection algorithms on our dataset. This includes YOLO, which is able to do inference super-fast, by predicting over a limited number of bounding boxes, Single Shot Detectors (SSD), which uses convolutional neural network’s pyramidal feature hierarchy for efficient detection of objects of various sizes, and RetinaNet, which is based on featurized image pyramid and the use of focal loss.


The dataset, code and paper will be released soon

Webpage last updated: June, 2021