Detecting food items with possible allergen using deep learning
Mayank Mishra
Tanmay Sarkar
Tanupriya Choudhury
Project Overview
Allergies triggered by food are prevalent amongst people. In some severe cases, allergies can trigger a life-threatening reaction. However, such an event can be avoided by being knowledgeable about possible allergens present in food items. Most common intolerances are caused by food items with compounds like lactose, histamine, caffeine, gluten, lactose, and salicylate. Our project is a step forward in building a robust object detection model to recognize food items with these compounds and to prevent a possible allergic reaction.
Figure 1: Common allergy causing compounds
Dataset preparation
We have modelled our own dataset for the project. Extensive research was performed to finalize a list of food categories that are highly rich in the compounds mentioned in Figure 1 and are commonly used on a daily basis.
We used various search engines (Google, Bing, etc.) to crawl and look for suitable images using javascript queries for each food item from the list created. The images with incomplete RGB channels were removed, and the images collected from different search engines were compiled.
After merging, a number of duplicate images were encountered. We implemented image hashing to conduct the detection of such duplicate images and deleted the extra ones.
Figure 2: Removing duplicates using hashing
When downloading images from search engines, many images are irrelevant to the purpose, especially the ones with text in them. We deployed the EAST text detector (based on the paper EAST: An Efficient and Accurate Scene Text Detector) to segregate such images.
Finally, we conducted a comprehensive manual inspection to ensure the relevancy of images in the dataset.
Object Detection
We have utilized various object detection algorithms to recognize food items with potential allergens from natural scene images. As a baseline, we have deployed the two-stage object detection algorithms from the R-CNN model family. The models propose a set of regions of interest by select search or regional proposal network, and then classifiers predict by processing the region candidates.
Further, we have studied the performance of various one-stage object detection algorithms on our dataset. This includes YOLO, which is able to do inference super-fast, by predicting over a limited number of bounding boxes, Single Shot Detectors (SSD), which uses convolutional neural network’s pyramidal feature hierarchy for efficient detection of objects of various sizes, and RetinaNet, which is based on featurized image pyramid and the use of focal loss.