Learning to Discover and Detect Objects
Vladimir Fomenko 3* Ismail Elezi 1 Deva Ramanan 2 Laura Leal-Taixé 1 Aljoša Ošep 1, 2
Technical University of Munich 1
Carnegie Mellon University 2
Microsoft Azure AI 3
* work done while at Technical University of Munich
NeurIPS 2022 TUM CMU

Abstract
We tackle the problem of
novel class discovery, detection, and localization (NCDL)
. In this setting, we assume a source dataset with labels for objects of commonly observed classes. Instances of other classes need to be discovered, classified, and localized automatically based on visual similarity, without human supervision. To this end, we propose a
two-stage object detection network Region-based NCDL (RNCDL)
, that uses a region proposal network to localize object candidates and is trained to classify each candidate, either as one of the known classes, seen in the source dataset, or one of the extended set of novel classes, with a long-tail distribution constraint on the class assignments, reflecting the natural frequency of classes in the real world. By training our detection network with this objective in an end-to-end manner, it learns to classify all region proposals for a large variety of classes, including those that are not part of the labeled object class vocabulary.
Our experiments conducted using
COCO
and
LVIS
datasets reveal that our method is significantly more effective compared to multi-stage pipelines that rely on traditional clustering algorithms or use pre-extracted crops. Furthermore, we demonstrate the generality of our approach by applying our method to a large-scale
Visual Genome
dataset, where our network successfully learns to detect various semantic classes without explicit supervision.

Task and framework overview
In
novel class discovery and localization
task we assume given a labeled images pool containing annotations for known, frequently observed semantic classes, and unlabeled images pool that may contain instances of novel classes. Our network learns to localize and recognize common semantic classes, as well as categorize novel classes, for which supervision was not given.

Our approach: Region-based Novel Category Discovery and Localization (RNCDL)
Top:
during supervised training phase, we train our backbone and RPN networks using labeled data, together with classification head and a class agnostic localization head. During discovery phase, we freeze all the layers of the network apart from classification head and attach and train a novel classification head using unlabeled data.
Bottom:
during inference phase, we perform a standard R-CNN pass, using classification heads of both known and novel categories to predict a class assignment for each proposal. This can be either one of K classes, that were presented a labeled samples during the model training, or any novel object class that appears in the training data.

Results
We evaluate our method on
COCO
and
LVIS
datasets, where we use random 50% of COCO dataset during the supervised phase, and the rest unlabeled images during the discovery phase. We compare our method with k-means baselines and three recent state-of-the-art methods that we adapted to our scenario. We present the results for both known (COCO) and unknown (the rest of LVIS) classes. In the manuscript, we provide additional results on the Visual Genome dataset.
Our method
significantly outperforms
baselines and SOTA
methods: the previous best NCD method UNO, by reaching 4.74 higher overall mAP, and approach by Weng et al.

Video presentation