Learning to Discover and Detect Objects

Learning to Discover and Detect Objects

Vladimir Fomenko ^3*

Ismail Elezi ¹

Deva Ramanan ²

Laura Leal-Taixé ¹

Aljoša Ošep ^{1, 2}

Technical University of Munich¹

Carnegie Mellon University²

Microsoft Azure AI³

* work done while at Technical University of Munich

NeurIPS 2022

TUM

CMU

🌐 NeurIPS 2022

📄 Paper

💻 Source code

📽 Video

Abstract

We tackle the problem of

novel class discovery, detection, and localization (NCDL)

. In this setting, we assume a source dataset with labels for objects of commonly observed classes. Instances of other classes need to be discovered, classified, and localized automatically based on visual similarity, without human supervision. To this end, we propose a

two-stage object detection network Region-based NCDL (RNCDL)

, that uses a region proposal network to localize object candidates and is trained to classify each candidate, either as one of the known classes, seen in the source dataset, or one of the extended set of novel classes, with a long-tail distribution constraint on the class assignments, reflecting the natural frequency of classes in the real world. By training our detection network with this objective in an end-to-end manner, it learns to classify all region proposals for a large variety of classes, including those that are not part of the labeled object class vocabulary.

Our experiments conducted using

COCO

and

LVIS

datasets reveal that our method is significantly more effective compared to multi-stage pipelines that rely on traditional clustering algorithms or use pre-extracted crops. Furthermore, we demonstrate the generality of our approach by applying our method to a large-scale

Visual Genome

dataset, where our network successfully learns to detect various semantic classes without explicit supervision.

Task and framework overview

In

novel class discovery and localization

task we assume given a labeled images pool containing annotations for known, frequently observed semantic classes, and unlabeled images pool that may contain instances of novel classes. Our network learns to localize and recognize common semantic classes, as well as categorize novel classes, for which supervision was not given.

Our approach: Region-based Novel Category Discovery and Localization (RNCDL)

Top:

during supervised training phase, we train our backbone and RPN networks using labeled data, together with classification head and a class agnostic localization head. During discovery phase, we freeze all the layers of the network apart from classification head and attach and train a novel classification head using unlabeled data.

Bottom:

during inference phase, we perform a standard R-CNN pass, using classification heads of both known and novel categories to predict a class assignment for each proposal. This can be either one of K classes, that were presented a labeled samples during the model training, or any novel object class that appears in the training data.

Results

We evaluate our method on

COCO

and

LVIS

datasets, where we use random 50% of COCO dataset during the supervised phase, and the rest unlabeled images during the discovery phase. We compare our method with k-means baselines and three recent state-of-the-art methods that we adapted to our scenario. We present the results for both known (COCO) and unknown (the rest of LVIS) classes. In the manuscript, we provide additional results on the Visual Genome dataset.

Our method

significantly outperforms

baselines and SOTA

methods: the previous best NCD method UNO, by reaching 4.74 higher overall mAP, and approach by Weng et al.

Video presentation

Vladimir Fomenko, Ismail Elezi, Deva Ramanan, Laura Leal-Taixé, Aljoša Ošep

Learning to Discover and Detect Objects

Advances in Neural Information Processing Systems 36 (NeurIPS 2022)

📄 Paper 💻 Code 📽 Video 🖼️ Poster