Image Classification with Minimal Supervision
With growing collections of images and video, it is imperative to have automated techniques for extracting information from visual data. A primary task that lies at the heart of information extraction is image classification, which refers to classifying images or parts of them as belonging to certain categories. Accurate and reliable image classification has diverse applications—web image and video search, content based image retrieval, medical image analysis, autonomous robotics, gesture-based human computer interfaces, etc.
However, considering the large image variability and typically high-dimensional representations, training predictive models requires substantial amounts of annotated data, often provided through human supervision—supplying such data is expensive and tedious. This training bottleneck is the motivation for development of robust algorithms that can build powerful predictive models with little training or supervision.
In this thesis, we propose new algorithms for learning with data, particularly focusing on active learning. Instead of passively accepting training data, the basic idea in active learning is to select the most informative data samples for the human to annotate. This can lead to extremely efficient allocation of resources, and results in predictive models that require far fewer training samples compared to the passive setting.
We first propose an active sample selection criterion for training large multi-class classifiers with hundreds of categories. The criterion is easy to compute, and extends traditional two-class active learning to the multi-class setting.
We then generalize the approach to handle only binary (yes / no) type feedback while still performing classification in the multi-class domain. The proposed modality provides substantial interactive simplicity, and makes it easy to distribute the training process across many users.
Active learning has been studied from two different perspectives: selective sampling from a pool, and query synthesis; both perspectives offer different tradeoffs. We propose a formulation that combines both approaches while leveraging their individual strengths resulting in a scalable and efficient multi-class active learning scheme. Experimental results show efficient training of classification systems with a pool of a few million images on a single computer.
Active learning is intimately related to a large body of previous work on experiment design and optimal sensing—we discuss the similarities and key differences between the two. A new greedy batch-mode sample selection algorithm is proposed that shows substantial benefits over random batch selection, when iterative querying cannot be applied.
We finally discuss two applications of active selection: i) active learning of compact hash codes for fast image search and classification, and ii) incremental learning of a classifier in a resource-constrained environment to handle changing scene conditions.
Throughout the thesis, we focus on thorough experimental validation on a variety of image datasets to analyze strengths and weaknesses of the proposed methods.