Audio localization in The Automatic Cameraman

2010 2010

Other formats: Order a copy

Abstract (summary)

This dissertation studies the audio localization component of a touchless interactive display located in the CSE building at UC San Diego. The display has been named The Automatic Cameraman (TAC) and consists of four large television displays, a PTZ camera, and a microphone array. In this work, we propose a simple solution to the problem of accurately pointing the PTZ camera at speaking humans who are interacting with TAC.

The focus of this dissertation will be on a novel audio localization and tracking algorithm based on what we call the coordinate-free approach. Previous approaches to localization assume a precise known geometry for the microphone array. This is expressed through a coordinate system for the room with an exact position for each microphone element. As a result, arrays are typically built so that microphone positions can be known easily e.g. as linear or planar with fixed spacing. The coordinate-free method we propose requires no such knowledge of such a coordinate system allowing for an ad-hoc placement of microphones.

Our coordinate-free localization algorithm employs a statistical approach by learning a mapping from observed time-delays between microphone pairs directly to a pan and tilt directive for the PTZ-camera. In addition, we explicitly utilize the fact that the training set of time-delay vectors lie on a low-dimensional structure, namely a three-dimensional structure governed by the sound source’s true spatial location. We explore various regressor models with special attention to those that are known to exploit this intrinsic low dimensionality.

We follow this with a study of a particle filtering based tracker of the time-delays between microphones. Our tracker employs a novel approach to the particle filtering problem based on online learning. It introduces a new, practically useful, particle resampling scheme. It is also more robust to model misspecification than traditional particle filters.

In the final part of the dissertation, we examine a MEMS digital microphone based array that we recently implemented on an FPGA. We explore how this digital array will alleviate many of the technical deficiencies of the current analog array in TAC.

Indexing (details)

Artificial intelligence;
Computer science
0463: Statistics
0800: Artificial intelligence
0984: Computer science
Identifier / keyword
Applied sciences; Pure sciences; Audio localization; Interactive displays; Particle filters
Audio localization in The Automatic Cameraman
Ettinger, Evan Ira
Number of pages
Publication year
Degree date
School code
DAI-B 71/10, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
Freund, Yoav
Committee member
Belongie, Serge; Bewley, Thomas; Dasgupta, Sanjoy; Rao, Bhaskar; Saul, Lawrence
University of California, San Diego
Computer Science and Engineering
University location
United States -- California
Source type
Dissertations & Theses
Document type
Dissertation/thesis number
ProQuest document ID
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.