Abstract/Details

Audio localization in The Automatic Cameraman


2010 2010

Other formats: Order a copy

Abstract (summary)

This dissertation studies the audio localization component of a touchless interactive display located in the CSE building at UC San Diego. The display has been named The Automatic Cameraman (TAC) and consists of four large television displays, a PTZ camera, and a microphone array. In this work, we propose a simple solution to the problem of accurately pointing the PTZ camera at speaking humans who are interacting with TAC.

The focus of this dissertation will be on a novel audio localization and tracking algorithm based on what we call the coordinate-free approach. Previous approaches to localization assume a precise known geometry for the microphone array. This is expressed through a coordinate system for the room with an exact position for each microphone element. As a result, arrays are typically built so that microphone positions can be known easily e.g. as linear or planar with fixed spacing. The coordinate-free method we propose requires no such knowledge of such a coordinate system allowing for an ad-hoc placement of microphones.

Our coordinate-free localization algorithm employs a statistical approach by learning a mapping from observed time-delays between microphone pairs directly to a pan and tilt directive for the PTZ-camera. In addition, we explicitly utilize the fact that the training set of time-delay vectors lie on a low-dimensional structure, namely a three-dimensional structure governed by the sound source’s true spatial location. We explore various regressor models with special attention to those that are known to exploit this intrinsic low dimensionality.

We follow this with a study of a particle filtering based tracker of the time-delays between microphones. Our tracker employs a novel approach to the particle filtering problem based on online learning. It introduces a new, practically useful, particle resampling scheme. It is also more robust to model misspecification than traditional particle filters.

In the final part of the dissertation, we examine a MEMS digital microphone based array that we recently implemented on an FPGA. We explore how this digital array will alleviate many of the technical deficiencies of the current analog array in TAC.

Indexing (details)


Subject
Statistics;
Artificial intelligence;
Computer science
Classification
0463: Statistics
0800: Artificial intelligence
0984: Computer science
Identifier / keyword
Applied sciences; Pure sciences; Audio localization; Interactive displays; Particle filters
Title
Audio localization in The Automatic Cameraman
Author
Ettinger, Evan Ira
Number of pages
105
Publication year
2010
Degree date
2010
School code
0033
Source
DAI-B 71/10, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
ISBN
9781124201238
Advisor
Freund, Yoav
Committee member
Belongie, Serge; Bewley, Thomas; Dasgupta, Sanjoy; Rao, Bhaskar; Saul, Lawrence
University/institution
University of California, San Diego
Department
Computer Science and Engineering
University location
United States -- California
Degree
Ph.D.
Source type
Dissertations & Theses
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
3419800
ProQuest document ID
756908077
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
http://search.proquest.com/docview/756908077
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.