UNSW CoFA Engineering FASS
iCinema
iCinema Projects
iCinema Infrastructure
Related Projects
Publications
Conferences & Seminars
Current Exhibitions
Papers
Postgraduate Research
Annual Reports
Vacancies
 
Postgraduate Research

Automated video, image, sound and speech analysis

Video Analysis
Video, both analogue and digital, is everywhere. On our harddisks, in archives, in librarys, on tapes, on the television and in the cinema. Unlike text, it is difficult to search and summarise video. It is also time-intensive to consume.
Techniques for semantic analysis of video (both image and sound) will be developed for use in the iCinema project T_Visionarium. Beginning with the atomic element of the 'shot', techniques for extracting low-level features such as colours, sound, form and motion will be developed. Following this, detection of higher-level features such as faces, objects, speech, music and scene, story and programming changes will be explored. The analyses will result in the augmentation of a large video library with structured meta-data suitable for real-time content-based search and retrieval.
Prerequisites: Pattern recognition, image analysis or computer vision.

Real-time visual tracking of the human body
One form of video analysis of particular interest to iCinema involves the detection and tracking of the human form and motion in video data. Whereas non real-time solutions would find ample application in the project T_Visionarium, real-time solutions would also lend themselves to vision based human-computer interfaces. Research would most likely begin with the state-of-the art single image/camera solutions and continue into multi-camera 3D analysis suitable for tracking humans in large performance spaces.
Prerequisites: Pattern recognition, image analysis or computer vision.

Speech Analysis
Much work is required in the field of speech analysis for immediate application in iCinema projects Conversations, T_Visionarium and iCLink.
Research and development is required in the following areas:
Automated detection of speech or music in an audio stream
Determination of language, accent and different speakers
Searching sound with sound: given a particular waveform, search for the most meaningful match
Speech Recognition
Prerequisites: Exposure to sound processing. Pattern recognition.

'Cold-start' speech to text problem
'Dictaphone' speech recognition systems, such as those used with a wordprocessor, require large amounts of tuning to a particular voice if a reasonable accuracy is to be attained. Unfortunately, with applications like the iCinema project Conversations, such training times are incompatible with fast-paced schedule of public installations. As such, solutions are required for attaining the highest accuracy possible with the shortest, least intrusive training sessions. This project involves research in the field of speech recognition and human computer interaction.

back to Postgraduate Research