|
Automated video, image, sound and speech analysis
Video Analysis
Video, both analogue and digital, is everywhere. On our harddisks,
in archives, in librarys, on tapes, on the television and in
the cinema. Unlike text, it is difficult to search and summarise
video. It is also time-intensive to consume.
Techniques for semantic analysis of video (both image and sound)
will be developed for use in the iCinema project T_Visionarium.
Beginning with the atomic element of the 'shot', techniques for
extracting low-level features such as colours, sound, form and motion
will be developed. Following this, detection of higher-level features
such as faces, objects, speech, music and scene, story and programming
changes will be explored. The analyses will result in the augmentation
of a large video library with structured meta-data suitable for
real-time content-based search and retrieval.
Prerequisites: Pattern recognition, image analysis or computer
vision.
Real-time visual tracking of the human body
One form of video analysis of particular interest to iCinema involves
the detection and tracking of the human form and motion in video
data. Whereas non real-time solutions would find ample application
in the project T_Visionarium, real-time solutions would
also lend themselves to vision based human-computer interfaces.
Research would most likely begin with the state-of-the art single
image/camera solutions and continue into multi-camera 3D analysis
suitable for tracking humans in large performance spaces.
Prerequisites: Pattern recognition, image analysis or computer
vision.
Speech Analysis
Much work is required in the field of speech analysis for immediate
application in iCinema projects Conversations, T_Visionarium
and iCLink.
Research and development is required in the following areas:
 |
Automated detection of speech or music in an audio stream |
 |
Determination of language, accent and different speakers |
 |
Searching sound with sound: given a particular waveform, search
for the most meaningful match |
 |
Speech Recognition |
Prerequisites: Exposure to sound processing. Pattern recognition.
'Cold-start' speech to text problem
'Dictaphone' speech recognition systems, such as those used with
a wordprocessor, require large amounts of tuning to a particular
voice if a reasonable accuracy is to be attained. Unfortunately,
with applications like the iCinema project Conversations,
such training times are incompatible with fast-paced schedule of
public installations. As such, solutions are required for attaining
the highest accuracy possible with the shortest, least intrusive
training sessions. This project involves research in the field of
speech recognition and human computer interaction.
back to Postgraduate Research
|