Issues addressed by MultiVision

Technologies to be developed
Innovative use of existing technologies
Example retrieval application

Technologies to be developed

In this project, the following technologies will be developed

Semi-semantic Video Processing for Video Indexing

Video Indexing Scheme for Efficient Video Retrieval

Conceptual Query to Semantic Description Translation

Video Data Caching for Reliable Hard Disks Operations

Innovative use of existing technologies

This project will also use existing technologies in semi-semantic video analysis innovatively to perform semi-semantic descriptors generation in the proposed retrieval system. The research work involved in this aspect includes extensive study and trial of large combinations of different video analysis tools for achieving the best performance in real-time. An immediate innovative use of the four newly developed technologies listed in (I) together with the existing technologies is the realization of a reliable digital video surveillance system with efficient content-based video retrieval support. The example in (III) illustrates how video retrieval application can be implemented.

Example retrieval application

Figure 2 shows 4 snapshots (time order from left to right, top to bottom) from a video sequences. Suppose the semi-semantic analysis can classify objects into two classes: human and non-human and indexed them accordingly. Figure 2 shows the moving objects that would be indexed during the encoding process (moving human in white bounding boxes and moving non-human objects in black bounding boxes).

Figure 2: Moving humans (in white bounding boxes) and moving objects (in black bounding boxes) extracted from an example video sequence.

Now suppose a user want to retrieve a video segment in which a man dropped his briefcase. The conceptual query translation module will first translate it to a semi-semantic description for retrieval purpose. In this particular example, it could well be translated into the query which retrieves all video segments with moving human and non-human objects. Then among all the video segments containing the four snapshots in Figure 2, only those which contain Figure 2(b)-(d) would be retrieved. Note that in surveillance type of video, most of the videos are similar to that in Figure 2(a). That means the semi-semantic approach in retrieval is usually good enough in isolating out the related video segments.

From this example, it can also be seen that the proposed system can be easily extended to a full-scale semantic video retrieval system. If full-scale semantic tools are available, the extracted videos from the semi-semantic retrieval could be passed to the full-scale semantic analysis process for further analysis. In this particular example, the full-scale semantic analysis process could check for each detected non-human objects to see if it is a briefcase or not. As such, only the video segments which contain Figure 2(b) and 2(c) will satisfy this criterion. It may also verify that the action involved satisfy the user query (i.e. The man dropped the briefcase). Together with this constraint, only the video segment containing the snapshot Figure 2(c) would be retrieved. There is no doubt that this full-scale semantic processing involves more sophisticated image/video processing techniques and should be a more time consuming process when compared with the semi-semantic analysis. However, under our proposed architecture, the full-scale semantic processing has to be performed only on those regions of interest (e.g. the regions within the bounding boxes in Figure 2) instead of the whole video frame. Besides, since the semi-semantic retrieval process should have removed most of the irrelevant video segments, it is expected that the proposed system can greatly improve the speed and accuracy of the retrieval tasks, resulting in a robust, efficient and extensible video retrieval systems.