Background

Due to the technologies advancement in digital video compression and storage devices, digital videos can be found in virtually anywhere in the area of home digital entertainment, video surveillance, etc. In particular, traditional CCTV systems have started to migrate from analog recording to digital recording. Due to the decrease in storage cost, a typical digital video surveillance system can store at least 7 days of video per camera on a single hard disk, resulting in terabytes of video data. With this large amount of video data, the browsing and searching for specific video of interest is an extremely tedious task for human. When some significant incidents occur, this kind of video retrieval would become indispensable because a fast retrieval system could help to quickly identify the relevant evidence for investigation. It is therefore necessary to equip digital video surveillance systems with efficient and accurate video retrieval functions such that users can search for specific videos of interest, within reasonable time, and with minimal amount of human intervention. On the other hand, as more and more images and videos are being put onto the web, it is difficult for search engine to match a given searching criteria with the contents in each of the image or video. Manual annotation may help to resolve this but it is apparently not a manageable task for this large base of images and videos. Another alternative approach is to automate the annotation process but this is, however, an extraordinary challenging task because there is no commonly agreed method for analyzing image and video contents, let alone the matching of the annotations against a user specified retrieval query.

The current interim solution to the video retrieval problem in typical digital video surveillance systems is to perform an exhaustive search on the whole video archive. This is inherently a time consuming process because it has to decode each video frame and analyze its content to see if a particular video segment satisfies the user specified query. On the other hand, the industry does not have sufficient tools for video content analysis and thus not all user queries can be reliably translated to proper searching criteria based on the limited set of analysis tools.

The root of the retrieval problem is that currently there is no standard way to decompose a sequence of images into some semantically describable entities [10-17]. This problem is usually addressed first by segmentation [4,5,8,9,18-22], which try to decompose each video image into a set of regions, followed by pattern recognition that try to identify or recognize the group of regions extracted and classify them to different class of objects [2,3,6,7,23-38], such as humans, animals, furniture, … etc. With this, higher level of description can be constructed from a flow of images such as “A man sitting on the table”, “A woman walking down the hallway”, “A person drops an unattended baggage in the public area” [12,15] etc. Currently, there is no generally accepted methodology to extract this kind of semantic description for videos and this is the reason why the applications related video retrieval is progressing so slowly.

The Department of Computer Science at the University of Hong Kong has a team of experts in the fields of video/image processing and computer system, and should be in a good position to research into this problem and propose a good solution based on the state-of-the-art technologies from their recent research [1-9]. In particular, the research in [5] provides a fundamental framework for video/image content analysis that is conducive to content-based video retrieval application. On the other hand, the team also has experiences in system engineering. Over the years, the team members have involved in a number of industrial projects which provide practical solutions to the problems encountered in real-world applications. The team should be able to provide fundamental solutions to the basic research problems and transfer them for use in the industry.

[1] S. Zhong, F. Chin, Y.S. Cheung and D. Kwan, “Hierarchical motion estimation based on visual patterns for video coding,” in IEEE Proc. ICASSP’ 96, pp. 2325-2328 (1996).

[2] Boris Wai-Sing Yiu, Kwan-Yee Kenneth Wong, Francis Y.L. Chin and R.H.Y. Chung, "Explicit Contour Model for Vehicle Tracking with Automatic Hypothesis Verification", to appear in Proc. International Conference on Image Processing (ICIP05).

[3] Angie W.K. So, Kenneth K.Y. Wong, Ronald H.Y. Chung, and Francis Y.L. Chin, "Shadow Detection for Vehicles by Locating the Object-Shadow Boundary", to appear in Proc. IASTED Conference on Signal and Image Processing (SIP 2005).

[4] Ronald H.Y. Chung, Francis Y.L. Chin, Kwan-Yee K. Wong, K.P. Chow, T. Luo and Henry S.K. Fung, "Efficient Block-based Motion Segmentation Method using Motion Vector Consistency", in Proc. IAPR Conference on Machine Vision Applications (MVA2005), Tsukuba, Japan, pp.550-553, 2005.

[5] R.H.Y. Chung, N.H.C. Yung and P.Y.S. Cheung, "An Efficient Parameter-less Quadrilateral-Based Image Segmentation Method", to appear in IEEE Trans. PAMI.

[6] S.-F. Wong and K.-Y. K. Wong. Reliable and fast human body tracking under information deficiency. In Proc. IEEE Intelligent Automation Conference, pages 491–498, Hong Kong, China, December 2003.

[7] S.-F. Wong and K.-Y. K. Wong. Fast and reliable recognition of human motion from motion trajectories using wavelet analysis. In Proc. 1st IFIP International Conference on Artificial Intelligence Applications and Innovations, Toulouse, France, August 2004.

[8] S.-F. Wong and K.-Y. K. Wong. Fast face detection using quadtree based color analysis and support vector verification. In Proc. International Conference on Image Analysis and Recognition, pages 676–683, Porto, Portugal, September 2004.

[9] S.-F. Wong and K.-Y. K. Wong. Robust image segmentation by texture sensitive snake under low contrast environment. In Proc. International Conference on Informatics in Control, Automation and Robotics, pages 430–434, Setubal, Portugal, August 2004.

[10] F.I. Bashir, A.A. Ashfaq, and D. Schonfeld. Segmented trajectory based indexing and retrieval of video data. In International Conference on Image Processing, volume 2, pages 14–17, Barcelona, Spain, September 2003.

[11] S. Dagtas, W. Al-Khatib, A. Ghafoor, and R.L. Kashyap. Models for motion-based video indexing and retrieval. IEEE Transactions on Image Processing, 9(1):88–101, January 2000.

[12] G.L. Foresti, L. Marcenaro, and C.S. Regazzoni. Automatic detection and indexing of videoevent shots for surveillance applications. IEEE Transactions on Multimedia, 4(4):459–471, December 2002.

[13] P. Munesawang and L. Guan. Adaptive video indexing and automatic/semi-automatic relevance feedback. IEEE Transactions on Circuits and Systems for Video Technology, 15(8):1032–1046, August 2005.

[14] D.T. Nguyen and W. Gillespie. A video retrieval system dased on compressed data from mpeg files. In IEEE Region 10 Conference on Convergent Technologies For The Asia-Pacific Region TENCON, volume 2, pages 555–560, October 2003.

[15] E. Stringa and C.S. Regazzoni. Real-time video-shot detection for scene surveillance applications. IEEE Transactions on Image Processing, 9(1):69–79, January 2000.

[16] H. Yi, D. Rajan, and L.T. Chia. A motion based scene tree for browsing and retrieval of compressed videos. In 2nd ACM International Workshop on Multimedia Databases, pages 10–18, Washington DC, USA, November 2004.

[17] D.L. Zhang and J.F. Nunamaker. A natural language approach to content-based video indexing and retrievel for interactive e-learning. IEEE Transactions on Multimedia, 6(3):450–458, June 2004.

[18] S. Berretti and A. Del Bimbo. Multiresolution spatial partitioning for shape representation. In Proc. International Conference on Pattern Recognition, volume II, pages 775–778, Cambridge, UK, August 2004.

[19] J. Gao, N. Thakoor, and S. Jung. A motion field reconstruction scheme for smooth boundary video object segmentation. In Proc. International Conference on Image Processing, volume I, pages 381–384, Singapore, October 2004.

[20] T. Gevers. Robust segmentation and tracking of colored objects in video. IEEE Transactions on Circuits and Systems for Video Technology. [4] T. Gevers. Image segmentation and similarity of color-texture objects. IEEE Transactions on Multimedia, 4(4):509–516, December 2002.

[21] D. Kim, C.H. Ahn, and Y.S. Ho. Video segmentation using vector-valued diffusion and clustering. In Proc. International Conference on Image Processing, volume I, pages 989–992, Barcelona, Spain, September 2003.

[22] S.L. Phung, Sr Bouzerdoum, A., and Sr Chai, D. Skin segmentation using color pixel classification: analysis and comparison. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(1):148–154, January 2005.

[23] N. Dalai and B. Triggs. Histograms of oriented gradients for human detection. In Proc. 2005 IEEE Conference on Computer Vision and Pattern Recognition, volume I, pages 886–893, San Diego, CA, USA, June 2005.

[24] B. Leibe, E. Seemann, and B. Schiele. Pedestrian detection in crowded scenes. In Proc. 2005 IEEE Conference on Computer Vision and Pattern Recognition, volume I, pages 878–885, San Diego, CA, USA, June 2005.

[25] K. Mikolajczyk, C. Schmid, and A. Zisserman. Human detection based on a probabilistic assembly of robust part detectors. In Proc. 8th European Conference on Computer Vision, volume I, pages 69–82, Prague, Czech Republic, May 2004. Springer–Verlag.

[26] A. Mittal and N. Paragios. Motion-based background subtraction using adaptive kernel density estimation. In Proc. 2004 IEEE Conference on Computer Vision and Pattern Recognition, volume II, pages 302–309, Washington, D.C., USA, June 2004.

[27] V. Nair and J.J. Clark. An unsupervised, online learning framework for moving object detection. In Proc. 2004 IEEE Conference on Computer Vision and Pattern Recognition, volume II, pages 317–324, Washington, D.C., USA, June 2004.

[28] K. Okuma, A. Taleghani, N. de Freitas, J. Little, and D. Lowe. A boosted particle filter: Multitarget detection and tracking. In Proc. of European Conf. on Computer Vision, volume I, pages 28–39, Prague, May 2004.

[29] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In Proc. 2001 IEEE Conference on Computer Vision and Pattern Recognition, volume I, pages 511–518, December 2001.

[30] P. Viola, M. Jones, and D. Snow. Detecting pedestrians using patterns of motion and appearance. International Journal of Computer Vision, 63(2):153–161, June 2005.

[31] T. Yu and Y.Wu. Collaborative tracking of multiple targets. In Proc. IEEE International Conference on Computer Vision and Pattern Recognition, volume I, pages 834–841, Washington, D.C, USA, June 2004.

[32] T. Yu and Y. Wu. Decentralized multiple target tracking using netted collaborative autonomous trackers. In Proc. IEEE International Conference on Computer Vision and Pattern Recognition, volume I, pages 939–946, San Diego, CA, USA, June 2005.

[33] T. Zhao and R. Nevatia. Bayesian human segmentation in crowded situations. In Proc. 2003 IEEE Conference on Computer Vision and Pattern Recognition, volume II, pages 459–466, Madison, Wisconsin, June 2003.

[34] T. Zhao and R. Nevatia. Tracking multiple human in crowed environment. In Proc. 2004 IEEE Conference on Computer Vision and Pattern Recognition, volume II, pages 406–413, Washington, D.C., USA, June 2004.

[35] B. Epshtein and S. Ullman. Identifying semantically equivalent object fragments. In Proc. 2005 IEEE Conference on Computer Vision and Pattern Recognition, volume I, pages 2–9, San Diego, CA, USA, June 2005.

[36] I. Haritaoglu, D. Harwood, and L.S. Davis. W4 : Real-time surveillance of people and their activities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):809–819, August 2000.

[37] C.F. Olson and D.P. Huttenlocher. Automatic target recognition by matching oriented edge pixels. IEEE Transactions on Image Processing, 6(1):103–113, January 1997.

[38] T.N. Tan and K.D. Baker. Efficient image gradient based vehicle localization. IEEE Transactions on Image Processing, 9(8):1343–1356, August 2000.