Background

Digital video recorders have been deployed virtually everywhere for surveillance and security purposes [1]. In particular, many countries all over the world like China, Korea, England have already installed many cameras in the streets, shopping malls, education institutes to assure public security. To ensure efficient and cost-effective surveillance, there is an increasing demand in video analytics to assist security officers in monitoring large amount of video footages coming from these cameras. Besides, when incidents occur, the corresponding video files that is related to the incident have to be retrieved accurately and efficiently, such that these video files can be investigated as early as possible. As a result, a surveillance architecture that delivers proper video analytics, while at the same time offering efficient retrieval of videos is crucial to the success of effective city-wide or country-wide surveillance. However, although there are abundantly available video analytics algorithms [2-8] that can serve the purpose, they are not properly integrated with digital recording systems. In other words, they are being treated as separate process as depicted in Figure 1. In this way, each camera needs very powerful processor to run both the encoding (compression) and analytics engines in a parallel but non-synchronized fashion, which is neither efficient nor cost effective.

On the other hand, the analytics engines available nowadays are essentially alarm-based [9-11], which is too specific and not flexible for general video retrieval purpose. For instance, the analytics engine may only generate alarm according to predefined set of rules, like the authorized entry to an area of interest. However, sometimes when incidents occur, these kinds of predefined set of rules may not necessary captured the incident and therefore there is no way for operators to retrieve precisely the relevant video footage. To alleviate this, the encoding and analytics engine must be work together in synchronized fashion, such that the analysis done in making encoding and analytics engines can be better leveraged. This can be best explained in Figure 2. Essentially, the metadata like motion information, frequency coefficients etc, obtained from the analysis done in encoding stage are being reused by analytics engine, leading to a substantial reduction in computation requirement to achieve both video encoding and analytics. Besides, the analytics engine should generate not only alarms according to predefined set of rules but also metadata that can be properly indexed into database for efficient and flexible video retrieval.

Furthermore, as there is huge amount of video and metadata in a typical city-wide or country-wide surveillance scenario, a distributed system architecture for video retrieval is thus necessary to ensure scalability. As such, we proposed in this project to develop a surveillance architecture that is scalable, distributed in nature, while offering efficient analytics, encoding and retrieval capability. The data flow of retrieval request, retrieval results and real-time analytics alarms in such a distributed architecture is illustrated in Figure 3. Essentially, each DVR in this architecture, as mentioned in the above, consists of both encoding and analytics engine. The encoded video bitstream can then be used for traditional recording purpose, while the analytics alarm can be used to alert operators, and the analytics metadata can be indexed into the local database in the DVR. In this way, the video retrieval database are fully distributed across the whole surveillance network, making it more efficient to serve video retrieval request.

[1] URL: http://www.sys-con.com/read/349064.htm.

[2] J. S. C. Yuk, K.-Y. K. Wong, R. H. Y. Chung, F. Y. L. Chin and K. P. Chow.
Real-time Multiple Head Shape Detection and Tracking System with Decentralized Trackers. In Proc. 6th IEEE International Conference on Intelligent System Design and Applications ISDA06), volume II, pages 384-389, Jinan, Shandong, China, October 2006.

[3] B. W.-S. Yiu, K.-Y. K. Wong, F. Y. L. Chin and R. H. Y. Chung.
Explicit Contour Model for Vehicle Tracking with Automatic Hypothesis Validation. In Proc. International Conference on Image Processing (ICIP05), volume II, pages 582-585, Genova, September 2005.

[4] H. Y. Chung, F. Y. L. Chin, K.-Y. K. Wong, K. P. Chow, T. Luo and H. S. K. Fung.
Efficient Block-Based Motion Segmentation Method Using Motion Vector Consistency.
In Proc. IAPR Conference on Machine Vision Applications (MVA2005), pages 550-553, Tsukuba Science City, Japan, May 2005.

[5] T. Yu and Y. Wu.
Decentralized multiple target tracking using netted collaborative autonomous trackers. In Proc. IEEE International Conference on Computer Vision and Pattern Recognition, volume I, pages 939–946, San Diego, CA, USA, June 2005.

[6] P. Viola, M. Jones, and D. Snow.
Detecting pedestrians using patterns of motion and appearance. International Journal of Computer Vision, 63(2):153–161, June 2005.

[7] T. Yu and Y. Wu.
Decentralized multiple target tracking using netted collaborative autonomous trackers. In Proc. IEEE International Conference on Computer Vision and Pattern Recognition, volume I, pages 939–946, San Diego, CA, USA, June 2005.

[8] K. Okuma, A. Taleghani, N. d. Freitas, J. Little, and D. Lowe.
A boosted particle filter: Multitarget detection and tracking. In Proc. of European Conf. on Computer Vision, volume I, pages 28–39, Prague, May 2004.

[9] G. Foresti, L. Marcenaro, and C. Regazzoni.
Automatic detection and indexing of videoevent shots for surveillance applications. IEEE Transactions on Multimedia, 4(4):459–471, December 2002.

[10] E. Stringa and C. Regazzoni.
Real-time video-shot detection for scene surveillance applications. IEEE Transactions on Image Processing, 9(1):69–79, January 2000.

[11] Silvia Ferrando, Gianluca Gera, Carlo Regazzoni.
Classification of Unattended and Stolen Objects in Video-Surveillance System. IEEE International Conference on Video and Signal Based Surveillance (AVSS'06), pages 21, November 2006.