CSIS7101: Advanced Database Technologies

Detailed (tentative) schedule

Articles and presentation slides are available for downloading.
Download GSview (for viewing ps) here . Download Acrobat Reader (for viewing pdf) here .
gz is "gzipped". Use gunzip in Unix/Linux, or WinZip in Windows to unzip.



1 Introduction 2 Spatial data Project: Implement a system that stores and indexes spatial data using R*-trees (code for R*-tree provided). The system allows visualization of different datasets and supports window queries, spatial joins, and k-NN-search queries.

3 Spatiotemporal data

Project: Implement the TPR-tree as described in the paper by Saltenis et al . You will be provided with the R*-tree code, based on which you will do your implementation.

4 Multimedia and Time-series data

Project: Implement time-series search using Keogh’s techniques . The system should store and visualize time-series, support similarity searches and other more complex operations (e.g., clustering).

5 Data mining I (mining association rules and sequence patterns)

6 Public holiday (Chung Yeung Festival)

7 Data mining II (clustering and classification)

Project: Implement the PROCLUS subspace clustering method and test it on data from the UCI KDD archive and the UCI Machine Learning archive .

8 Data warehousing and OLAP

Project: Implement DynaMat and test it with simulated query workloads. You will use the APB benchmark data to test your implementation.

9 Strings and biological data

Gonzalo Navarro: A guided tour to approximate string matching. ACM Computing Surveys 33(1): 31-88, 2001. [ pdf ] (reading: Sections 1,2,3,5.1,5.2,8.1,8.3)
Ela Hunt, Malcolm P. Atkinson, Robert W. Irving: A Database Index to Large Biological Sequences. VLDB 2001. [ pdf ] (optional reading)
Tamer Kahveci, Ambuj K. Singh: Efficient Index Structures for String Databases, VLDB 2001. [ pdf ]
10 Semi-structured and XML data
Raghav Kaushik, Pradeep Shenoy, Philip Bohannon, Ehud Gudes: Exploiting Local Similarity for Efficient Indexing of Paths in Graph Structured Data, ICDE 2002 [ ps.gz ]
Divesh Srivastava, Shurug Al-Khalifa, H. V. Jagadish, Nick Koudas, Jignesh M. Patel, Yuqing Wu: Structural Joins: A Primitive for Efficient XML Query Pattern Matching. ICDE 2002. [ pdf ]
Nicolas Bruno, Divesh Srivastava, Nick Koudas: Holistic Twig Joins: Optimal XML Pattern Matching.ACM SIGMOD 2002. [ pdf ] (optional reading)
11 Storage and query processing on modern machines Project: Implement various database operators for optimized performance in main-memory. Compare the performance of various versions of the algorithms. Design main-memory processing techniques for more complex query types (e.g., multidimensional data analysis). Your implementation will be based on the articles above.

12 Cache conscious indexes

Project: Implement cache conscious B+-tree and cache conscious R*-tree. You will be given the (secondary memory) R*-tree code to start with your implementation.

13 Summary




* Some slides on Data Mining topics are taken/modified from ** Some slides on Time-series similarity topics are taken/modified from