CSIS7101: Advanced Database Technologies
Detailed (tentative) schedule
Articles and presentation slides are available for downloading.
Download GSview (for viewing ps) here
. Download Acrobat Reader (for viewing pdf) here
.
gz is "gzipped". Use gunzip in Unix/Linux, or WinZip in Windows to
unzip.
1 Introduction
2 Spatial data
-
Overview by the instructor [
slides in pdf ]
-
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger:
The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles,
ACM SIGMOD, 1990. [
pdf ] [
slides ]
-
Thomas Brinkhoff, Hans-Peter Kriegel, Bernhard Seeger: Efficient Processing
of Spatial Joins Using R-Trees, ACM SIGMOD, 1993. [
pdf ] [
slides ]
-
Gisli R. Hjaltason, Hanan Samet: Distance Browsing in Spatial Databases.
ACM TODS, Volume 24, Number 1, pp. 265-318, March 1999. [
pdf ] [
slides ]
Project: Implement a system that stores and indexes spatial data
using R*-trees (code for R*-tree provided). The system allows visualization
of different datasets and supports window queries, spatial joins, and k-NN-search
queries.
3 Spatiotemporal data
-
Overview by the instructor [
slides in pdf ]
-
George Kollios, Dimitrios Gunopulos and Vassilis Tsotras. On Indexing Mobile
Objects. ACM PODS, 1999. [
ps.gz ] [
slides ]
-
Simonas Saltenis, Christian S. Jensen, Scott T. Leutenegger, Mario A. Lopez:
Indexing the Positions of Continuously Moving Objects, ACM SIGMOD, 2000
[
pdf ] [
slides ]
-
Yufei Tao, Dimitris Papadias: MV3R-Tree: A Spatio-Temporal Access Method
for Timestamp and Interval Queries, VLDB, 2001. [
pdf ] [
slides ] (optional reading).
Project: Implement the TPR-tree as described in the paper
by Saltenis et al . You will be provided with the R*-tree code, based
on which you will do your implementation.
4 Multimedia and Time-series data
-
Overview by the instructor [
slides in pdf ][
slides in ppt ]**
-
Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft: When
Is ''Nearest Neighbor'' Meaningful?, ICDT, 1999. [
ps.gz ] [
slides ]
-
Christos Faloutsos, King-Ip Lin: FastMap: A Fast Algorithm for Indexing,
Data-Mining and Visualization of Traditional and Multimedia Datasets, ACM
SIGMOD, 1995. [
pdf ] [
slides ]
-
Eamonn J. Keogh: Exact Indexing of Dynamic Time Warping, VLDB 2002. [
pdf ] [
slides ][
time-series tutorial ][
DTW computation applet ] (optional reading).
Project: Implement time-series search using Keogh’s
techniques . The system should store and visualize time-series, support
similarity searches and other more complex operations (e.g., clustering).
5 Data mining I (mining association rules and sequence patterns)
-
Presentation of data mining I topics [
slides in pdf ][
slides in ppt ]*
-
References:
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association
Rules in Large Databases, VLDB 1994 [
pdf ]
Jiawei Han, Jian Pei, Yiwen Yin: Mining Frequent Patterns without Candidate
Generation, ACM SIGMOD, 2000. [
pdf ]
Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo: Discovery of
frequent episodes in event sequences. Data Mining and Knowledge Discovery
1(3): 259 - 289, November 1997. [
pdf ] (optional reading).
6 Public holiday (Chung Yeung Festival)
7 Data mining II (clustering and classification)
-
Overview by the instructor [
slides in pdf ][
slides in ppt ]*
-
Sudipto Guha, Rajeev Rastogi, Kyuseok Shim: CURE: An Efficient Clustering
Algorithm for Large Databases, ACM SIGMOD, 1998. [
pdf ] [
color pdf ] [slides
in ppt ]
-
Charu C. Aggarwal, Cecilia Magdalena Procopiuc, Joel L. Wolf, Philip S.
Yu, Jong Soo Park: Fast Algorithms for Projected Clustering. ACM SIGMOD
1999. [
ps.gz ] [
slides in ppt ]
-
John C. Shafer, Rakesh Agrawal, Manish Mehta: SPRINT: A Scalable Parallel
Classifier for Data Mining, VLDB 1996. [
pdf ] [
slides in ppt ] (optional reading)
Project: Implement the PROCLUS
subspace clustering method and test it on data from the UCI
KDD archive and the UCI
Machine Learning archive .
8 Data warehousing and OLAP
-
Overview by the instructor [
slides in pdf ][
slides in ppt ]*
-
K.A. Ross and D. Srivastava, Fast Computation of Sparse Datacubes, VLDB
1997. [
ps.gz ] [slides
in ppt ] (optional reading)
-
Venky Harinarayan, Anand Rajaraman, Jeffrey D. Ullman: Implementing Data
Cubes Efficiently, ACM SIGMOD 1996. [
pdf ] [slides in pdf] [
slides in ppt ]
-
Yannis Kotidis, Nick Roussopoulos: DynaMat: A Dynamic View Management System
for Data Warehouses. ACM SIGMOD 1999. [
ps.gz ] [slides in pdf] [
slides in ppt ]
Project: Implement DynaMat
and test it with simulated query workloads. You will use the APB
benchmark data to test your implementation.
9 Strings and biological data
Gonzalo Navarro: A guided tour to approximate string matching.
ACM Computing Surveys 33(1): 31-88, 2001. [
pdf ] (reading: Sections 1,2,3,5.1,5.2,8.1,8.3)
Ela Hunt, Malcolm P. Atkinson, Robert W. Irving: A Database Index to
Large Biological Sequences. VLDB 2001. [
pdf ] (optional reading)
Tamer Kahveci, Ambuj K. Singh: Efficient Index Structures for String
Databases, VLDB 2001. [
pdf ]
10 Semi-structured and XML data
Raghav Kaushik, Pradeep Shenoy, Philip Bohannon, Ehud Gudes:
Exploiting Local Similarity for Efficient Indexing of Paths in Graph Structured
Data, ICDE 2002 [
ps.gz ]
Divesh Srivastava, Shurug Al-Khalifa, H. V. Jagadish, Nick Koudas,
Jignesh M. Patel, Yuqing Wu: Structural Joins: A Primitive for Efficient
XML Query Pattern Matching. ICDE 2002. [
pdf ]
Nicolas Bruno, Divesh Srivastava, Nick Koudas: Holistic Twig Joins:
Optimal XML Pattern Matching.ACM SIGMOD 2002. [
pdf ] (optional reading)
11 Storage and query processing on modern machines
-
Overview by the instructor [
slides in pdf ][
slides in ppt ]
Reference: Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, Marios
Skounakis: Weaving Relations for Cache Performance. VLDB 2001 [
pdf ] (optional reading)
-
Ambuj Shatdal, Chander Kant, Jeffrey F. Naughton: Cache Conscious Algorithms
for Relational Query Processing. VLDB 1994 [
pdf ] [slides
in ppt]
-
Peter A. Boncz, Stefan Manegold, Martin L. Kersten: Database Architecture
Optimized for the New Bottleneck: Memory Access. VLDB 1999. [
pdf ] [slides
in ppt]
Project: Implement various database operators for optimized performance
in main-memory. Compare the performance of various versions of the algorithms.
Design main-memory processing techniques for more complex query types (e.g.,
multidimensional data analysis). Your implementation will be based on the
articles above.
12 Cache conscious indexes
-
Overview by the instructor [
slides in pdf ][
slides in ppt ]
-
Jun Rao, Kenneth A. Ross: Making B+-Trees Cache Conscious in Main Memory,
ACM SIGMOD 2000 [
ps.gz ] [slides
in ppt]
-
Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin: Fractal
Prefetching B+trees: Optimizing Both Cache and Disk Performance. ACM SIGMOD
2002. [
pdf ] [slides
in ppt] (optional reading)
-
Kihong Kim, Sang K. Cha, Keunjoo Kwon: Optimizing Multidimensional Index
Trees for Main Memory Access. ACM SIGMOD 2001 [
pdf ] [slides
in ppt]
Project: Implement cache conscious B+-tree and cache conscious R*-tree.
You will be given the (secondary memory) R*-tree code to start with your
implementation.
13 Summary
* Some slides on Data Mining topics are taken/modified from
** Some slides on Time-series similarity topics are taken/modified from