List of Seminars

Large graph mining: patterns, tools and case studies

Christos Faloutsos (CMU), Hanghang Tong (CMU)
Efficient Approximate Search on String Collections

Marios Hadjieleftheriou (AT & T Labs), Chen Li (UC Irvine)
Enumerating Large Query Results

Sara Cohen (Hebrew University of Jerusalem), Benny Kimelfeld (Hebrew University of Jerusalem), Yehoshua Sagiv (Hebrew University of Jerusalem)
Preference Queries from OLAP and Data Mining Perspective

Jian Pei (SFU), Yufei Tao (CUHK), Jiawei Han (UIUC)
Distributed Object Bases: An Integrated Approach

Markus Kirchberg (Institute for Infocomm Research, A*STAR, Singapore and Information Science Research Centre, Palmerston North, New Zealand), Hui Ma (Information Science Research Centre, Palmerston North, New Zealand and Victoria University of Wellington, New Zealand), Klaus-Dieter Schewe (Information Science Research Centre, Palmerston North, New Zealand)
Mashups, SaaS, and Cloud Computing: Evolutions and Revolutions in the Integration Landscape

Boualem Benatallah (University of New South Wales, Sydney, Australia), Fabio Casati (University of Trento, Italy), Florian Daniel (University of Trento, Italy), Jin Yu (University of New South Wales, Sydney, Australia)
Similarity Searching: Indexing, Nearest Neighbor Finding, Dimensionality Reduction, and Embedding Methods for Applications in Multimedia Databases

Hanan Samet (University of Maryland, College Park)

Seminar #1:
Large graph mining: patterns, tools and case studies

Christos Faloutsos (CMU),
Hanghang Tong (CMU)
Duration: 3 hrs

How do graphs look like? How do they evolve over time? How can we find patterns, anomalies and regularities in them? How to find influential nodes in the network? We will present both theoretical results and algorithms as well as case studies on several real applications. Our emphasis is on the intuition behind each method, and on guidelines for the practitioner.
The tutorial has the following parts: (a) Statistical properties and models and graph generators of static and evolving networks. (b) Tools for the analysis of static and dynamic graphs, like the Singular Value Decomposition, tensor decomposition for community detection, HITS/PageRank etc. (c) Proximity measurements on graphs, the main ideas to quantify the closeness of two nodes of the graph, fast algorithms to compute the proximity scores, applications of proximity, like CenterPiece subgraphs, pattern match, trend analysis etc. (d) Case studies of how a virus or information or influence spreads through the network, how to find influential bloggers or nodes to target for viral marketing, how to find fraudsters on eBay, how to find communities on graphs.
For more details, visit the tutorial's website

Christos Faloutsos is a Professor at Carnegie Mellon University. He has received the Presidential Young Investigator Award by the National Science Foundation (1989), the Research Contributions Award in ICDM 2006, twelve ``best paper'' awards, and several teaching awards. He has served as a member of the executive committee of SIGKDD; he has published over 160 refereed articles, 11 book chapters and one monograph. He holds five patents and he has given over 20 tutorials and 10 invited distinguished lectures. His research interests include data mining for streams and graphs, fractals, database performance, and indexing for multimedia and bio-informatics data.
(Full CV at www.cs.cmu.edu/~christos/webvitae.pdf )

Hanghang Tong is a senior Ph.D. student in the Machine Learning Department at Carnegie Mellon University. He has received best paper awards from SIAM-DM 2008 and ICDM 2006, and he has 25 refereed publications. He holds an M.S. degree and a B.S. degree from Tsinghua University, P.R. China. His research interests include data mining for multimedia and for graphs.
(Full CV at www.cs.cmu.edu/~htong/pdf/cv_Tong.pdf )

Seminar #2:
Efficient Approximate Search on String Collections

Marios Hadjieleftheriou (AT & T Labs),
Chen Li (UC Irvine)
Duration: 3hrs

This tutorial provides a comprehensive overview of recent research progress on the important problem of approximate search in collections of strings. It aims to identify existing search algorithms and selectivity-estimation techniques, as well as their merits and limitations. This problem is of great interest for a variety of applications, including data cleaning, query relaxation, and spell checking. The performance of approximate string searching algorithms
is critical in these applications in order to be able to support very large database sizes and high query throughput. In addition, accurate selectivity estimation of approximate string queries is of equal importance for query optimization purposes. We will present a succinct summary of existing work, that will portray the latent relationships between different approaches for performing approximate string searches, hence giving a deeper understanding of the state-of-the-art. We will also contact a comparative study of the merits and pitfalls associated with different algorithms and techniques, that will help identify the right tool for the right problem.

Marios Hadjieleftheriou is an Inventive Researcher at AT&T Labs – Research. He received his Ph.D. degree in Computer Science from the University of California, Riverside, and his B.S. in Computer Science from the National Technical University of Athens, Greece. He was also a Postdoctoral fellow at Boston University. His research interests include core
database management and indexing, data mining, data stream management, and data privacy and security.

Chen Li is an associate professor in the Department of Computer Science at the University of California, Irvine. He received his Ph.D. degree in Computer Science from Stanford University in 2001, and his M.S. and B.S. in Computer Science from Tsinghua University, China, in 1996 and 1994, respectively. He received a National Science Foundation CAREER Award in 2003 and a few other NSF grants. He was once a part-time Visiting Research Scientist at Google. His research interests are in the fields of data management and information search, including text search, data cleansing, data integration, and data warehousing.

Seminar #3:
Enumerating Large Query Results

Sara Cohen (Hebrew University of Jerusalem),
Benny Kimelfeld (Hebrew University of Jerusalem),
Yehoshua Sagiv (Hebrew University of Jerusalem)
Duration: 1.5 hrs

Query evaluation is usually measured in terms of data complexity and combined complexity. Unfortunately, these complexity classes do not encourage the design of efficient algorithms for many interesting query evaluation problems. The main weak spot of these measures is in their failure to take into consideration the size of the output, as well as the speed in which answers are returned to the user. When enumerating large query results in an online scenario, the latter aspect is critical, as users will typically read the answers as they are returned. In fact, the answers should actually be returned in a ranked order—an aspect that is completely ignored by the standard complexity classes.
The purpose of this tutorial is twofold. First, we will discuss complexity measures that capture desirable properties of enumeration of large query results, such as answering speed and ranking. Second, we will discuss general enumeration techniques that have proven useful in the past to derive efficient query evaluation algorithms and top-k algorithms, for a variety of problems (e.g., keyword proximity search, ranked tree patterns, full disjunctions). These techniques achieve the desired complexity goals and will be the main focus of our talk. Their importance lies in the fact that they are of a very general nature, and can be used to solve many different problems. For example, two of the techniques are of a plug-and-play variety, i.e., they provide algorithms for general problems by requiring a solution to much simpler problems. Thus, the enumeration techniques we will present are a useful and important toolbox for researchers and practitioners developing query evaluation algorithms for problems with (potentially) large results.

	Sara Cohen is an assistant professor of computer science at Hebrew University of Jerusalem. She received her Ph.D. degree with excellence from Hebrew University in 2004. Prior to joining Hebrew University, Sara was a faculty member at the Technion—Israel Institute of Technology. Her research interests include imprecise querying, query equivalence, automatic data generation and desktop querying.
	Benny Kimelfeld is a post-doctoral researcher in IBM Almaden Research Center. His research work is centered around management of probabilistic databases, flexible queries, keyword search over databases, ranked query evaluation and XML query optimization.
	Yehoshua Sagiv is a professor of computer science at Hebrew University. He received a B.Sc. from Hebrew University in 1971, an M.Sc. from Weizmann Institute in 1976, and a Ph.D. from Princeton University in 1978. For three years he was an assistant professor at the University of Illinois, Urbana-Champaign, and since the fall of 1981 he has been on the faculty of Hebrew University. He also held visiting positions at Stanford University and IBM Almaden Research Center. His research interests include databases, the world-wide web, information retrieval and logic programming.

Seminar #4:
Preference Queries from OLAP and Data Mining Perspective

Jian Pei (SFU),
Yufei Tao (CUHK),
Jiawei Han (UIUC)
Duration: 3 hrs

Preference queries, including ranking queries (aka top-k queries) and skyline queries, have been a hot topic in the database community. In the traditional query answering and spatial database aspects, much research has been dedicated to developing efficient and scalable algorithms for preference queries.
Recently, a few novel studies approach preference queries from the online analytic processing (OLAP) and data mining perspective, and lead to some very interesting results and attractive applications. for example, data cubing methods are developed so that various ranking queries and skyline queries in various subspaces can be answered efficiently. With the new cubing methods, preference queries can be used in an OLAP style and thus the applicability and user-friendliness of preference queries are dramatically extended. As another example, various reverse preference queries are employed to model preference mining problems from different angles, which find novel applications and post new challenges.
To provide a shortcut to the frontier in the novel applications of preference queries in OLAP and data mining, in this tutorial, we will present a concise survey on preference queries from the OLAP and data mining perspective. We will summarize the exciting progress in the recent years, and highlight a few intriguing problems for future studies.

Jian Pei received his Ph.D. degree in Computing Science from Simon Fraser University, Canada, in 2002, under Professor Jiawei Han's supervision. He is currently an Associate Professor of Computing Science at Simon Fraser University, Canada. His research interests can be summarized as developing effective and efficient data analysis techniques for novel data intensive applications. Currently, he is interested in advanced techniques of data mining, data warehousing, online analytical processing, database systems, and information retrieval, as well as their applications in web search, sensor networks, health-informatics, bioinformatics, and business. He has published prolifically in refereed journals, conferences, and workshops. He is an associate editor of IEEE Transactions on Knowledge and Data Engineering. He has served regularly in the organization committees and the program committees of many international conferences and workshops, and has also been a reviewer for the leading academic journals in his fields. He is a senior member of ACM and IEEE. He is the recipient of the British Columbia Innovation Council 2005 Young Innovator Award, an IBM Faculty Award (2006), and the KDD'08 Best Application Paper Award.

Yufei Tao is engaged in research of database systems. He is particularly interested in index structures and query algorithms on multidimensional data, and has published primarily on temporal databases, spatial databases, and privacy preservation. He received the Hong Kong young scientist award in 2002. He has served the program committees of most prestigious database conferences such as SIGMOD, VLDB, ICDE, and is currently an associate editor of ACM Transactions on Database Systems (TODS). He is now an assistant professor at the Chinese University of Hong Kong. In the past, he held positions at the Carnegie Mellon University and the City University of Hong Kong

Jiawei Han is a Professor in the Department of Computer Science at the University of Illinois. He has been working on research into data mining, data warehousing, stream data mining, spatiotemporal and multimedia data mining, biological data mining, social network analysis, text and Web mining, and software bug mining, with over 350 conference and journal publications. He has chaired or served in over 100 program committees of international conferences and workshops and also served or is serving on the editorial boards for Data Mining and Knowledge Discovery, IEEE Transactions on Knowledge and Data Engineering, Journal of Computer Science and Technology, and Journal of Intelligent Information Systems. He is currently the founding Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data (TKDD). Jiawei has received three IBM Faculty Awards, the Outstanding Contribution Award at the 2002 International Conference on Data Mining, ACM Service Award (1999) and ACM SIGKDD Innovation Award (2004), and IEEE Computer Society Technical Achievement Award (2005). He is an ACM and IEEE Fellow. His book "Data Mining: Concepts and Techniques" (Morgan Kaufmann) has been used popularly as a textbook.

Seminar #5:
Distributed Object Bases: An Integrated Approach

Markus Kirchberg (Institute for Infocomm Research, A*STAR, Singapore and Information Science Research Centre, Palmerston North, New Zealand),
Hui Ma (Information Science Research Centre, Palmerston North, New Zealand and Victoria University of Wellington, New Zealand),
Klaus-Dieter Schewe (Information Science Research Centre, Palmerston North, New Zealand)
Duration: 1.5 hrs

Throughout the 1980s and early 1990s, object bases have been advocated as the superior database technology that would supersede relational database tech-nologies. While various object base implementations emerged during that time, these systems did not ‘live up to their expectations’. Among others, this failure can be explained by developers grounding their implementations on ideas that originated from relational technologies. Apparently, the latter is not a suitable technology for the processing of large sets of highly structured complex objects. Since about 2004, object bases have returned into the focus of database research. Mainly driven by open source developments, object bases have established themselves as a complement to (object-)relational systems. They have found their place as embeddable persistence solutions in devices, on clients, in packaged software, in real-time control systems etc.
In this tutorial, we will present an approach that breaks with the tradi- tional school of thought in order to avoid the repetition of the mistakes made in the 1980s and 1990s. Instead, we have been looking for alternative (including previously neglected) ways to approach the integration of object-oriented programming and database technologies. This tutorial will focus on database ar- chitectures, data fragmentation, the integration of object-oriented programming, database programming and traditional database concepts, as well as current and future challenges in the field of object base research and development.

	Markus Kirchberg holds a Masters degree from Clausthal University of Technology, Germany and a PhD degree from Massey University, New Zealand. Since January 2008, he works as a Research Fellow at the Institute for Infocomm Research (I2R), A*STAR in Singapore. His major research interests include data caching, database architectures, database programming and querying languages, distributed object bases, foundations of services computing, service discovery and composition, service-oriented architectures, and transaction processing. Markus has publications in journals such as Data & Knowledge Engineering, LNCS Journal on Data Semantics and Journal on Universal Computer Sciences; and has presented his research at conferences such as ADBIS, ADC, DEXA, ER, WISE etc. Presently, he is the (technical) programme committee co-chair of APCCM 2009 and APSCC 2009, and serves as PC member for APWeb-WAIM 2009, BenchmarX 2009, CISIS 2009, DASFAA 2009, DEXA 2009, and ICSEA 2009.
	Hui Ma is Lecturer for Software Engineering at Victoria University of Wellington in New Zealand. She holds a BE from Tongji University, China, and an MSc and PhD from Massey University, New Zealand. Hui’s research interests include Web information systems, data distribution, Web services, and geographical information systems. Hui has published in journals such as Data & Knowledge Engineering, the Journal of Software and Systems, and the Journal on Data Semantics, and she served on the programme committees of international conferences and workshops such as eCoMo, WISE, Baltic DB&IS, SDKB, and FP-UML.
	Klaus-Dieter Schewe holds a Masters and a PhD degree from University of Bonn, Germany, and a DSc degree from the Brandenburgian Technical University, Germany. He is Director of the Information Science Research Centre in New Zealand. His major research interests include database theory and systems, logic in databases, formal methods and semantics, and systems development methodologies, in particular for Web information systems. He has published more than 200 refereed publications in renowned journals and international conferences, and has been programme committee chair for several international events such as FMLDO, ADC, QSIC, ER, SDKB, FoIKS, SDKB, and WISE, and PC member of more than 70 international conferences. Previously he gave tutorials at international conferences such as ER (1998, 2000, 2005), WMF (2000), ICWE (2005), ADBIS (2005), WAIM (2005), and iiWAS (2008).

Seminar #6:
Mashups, SaaS, and Cloud Computing:
Evolutions and Revolutions in the Integration Landscape

Boualem Benatallah (University of New South Wales, Sydney, Australia),
Fabio Casati (University of Trento, Italy),
Florian Daniel (University of Trento, Italy),
Jin Yu (University of New South Wales, Sydney, Australia)
Duration: 3 hrs

Integration is a key technique in software engineering, which aims to bring together disparate components and systems to form new, value-adding applications. In this context, web mashups, software/platform/infrastructure as a service, and cloud computing are novel, innovative paradigms and forms of integration that are fascinating a rapidly growing number of researchers and practitioners. Yet, the exact meaning and scope of those terms, the technological challenges underlying these paradigms, as well as the research and business opportunities they bring are still vague and sometimes hard to grasp.
This seminar aims at clarifying these paradigms, at discussing the relationships that exist among them, and at outlining the fundamental challenges and potentials they bring. The seminar starts by presenting the concepts and technologies that characterize web mashups and integration at the user interface level. Particular focus will be given to similarities and differences between this novel form of Web-based and often user-oriented form of integration with respect to traditional forms of integration, which have been around for years (e.g., data and application integration and, more recently, service composition). We introduce a set of perspectives that can be used to look at mashup techniques and mashup tools, and we give various examples, discussing which characteristics of mashups are suitable for which kinds of applications and comparing the various approaches among them and also with respect to traditional integration. We also discuss the potential shift that mashups bring in terms of “mass programming”, as opposed to programming done by a small set of skilled developers. We then discuss how the software, platform, infrastructure, and user experience provided “as a service” benefit from and affect mashup or traditional integration as well as application development techniques in general.

	Boualem Benatallah is professor in the School of Computer Science (CSE), University of New South Wales (UNSW, Sydney, Australia). His main research interests are developing fundamental concepts and techniques in Web service composition and engineering. He has published more than 130 refereed papers including 33 journal papers. Most of his papers appeared in very selective and reputable conferences and journals. He is frequently invited to give keynote talks and tutorials on service computing in international conferences. Boualem has been PC chair of three main international conferences (BPM'05, ICSOC'05, WISE'07). He is the general chair of ICSOC'08 to be held in Sydney. He has acted as a key official (tutorial chair, workshops chair, publication chair, area chair) for several international conferences. He has been guest editor of five special issues for reputable international journals including ACM TOIT. He has been a PC member of all the reputable international conferences including VLDB, ICDE, WWW, EDBT, MDM, ICSOC, ICWS and ER. He is member of the steering committee of BPM and ICSOC. He is on the editorial board of numerous international journals. He was visiting Professor at INRIA-LORIA, Claude Bernard University (France), University of Blaise Pascal (Clermont Ferrand, France), University of Trento (Italy, 2007). As chair of the CSE research committee, he was member of the team (comprising multiple university, government and industry partners) that constructed the successful bid for the new Smart Services CRC, which was awarded $30m in federal funding in 2007.
	Fabio Casati is professor of computer science at the University of Trento. He joined the University of Trento after 7 years in Hewlett-Packard USA, where he was working on research and solution development in business process intelligence and business process outsourcing. In Trento he is working on topics ranging from service execution analysis to improving the way scientists collaborate to mass programming, mass composition, and mass analysis. Fabio is member of the editorial board of ACM TWEB, member of the steering committee of the international conferences on Service-Oriented Computing and Business Process Management. He leads and participates to several EU, industry, and local projects in the areas above.
	Florian Daniel is post-doc researcher at the University of Trento, Italy. He holds a PhD in Information Technology from Politecnico di Milano. His main research interests include web applications and web mashups, adaptivity and context-awareness in web applications, and quality and privacy in business intelligence applications. Florian is organizer of the international workshops Adaptation and Evolution in Web Systems Engineering (AEWSE) and Lightweight Composition on the Web (ComposableWeb).
	Jin Yu is currently the Vice President of Engineering of Martsoft Corporation in Santa Clara, California. He is also a PhD candidate in computer science and engineering at the University of New South Wales, Australia. His research focuses on rich Internet applications, UI integration, and web mashups.Jin Yu is currently the Vice President of Engineering of Martsoft Corporation in Santa Clara, California. He is also a PhD candidate in computer science and engineering at the University of New South Wales, Australia. His research focuses on rich Internet applications, UI integration, and web mashups.

Seminar #7:
Similarity Searching: Indexing, Nearest Neighbor Finding, Dimensionality Reduction, and Embedding Methods for Applications in Multimedia Databases

Hanan Samet (Computer Science Department, Center for Automation Research Institute for Advanced Computer Studies, University of Maryland, College Park)
Duration: 3 hrs

Similarity searching is a crucial part of retrieval in multimedia databases used for applications such as pattern recognition, image databases, and content-based retrieval. It involves finding objects in a data set S that are similar to a query object q based on some distance measure d which is usually a distance metric. The search process is usually achieved by means of nearest neighbor finding.
Existing methods for handling similarity search in this setting fall into one of two classes. The first is based on mapping to a low-dimensional vector space which is then indexed using representations such as k-d trees, R-trees, quadtrees, etc. The second directly indexes the the objects based on distances using representations such as the vp-tree, M-tree, etc. Mapping from a high-dimensional space into a low-dimensional space is known as dimensionality reduction and is achieved using SVD, DFT, etc. At times, when we just have distance information, the data objects are embedded in a vector space so that the distances of the embedded objects as measured by the distance metric in the embedding space approximate the actual distance. The search in the embedding space uses conventional indexing methods which are often coupled with dimensionality reduction. Some commonly known embedding methods are multidimensional scaling, Lipschitz embeddings, and FastMap.
This seminar is organized into five parts that cover the five basic concepts outlined above: indexing low and high dimensional spaces, distance-based indexing, dimensionality reduction, embedding methods, and nearest neighbor searching.

Professor Hanan Samet (http://www.cs.umd.edu/~hjs/) is a Professor of Computer Science at the University of Maryland. He has a Ph.D from Stanford University. He is the author of the text "Foundations of Multidimensional and Metric Data Structures" published by Morgan-Kaufmann, an imprint of Elsevier, San Francisco, 2006 (hhttp://www.cs.umd.edu/~hjs/multidimensional-book-flyer.pdf), and the first two texts in the field: "Design and Analysis of Spatial Data Structures" and "Applications of Spatial Data Structures: Computer Graphics, Image Processing and GIS" published by Addison-Wesley, Reading, MA, 1990. He is the founding chair of the ACM Special Interest Group on Spatial Information (SIGSPATIAL), co-general chair of the 2008 ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2008), received best paper awards in SIGMOD 2008, SIGSPATIAL ACMGIS, and the 2007 Computers and Graphics. He is a Fellow of the ACM, IEEE, and the International Association of Pattern Recognition (IAPR).

List of Seminars

Large graph mining: patterns, tools and case studies Christos Faloutsos (CMU), Hanghang Tong (CMU)

Efficient Approximate Search on String Collections Marios Hadjieleftheriou (AT & T Labs), Chen Li (UC Irvine)

Enumerating Large Query Results Sara Cohen (Hebrew University of Jerusalem), Benny Kimelfeld (Hebrew University of Jerusalem), Yehoshua Sagiv (Hebrew University of Jerusalem)

Preference Queries from OLAP and Data Mining Perspective Jian Pei (SFU), Yufei Tao (CUHK), Jiawei Han (UIUC)

Similarity Searching: Indexing, Nearest Neighbor Finding, Dimensionality Reduction, and Embedding Methods for Applications in Multimedia Databases Hanan Samet (University of Maryland, College Park)

Benny Kimelfeld is a post-doctoral researcher in IBM Almaden Research Center. His research work is centered around management of probabilistic databases, flexible queries, keyword search over databases, ranked query evaluation and XML query optimization.

Large graph mining: patterns, tools and case studies

Christos Faloutsos (CMU), Hanghang Tong (CMU)

Efficient Approximate Search on String Collections

Marios Hadjieleftheriou (AT & T Labs), Chen Li (UC Irvine)

Enumerating Large Query Results

Sara Cohen (Hebrew University of Jerusalem), Benny Kimelfeld (Hebrew University of Jerusalem), Yehoshua Sagiv (Hebrew University of Jerusalem)

Preference Queries from OLAP and Data Mining Perspective

Jian Pei (SFU), Yufei Tao (CUHK), Jiawei Han (UIUC)

Similarity Searching: Indexing, Nearest Neighbor Finding, Dimensionality Reduction, and Embedding Methods for Applications in Multimedia Databases

Hanan Samet (University of Maryland, College Park)