DBLunch – Lecture by Gao Cong

Time: Friday 11th April, 13.00 pm
Place: Aalborg University, Selma Lagerløfs vej 300, room 0.2.13

Abstract:
I will give a brief introduction to some of my research in text mining, data mining, XML database and data warehouse. I will focus on research problems, but not go into the detailed algorithms. My main purpose is to get you know some of my past research topics to see if I could contribute to your research projects in some way. More
specifically, I plan to introduce the following research problems:

1. Extracting question-answer pairs from online forums (e.g. http://www.tripadvisor.com/ForumHome). The question-answer pairs extracted from the Web could be used to enrich the knowledge base of community based question-answer service, such as Yahoo! Answers (http://answers.yahoo.com/).

2. Mining Gene Expression Data. This includes two sub-problems. First, we design algorithms for mining association rules from high-dimensional data with a few samples. Second, we make use of the discovered rules to build classifiers to classify gene expression data.

3. Querying XML with update syntax. Transform query is a kind of composite query newly proposed by W3C XQuery Update Working draft (http://www.w3.org/TR/2006/WD-xqupdate-20060127/). A transform query is defined in terms of XML update syntax. When a transform query is posed on an XML tree T, it returns another XML tree that would be produced by executing its embedded update on T, without destructive impact on T. Transform queries support a variety of applications, e.g. the enforcement of XML access control, XML message transformation, and XML hypothetical queries.

4. Partial evaluation in Distributed Query Evaluation. The problem is to evaluate XML queries over a tree that is fragmented, both horizontally and vertically over a number of sites.

5. Data cleaning. The problem is to repair inconsistencies from the data warehouse. The inconsistencies here are violations of functional dependencies and conditional functional dependencies.