Date(s) - 13/09/2016
11:00 am - 12:00 pm
As increasing volumes of RDF data are being produced and analyzed, many massively distributed architectures have been proposed for storing and querying this data. These architectures are characterized first, by their RDF partitioning and storage method, and second, by their approach for distributed query optimization, i.e. determining which operations to execute on each node in order to compute the query answers.
We have developed CliqueSquare, a novel optimization approach for evaluating conjunctive RDF queries in a massively parallel environment. We focus on reducing query response time, and thus seek to build flat plans, where the number of joins encountered on a root-to-leaf path in the plan is minimized. We present a family of optimization algorithms, relying on n-ary (star) equality joins to build flat plans, and compare their ability to find the flattest possibles. We have deployed our algorithms in a MapReduce-based RDF platform and demonstrate experimentally the interest of the flat plans built by our best algorithms.
Joint work with François Goasdoué, Zoi Kaoudi, Jorge Quiané-Ruiz and Stamatis Zampetakis
Ioana Manolescu is a senior researcher at Inria Saclay, and the lead of an INRIA team focusing on scalable database architectures for
complex large data. She is a member of the PVLDB Endowment Board of Trustees, of the ACM SIGMOD Jim Gray PhD dissertation committee, an associated editor of the ACM Transactions on the Web, and the program chair of the Scientific and Statistical Data Management Conference in 2016. She has co-authored more than 130 articles in international journals and
conferences, and contributed recently to a book on “Web Data Management” by S. Abiteboul, I. Manolescu, P. Rigaux, M.-C. Rousset and P. Senellart. She has been a post-doctoral fellow and visiting professor at Politecnico di Milano and has obtained a PhD in 2001 from Universite de Versailles Saint-Quentin and Inria Rocquencourt. Her main research interests algebraic and storage optimizations for semistructured data and in particular data models for the Semantic Web, novel data models and languages for complex data management, data models and algorithms for fact-checking, and distributed architectures for complex large data.