Lightweight Graphical Models for Selectivity Estimation Without Independence Assumptions

TitleLightweight Graphical Models for Selectivity Estimation Without Independence Assumptions
Publication TypeJournal Articles
Year of Publication2011
AuthorsTzoumas K, Deshpande A, Jensen CS
JournalProceedings of the VLDB Endowment
Volume4
Issue7
Date Published2011///
Abstract

As a result of decades of research and industrial development, mod-ern query optimizers are complex software artifacts. However, the
quality of the query plan chosen by an optimizer is largely deter-
mined by the quality of the underlying statistical summaries. Small
selectivity estimation errors, propagated exponentially, can lead to
severely sub-optimal plans. Modern optimizers typically maintain
one-dimensional statistical summaries and make the attribute value
independence and join uniformity assumptions for efficiently esti-
mating selectivities. Therefore, selectivity estimation errors in to-
day’s optimizers are frequently caused by missed correlations be-
tween attributes. We present a selectivity estimation approach that
does not make the independence assumptions. By carefully using
concepts from the field of graphical models, we are able to fac-
tor the joint probability distribution of all the attributes in the data-
base into small, usually two-dimensional distributions. We describe
several optimizations that can make selectivity estimation highly
efficient, and we present a complete implementation inside Post-
greSQL’s query optimizer. Experimental results indicate an order
of magnitude better selectivity estimates, while keeping optimiza-
tion time in the range of tens of milliseconds.