On the computation of stochastic search variable selection in linear regression with UDFs

Mario Navas, Carlos Ordonez, Veerabhadran Baladandayuthapani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

Computing Bayesian statistics with traditional techniques is extremely slow, specially when large data has to be exported from a relational DBMS. We propose algorithms for large scale processing of stochastic search variable selection (SSVS) for linear regression that can work entirely inside a DBMS. The traditional SSVS algorithm requires multiple scans of the input data in order to compute a regression model. Due to our optimizations, SSVS can be done in either one scan over the input table for large number of records with sufficient statistics, or one scan per iteration for high-dimensional data. We consider storage layouts which efficiently exploit DBMS parallel processing of aggregate functions. Experimental results demonstrate correctness, convergence and performance of our algorithms. Finally, the algorithms show good scalability for data with a very large number of records, or a very high number of dimensions.

Original languageEnglish (US)
Title of host publicationProceedings - 10th IEEE International Conference on Data Mining, ICDM 2010
Pages941-946
Number of pages6
DOIs
StatePublished - 2010
Event10th IEEE International Conference on Data Mining, ICDM 2010 - Sydney, NSW, Australia
Duration: Dec 14 2010Dec 17 2010

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other10th IEEE International Conference on Data Mining, ICDM 2010
Country/TerritoryAustralia
CitySydney, NSW
Period12/14/1012/17/10

Keywords

  • Bayesian statistics
  • UDF
  • Variable selection

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'On the computation of stochastic search variable selection in linear regression with UDFs'. Together they form a unique fingerprint.

Cite this