Data mining algorithms as a service in the cloud exploiting relational database systems

Carlos Ordonez; Javier García-García; Carlos Garcia-Alvarado; Wellington Cabrera; Veerabhadran Baladandayuthapani; Mohammed S. Quraishi

doi:10.1145/2463676.2465240

Data mining algorithms as a service in the cloud exploiting relational database systems

Carlos Ordonez, Javier García-García, Carlos Garcia-Alvarado, Wellington Cabrera, Veerabhadran Baladandayuthapani, Mohammed S. Quraishi

Biostatistics

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

3 Scopus citations

Abstract

We present a novel cloud system based on DBMS technology, where data mining algorithms are offered as a service. A local DBMS connects to the cloud and the cloud system returns computed data mining models as small relational tables that are archived and which can be easily transferred, queried and integrated with the client database. Unlike other analytic systems, our solution is not based on MapRe-duce. Our system avoids exporting large tables outside the local DBMS and thus it avoids transmitting large volumes of data to the cloud. The system offers three processing modes: local, cloud and hybrid, where a linear cost model is used to choose processing mode. In hybrid mode processing is split between the local DBMS and the cloud DBMS. Our system has a job scheduler with FIFO, SJF and RR policies to enhance response time and get partial results early. The cloud DBMS performs dynamic job scheduling, model computation and model archive management. Our system incorporates several optimizations: local data set summarization with sufficient statistics, sampling, caching matrices in RAM and selectively transmitting small matrices, back and forth. We show that in general the most efficient computing mechanism is hybrid processing: summarizing or sampling the data set in the local DBMS, transferring small matrices back and forth, leaving mathematically complex methods as a task for the cloud DBMS.

Original language	English (US)
Title of host publication	SIGMOD 2013 - International Conference on Management of Data
Pages	1001-1004
Number of pages	4
DOIs	https://doi.org/10.1145/2463676.2465240
State	Published - 2013
Event	2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013 - New York, NY, United States Duration: Jun 22 2013 → Jun 27 2013

Publication series

Name	Proceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)	0730-8078

Other

Other	2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013
Country/Territory	United States
City	New York, NY
Period	6/22/13 → 6/27/13

Keywords

Algorithms
Languages
Performance
Theory

ASJC Scopus subject areas

Software
Information Systems

Access to Document

10.1145/2463676.2465240

Cite this

Ordonez, C., García-García, J., Garcia-Alvarado, C., Cabrera, W., Baladandayuthapani, V., & Quraishi, M. S. (2013). Data mining algorithms as a service in the cloud exploiting relational database systems. In SIGMOD 2013 - International Conference on Management of Data (pp. 1001-1004). (Proceedings of the ACM SIGMOD International Conference on Management of Data). https://doi.org/10.1145/2463676.2465240

Data mining algorithms as a service in the cloud exploiting relational database systems. / Ordonez, Carlos; García-García, Javier; Garcia-Alvarado, Carlos et al.
SIGMOD 2013 - International Conference on Management of Data. 2013. p. 1001-1004 (Proceedings of the ACM SIGMOD International Conference on Management of Data).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Ordonez, C, García-García, J, Garcia-Alvarado, C, Cabrera, W, Baladandayuthapani, V & Quraishi, MS 2013, Data mining algorithms as a service in the cloud exploiting relational database systems. in SIGMOD 2013 - International Conference on Management of Data. Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1001-1004, 2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013, New York, NY, United States, 6/22/13. https://doi.org/10.1145/2463676.2465240

Ordonez C, García-García J, Garcia-Alvarado C, Cabrera W, Baladandayuthapani V, Quraishi MS. Data mining algorithms as a service in the cloud exploiting relational database systems. In SIGMOD 2013 - International Conference on Management of Data. 2013. p. 1001-1004. (Proceedings of the ACM SIGMOD International Conference on Management of Data). doi: 10.1145/2463676.2465240

@inproceedings{9d2264da136d46a6a2ff78c4b242e9bd,

title = "Data mining algorithms as a service in the cloud exploiting relational database systems",

abstract = "We present a novel cloud system based on DBMS technology, where data mining algorithms are offered as a service. A local DBMS connects to the cloud and the cloud system returns computed data mining models as small relational tables that are archived and which can be easily transferred, queried and integrated with the client database. Unlike other analytic systems, our solution is not based on MapRe-duce. Our system avoids exporting large tables outside the local DBMS and thus it avoids transmitting large volumes of data to the cloud. The system offers three processing modes: local, cloud and hybrid, where a linear cost model is used to choose processing mode. In hybrid mode processing is split between the local DBMS and the cloud DBMS. Our system has a job scheduler with FIFO, SJF and RR policies to enhance response time and get partial results early. The cloud DBMS performs dynamic job scheduling, model computation and model archive management. Our system incorporates several optimizations: local data set summarization with sufficient statistics, sampling, caching matrices in RAM and selectively transmitting small matrices, back and forth. We show that in general the most efficient computing mechanism is hybrid processing: summarizing or sampling the data set in the local DBMS, transferring small matrices back and forth, leaving mathematically complex methods as a task for the cloud DBMS.",

keywords = "Algorithms, Languages, Performance, Theory",

author = "Carlos Ordonez and Javier Garc{\'i}a-Garc{\'i}a and Carlos Garcia-Alvarado and Wellington Cabrera and Veerabhadran Baladandayuthapani and Quraishi, {Mohammed S.}",

year = "2013",

doi = "10.1145/2463676.2465240",

language = "English (US)",

isbn = "9781450320375",

series = "Proceedings of the ACM SIGMOD International Conference on Management of Data",

pages = "1001--1004",

booktitle = "SIGMOD 2013 - International Conference on Management of Data",

}

TY - GEN

T1 - Data mining algorithms as a service in the cloud exploiting relational database systems

AU - Ordonez, Carlos

AU - García-García, Javier

AU - Garcia-Alvarado, Carlos

AU - Cabrera, Wellington

AU - Baladandayuthapani, Veerabhadran

AU - Quraishi, Mohammed S.

PY - 2013

Y1 - 2013

N2 - We present a novel cloud system based on DBMS technology, where data mining algorithms are offered as a service. A local DBMS connects to the cloud and the cloud system returns computed data mining models as small relational tables that are archived and which can be easily transferred, queried and integrated with the client database. Unlike other analytic systems, our solution is not based on MapRe-duce. Our system avoids exporting large tables outside the local DBMS and thus it avoids transmitting large volumes of data to the cloud. The system offers three processing modes: local, cloud and hybrid, where a linear cost model is used to choose processing mode. In hybrid mode processing is split between the local DBMS and the cloud DBMS. Our system has a job scheduler with FIFO, SJF and RR policies to enhance response time and get partial results early. The cloud DBMS performs dynamic job scheduling, model computation and model archive management. Our system incorporates several optimizations: local data set summarization with sufficient statistics, sampling, caching matrices in RAM and selectively transmitting small matrices, back and forth. We show that in general the most efficient computing mechanism is hybrid processing: summarizing or sampling the data set in the local DBMS, transferring small matrices back and forth, leaving mathematically complex methods as a task for the cloud DBMS.

AB - We present a novel cloud system based on DBMS technology, where data mining algorithms are offered as a service. A local DBMS connects to the cloud and the cloud system returns computed data mining models as small relational tables that are archived and which can be easily transferred, queried and integrated with the client database. Unlike other analytic systems, our solution is not based on MapRe-duce. Our system avoids exporting large tables outside the local DBMS and thus it avoids transmitting large volumes of data to the cloud. The system offers three processing modes: local, cloud and hybrid, where a linear cost model is used to choose processing mode. In hybrid mode processing is split between the local DBMS and the cloud DBMS. Our system has a job scheduler with FIFO, SJF and RR policies to enhance response time and get partial results early. The cloud DBMS performs dynamic job scheduling, model computation and model archive management. Our system incorporates several optimizations: local data set summarization with sufficient statistics, sampling, caching matrices in RAM and selectively transmitting small matrices, back and forth. We show that in general the most efficient computing mechanism is hybrid processing: summarizing or sampling the data set in the local DBMS, transferring small matrices back and forth, leaving mathematically complex methods as a task for the cloud DBMS.

KW - Algorithms

KW - Languages

KW - Performance

KW - Theory

UR - http://www.scopus.com/inward/record.url?scp=84880559186&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84880559186&partnerID=8YFLogxK

U2 - 10.1145/2463676.2465240

DO - 10.1145/2463676.2465240

M3 - Conference contribution

AN - SCOPUS:84880559186

SN - 9781450320375

T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data

SP - 1001

EP - 1004

BT - SIGMOD 2013 - International Conference on Management of Data

T2 - 2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013

Y2 - 22 June 2013 through 27 June 2013

ER -

Data mining algorithms as a service in the cloud exploiting relational database systems

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this