TY - JOUR
T1 - Optimizing automated classification of variable stars in new synoptic surveys
AU - Long, James P.
AU - El Karoui, Noureddine
AU - Rice, John A.
AU - Richards, Joseph W.
AU - Bloom, Joshua S.
PY - 2012/3
Y1 - 2012/3
N2 - Efficient and automated classification of periodic variable stars is becoming increasingly important as the scale of astronomical surveys grows. Several recent articles have used methods from machine learning and statistics to construct classifiers on databases of labeled, multi-epoch sources with the intention of using these classifiers to automatically infer the classes of unlabeled sources from new surveys. However, the same source observed with two different synoptic surveys will generally yield different derived metrics (features) from the light curve. Since such features are used in classifiers, this survey-dependent mismatch in feature space will typically lead to degraded classifier performance. In this article we show how and why feature distributions change using OGLE and Hipparcos light curves. To overcome survey systematics, we apply a noisification method, which attempts to empirically match distributions of features between the labeled sources used to construct the classifier and the unlabeled sources we wish to classify. Results from simulated and real-world light curves show that noisification can significantly improve classifier performance. In a three-class problem using light curves from Hipparcos and OGLE, noisification reduces the classifier error rate from 27.0% to 7.0%. We recommend that noisification be used for upcoming surveys such as Gaia and LSST, and we describe some of the promises and challenges of applying noisification to these surveys.
AB - Efficient and automated classification of periodic variable stars is becoming increasingly important as the scale of astronomical surveys grows. Several recent articles have used methods from machine learning and statistics to construct classifiers on databases of labeled, multi-epoch sources with the intention of using these classifiers to automatically infer the classes of unlabeled sources from new surveys. However, the same source observed with two different synoptic surveys will generally yield different derived metrics (features) from the light curve. Since such features are used in classifiers, this survey-dependent mismatch in feature space will typically lead to degraded classifier performance. In this article we show how and why feature distributions change using OGLE and Hipparcos light curves. To overcome survey systematics, we apply a noisification method, which attempts to empirically match distributions of features between the labeled sources used to construct the classifier and the unlabeled sources we wish to classify. Results from simulated and real-world light curves show that noisification can significantly improve classifier performance. In a three-class problem using light curves from Hipparcos and OGLE, noisification reduces the classifier error rate from 27.0% to 7.0%. We recommend that noisification be used for upcoming surveys such as Gaia and LSST, and we describe some of the promises and challenges of applying noisification to these surveys.
UR - http://www.scopus.com/inward/record.url?scp=84859067767&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84859067767&partnerID=8YFLogxK
U2 - 10.1086/664960
DO - 10.1086/664960
M3 - Article
AN - SCOPUS:84859067767
SN - 0004-6280
VL - 124
SP - 280
EP - 295
JO - Publications of the Astronomical Society of the Pacific
JF - Publications of the Astronomical Society of the Pacific
IS - 913
ER -