Optimizing automated classification of variable stars in new synoptic surveys

James P. Long; Noureddine El Karoui; John A. Rice; Joseph W. Richards; Joshua S. Bloom

doi:10.1086/664960

Optimizing automated classification of variable stars in new synoptic surveys

James P. Long, Noureddine El Karoui, John A. Rice, Joseph W. Richards, Joshua S. Bloom

Research output: Contribution to journal › Article › peer-review

19 Scopus citations

Abstract

Efficient and automated classification of periodic variable stars is becoming increasingly important as the scale of astronomical surveys grows. Several recent articles have used methods from machine learning and statistics to construct classifiers on databases of labeled, multi-epoch sources with the intention of using these classifiers to automatically infer the classes of unlabeled sources from new surveys. However, the same source observed with two different synoptic surveys will generally yield different derived metrics (features) from the light curve. Since such features are used in classifiers, this survey-dependent mismatch in feature space will typically lead to degraded classifier performance. In this article we show how and why feature distributions change using OGLE and Hipparcos light curves. To overcome survey systematics, we apply a noisification method, which attempts to empirically match distributions of features between the labeled sources used to construct the classifier and the unlabeled sources we wish to classify. Results from simulated and real-world light curves show that noisification can significantly improve classifier performance. In a three-class problem using light curves from Hipparcos and OGLE, noisification reduces the classifier error rate from 27.0% to 7.0%. We recommend that noisification be used for upcoming surveys such as Gaia and LSST, and we describe some of the promises and challenges of applying noisification to these surveys.

Original language	English (US)
Pages (from-to)	280-295
Number of pages	16
Journal	Publications of the Astronomical Society of the Pacific
Volume	124
Issue number	913
DOIs	https://doi.org/10.1086/664960
State	Published - Mar 2012
Externally published	Yes

ASJC Scopus subject areas

Astronomy and Astrophysics
Space and Planetary Science

Access to Document

10.1086/664960

Cite this

@article{edf0751a76fd4b6d97304dc02b2191b5,

title = "Optimizing automated classification of variable stars in new synoptic surveys",

abstract = "Efficient and automated classification of periodic variable stars is becoming increasingly important as the scale of astronomical surveys grows. Several recent articles have used methods from machine learning and statistics to construct classifiers on databases of labeled, multi-epoch sources with the intention of using these classifiers to automatically infer the classes of unlabeled sources from new surveys. However, the same source observed with two different synoptic surveys will generally yield different derived metrics (features) from the light curve. Since such features are used in classifiers, this survey-dependent mismatch in feature space will typically lead to degraded classifier performance. In this article we show how and why feature distributions change using OGLE and Hipparcos light curves. To overcome survey systematics, we apply a noisification method, which attempts to empirically match distributions of features between the labeled sources used to construct the classifier and the unlabeled sources we wish to classify. Results from simulated and real-world light curves show that noisification can significantly improve classifier performance. In a three-class problem using light curves from Hipparcos and OGLE, noisification reduces the classifier error rate from 27.0% to 7.0%. We recommend that noisification be used for upcoming surveys such as Gaia and LSST, and we describe some of the promises and challenges of applying noisification to these surveys.",

author = "Long, {James P.} and {El Karoui}, Noureddine and Rice, {John A.} and Richards, {Joseph W.} and Bloom, {Joshua S.}",

year = "2012",

month = mar,

doi = "10.1086/664960",

language = "English (US)",

volume = "124",

pages = "280--295",

journal = "Publications of the Astronomical Society of the Pacific",

issn = "0004-6280",

publisher = "University of Chicago",

number = "913",

}

TY - JOUR

T1 - Optimizing automated classification of variable stars in new synoptic surveys

AU - Long, James P.

AU - El Karoui, Noureddine

AU - Rice, John A.

AU - Richards, Joseph W.

AU - Bloom, Joshua S.

PY - 2012/3

Y1 - 2012/3

N2 - Efficient and automated classification of periodic variable stars is becoming increasingly important as the scale of astronomical surveys grows. Several recent articles have used methods from machine learning and statistics to construct classifiers on databases of labeled, multi-epoch sources with the intention of using these classifiers to automatically infer the classes of unlabeled sources from new surveys. However, the same source observed with two different synoptic surveys will generally yield different derived metrics (features) from the light curve. Since such features are used in classifiers, this survey-dependent mismatch in feature space will typically lead to degraded classifier performance. In this article we show how and why feature distributions change using OGLE and Hipparcos light curves. To overcome survey systematics, we apply a noisification method, which attempts to empirically match distributions of features between the labeled sources used to construct the classifier and the unlabeled sources we wish to classify. Results from simulated and real-world light curves show that noisification can significantly improve classifier performance. In a three-class problem using light curves from Hipparcos and OGLE, noisification reduces the classifier error rate from 27.0% to 7.0%. We recommend that noisification be used for upcoming surveys such as Gaia and LSST, and we describe some of the promises and challenges of applying noisification to these surveys.

AB - Efficient and automated classification of periodic variable stars is becoming increasingly important as the scale of astronomical surveys grows. Several recent articles have used methods from machine learning and statistics to construct classifiers on databases of labeled, multi-epoch sources with the intention of using these classifiers to automatically infer the classes of unlabeled sources from new surveys. However, the same source observed with two different synoptic surveys will generally yield different derived metrics (features) from the light curve. Since such features are used in classifiers, this survey-dependent mismatch in feature space will typically lead to degraded classifier performance. In this article we show how and why feature distributions change using OGLE and Hipparcos light curves. To overcome survey systematics, we apply a noisification method, which attempts to empirically match distributions of features between the labeled sources used to construct the classifier and the unlabeled sources we wish to classify. Results from simulated and real-world light curves show that noisification can significantly improve classifier performance. In a three-class problem using light curves from Hipparcos and OGLE, noisification reduces the classifier error rate from 27.0% to 7.0%. We recommend that noisification be used for upcoming surveys such as Gaia and LSST, and we describe some of the promises and challenges of applying noisification to these surveys.

UR - http://www.scopus.com/inward/record.url?scp=84859067767&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84859067767&partnerID=8YFLogxK

U2 - 10.1086/664960

DO - 10.1086/664960

M3 - Article

AN - SCOPUS:84859067767

SN - 0004-6280

VL - 124

SP - 280

EP - 295

JO - Publications of the Astronomical Society of the Pacific

JF - Publications of the Astronomical Society of the Pacific

IS - 913

ER -

Optimizing automated classification of variable stars in new synoptic surveys

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this