Applying support vector machines to imbalanced datasets

Rehan Akbani, Stephen Kwek, Nathalie Japkowicz

Research output: Contribution to journalConference articlepeer-review

902 Scopus citations

Abstract

Support Vector Machines (SVM) have been extensively studied and have shown remarkable success in many applications. However the success of SVM is very limited when it is applied to the problem of learning from imbalanced datasets in which negative instances heavily outnumber the positive instances (e.g. in gene profiling and detecting credit card fraud). This paper discusses the factors behind this failure and explains why the common strategy of undersampling the training data may not be the best choice for SVM. We then propose an algorithm for overcoming these problems which is based on a variant of the SMOTE algorithm by Chawla et al, combined with Veropoulos et al's different error costs algorithm. We compare the performance of our algorithm against these two algorithms, along with undersampling and regular SVM and show that our algorithm outperforms all of them.

Original languageEnglish (US)
Pages (from-to)39-50
Number of pages12
JournalLecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
Volume3201
DOIs
StatePublished - 2004
Externally publishedYes
Event15th European Conference on Machine Learning, ECML 2004 - Pisa, Italy
Duration: Sep 20 2004Sep 24 2004

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Applying support vector machines to imbalanced datasets'. Together they form a unique fingerprint.

Cite this