You are here
Alleviating class imbalance using data sampling: Examining the effects on classification algorithms
- Date Issued:
- 2006
- Summary:
- Imbalanced class distributions typically cause poor classifier performance on the minority class, which also tends to be the class with the highest cost of mis-classification. Data sampling is a common solution to this problem, and numerous sampling techniques have been proposed to address it. Prior research examining the performance of these techniques has been narrow and limited. This work uses thorough empirical experimentation to compare the performance of seven existing data sampling techniques using five different classifiers and four different datasets. The work addresses which sampling techniques produce the best performance in the presence of class unbalance, which classifiers are most robust to the problem, as well as which sampling techniques perform better or worse with each classifier. Extensive statistical analysis of these results is provided, in addition to an examination of the qualitative effects of the sampling techniques on the types of predictions made by the C4.5 classifier.
Title: | Alleviating class imbalance using data sampling: Examining the effects on classification algorithms. |
543 views
467 downloads |
---|---|---|
Name(s): |
Napolitano, Amri E. Florida Atlantic University, Degree grantor Khoshgoftaar, Taghi M., Thesis advisor College of Engineering and Computer Science Department of Computer and Electrical Engineering and Computer Science |
|
Type of Resource: | text | |
Genre: | Electronic Thesis Or Dissertation | |
Issuance: | monographic | |
Date Issued: | 2006 | |
Publisher: | Florida Atlantic University | |
Place of Publication: | Boca Raton, Fla. | |
Physical Form: | application/pdf | |
Extent: | 100 p. | |
Language(s): | English | |
Summary: | Imbalanced class distributions typically cause poor classifier performance on the minority class, which also tends to be the class with the highest cost of mis-classification. Data sampling is a common solution to this problem, and numerous sampling techniques have been proposed to address it. Prior research examining the performance of these techniques has been narrow and limited. This work uses thorough empirical experimentation to compare the performance of seven existing data sampling techniques using five different classifiers and four different datasets. The work addresses which sampling techniques produce the best performance in the presence of class unbalance, which classifiers are most robust to the problem, as well as which sampling techniques perform better or worse with each classifier. Extensive statistical analysis of these results is provided, in addition to an examination of the qualitative effects of the sampling techniques on the types of predictions made by the C4.5 classifier. | |
Identifier: | 9780542931291 (isbn), 13413 (digitool), FADT13413 (IID), fau:10263 (fedora) | |
Collection: | FAU Electronic Theses and Dissertations Collection | |
Note(s): |
College of Engineering and Computer Science Thesis (M.S.)--Florida Atlantic University, 2006. |
|
Subject(s): |
Combinatorial group theory Data mining Decision trees Machine learning |
|
Held by: | Florida Atlantic University Libraries | |
Persistent Link to This Record: | http://purl.flvc.org/fcla/dt/13413 | |
Sublocation: | Digital Library | |
Use and Reproduction: | Copyright © is held by the author, with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. | |
Use and Reproduction: | http://rightsstatements.org/vocab/InC/1.0/ | |
Host Institution: | FAU | |
Is Part of Series: | Florida Atlantic University Digital Library Collections. |