Using weighted nearest neighbor to benefit from unlabeled data

K Driessens; P Reutemann; B Pfahringer; C Leschi

doi:10.1007/11731139_10

Using weighted nearest neighbor to benefit from unlabeled data

K Driessens^*, P Reutemann, B Pfahringer, C Leschi

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

Abstract

The development of, data-mining applications such as text-classification and molecular profiling has shown the need for machine learning algorithms that can benefit from both labeled and unlabeled data, where often the unlabeled examples greatly outnumber the labeled examples. In this paper we present a two-stage classifier that improves its predictive accuracy by making use of the available unlabeled data. It uses a weighted nearest neighbor classification algorithm using the combined example-sets as a knowledge base. The examples from the unlabeled set are "pre-labeled" by an initial classifier that is build using the limited available training data. By choosing appropriate weights for this prelabeled data, the nearest neighbor classifier consistently improves on the original classifier.

Original language	English
Title of host publication	Advances in Knowledge Discovery and Data Mining
Subtitle of host publication	Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006)
Editors	Wee Keong Ng, Masaru Kitsuregawa, Jianzhong Li, Kuiyu Chang
Publisher	Springer
Pages	60-69
ISBN (Print)	3-540-33206-5, 978-3-540-33206-0
DOIs	https://doi.org/10.1007/11731139_10
Publication status	Published - 2006
Externally published	Yes

Publication series

Series	Lecture Notes in Computer Science
Volume	3918
ISSN	0302-9743

Access to Document

10.1007/11731139_10

Cite this

Driessens, K., Reutemann, P., Pfahringer, B., & Leschi, C. (2006). Using weighted nearest neighbor to benefit from unlabeled data. In W. Keong Ng, M. Kitsuregawa, J. Li, & K. Chang (Eds.), Advances in Knowledge Discovery and Data Mining: Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006) (pp. 60-69). Springer. https://doi.org/10.1007/11731139_10

Driessens, K ; Reutemann, P ; Pfahringer, B et al. / Using weighted nearest neighbor to benefit from unlabeled data. Advances in Knowledge Discovery and Data Mining: Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006). editor / Wee Keong Ng ; Masaru Kitsuregawa ; Jianzhong Li ; Kuiyu Chang. Springer, 2006. pp. 60-69 (Lecture Notes in Computer Science, Vol. 3918).

@inproceedings{124dc3969fa74294b650f4e7fbd45864,

title = "Using weighted nearest neighbor to benefit from unlabeled data",

abstract = "The development of, data-mining applications such as text-classification and molecular profiling has shown the need for machine learning algorithms that can benefit from both labeled and unlabeled data, where often the unlabeled examples greatly outnumber the labeled examples. In this paper we present a two-stage classifier that improves its predictive accuracy by making use of the available unlabeled data. It uses a weighted nearest neighbor classification algorithm using the combined example-sets as a knowledge base. The examples from the unlabeled set are {"}pre-labeled{"} by an initial classifier that is build using the limited available training data. By choosing appropriate weights for this prelabeled data, the nearest neighbor classifier consistently improves on the original classifier.",

author = "K Driessens and P Reutemann and B Pfahringer and C Leschi",

year = "2006",

doi = "10.1007/11731139_10",

language = "English",

isbn = "3-540-33206-5",

series = "Lecture Notes in Computer Science",

publisher = "Springer",

pages = "60--69",

editor = "{Keong Ng}, Wee and Masaru Kitsuregawa and Jianzhong Li and Kuiyu Chang",

booktitle = "Advances in Knowledge Discovery and Data Mining",

address = "United States",

}

Driessens, K, Reutemann, P, Pfahringer, B & Leschi, C 2006, Using weighted nearest neighbor to benefit from unlabeled data. in W Keong Ng, M Kitsuregawa, J Li & K Chang (eds), Advances in Knowledge Discovery and Data Mining: Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006). Springer, Lecture Notes in Computer Science, vol. 3918, pp. 60-69. https://doi.org/10.1007/11731139_10

Using weighted nearest neighbor to benefit from unlabeled data. / Driessens, K; Reutemann, P; Pfahringer, B et al.
Advances in Knowledge Discovery and Data Mining: Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006). ed. / Wee Keong Ng; Masaru Kitsuregawa; Jianzhong Li; Kuiyu Chang. Springer, 2006. p. 60-69 (Lecture Notes in Computer Science, Vol. 3918).

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

TY - GEN

T1 - Using weighted nearest neighbor to benefit from unlabeled data

AU - Driessens, K

AU - Reutemann, P

AU - Pfahringer, B

AU - Leschi, C

PY - 2006

Y1 - 2006

N2 - The development of, data-mining applications such as text-classification and molecular profiling has shown the need for machine learning algorithms that can benefit from both labeled and unlabeled data, where often the unlabeled examples greatly outnumber the labeled examples. In this paper we present a two-stage classifier that improves its predictive accuracy by making use of the available unlabeled data. It uses a weighted nearest neighbor classification algorithm using the combined example-sets as a knowledge base. The examples from the unlabeled set are "pre-labeled" by an initial classifier that is build using the limited available training data. By choosing appropriate weights for this prelabeled data, the nearest neighbor classifier consistently improves on the original classifier.

AB - The development of, data-mining applications such as text-classification and molecular profiling has shown the need for machine learning algorithms that can benefit from both labeled and unlabeled data, where often the unlabeled examples greatly outnumber the labeled examples. In this paper we present a two-stage classifier that improves its predictive accuracy by making use of the available unlabeled data. It uses a weighted nearest neighbor classification algorithm using the combined example-sets as a knowledge base. The examples from the unlabeled set are "pre-labeled" by an initial classifier that is build using the limited available training data. By choosing appropriate weights for this prelabeled data, the nearest neighbor classifier consistently improves on the original classifier.

U2 - 10.1007/11731139_10

DO - 10.1007/11731139_10

M3 - Conference article in proceeding

SN - 3-540-33206-5

SN - 978-3-540-33206-0

T3 - Lecture Notes in Computer Science

SP - 60

EP - 69

BT - Advances in Knowledge Discovery and Data Mining

A2 - Keong Ng, Wee

A2 - Kitsuregawa, Masaru

A2 - Li, Jianzhong

A2 - Chang, Kuiyu

PB - Springer

ER -

Driessens K, Reutemann P, Pfahringer B, Leschi C. Using weighted nearest neighbor to benefit from unlabeled data. In Keong Ng W, Kitsuregawa M, Li J, Chang K, editors, Advances in Knowledge Discovery and Data Mining: Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006). Springer. 2006. p. 60-69. (Lecture Notes in Computer Science, Vol. 3918). doi: 10.1007/11731139_10