Using weighted nearest neighbor to benefit from unlabeled data

K Driessens*, P Reutemann, B Pfahringer, C Leschi

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

Abstract

The development of, data-mining applications such as text-classification and molecular profiling has shown the need for machine learning algorithms that can benefit from both labeled and unlabeled data, where often the unlabeled examples greatly outnumber the labeled examples. In this paper we present a two-stage classifier that improves its predictive accuracy by making use of the available unlabeled data. It uses a weighted nearest neighbor classification algorithm using the combined example-sets as a knowledge base. The examples from the unlabeled set are "pre-labeled" by an initial classifier that is build using the limited available training data. By choosing appropriate weights for this prelabeled data, the nearest neighbor classifier consistently improves on the original classifier.
Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining
Subtitle of host publicationProceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006)
EditorsWee Keong Ng, Masaru Kitsuregawa, Jianzhong Li, Kuiyu Chang
PublisherSpringer
Pages60-69
ISBN (Print)3-540-33206-5, 978-3-540-33206-0
DOIs
Publication statusPublished - 2006
Externally publishedYes

Publication series

SeriesLecture Notes in Computer Science
Volume3918
ISSN0302-9743

Cite this