Shapley-Value Data Valuation for Semi-supervised Learning

Christie Courtnage, Evgueni Smirnov

Research output: Chapter in Book/Report/Conference proceedingChapterAcademic

Abstract

Semi-supervised learning aims at training accurate prediction models on labeled and unlabeled data. Its realization strongly depends on selecting pseudo-labeled data. The standard approach is to select instances based on the pseudo-label confidence values that they receive from the prediction models. In this paper we argue that this is an indirect approach w.r.t. the main goal of semi-supervised learning. Instead, we propose a direct approach that selects the pseudo-labeled instances based on their individual contributions for the performance of the prediction models. The individual instance contributions are computed as Shapley values w.r.t. characteristic functions related to the model performance. Experiments show that our approach outperforms the standard one when used in semi-supervised wrappers.
Original languageEnglish
Title of host publicationDiscovery Science. DS 2021
EditorsC. Soares, L. Torgo
PublisherSpringer, Cham
Pages94-108
ISBN (Print)978-3-030-88941-8
DOIs
Publication statusPublished - 2021

Publication series

SeriesLecture Notes in Computer Science
Volume12986
ISSN0302-9743

Keywords

  • machine learning, semi-supervised learning

Cite this