A Context-Aware Approach for Extracting Hard and Soft Skills

Ivo Wings; Rohan Nanda; K.J. Adebayo

doi:10.1016/j.procs.2021.10.016

A Context-Aware Approach for Extracting Hard and Soft Skills

Ivo Wings^*, Rohan Nanda, K.J. Adebayo

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

The continuous growth in the online recruitment industry has made the candidate screening process costly, labour intensive, and time-consuming. Automating the screening process would expedite candidate selection. In recent times, recruiting is moving towards skill-based recruitment where candidates are ranked according to the number of skills, skill's competence level and skill's experience. Therefore it is important to create a system which can accurately and automatically extract hard and soft skills from candidates' resume and job descriptions. The task is less complex for hard skills which in some cases could be named entities but much more challenging for soft skills which may appear in different linguistic forms depending on the context. In this paper, we propose a context-aware sequence classification and token classification model for extracting both hard and soft skills. We utilized the most recent state-of-the-art word embedding representations as textual features for various machine learning classifiers. The models have been validated by evaluating them on a publicly available job description dataset. Our results indicated that the best performing sequence classification model used BERT embeddings in addition with POS and DEP tags as input for a logistic regression classifier. The best performing token classification model used fine-tuned BERT embeddings with a support vector machine classifier. (C) 2021 The Authors. Published by Elsevier B.V.

Original language	English
Pages (from-to)	163-172
Number of pages	10
Journal	Procedia Computer Science
Volume	193
DOIs	https://doi.org/10.1016/j.procs.2021.10.016
Publication status	Published - 2021

Keywords

Human Resources management
Natural language processing
Skill extraction

Access to Document

10.1016/j.procs.2021.10.016Licence: CC BY-NC-ND

Cite this

@article{fd6e180dd6d14151a94e88507d6f3a1f,

title = "A Context-Aware Approach for Extracting Hard and Soft Skills",

abstract = "The continuous growth in the online recruitment industry has made the candidate screening process costly, labour intensive, and time-consuming. Automating the screening process would expedite candidate selection. In recent times, recruiting is moving towards skill-based recruitment where candidates are ranked according to the number of skills, skill's competence level and skill's experience. Therefore it is important to create a system which can accurately and automatically extract hard and soft skills from candidates' resume and job descriptions. The task is less complex for hard skills which in some cases could be named entities but much more challenging for soft skills which may appear in different linguistic forms depending on the context. In this paper, we propose a context-aware sequence classification and token classification model for extracting both hard and soft skills. We utilized the most recent state-of-the-art word embedding representations as textual features for various machine learning classifiers. The models have been validated by evaluating them on a publicly available job description dataset. Our results indicated that the best performing sequence classification model used BERT embeddings in addition with POS and DEP tags as input for a logistic regression classifier. The best performing token classification model used fine-tuned BERT embeddings with a support vector machine classifier. (C) 2021 The Authors. Published by Elsevier B.V.",

keywords = "Human Resources management, Natural language processing, Skill extraction",

author = "Ivo Wings and Rohan Nanda and K.J. Adebayo",

year = "2021",

doi = "10.1016/j.procs.2021.10.016",

language = "English",

volume = "193",

pages = "163--172",

journal = "Procedia Computer Science",

issn = "1877-0509",

publisher = "Elsevier",

}

TY - JOUR

T1 - A Context-Aware Approach for Extracting Hard and Soft Skills

AU - Wings, Ivo

AU - Nanda, Rohan

AU - Adebayo, K.J.

PY - 2021

Y1 - 2021

N2 - The continuous growth in the online recruitment industry has made the candidate screening process costly, labour intensive, and time-consuming. Automating the screening process would expedite candidate selection. In recent times, recruiting is moving towards skill-based recruitment where candidates are ranked according to the number of skills, skill's competence level and skill's experience. Therefore it is important to create a system which can accurately and automatically extract hard and soft skills from candidates' resume and job descriptions. The task is less complex for hard skills which in some cases could be named entities but much more challenging for soft skills which may appear in different linguistic forms depending on the context. In this paper, we propose a context-aware sequence classification and token classification model for extracting both hard and soft skills. We utilized the most recent state-of-the-art word embedding representations as textual features for various machine learning classifiers. The models have been validated by evaluating them on a publicly available job description dataset. Our results indicated that the best performing sequence classification model used BERT embeddings in addition with POS and DEP tags as input for a logistic regression classifier. The best performing token classification model used fine-tuned BERT embeddings with a support vector machine classifier. (C) 2021 The Authors. Published by Elsevier B.V.

AB - The continuous growth in the online recruitment industry has made the candidate screening process costly, labour intensive, and time-consuming. Automating the screening process would expedite candidate selection. In recent times, recruiting is moving towards skill-based recruitment where candidates are ranked according to the number of skills, skill's competence level and skill's experience. Therefore it is important to create a system which can accurately and automatically extract hard and soft skills from candidates' resume and job descriptions. The task is less complex for hard skills which in some cases could be named entities but much more challenging for soft skills which may appear in different linguistic forms depending on the context. In this paper, we propose a context-aware sequence classification and token classification model for extracting both hard and soft skills. We utilized the most recent state-of-the-art word embedding representations as textual features for various machine learning classifiers. The models have been validated by evaluating them on a publicly available job description dataset. Our results indicated that the best performing sequence classification model used BERT embeddings in addition with POS and DEP tags as input for a logistic regression classifier. The best performing token classification model used fine-tuned BERT embeddings with a support vector machine classifier. (C) 2021 The Authors. Published by Elsevier B.V.

KW - Human Resources management

KW - Natural language processing

KW - Skill extraction

U2 - 10.1016/j.procs.2021.10.016

DO - 10.1016/j.procs.2021.10.016

M3 - Article

SN - 1877-0509

VL - 193

SP - 163

EP - 172

JO - Procedia Computer Science

JF - Procedia Computer Science

ER -