A Retrieval-Based Dialogue System Utilizing Utterance and Context Embeddings

A. Bartl; G. Spanakis

doi:10.1109/ICMLA.2017.00011

A Retrieval-Based Dialogue System Utilizing Utterance and Context Embeddings

^*Corresponding author for this work

Advanced Computing Sciences

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

214 Downloads (Pure)

Abstract

Finding semantically rich and computer-understandable representations for textual dialogues, utterances and words is crucial for dialogue systems (or conversational agents), as their performance mostly depends on understanding the context of conversations. In recent research approaches, responses have been generated utilizing a decoder architecture, given the distributed vector representation (embedding) of the current conversation. In this paper, the utilization of embeddings for answer retrieval is explored by using Locality-Sensitive Hashing Forest (LSH Forest), an Approximate Nearest Neighbor (ANN) model, to find similar conversations in a corpus and rank possible candidates. Experimental results on the well-known Ubuntu Corpus (in English) and a customer service chat dataset (in Dutch) show that, in combination with a candidate selection method, retrieval-based approaches outperform generative ones and reveal promising future research directions towards the usability of such a system.

Original language	English
Title of host publication	2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)
Publisher	IEEE
Pages	1120-1125
Number of pages	6
DOIs	https://doi.org/10.1109/ICMLA.2017.00011
Publication status	Published - Dec 2017

Keywords

information retrieval
interactive systems
nearest neighbour methods
Approximate Nearest Neighbor model
Locality-Sensitive Hashing Forest
answer retrieval
computer-understandable representations
context embeddings
conversational agents
dialogue systems
distributed vector representation
retrieval-based approaches
retrieval-based dialogue system
Context modeling
Databases
Decoding
Encoding
Logic gates
Pipelines
Training
Deep Learning
Dialogue Systems
Information Retrieval

Access to Document

10.1109/ICMLA.2017.00011

Full TextFinal published version, 187 KBLicence: Taverne

Cite this

@inproceedings{f1764f31e341472d9f340f4e060786de,

title = "A Retrieval-Based Dialogue System Utilizing Utterance and Context Embeddings",

abstract = "Finding semantically rich and computer-understandable representations for textual dialogues, utterances and words is crucial for dialogue systems (or conversational agents), as their performance mostly depends on understanding the context of conversations. In recent research approaches, responses have been generated utilizing a decoder architecture, given the distributed vector representation (embedding) of the current conversation. In this paper, the utilization of embeddings for answer retrieval is explored by using Locality-Sensitive Hashing Forest (LSH Forest), an Approximate Nearest Neighbor (ANN) model, to find similar conversations in a corpus and rank possible candidates. Experimental results on the well-known Ubuntu Corpus (in English) and a customer service chat dataset (in Dutch) show that, in combination with a candidate selection method, retrieval-based approaches outperform generative ones and reveal promising future research directions towards the usability of such a system.",

keywords = "information retrieval, interactive systems, nearest neighbour methods, Approximate Nearest Neighbor model, Locality-Sensitive Hashing Forest, answer retrieval, computer-understandable representations, context embeddings, conversational agents, dialogue systems, distributed vector representation, retrieval-based approaches, retrieval-based dialogue system, Context modeling, Databases, Decoding, Encoding, Logic gates, Pipelines, Training, Deep Learning, Dialogue Systems, Information Retrieval",

author = "A. Bartl and G. Spanakis",

year = "2017",

month = dec,

doi = "10.1109/ICMLA.2017.00011",

language = "English",

pages = "1120--1125",

booktitle = "2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)",

publisher = "IEEE",

address = "United States",

}

TY - GEN

T1 - A Retrieval-Based Dialogue System Utilizing Utterance and Context Embeddings

AU - Bartl, A.

AU - Spanakis, G.

PY - 2017/12

Y1 - 2017/12

N2 - Finding semantically rich and computer-understandable representations for textual dialogues, utterances and words is crucial for dialogue systems (or conversational agents), as their performance mostly depends on understanding the context of conversations. In recent research approaches, responses have been generated utilizing a decoder architecture, given the distributed vector representation (embedding) of the current conversation. In this paper, the utilization of embeddings for answer retrieval is explored by using Locality-Sensitive Hashing Forest (LSH Forest), an Approximate Nearest Neighbor (ANN) model, to find similar conversations in a corpus and rank possible candidates. Experimental results on the well-known Ubuntu Corpus (in English) and a customer service chat dataset (in Dutch) show that, in combination with a candidate selection method, retrieval-based approaches outperform generative ones and reveal promising future research directions towards the usability of such a system.

AB - Finding semantically rich and computer-understandable representations for textual dialogues, utterances and words is crucial for dialogue systems (or conversational agents), as their performance mostly depends on understanding the context of conversations. In recent research approaches, responses have been generated utilizing a decoder architecture, given the distributed vector representation (embedding) of the current conversation. In this paper, the utilization of embeddings for answer retrieval is explored by using Locality-Sensitive Hashing Forest (LSH Forest), an Approximate Nearest Neighbor (ANN) model, to find similar conversations in a corpus and rank possible candidates. Experimental results on the well-known Ubuntu Corpus (in English) and a customer service chat dataset (in Dutch) show that, in combination with a candidate selection method, retrieval-based approaches outperform generative ones and reveal promising future research directions towards the usability of such a system.

KW - information retrieval

KW - interactive systems

KW - nearest neighbour methods

KW - Approximate Nearest Neighbor model

KW - Locality-Sensitive Hashing Forest

KW - answer retrieval

KW - computer-understandable representations

KW - context embeddings

KW - conversational agents

KW - dialogue systems

KW - distributed vector representation

KW - retrieval-based approaches

KW - retrieval-based dialogue system

KW - Context modeling

KW - Databases

KW - Decoding

KW - Encoding

KW - Logic gates

KW - Pipelines

KW - Training

KW - Deep Learning

KW - Dialogue Systems

KW - Information Retrieval

U2 - 10.1109/ICMLA.2017.00011

DO - 10.1109/ICMLA.2017.00011

M3 - Conference article in proceeding

SP - 1120

EP - 1125

BT - 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)

PB - IEEE

ER -