Abstract
Finding semantically rich and computer-understandable representations for textual dialogues, utterances and words is crucial for dialogue systems (or conversational agents), as their performance mostly depends on understanding the context of conversations. In recent research approaches, responses have been generated utilizing a decoder architecture, given the distributed vector representation (embedding) of the current conversation. In this paper, the utilization of embeddings for answer retrieval is explored by using Locality-Sensitive Hashing Forest (LSH Forest), an Approximate Nearest Neighbor (ANN) model, to find similar conversations in a corpus and rank possible candidates. Experimental results on the well-known Ubuntu Corpus (in English) and a customer service chat dataset (in Dutch) show that, in combination with a candidate selection method, retrieval-based approaches outperform generative ones and reveal promising future research directions towards the usability of such a system.
Original language | English |
---|---|
Title of host publication | 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA) |
Publisher | IEEE |
Pages | 1120-1125 |
Number of pages | 6 |
DOIs | |
Publication status | Published - Dec 2017 |
Keywords
- information retrieval
- interactive systems
- nearest neighbour methods
- Approximate Nearest Neighbor model
- Locality-Sensitive Hashing Forest
- answer retrieval
- computer-understandable representations
- context embeddings
- conversational agents
- dialogue systems
- distributed vector representation
- retrieval-based approaches
- retrieval-based dialogue system
- Context modeling
- Databases
- Decoding
- Encoding
- Logic gates
- Pipelines
- Training
- Deep Learning
- Dialogue Systems
- Information Retrieval