Embrace the Unknown

July 3, 2017, by Louisa Pragst

Humans spent years of their life learning the meaning of words and how to respond to them. Even after we learned the basics, we never stop expanding our knowledge, with new, more sophisticated words and sentence structures or even by learning another language. We achieve this by permanently consuming and producing language over an extended period of time. Considering most dialogue systems do not spend years of practising with human conversation partners before they are deployed, how can they obtain the ability to hold a conversation?

Traditionally, a dialogue system uses a predefined model that specifies everything the user and the system can say. Hand-crafted rules or a dialogue policy learned from training data are used to determine the exact reaction of the dialogue system at any given time. While this results in reasonable system behaviour, it restricts the flexibility to react to unknown user requests.

In the KRISTINA project, we have the possibility to search the web for relevant information and include this information in the dialogue (If you haven't already, read last month's article. It's interesting, I promise.). This means that the KRISTINA system can be more flexible in the answers it provides since new options can be retrieved from the web at any time. However, that also implies that we can't know beforehand all the things KRISTINA is able to say. Even worse, the user has to have the opportunity to query the newly gathered information for it to get used, so we also don't want to restrict the user in what they can say. So, if we know neither what the user can say nor what the system can answer, how can we determine good system behaviour?

To solve this difficulty, we rely on generalisation. The dialogue policy is no longer defined for concrete user and system utterances but rather for their general properties. As a first step, the concrete rule 'if the user asks what Stefan's favourite food is, reply Stefan's favourite food is steak' can be made more general, such as 'if the user asks something that contains Stefan, food, and favourite, choose an answer that contains Stefan, favourite, food and the name of a dish'. While such simple generalised dialogue policies no longer require knowledge of the exact user and system utterances, they still rely on knowledge about the important semantic concepts that can occur. Assuming the rule for favourite food exists, but a question about favourite sweets has not been foreseen, this kind of generalisation will not help to find the suitable system response. However, we can use a concept hierarchy to see that sweets are a kind of food and can therefore be handled similarly. In the concept hierarchy sweets would be defined as a subclass of food. Lacking a rule for a question about Stefan, sweets, and favourite, the dialogue system will try to find a more general rule and apply it. If more than one possible answer exists (e.g. 'Stefan's favourite sweets are cakes.' and 'Stefan's favourite vegetables are mushrooms.'), the one that has the most common concepts with the original user utterance is chosen. Apart from the concept hierarchy, we can also utilise semantic relationships between concepts to find suitable rules for unknown concepts. The question 'What is Stefan's favourite recipe?' is very similar to requesting Stefan's favourite food, however, in a concept hierarchy there is no connection between food and recipe. By analysing huge amounts of data, we can see which concepts are often used in the same context and thereby estimate the extent of their semantic interrelation. If two concepts are semantically similar, it is likely that applying the same rule will result in a good system behaviour.

Despite our best efforts it is almost impossible to guarantee the dialogue system will be able to react adequately to anything the user might say. If all else fails, a dialogue system can still learn a lesson from the countless humans that just don't know what their conversation partner is trying to tell them: say anything and hope for the best. Or just get over yourself and admit you're at a loss.

multilingual intelligent embodied agent social competence adaptive dialogue expressive speech recognition and synthesis discourse generation vocal facial and gestural social and emotional cues