Architecting a Chatbot for Language Recognition

Published 04/02/2019 10:40 AM   |    Updated 04/02/2019 03:39 PM
Natural conversations, by nature, allow for the flexibility of switching language midconversation. In fact, for multilingual individuals such as my brothers and me, changing between various languages allows us to emphasize certain concepts without explicitly stating so. 

We generally speak in Polish (English if our wives are present), English to fill in words we don’t know in Polish, and Spanish to provide emphasis or a callback to something that happened in our childhood in Puerto Rico. 

Chatbots, in their current state without Artificial General Intelligence (AGI), don’t allow for the nuance of language choice. However, given the state of language recognition and machine translation, we can implement a somewhat intelligent multilingual chatbot. 

In this article, I’ll outline the general automatic approach. I’ll also highlight the downsides of this approach and list the problems that need to be solved when creating a production-quality multi-language chatbot experience.

A naïve approach to a multilingual chatbot

I call the fully automated approach naive. This is the type of approach most projects start with. It’s somewhat easy to put in place and moves a project into the multilingual realm quite quickly. But it comes with its set of challenges. Before I dive into those, let’s review the approach. 

Assuming we have a working English natural language model and English content, the bot can implement multilingual conversations as follows:

  1. Receive user input in its native language.
  2. Detect the user input language and store it in the user’s preferences.
  3. If the incoming message is not English, translate it into English.
  4. Send English user utterance to a Natural Language Understanding (NLU) platform.
  5. Execute logic and render English output.
  6. If the user’s language wasn’t English, translate the output into the user’s native language.
  7. Send a response back to the user.

This approach works, but the conversation quality is off. Although machine translation has improved by leaps and bounds, cases still exist in which the conversation feels stiff and culturally disconnected. This approach suffers in three main areas:

  • Input utterance cultural nuances: Utterance translation can sometimes feel awkward, especially for heavy slang or highly proprietary language. NLU model performance suffers as a result.
  • Ambiguous language utterance affects conversation flow: A word such as “no” or “mama” can easily turn conversation into another language. For example, in some language detection engines, the word “no” is consistently classified as Spanish. If the bot were to ask a yes or no question, answering no could trigger a response in Spanish.
  • Output translation branding quality: Although automatic machine translation is a good start, companies and brands that want fine-tuned control over their bot’s output will cringe at the output generated by the machine translation service.

Moving to a hybrid managed approach

A more mature approach to a multilingual chatbot involves three key considerations. They vary based on risk aversion, content quality and available resources. Let’s explore options for each item as we progress through them.

Multilanguage NLU

Ideally, I like my chatbot solutions to have an NLU model for each supported language. The cost of creating and maintaining these models can be significant. For multilanguage solutions, I always ask for the highest-priority languages a client would like to support. 

If an enterprise can support 90% of employees by getting two languages working well, then we can limit the NLU scope to those two languages — and use the automatic approach for any other languages. 
In many of my projects, I use Microsoft’s Language Understanding Intelligent Service (LUIS). I might create one model for English and another for Simplified Chinese. That way, Chinese users don’t suffer the nuanced translation tax. 

Project stakeholders also need to decide whether the chatbot should support an arbitrary amount of languages or limited valid inputs to languages with an NLU model. If it does the latter, the automatic approach above will be applied to non-natively supported languages.

Ambiguous language detection

In ambiguous language detection, short utterances may be valid in multiple languages. Further complicating the matter, the translation APIs, such as those from Microsoft and Google, don’t return options or confidence levels. 

There are numerous approaches to resolving the ambiguous language problem. Two possibilities are 1) run a concatenation of the last N user utterances through the language recognition engine, or 2) maintain a list of ambiguous words that we ignore for language detection and use the user’s last utterance language instead. 

Both are different flavors of simply considering the user’s language preference as a conversation-level rather than message-level property. If we’re interested in supporting switching between languages midconversation, a mix of both approaches works well.

Output content translation

I encourage clients to maintain the precise localized content sent by the chatbot. This is especially true for public consumer or regulated industry use cases in which any mistranslated content might result in fines or negative brand attention. 

This, again, is a risk versus effort calculation that needs to be performed by the right stakeholders. The necessity of controlling localized content and the effort involved in it typically weigh on whether the bot supports arbitrary languages or not.

Final architecture

Based on all of the above, a true approach to a multilingual chatbot experience might look like this:

The bot in this case:

  1. Receives user input in its native language.
  2. Detects the user input language and stores it in the user’s preferences. Language detection is based both on an API and utterance ambiguity rules.
  3. Depending on the detected language …
    1. If we have an NLU model for the detected language, the bot queries that NLU model.
    2. If not, assuming we want to support all languages, the bot translates the user’s messages into English and uses the English NLU model to resolve intent. Assuming we want to support a closed set of languages, the bot may respond with a message that’s not recognized.
  4. Executes the chatbot logic and renders localized output.
  5. If the user’s language wasn’t English and our bot supports arbitrary languages, the bot automatically translates the output into the user’s native language.
  6. Sends a response back to the user.

The managed models and paths to automatic translation add nuance to the automatic approach. If we imagine a spectrum where on one end we find the fully automatic approach and on the other end the fully managed approach, all implementations fall somewhere within this range. 

Clients in regulated industries and heavily branded scenarios will lean toward the fully managed end, and clients with internal or less precise use cases will typically find the automatic approach more effective and economical.

The hybrid managed/automatic implementation does take some effort but results in the best conversational chatbot experience. 

Is this answer helpful?