Why is Context So Hard?

by Thomas Beutel

When giving commands to my Amazon Echo, I can say “play my classical playlist” and then once it is playing I can say “turn on shuffle mode”. So Echo already has a limited sense of context. It knows what shuffle mode means when it is already playing something.

But I can’t say “play my classical playlist and turn on shuffle mode”.* This is clearly a limitation of the voice interface. Turning on shuffle mode does not make sense until the player is already playing. In general, understanding context at the command level is much harder than understanding context when Echo is already doing something.

Context in free form text is still one of the hardest computing problems. Computing still requires highly regularized and categorized text. Context however requires a large knowledge base of the world with its relations and interactions.

Some services such as OpenCalais already offer limited contextual services. But a true context service would identify not only the who and what of entities, but also relationships. And it would be able to answer queries and make predictions about the relationships of entities.

I can see a context service being used in documents analysis as well as enhancing Amazon Echo. It could tell Echo exactly what discrete actions a compound command is asking to do, relative to Echo’s capabilities. My guess is that someday context processing will be offered as an easy-to-use service (i.e. AWS Contextualizer).


*After I wrote this post, I found out that I can say “play my classical playlist with shuffle mode” and it works. From what I have read, Echo will only respond if the command variation has been preprogrammed. In other words, Echo is only as smart as the number of variations that the programmer can think of. It won’t make an attempt to understand if your command does not match one of the templates.