Game Design for Voice Devices Like Alexa – Part One

by Jenn | on October 17, 2017

A family playing the Voice Originals game When in Rome with their Amazon Alexa
Fig 1. A family playing the Voice Originals game When in Rome with their Amazon Alexa


Devices that rely on user input via voice have become nearly ubiquitous with the advent of smartphones, smart TVs, and now, smart speakers. Indeed, specialized devices such as Amazon Echo and Google Home have moved beyond the pages of Wired and into homes across the world; it’s now no longer a question of if the device has support in the market, but how much support we should expect from its development community.


Skills (the apps designed specifically for these smart speakers and AIs) are being developed with fervor, but common pitfalls may prevent some of these from reaching their full potential.


At Sensible Object, we’re creating games for a more connected world. Our first game, Beasts of Balance, is a cooperative stacking game that comes to life via a tablet or other smart device. As voice powered devices become commonplace in homes around the world, we feel that voice is the next logical step to create physical games that are digitally enhanced.


We were thrilled to be part of the Alexa Accelerator, powered by Techstars cohort (Venturebeat Article). During our time in Seattle we explored many ways to use voice powered devices and combine them with tabletop, physical games. We’ve started development of a series of skills, known as Voice Originals. They’ve given us the opportunity to work extensively with these voice devices, and determine what really lets them shine.


Voice devices are communal ambient interfaces. That means they create a shared experience, that they are easy to access by simply being there in your home and they are a way to interact with a digital device. To begin our discussion of how to design effectively for these devices, we need to look at the main ways that voice devices can interact with players and vice versa.


Firstly, voice devices can output speech either in their procedurally generated voices or with pre-recorded voice actor dialog. Voice devices can also output a range of audio such as music and special effects.


The main way that players can interact with a voice device is by saying commands. Within a game context, you’ll frequently have the voice device ask a question and wait for the player’s command. However, using a wake word, you can interrupt the voice device in the middle of it speaking or playing audio. Amazon just announced that Alexa can now detect the voices of multiple users. This ‘voice fingerprinting technology could enable Alexa to detect different players during the game, which would help keep track of score.


Most voice devices only accept verbal commands, however in the future we can see them being able to recognise decibels or other sounds. This could useful to recognise clapping or distinct sounds from specific objects that were shipped in the game.


Summary of Current Interaction Types:

  • Output only:
    • Procedurally generated voice
    • Recorded dialogue
    • Music & SFX
  • Input only:
    • Voice commands – answering
    • Voice commands – interrupting
    • Mute voice device
  • Input & Output:
    • Associated mobile app
    • Buttons

Fig 2. Summary of current interaction types for Alexa

If players don’t want the voice device to hear them talking, they can forcibly mute the device. Although this isn’t really something that should be encouraged, it is a way for players to interact and affect the device.


Voice devices are linked to your phone via an app. This app can help players interact with the device in terms of understanding what the device interpreted or finding new skills (ie voice apps) to enable.


New ways to interact with voice devices will continue to be developed and added as the user base grows. For example, Amazon has announced a range of gadgets for the Echo range of devices, such as the Echo Buttons which are well suited for quiz or trivia style games.


This We Believe

At Sensible Object, when working on a project, we create a series of statements that we call This We Believe. This design process was originated by Margaret Robertson, and adopted at Sensible Object CEO’s first company, Hide&Seek. They are designed to be aspirational, to create a possibility space that everyone on that team can feel ownership of.


For our exploration of voice powered devices, we created a new set of This We Believe statements:

  • Physical Components: The game has a set of physical components that come in a small-ish box that we can sell at a desirable price point for consumers.
  • Using Voice AI: The game uses Voice AI for something that only Voice AI can do.
  • Voice Commands: The game makes speaking to Alexa an action full of meaning and excitement.
  • Creating atmosphere: The game uses the full potential of audio to create atmosphere.
  • Writing: The game has a sophisticated set of responses to every situation, driven by a large corpus of well-written text.
  • Meta World: The game has a rich meta which evolves over time, in a loop between player creativity, a developing AI, and new physical upgrades.

We used these to help guide our experimentation and design process.

Lessons Learnt from Designing for Voice Devices

Over the course of 13 weeks in the Amazon Alexa Techstars Accelerator, we spent time prototyping wild ideas, eventually settling on one idea of a radical new take on the travel trivia game. The lessons we learnt have been grouped based on our This We Believe statements.

Voice Originals - When In Rome what’s in the box
Fig 3. Voice Originals – When In Rome what’s in the box

Physical Components

One of the key components of the games that we make at Sensible Object is that they have physical components. They get people together around a table and encourage face-to-face interactions, rather than face-to-screen. We aim to create games where the physical components are at least as important as the digital component.


One of the biggest advantages a digital game has over traditional tabletop games relates to first time experience. For tabletop games, if no one has played the game before, someone has to read through a manual; attempt to understand the rules and then convey them to everyone else at the table. In digital games, the game itself can teach you the rules as you play the game. Furthermore, digital games can increase difficulty and block off sections of the game to create an easier area to learn the main gameplay.


Voice devices let us have the best of both worlds: you can teach people how to play the game, but still allow people to gather around a table to play games together in a natural way. This means that we can teach players much more complex rules than is usually possible for a tabletop game. Further, the rules can change dynamically or the voice device can add opponents for players.


If the game was entirely audio only, players would need to keep a lot of information in their heads. Having physical components helps with this issue, but caution is still needed. Matching state between the physical components, players’ minds and a voice device can be very complex. There are two ways information can be passed: from a voice device to a player who then moves physical components; or from physical components (e.g. a new card) to a player and then told to a voice device.


When passing information from a voice device, issues arise if the physical components need to change frequently or in complex ways. That is, players may incorrectly move a component and then the version of the board that the voice device has stored will not match what players are seeing. When a player realises that the voice device has a different version of the game state, it can put players off the entire game since they are unable to determine the extent of the mismatch in state.


When passing information to a voice device, the main method is for a player to pick up a card, read a special code or phrase in and then the voice device will know what is being played or can create effects of the card. This technique should only be used in very special circumstances, since reading out codes to a device may become time consuming and error-prone, especially if it is hard to pronounce, non-English words or a set of numbers and letters where each number of letter could be mis-understood.


Physical Components Rules of Thumb

  • Emphasise face-to-face interactions between people
  • Use voice device to teach rules and use of components
  • Don’t ask players to move components frequently
  • Only in special cases, pass information from components to voice device using code words on components themselves
  • Games based on secret knowledge will be difficult to implement

Fig 4. Heuristics for designing physical game components

In many purely tabletop games, players are able to pick up cards that tell them something that only they know. In a voice game, if that information is something that the voice device also needs to know then the player must read it aloud. When one player reads out a code, everyone else in the room will hear what is said. This may work for some game types or on a single playthrough of the game, but over time players will learn what the unique codes mean. Similarly, if you want the voice device to tell only one of the players information, it will be very difficult without asking people to put on headphones or physically leave the room. This all means that games involving secret knowledge have added design challenges to overcome.



And that’s all for this instalment of our design blog. Stay tuned for the next article on voice AI and voice commands.