Voice: Three Challenges Muting the Industry
Everyone knows voice is having a moment. Adoption of voice-enabled devices continues to accelerate — billions of devices will be soon mic-enabled, making voice experiences more accessible than ever. It’s not hard to imagine a world where we are constantly immersed in an ambient voice-enabled environment, interacting with voice applications that are as sophisticated as Scarlet Johansson’s virtual persona in the movie Her. Many technologists have conviction that this future state, or some version close to it, will unfold.
But when considering the voice applications we interact with today, experiences like Siri (“Hey Siri, what’s the weather?”), Alexa (“Alexa, play Drake on Spotify”), or automated phone systems (“Can I just speak to an agent, please!?”), it feels like we’re pretty far from virtual assistants that are so good we develop…romantic feelings for them. I would argue that most voice experiences today are actually quite frustrating, killing any magic or feelings of delight as a result. Today, we only trust voice to do the simplest utilitarian tasks possible — ones that have a high likelihood of success. And when we’re forced to do something more complex with voice AI (think looking up your recent bill info when calling a bank), the experience is almost never pleasant.
Voice experiences will eventually meet their potential. However, there are some core problems in the space that need to be solved before we can move forward. Below is my take on some of the biggest ones.
Design is an often undervalued element in product development, and this certainly is the case with voice experiences. Designing delightful voice experiences takes A LOT of design work. Without some mega powerful general AI that can handle almost any situation, these conversations need to have rigid conversational paths that are designed with great care. There’s no way around it. Thoughtful design is what prevents voice experiences from breaking all the time, and it is also what helps repair voice experience when something goes wrong (in conversational design, this is called a “sad path”). Not many people in the world are good at voice design, and I don’t expect that to change. So my view is that in order for more useful voice experiences to come to market, we need software that does the design work for us. Think about how Squarespace does this for websites. Squarespace offers beautifully designed web templates that business users can conform to the style of their brand with very minimal effort. Nothing similar exists for the voice medium — I believe that it will soon and once it does, we’ll see many more compelling voice experiences than we do today. Right now, we have great tools like Voiceflow, Botsociety, and Botmock, that help with the design, prototyping, and to some extent, the deployment process. However, they are more like blank canvases that require the user to do all of the thinking around design (similar to Figma or Sketch). We need something more like Squarespace that helps non-experts get in on the action, too. Until then, only the most technical folks — engineers and designers — have ability to build applications for this medium.
Access to Data
In order for voice apps to be personalized and for them to execute useful actions, these apps need access to databases. One simple example of this would be a pizza ordering voice experience for a local pizza shop. In order to be useful, this app at a minimum would need to be able to place an order in the restaurant’s order management system. It would be even more useful if it could look up a customer in the customer database and automatically identify said customer via voiceprint or some other method. This sounds straightforward in theory but in practice is actually quite challenging. Businesses’ data is fragmented across different cloud solutions that have varying levels of openness…some of which do not surface their data via API at all. Even more challenging, many businesses still use on premise solutions (clutches pearls). Connecting to the requisite data will be a huge bottleneck for adoption of voice AI. The businesses that benefit from this technology first will be businesses using cloud native software that is easy to integrate with. There are some promising solutions to this problem that I will write about in future posts.
Dominant Voice Devices & Ecosystems
On which types of devices and in which voice ecosystems will most voice interactions take place? It’s still unclear which will be the dominant voice device and which will be the dominant voice ecosystem. This is important because it influences which platforms developers and creatives are incentivized to build for, which in turn influences where consumers will engage. While there are a plethora of voice apps for Alexa and Google Assistant, have yet to see a “killer app” built explicitly for voice devices. Right now, the telephone (automated voice systems) is still the biggest voice channel by far. It’s not the ideal medium because the sound quality is poor, but what’s nice about it is customers are already using this channel to interact with businesses, and these experiences can live outside of the restrictive voice ecosystems of Alexa, Google Assistant, etc. Discovery is also easier with telephony than it is with voice speaker apps — a business can slap a phone number anywhere they want and point their customers to it. You even could have different phone numbers for different use cases with specialized virtual agents living on each number: “call this number to give us feedback, call this number for customer service, call this number to get some product advice, etc..”. Beyond telephony, I do think that a mobile-first voice platform has to emerge. There is a big gap in mobile-first voice experiences right now, and most developers are thinking about voice solely in terms of voice speakers in the home or in the car at the moment. Maybe Alexa / Google Assistant will start to prioritize mobile…or maybe someone else will seize this opportunity and dominate the mobile-first use case. Only time will tell!