Posts

Until the release of Amazon’s Echo, aka Alexa, the big players had paid little attention to voice technologies. In the meantime, there are numerous other variants, but which are the best known and which voice interface is the most suitable?

Today’s voice interfaces are a combination of two components, namely transcription and natural language processing (NLP). A spoken sentence is transcribed into text. This is analysed using artificial intelligence, based on which a reaction is generated and converted back to analogue speech via a speech synthesis (see also part 1).

Different classifications

Conversational interfaces are differentiated by whether they use so-called knowledge domains or not. Knowledge domains are digital structures that map knowledge around a given subject area.

1) Conversational interfaces with knowledge domains 

Conversational interfaces with knowledge domains are not just about parsing phrases, but about understanding the actual meaning behind a sentence. These types of interfaces are called smart assistants. Consider this sentence, which is simple for us humans: “Reserve two seats at a two-star restaurant in Hamburg!” – it is very easy for us to understand. We know that a restaurant can be given ‘stars’, that Hamburg is a city and that you can reserve seats in a restaurant. However, without this prior knowledge, it is difficult to make sense of the sentence. ‘Two Stars’ could just as well be the name of a specific restaurant. What two seats are and how to reserve them is then completely unclear. That a restaurant with certain characteristics in Hamburg is to be searched for, is also unclear. However, Smart Assistants should be able to precisely understand these concepts and therefore require special basic knowledge in respective domains such as gastronomy, events, weather and travel.

2) Conversational Interfaces without knowledge domains

Conversational interfaces without domain knowledge, such as Alexa, do not have this skill. Instead, they use a different approach. For a possible dialogue, sentence structures are specified during implementation in which variable parts, so-called slots, are defined. The spoken sentence is then analysed and assigned with a sentence structure. Subsequently, the component which generates the response to what has been said is informed of which sentence structure has been recognised by the given variable parts. The fact that this does not require any basic knowledge is clarified by the following sentence: ‘I would like to buy a red shirt’. At this point, the system does not need to know anything about clothes or colours because it just compares the phrase with given phrases related to buying a shirt. For this purpose, it is defined in the interface dialogue model that there is a sentence structure with an ID called, for example, ‘shirt purchase’. It is then subsequently determined that the sentence structure may have the following characteristics: “I want to buy a <colour> shirt”, “I want to buy a shirt in the colour <colour>” and “I want to buy a shirt in <colour>”. In this way, it also defines that there is a variable phrase (slot) named ‘colour’. The desired possibilities for this slot are indicated, e.g. ‘red’, ‘green’ and ‘yellow’. If the user utters the above sentence, the analysis shows that it has the ‘shirt purchase’ sentence structure with the value ‘red’ for the slot ‘colour’. In a correspondingly structured form, a back-end system can already begin to build something with this information.

The current key stakeholders

Until the release of Amazon’s Echo, aka Alexa, most IT companies had paid little attention to voice technologies. Although Siri was released with a bang, it was perceived more as a helpful tool rather than a whole new class of interfaces. However, the advantages of hands-free features for mobile devices were not to be dismissed and today each big player develops their own language solution. Here is a brief introduction to the current key stakeholders:

Amazon‘s Alexa

If you look at the Amazon product range, it is clear that Alexa is a logical development from already existing technologies. The Fire Tablets (launched 2013), Fire Phone (2014) and first Fire TVs (2014) were already equipped with voice control. However, Alexa’s ‘Voice Interface as a Service’ or ‘Alexa Voice Service’ technology is still not considered a Smart Assistant. Instead of analysing the meaning of sentences, they are simply compared in the background. When asked more complex questions, Alexa quickly bails out. The reason for this is that it only handles superficial knowledge domains that are not open to the developer. In addition, requests that can be expressed to an Echo must be very concise and not overly complex in their formulation. For example, films can be searched for using the name of an actor, or restaurants can be searched for by indicating the area. However, it does not get much more complex than this.

Google Assistant

Google Now was originally part of Google Search and was only searchable on the web. Later it was spun off to expand domain knowledge, making it more competitive with wizards like Apple’s Siri or Samsung’s S Voice. Last year, Google Now was replaced by Google Assistant. The extent to which the various knowledge domains in the Google Assistant are interlinked was impressively demonstrated at the Google Developer Conference with the ‘Google Duplex’ product. As a component of the assistant, Google Duplex can make phone calls to real people and make appointments for the hairdresser, for example, or even book a table. In doing so, the assistant not only accesses the appointment calendar, but must also have appropriate domain knowledge.

Apple‘s Siri

The story of Siri is a bit different. The Smart Assistant was developed by the Siri Inc. company and from the outset took the approach of analysing language by means of domain knowledge. Siri Inc. is a spin-off of the Stanford Research Institute (SRI). Fifteen years ago, SRI collaborated with these institutions on the CALO (Cognitive Assistant that Learns and Organizes) project, the experience of which influenced the development of Siri. Siri was released in the App Store in 2010 and Siri Inc. was promptly bought by Apple. A year later, Apple then officially announced that Siri is now an integral part of iOS. It has since been unrolled across all platforms. Most recently, the HomePod was released as a smart loudspeaker that reflects the current trend in voice interfaces and is comparable to Amazon’s competing product, Echo.

Microsoft’s Cortana

Microsoft’s Cortana was presented to the public for the first time in 2014 at a conference. Also designed as a Smart Assistant, Cortana features interesting reality-based adaptations. For example, a real assistant usually takes notes about their supervisor or client in order to get to know the person better and remember their habits. This is where Cortana uses a virtual notebook. For example, when being used for the first time, Cortana asks a few preferences in order to be able to provide personalised answers at an early stage. This functionality can also be prompted as needed. The key element of Cortana is Bing; Bing-based services allow you to make informal queries with the search engine.

Samsung’s Viv

Samsung has also been trying to establish intelligent software for their devices for quite some time, which naturally must also include a voice interface. In 2016 Samsung bought the company of Siri’s developers, Viv Labs. Viv Lab’s system fully relies on domain knowledge. Unlike its competitors, however, Viv is able to extend the knowledge base of external developers into new domains. As a result, the system should become more intelligent and be able to understand more and more. For example, imagine a whiskey distillery. With the help of experts, the Viv is provided with knowledge about the domain of whiskey and its products. In addition, a distillery shares all of its knowledge concerning wooden barrels and their production. The Viv domain knowledge now provides valuable expertise on which wooden barrels influence the taste of certain types of alcohol. For example, oak barrels provide whiskey with a vanilla flavour. If I now ask Viv what results in the vanilla note of a particular whiskey from said factory, Viv can answer that this taste is most likely due to oak barrel aging. Thus, Viv has merged both domains.

IBM’s Watson

To clear up any misunderstandings, IBM Watson should also be mentioned here. There is no ‘Artificial Intelligence Watson’ that understands everything and continuously accumulates knowledge. Instead, Watson is a collection of various artificial intelligence tools brought together under a common concept that can be used to realise a wide variety of projects. In addition, there are projects that serve to build up a large knowledge base. However, one should not labour under the illusion that each Watson project provides access to this knowledge. If you want to implement a project with Watson, you need to provide your own database – just as with any other machine learning toolkit. Among other features, Watson provides transcription (The IBM® Speech to Text Service) and text analysis (Natural Language Understanding Service) tools. If you want to implement a project together with Watson, you build on these two tools when implementing voice interfaces.

From analysing the problem to finding the right voice interface

Of course, there are many additional solutions, some of which are very specialised, but which also aim to break through the restrictions of the big players in order to offer more development opportunities. Now, the question naturally arises: But why all the different voice interfaces? As with many complex problems, there is no single universal solution. There is no ‘good’ or ‘bad’ interface. There are only ‘right’ or ‘wrong’ applications for the different technologies. Alexa is not good for complex sentence structures, but is great for fast conversions and is already widely used. On the other hand, while Viv has not been able to assert itself yet, it has the potential to understand random and complex sentences.

The selection of the right voice interface therefore involves choosing certain criteria, such as the application, focus, problem definition, needs of the target group and how open an interface is for integration into your own projects.

This is the second contribution of a four-part series on the subject of voice interfaces:

On the face of it, the SXSW is a pretty poor deal. You spend 12 hours on a plane and then rush around downtown Austin with 30,000 other lunatics for a week to listen to lectures and panels in air-conditioned 80s-style conference rooms. Doesn’t sound very inspiring. For me, the conference is nevertheless one of the absolute highlights of the year, because you’d be hard pressed to find a higher concentration of excellent speakers on current trends in the digital world. Read about the topics and lectures I am particularly looking forward to below.

Digitisation has arrived in society

In recent years it has become apparent that the times when you had guaranteed attention with the next hype platform or app in the market are over. The issues have no longer been revolving around digital services or the marketing behind them for a while now, because digitisation currently covers all areas of life. The impact of this process on society, working life, health and urban development will be the dominant themes of the conference, as they were in 2017. The same goes for the demand for specific solutions that include new technologies in product development and the creative process.

The perennial favourites: VR, AR & AI

Virtual reality continues to be a hot topic, especially in the creative industries. While the search for meaningful application scenarios outside the niche continues, augmented reality is preparing to make the breakthrough into a modern storytelling tool suitable for the mass market.

AI, on the other hand, is much more established: Data as the DNA of the modern world and ever better algorithms promise automation and increased efficiency in many areas. But how much of this will find its way into consumers’ everyday lives? Amazon Echo & Google Home are now in millions of homes, but currently lead a sorrowful existence as glorified light switches and Bluetooth speakers for Spotify. What do the truly smart assistants of the future look like in comparison? And how are various industry pioneers already using AI today for communication, data analysis or product development?

Blockchain self-awareness

This year’s theme for tech conferences is probably inevitable: the blockchain. The flagship project Bitcoin has evolved from a democratic, borderless payment system into an investment bubble for dauntless investors. But there is tremendous potential in the technology behind it. How will smart contracts and transaction-based systems change our economic life, business processes and, ultimately, marketing? Ethereum co-inventor Joseph Lubin has titled his lecture “Why Ethereum Is Going To Change The World” and the other actors in the blockchain business are not lacking in self-awareness. It will be interesting!

Gaming & eSports

Representatives of the gaming and eSports world are also confidently taking an increasingly prominent place at SXSW. Often ridiculed by outsiders, gaming has now become a dominant force in the entertainment industry. The professionalisation of the eSports scene reached new heights in 2017 with millions invested in tournaments and teams. So if you’re still around in the second week of the conference, you should drop in on the lectures of SXSW Gaming. It could be interesting to see what the industry’s ROI expectations look like and what opportunities there are in marketing.

Problem children start-ups & disrupting dystopia

In contrast, the start-up scene in Silicon Valley is experiencing a bit of a crisis. At last year’s elevator pitches, every second comment was “Nice idea, but what are you going to do in three months’ time when Zuckerberg copies you?” The stifling market position of the Big Four has noticeably cooled the willingness of investors to provide seed capital for new start-ups. How can start-ups continue to raise capital to make their ideas a reality and grow in a world dominated by Facebook, Google, Amazon and Apple?

A few months after the Trumpocalypse, the mood in 2017 was somewhat gloomy, a rather atypical level of self-reflection for the industry. In our enthusiasm for the digitisation of all areas of life, have we underestimated the risks of a fully networked and automated world? What will be left of the quiet self-doubt in 2018? The closing keynote from SciFi author & SXSW bedrock Bruce Sterling is likely to be an excellent barometer. An hour-long rant with subtle peaks against the self-loving tech and marketing scene will surely be a highlight once again. A fitting title for 2018: Disrupting Dystopia.

Away from the lectures

In addition to the lectures and panels at the conference, the event spaces of the numerous brands and companies will be another highlight. Exciting from a German point of view: the presence of Mercedes-Benz. The joint focus of the me Convention during the IAA had already indicated far-reaching cooperation with SXSW. Mercedes and Smart are now on the starting line in Austin as super sponsors and are hosting their own lectures and events on the topic of Future Mobility in Palm Park, right next to the Convention Centre.

In addition, visits to the brand locations of the Japanese electronics giants Sony and Panasonic are also likely to be worthwhile. In 2017, Panasonic exhibited numerous prototypes developed in cooperation with students on the subject of the Smart Home. Sony, on the other hand, devoted itself to VR.

The large number of lectures, panel discussions, pop-up locations and the numerous events off the official program make planning your SXSW visit a challenge. When you think back to your time in Austin on your flight home, you often realize that the most exciting lectures were those you caught by chance, that the best Brand Lounge was one where you just happened to be passing by and you only met the most interesting people because they were standing next to you in the endless queues. Resisting the temptation to plan everything in advance makes a visit to SXSW all the more interesting.

Auf dem Innovationstag von Serviceplan diskutierten der renommierte Münchner Philosoph und Kulturstaatsminister a. D. Julian Nida-Rümelin und Martina Koederitz, Vorsitzende der Geschäftsführung IBM Deutschland, gemeinsam mit Klaus Schwab, Geschäftsführer der Plan.Net Gruppe, über neue ethische Standards.