2019 is already knocking on the door – new year, new trends. At the end of the year, we asked the Serviceplan Group experts about their personal trends for 2019. What’s coming next alongside influencer marketing, new work and sustainability? The communication professionals give their verdict here. Happy reading!

Scrum, Kanban, design thinking, prototyping and collaboration are working styles and methods that have their origins in product design and software development. In recent years, they have found their way into the development of digital platforms, products and services. Now we are experiencing how they are beginning to change the way people work across communication agencies: in the future, communication strategies and communication campaigns and measures alike will be designed and planned more and more collaboratively – including in partnership with customers – in sprints.

 

This article is part of the Trends 2019 series of the Serviceplan Group.

We take a look back at over 20 years in the digital world and the players involved. Then most importantly, it’s time to focus our eyes on the road ahead, as digital agencies, and the challenges we will have to face. Action!

The new Western

It’s a tired metaphor. Way back in 1990, when American activists were worrying that the government was going to take over the Internet, they named their cause the Electronic Frontier Foundation [1]. And the word “Frontier” wasn’t chosen at random. In the American West, the frontier was the artificial boundary between a civilised world governed by laws and a wild and untamed territory [2].

The EFF saw the burgeoning web as the new Wild West, a country beyond the American frontier, where people could say whatever they liked, and were free to create, undertake and experiment.

The metaphor still holds some truth, but history has now caught up with digital. For those who have been living in the digital world for the last twenty years, life is now strangely reminiscent of Once Upon a Time in the West [3].

Twenty years ago, pioneers created the first websites, the first businesses, taking advantage of the opportunities offered by a free and boundless world. Ten years ago, the first infrastructures – search engines, social networks, eCommerce platforms – began to consolidate, and become successful companies. The internet is now comparable to big cities that sprouted from the desert: some companies are still flourishing, but only under the cover of GAFA, and the infrastructure they provide at varying costs. The pioneers have become the users. From creators who started out with the tools to express themselves in a basic style, pioneers have now become residents in a gigantic network of sites and platforms with which they interact daily.

The Electronic Frontier of the pioneers has all but disappeared, giving way to a digital space that we would like to think of as civilised, or at least created logically and with infrastructure. And their fear of the internet falling under the jurisdiction of the government has now given way to their concerns about the stranglehold of Silicon Valley giants.

Our internet civilisation begs the question: where are humans in all this?

The Civilisation of the internet

While its infrastructure has become more sophisticated, the internet has also become democratised. Such a large number of human beings have never been connected to the same tool at the same time. All you need to do is take a look at the attendance figures of some social networks to realise how much digital technology has disappeared from our screens… and given way to interactions: Facebook reports 2.2 billion users, Instagram 1 billion amateur photographers and WeChat has exceeded 1 billion members.

But what do these figures really mean? Do digital users have a real understanding of the ecosystems to which they are contributing? Far removed from the approach taken by the pioneers and creators of the digital world, today’s users consume digital interfaces superficially, without knowing how they work, nor even being aware of the consequences of how they generate and share their content.

In a digital and increasingly automated world that relies more and more on algorithms – and soon Artificial Intelligence – digital illiteracy is becoming more serious, and more widespread.

The myth of the Digital Native – a child who grew up with a computer being naturally gifted for technology – has long since been disproven…

Honesty and transparency

So without full knowledge of how it all works, users not only need simple interactions, but also clear and straightforward interfaces that are honest about the impact of their online behavior, or which digital players are privvy to the content they share. Better still, interfaces are needed that educate and enlighten people about the real issues of our digital society. Between Cambridge Analytica and GDPR, 2018 has been a year of raising awareness.

Digital interactions must be created transparently and honestly, but must also be simple and beautiful to look at. In a nutshell, they need a great design.

These concerns are at the heart of the design profession. Designers should be coming up with systems that interact with humans while being practical, aesthetic, real and economically viable.

Theory of evolution

Between giant and industrial infrastructures and users who aren’t fully aware of what they are doing or what they could be doing, intermediaries need to start redefining their roles.

More specifically, digital agencies that support companies in their daily digital experiences, those that provide users with the tools, content and interfaces that are the gateway to the digital world need to take a good hard look at what they are developing.

Web agencies used to create ecosystems (websites, platforms, intra- and extranets), but are now seeing their role slowly mature.

First because the systems used to create digital platforms are being industrialised. CMS, Workframe and other SAAS platforms sometimes render custom development completely unnecessary. Brands are now online, and services often involve adjusting an existing solution rather than developing an idea from scratch.

Audiences are also now gathering on a handful of very large platforms. Facebook, Instagram, YouTube, as well as other content platforms attract billions of users every day, and have become cornerstones to the online experience. And it’s often easier and more useful to approach people on these networks rather than trying to create a new online contact point. Digital consumers are now being targeted on the spaces and systems they already use.

Digital agencies are having to answer a new question: they are no longer asked to create new ways to use a system, but how to make the most of the existing digital space. They are asked to optimise digital interactions between brands and users. Morphing from their role as creators and builders, agencies have now become developers, interior designers, promoters…

In a word, they are now being asked to design.

Digital agency, Design agency

What is Design? According to some definitions, including Wikipedia’s suggestion, design is the intentional creation of a plan or specification for the construction of an object or system or for the implementation of an activity of process. Design is where art, technology and society converge. And that’s what digital agencies are doing right now.

Agencies aren’t just creating websites, campaigns or content, they are also coming up with the tools and methods that allow these elements to exist and be maintained in the long term. What about the aesthetic and technical aspects of the internet? They’ve been an integral part of agency work for years. Any online content needs to be effective and beautiful, useful and enjoyable. That’s what the internet is all about.

Perhaps the new element in all this is societal responsibility. Digital agencies are now creating, designing, and recommending digital tools to a vast audience whose daily life can be influenced and transformed at the tap of a finger. And that responsibility – if not societal, then at least human – must now be shouldered by all online companies.

Creating daily digital interactions is what makes digital agencies genuine designers in their own right.

Translated from French by Ruth Simpson

The current functionality of voice interfaces is far from optimal. Vocabulary gets misunderstood, and entire sentences are interpreted wrongly. There are also many limitations on development in the most commonly available interfaces. What technological improvements need to be made to achieve greater acceptance among people? And what development trends are the big players on the market pursuing? We take a look at how voice interfaces are developing and where major potential can be unlocked.

The international market for voice interfaces is developing rapidly and in different directions. A few businesses are focusing on improving the understanding of speech, while others are working to add convenience functions to established technologies. For instance, Alexa will soon be able to distinguish between multiple users by applying voice analysis. Smart assistants will be equipped with deeper knowledge so they can understand increasingly complex speech inputs and grow smarter in the process.

Thus, for example, Samsung are currently working on models for Viv that external developers will be able to expand in future in order to create an increasingly broad knowledge base. What’s more, niche markets are forming for highly specific application fields for conversational interfaces. These are already available for working with product data or in-car solutions, for example.

The big players’ plans

With Alexa, Amazon’s goal was not to bring a smart assistant to the market; rather, the idea is to offer developers the opportunity to create new skills for it. Its functionality is designed to grow, thus expanding the range of possible applications and ensuring that a market develops specifically for this interface. Other systems tend to be difficult for external developers to expand. For instance, adding a domain to Siri’s knowledge – or in other words, adding knowledge of a particular field – would have a huge impact on its overall functionality.

A good example of this is the word “dry”, which can mean “arid” but also can be used to describe wine. If both knowledge domains were implemented without being coordinated with each other, a sentence such as “I like it dry” would be difficult to interpret. By contrast, the classification would be unambiguous if there were only one knowledge domain. That’s why the Apple environment offers no way of programming Siri independently. With Cortana and the Google Assistant, the expansion opportunities are restricted so that Voice Skills or Actions (Google’s equivalent to Alexa Skills) can be developed, but cannot access existing domain knowledge. For developers, this puts them on an equal footing with Alexa.

Amazon focuses on in-skill purchasing

Microsoft and Amazon are working to integrate Alexa into Cortana, and vice versa, in order to broaden the market. Initial reviews can already be found online. What’s more, Amazon is working to bring more and more hardware for Alexa (or with direct support for Alexa) onto the market. These include buzzers – simple buttons that allow users to trigger an action in order to increase the scope of gamification – as well as all kinds of Echos and even smart hub integration for Philips Hue, among others.

So far, however, the market for Alexa Skills has proven to be a zero-sum game. Revenues boiled down to the profits generated through use of Amazon Web Services, and were only earned once a specific usage volume was achieved. The introduction of “in-skill purchasing” has changed all that, in the USA, at least. In-skill purchasing is similar to in-app purchases, and is the first method of voice interface monetisation to be supported by a provider. Amazon takes a 30% cut of every purchase and every subscription, which is roughly equivalent to what Apple and its competitors earn on the app market. This model will be coming to Germany soon, although Amazon has not yet released any more specific information on this topic.

Google focuses on artificial intelligence

Google is tackling a much broader field in its development of voice interfaces. The Duplex system was unveiled at this year’s “Google I/O” conference, and provides additional functionality for the Google Assistant. It uses artificial intelligence (AI), is capable of understanding conversations, and speaks with a remarkably realistic human voice.

But what exactly does that mean? Suppose my favourite sushi delivery service doesn’t let me place orders online, and I need to order over the phone. All telephone conversations of this kind follow the same principle: I state where I live and the dish I want to order, and in reply, I am told how much I need to pay and what time the food will get to me. Google created Duplex for exactly this kind of situation. It can be instructed to make phone calls independently and arrange appointments on your behalf, for example. And when it does so, it’s hard to believe that there isn’t a genuine caller on the line. Intonation and pauses play a special role here, as well as the natural flow of its speech. Duplex thus benefits from Google’s prior deep engagement with natural language.

Google also developed Tacotron 2 in order to artificially generate a human speaking voice (a process known as speech synthesis). Like its predecessor, this new system is trained on the established Deepmind WaveNet neural network as a basis for generating natural language; however, the new feature is that the neural network now receives data on pitch. This YouTube video by CodeEmporium shows exactly how this works and how the system functions. The system can also be tested with different languages on Cloud Text-To-Speech – just make sure you specify the “WaveNet” voice model when you do so. However, prospective users of this system should take note that it is four times as expensive as the existing Cloud Text-To-Speech.

Samsung and Apple keep their cards close to their chests

Unfortunately, it’s completely unclear why Samsung acquired Viv Labs and how this system is being developed. It remains to be seen whether Viv will replace Samsung’s previous Bixby solution, or whether the Viv technology will be integrated into Bixby. However, it is clear from Viv’s overall history that it represents a significantly improved version of Siri, with major potential (see Voice Interfaces – The Here and Now).

By contrast, Siri’s development seems to be stagnating. The only major innovations over the past year were voice macros, which make it possible to activate small macros using a pre-saved voice command. Yet this could be the proverbial calm before the storm. After all, Apple’s HomePod would be ideal as a possible competitor to Alexa. To achieve that, however, Apple would first need to open the Siri interface up to developers and make it possible to write software for the HomePod.

Where does the journey lead?

Beyond voice and conversational interfaces, machine learning is also a major buzzword at the moment. The advances that have been made in voice interfaces over the last few years would have been impossible to achieve without machine learning. Whether for transcription, text analysis or speech synthesis, neural networks are used everywhere, and are yielding ever more astonishing results.

For example, a voice interface that was trained on a single voice would be able to clearly recognise and process a voice belonging to a specific person – even through a din of background noise – with the help of neural networks, as well as its knowledge of all the features of that voice. Anyone who has tried to use their Alexa Smart Home controls while watching a film will understand how important this development would be. After all, users do not want to shout at their voice interface in order to make themselves heard over the ambient noise; rather, they want to communicate at a normal volume. What’s more, if individual voices could be separated, that would significantly expand the fields of application for voice interfaces.

Looking beyond optimised speech processing, it is striking that all smart assistants to date have been completely impersonal. All that might be about to change, however, as a completely digital newsreader has just been showcased in China. This offers significant potential for product providers. Even if the film “Her” depicts a particularly personal relationship with a voice, it is undoubtedly true that people build closer emotional connections with realistic personalities. Just look at the success of influencer marketing. VR and AR technology might also allow this kind of assistant to keep us company in human form wherever we go.

Where does the greatest potential lie?

Computing power: Given the security issue that all data processing performed by voice interfaces takes place in the cloud, we can predict that in future, there will be more and more solutions in which the processing takes place locally. At present, almost all data is processed and stored in the provider’s cloud. This is mainly because many solutions exceed the capacity of the user’s own computer. Yet processing power is constantly growing and getting cheaper. As such, it’s only a matter of time before voice interfaces will be able to function perfectly on smartphones that are offline.

Language comprehension:

Many companies are also working on understanding speech at the level of content. All modern voice interfaces become useless when it comes to interpreting more than one individual sentence – such as the content of an entire story. As they currently stand, voice interfaces focus primarily on statements of intent rather than on knowledge content. The interface is designed to understand what the user wants in order to provide a response. By contrast, extracting knowledge from texts is about capturing knowledge and saving it in ordered structures.

Let’s take the example of a service employee on a hotline who has to handle a five-minute conversation with a customer regarding a complaint. In order to help the employee do their job, there are already a few solutions available that can identify keywords in the conversation and display relevant topics on a screen. Yet it would be even more useful if the interface could extract the essential content from the conversation and display the key points on a screen, so that the employee could then address these points in the discussion. For this to happen, the system would need to be able to understand the content of what the user is saying and evaluate or prioritise it as appropriate. Going further, a conversational interface could also extract information from emails or even chatbots and quickly make all the relevant facts available to service employees.

A great deal of additional research is currently underway in the field of knowledge representation and natural language understanding. Likewise, more and more self-learning technologies are being developed to undertake text analysis, such as word embedding. Here too, it is only a matter of time before systems become available that can understand highly complex content.

Recognising and verbalising image content:

Something that most people tend to encounter only peripherally is the idea of accessibility in the digital world. In the past, Siri made a major and very important contribution to helping people with visual impairments to use smartphones conveniently. The use of voice interfaces is particularly relevant to people in this position.

In addition, the field of machine learning harbours many research projects that focus on recognising image content. This is no longer merely a matter of telling dogs apart from cats; rather, it revolves around image constructions with many components. To take an example: imagine a system that can recognise and describe the location of a street – what’s in front of it, what’s behind it – or can recognise whether a traffic light is currently red, or read the symbols on road signs. Taken together, these technologies would deliver significant added value: a system for the visually impaired that describes what is currently happening in front of the user, warns them when obstacles come into view, and provides reliable navigation.

Conclusion

Voice interfaces have come a long way – yet from day to day, it still doesn’t feel completely natural to use interfaces of this kind since their capacity to understand speech is still too limited. However, there are people working on these problems, and it is possible to envisage a future in which we talk to our digital assistants almost routinely – perhaps even telling them about our ups and downs, and receiving understanding responses or even ideas and encouragement in return. Time will tell what impact this might have on our social lives. Every major technology to date has brought advantages and disadvantages in its wake – we just need to make sure we deploy it prudently.

This is the final contribution of a four-part series on the subject of voice interfaces

2019 is already knocking on the door – new year, new trends. At the end of the year, we asked the Serviceplan Group experts about their personal trends for 2019. What’s coming next alongside influencer marketing, new work and sustainability? The communication professionals give their verdict here. Happy reading!

Influencer marketing is (finally) starting to take off: the measurement of the contribution made by bloggers, instagrammers and the like to advertising impact, as well as an efficiency analysis of their actions as part of an integrated communication complex are just as important as the creative, individual and target group-oriented implementation of a brand message. Slowly but surely, high-quality advertising is emerging from the hype surrounding influencers.

More and more companies and brands are interested in working with bloggers, vloggers and social media stars, big and small. In doing so, they seek to achieve authentic marketing of their products and services, increased engagement with their brand or the achievement of a new target group that is difficult to access with traditional media.
All of this can and must be done by influencer marketing – however, for companies entering the field, measuring the impact on business-relevant goals such as increasing product or service awareness or selling remains a challenge. Therefore – to reduce complexity – often only the range (which is also equated with the number of followers on the respective channel) is used as the most important metric for measuring performance.

But where does this simplification come from? In my opinion, it comes from an attempt to put influencer marketing on the same level as traditional media bookings. We juggle CPCs, CPMs, and CPOs, forgetting that influencer marketing can be so much more: it is a matter of working with people who are enthusiastic about their brand and want to bring it closer to their followers – for a fee, of course; however, influencers provide their loyal, laboriously acquired fan base for the advertising activities of a company and create self-directed content for it, which is usually even provided to the brands for further use.

Precisely these elements should be thrown into the balance when making future evaluations of influencer marketing – and from these considerations comes the biggest short-term trend of this discipline: the professionalisation of influencer marketing with a focus on efficiency and impact measurement in comparison with other media formats and channels.

By leveraging tools from qualitative and quantitative advertising impact research, the use of equivalence studies and various technical platforms and tools that predict media-relevant KPIs, such as reach, impression, or engagement, based on the budget forecast, we are working on an integrated evaluation of influencer marketing projects. Only in this way the ‘influencer marketing’ trend can become a discipline that is indispensable to the advertising landscape of the future.

 

This article is part of the Trends 2019 series of the Serviceplan Group.

In the series The inside story x 3, experts from the Plan.Net group regularly explain a current topic from the digital world from different perspectives. What does it mean for your grandma or your agency colleague? And what does the customer – in other words, a company – get out of it?

In the western media, China’s social credit system is often compared to the Nosedive episode of Black Mirror and described as an Orwellian nightmare. Based on online search requests, shopping history, education, criminal record, social media behaviour and many other factors, every citizen is to be evaluated according to a point system. If the three-digit score is too low, there are far-reaching consequences. some jobs will be blocked for you, your children will not be able to attend good schools, travel will be denied, and you’ll also be unable to get a loan. The picture of the social credit system painted by the western media looks very disturbing. But fortunately, the reality is not quite that bad.

China’s social credit system is an ecosystem composed of various initiatives

In 2014, the Chinese government announced a plan that provided for setting up an extensive social credit system by 2020. The aim is to improve governance and create order in a country that often has to combat fraud. As China is in the process of becoming a market economy, it still does not have properly functioning institutions, such as courts, to deal with these issues. For this reason, the Chinese government is trying to establish a kind of reward and punishment system to promote trustworthiness and integrity. The central Joint Punishment System puts citizens on a blacklist if they violate certain rules. For example, if a citizen is ordered by a court to pay a fine and does not pay, they are put on a blacklist. After that, the person concerned cannot book flights, travel first class by train, or purchase luxury items on TMALL or Taobao until they have paid the fine. Furthermore, they are denied access to loans and government jobs.

However, this Joint Punishment System does not assign scores to citizens. The basis for this mistaken idea is related to Alibaba. The Chinese government is not alone in working on a social credit system – private-sector companies have also launched initiatives. In the western media, these are often lumped together and confused with each other.

Like Amazon, Alibaba is an online retailer that provides a platform for merchants to sell their products to consumers. At the time when Alibaba set up its e-commerce business, China was largely a cash country in which few people had credit cards. To be able to implement their business model, Alibaba had to secure payment transactions between buyers and sellers. As there was no provider like Visa or MasterCard in China that could handle this task, Alibaba had to set up its own payment infrastructure. Alibaba’s subsidiary Ant Financial was established for that purpose. As most people in China cannot show a documented payment history, Alibaba needed other factors to enable them to assess the creditworthiness of consumers and build trust between merchants and purchasers. That was the origin of the Sesame Credit Score system.

The score can range from 350 to 950 points and consists of several factors: the amount of revenue at Alibaba, whether purchased products, as well as electricity and phone bills, are paid on time, the completeness of personal details, and social contacts.

In addition, the Public Bank of China (PBoC) plans to develop a national creditworthiness check comparable to the SCHUFA Report in Germany. However, they lack the data necessary for this, so in 2015 the PBoC contracted eight companies on a trial basis to develop an official credit scoring system. Sesame Credit was one of those companies. Due to privacy concerns and conflicts of interest, in the end none of these companies received an official licence for their rating system. Instead, a joint venture composed of the eight companies and the China Internet Finance Association was founded. This joint venture is called Baihang Credit, and it is the first uniform credit information agency in China.

The Sesame Credit score offers my grandma lots of advantages

Participation in the Sesame Credit point system is currently voluntary and does not have any downside for its users. Actually the score resembles a loyalty programme, like collecting air miles. Ant Financial cooperates with many external partners that reward customers who have high scores and offer them many advantages. For example, my grandma with her high score does not have to pay deposits for hotel bookings, car rentals or bike rentals. She is directed to the fast lane at airport security, and her visa application for Luxembourg or Singapore gets priority treatment. Quite a few singles also post their Sesame Credit score on Baihe, Alibaba’s online dating service, in the hope of increasing their chances.

The score is intended to be a means of building up mutual trust. However, its additional use outside the Alibaba platform and the immediate financial context as a criterion for government tasks, such as airport security or issuing a visa, is a questionable mix of different sectors.

Impact of the score on companies: are product categories evaluated differently?

In a press interview, Li Yingyun, Technology Director at Sesame Credit, indicated that the type of product purchased affects the score. For example, buying nappies would increase your score because the system thinks you are a responsible parent. By contrast, if you buy a lot of video games, you are seen as less trustworthy, with a negative impact on your score. Although Ant Financial later denied this statement, doubts remain. For companies that market their products on the Alibaba platform, this represents a great uncertainty. If their products are in a category that is weighted negatively by the algorithm, that could lead to declining sales of these products in future because consumers are afraid of losing points.

Do the scores of my colleagues affect my own score?

One thing that aroused interest in the western media was the rumour that the online behaviour of your friends could be considered in the calculation of your own score. Alibaba has denied this. According to their statements, what matters is the size of your social network, not the online behaviour of your contacts. That’s because the more verified friends you have, the less likely it is that your account is fake.

We should follow the developments in China with a critical eye

It remains to be seen how the social credit system will develop up to 2020. Nevertheless, presently there is not (yet) any overarching AI-based super-system that evaluates the Chinese population according to a rating system and affects all aspects of their lives.

When it comes to China and technology, we quickly assume the worst and can easily imagine nightmare scenarios. However, the developments are often a bit more complex, and a critical attitude to news from the Far East is worthwhile. Especially for companies that are active in the Chinese market, it is essential to do your own research and keep a close eye on the market. The following websites that report on technological, economic and cultural developments in China can serve as a starting point:

  • Tech In Asia and Technode are blogs that discuss technology trends and the latest news on start-ups and large companies in China. Technode posts short daily briefs that explain what is happening and why the news is relevant. Their China Tech Talk podcasts are also recommended.
  • The South China Morning Post has a good business section as well as extensive technology coverage. If you’re looking for the latest headlines on China’s Internet heavyweights Alibaba, Tencent or JD.com, that’s the right place. However, you should bear in mind that Alibaba bought the newspaper in 2015.
  • Radii China primarily deals with the cultural aspects of present-day China, and Magpie Digest gives good insights into China’s youth culture.

There’s no longer any doubt that artificial intelligence (AI) can be creative. The question now is exactly what role AI is capable of playing in the creative process. Will this role be limited to that of any other tool, like a paintbrush or a camera? Or could AI become the muse, or even the independent originator of new creations? Could it even be responsible for the extinction of artistic directors as a species? If so, when?

For the time being at least, I can reassure my colleagues that their jobs are safe. Nonetheless, it might be wise for them to start getting on the right side of this new co-worker. Even though the beginnings of AI development go all the way back to the 1950s, it’s only today that exponential development of the three “ABC factors” is enabling it to really gather pace (for the uninitiated: A is for algorithms, B is for big data, and C is for computer chips). That’s why the time has now come for every sector and every company to ask itself how artificial intelligence should be transposed and integrated into its everyday activities.

Within marketing, applications for AI have so far been concentrated primarily in the areas of predictive analytics (for example, for providing recommendations in online shops), personalisation (for example, for individually-tailored newsletters), linguistic assistance, and automation (for example, in media planning). Another important area of marketing, which has so far been almost entirely ignored, is creativity. This is often entrusted only to human hands, and portrayed as an unassailable fortress. With sophisticated puns, poetry, sentimental melodies, magnificent graphics, and everything else that stirs our emotions, there’s surely no way that the processors of a cold machine could ever dream up creative content – is there?

Perhaps we shouldn’t be so sure. For there are already numerous examples today of how artificial intelligence can support, expand, or even imitate human creativity – and the numbers keep growing.

AI can write

How many journalists relish the prospect of laboriously scrolling through the same stock market updates, sporting results, and weather forecasts every day? No problem: the responsibility for texts like these that follow a fixed format can now be shouldered by AI – and without the reader being able to tell the difference. Who knows when robo-journalism will lead to the first advertising texts written by machines, or copy-CADs, as they’ve already been dubbed?

AI can speak

Adobe hasn’t only created the world’s most important program for image editing in the form of Photoshop, but has been hard at work on human speech as well: Adobe VoCo is Photoshop for voice files. After only 20 minutes of listening to a person talk, the program’s AI is capable of fully imitating their voice. VoCo doesn’t simply stitch together snippets of words already spoken by its human subject either, but is instead capable of pronouncing entirely new words as they are typed in.

KI can compose

A team from the University of Toronto has succeeded in programming AI to be able to compose and write catchy and memorable songs. The program, Neural Karaoke, was fed on more than 100 hours of music, based on which it produced an entire Christmas song complete with lyrics and cover graphics.

KI can construct images and grafics

So-called generative adversarial networks are capable of producing astonishingly realistic images from descriptions written by people. Simply put, they function by using a “generator” to randomly create pictures which are then evaluated by a “discriminator” that has learned to recognise objects with the help of real images. This process can turn the words “a small bird with a short, pointy, orange beak” into a photorealistic image.

KI can paint

AI program Vincent from product design specialists Cambridge Consultants, which is also based on generative adversarial networks, has extensively studied the style of the most important painters of the 19th and 20th centuries, and can now make any sketch drawn on a tablet resemble the work of a specific Renaissance artist.

KI can do product design

Intelligent CAD system Dreamcatcher from Autodesk can generate thousands of design options for metal, plastic, and other components, all of which provide the same specified functionality. The designs also have an astonishingly organic look which couldn’t be described as “mechanical” or “logical” at all.

KI can produce videos

Working together with MIT’s Computer Science and Artificial Intelligence Laboratory, Canadian company Nvidia has developed a technology that can synthetically produce entire high-resolution video sequences. The videos, which have a 2K resolution, are up to 30 seconds long and can be made to include complete street scenes with cars, houses, trees, and other features.

KI as the Art Director

Advertising agency McCann Japan has already been “employing” AI in the role of Creative Director for some time. AI-CD ß has been fed a diet of award-winning advertising for the last 10 years, and has already produced its own TV ad based on these data.

Big changes begin with small steps

What does all of this mean for us? Although we may still chuckle at the shortcomings of such AI applications today, development is now moving at an exponential rate – and the progress being made is impressive. This is why now is the time to start getting over the prejudices and fears, and to give proper thought to how we will construct creative processes in the future, together with the role that we want artificial intelligence to play in them. Big changes can’t be made at a single stroke, and are instead better implemented in many small steps. Barriers are best removed by being prepared to play around with new technologies in order to test them out and gather experience. True, doing this takes up a certain amount of a company’s time and resources. But those of us who begin with a small project and then slowly feel our way forward have much higher chances of achieving long-term success, and maybe even of helping to shape a new development in the AI world.

As Christmas trade slowly gathers pace, this year too it’s mainly prettily wrapped electronics that we can expect to see under German Christmas trees. November’s instalment of SEO News examines why we should keep a critical mind when it comes to technology, and also considers the possibility of Google’s homepage relaunch going awry.

Google is becoming a long quiet river

So, it’s finally happened. The last 20 years have seen Google not only set the standard for web-based search engines, but also lead the way with the minimalism and efficiency of its homepage design. During the Internet boom of the early naughties, Google’s simple search field with just its logo – or doodle – and two buttons underneath was the welcome antithesis of labyrinthine jumbles of links and tedious Flash intros. Much has happened since 1998, however, and the market leader from Mountain View is now finally bowing to the trend for constant and personalised stimulation. “Discover feed” is the name of a new feature which has been in the process of a progressive worldwide roll-out on desktop and mobile devices, including search apps, since the end of October. The first of several new functions announced by Google to celebrate its 20th birthday, Discover feed marks the first step towards an individualised response engine that delivers results without even needing to be asked questions (see our report). Although Google has experimented in the past with new homepage features that allow users to enter into popular subject areas, and with its assistant service “Now”, this is the first time that relevant content in the context of personal search histories is being presented in endless stream form. And, just like on YouTube, the whole experience is also available in a Night Mode which comes in special muted colours.

This design overhaul – the most comprehensive since Google’s very beginnings – has clearly been a difficult step for the decision-makers in Mountain View, even though the competition at Microsoft have taken a different visual tack from the start with their search engine Bing. With a striking new image to greet visitors to its homepage every day and the latest news, Bing has always provided more points of entry for its users than the market leader. It’s also interesting to compare Google with Amazon: for the Seattle-based retail search engine, content personalisation is the obvious starting point when it comes to homepage design. Perpetual upsell with the help of the A9 algorithm means that users are presented with countless individually-tailored offers. On the other hand, recent integration of increasing numbers of new features and placements has resulted in user experience and usability of design suffering significantly. The consequence seems to be that Amazon’s homepage design is devolving back into the confusing times of fragmented front page websites. Neither does user experience appear to be too great a sacrifice as long as takings are good. And for Google too, integrating paid ads into the Discover stream is naturally providing new forms of monetisation.

That said, the homepage itself may ultimately turn out to be a doomed model. Voice and visual search capabilities are now providing countless touchpoints for search engines, which may soon enough ditch classic web or app-based presentation formats to offer users a tailor-made answers and solutions package in their place. Until that time comes, SEOs will need to wait and see whether the new Google stream gains acceptance among its users, and what criteria Google’s Discover feed uses to generate its responses. This new, larger stage certainly shouldn’t go unused.

Led around by the nose

Technological progress is a function of modernity – it’s both its cause and its consequence. One of the clearest examples of just how deeply technology has embedded itself into our lives is the phenomenon of the search engine. Whether it’s Google’s vision of an invisible companion for the challenges of the unplannable outside world, or Amazon’s promise of immediate consumer satisfaction, neither project would be conceivable without the technology that functions as its beating heart. It was no different with the steam engine or with internal combustion. The difference is that the machinery driving the present chapter of modernisation is far harder to see inside. If the new diesel generator was something that you could take apart with your own hands, the same can hardly be said of algorithms and artificial intelligence, which exist only in distant clouds of data. And sometimes it’s difficult to shake the impression that the bombastic promises and visions of the high-tech industry are little more than a glitzy marketing show for a helplessly naive public.

This is why it’s always reassuring to catch the technological elite showing a more fallible side. To this end, the SEO Signals Lab Group announced a competition which challenged contestants to achieve a high ranking among responses to the search term “Rhinoplasty Plano” within the space of 30 days. The term was one that users might enter in order to locate plastic surgeons in the greater Dallas area of Texas who specialise in sculpting noses. This was a query that had not formerly been the subject of a great deal of competition, and which had high local relevance. The small-scale challenge delivered some unexpected results, however. Google’s mantra for success in organic searches can be broken down into three key points: relevant content, a friendly user experience, and clean technical compatibility with all platforms. That’s why it’s more than surprising that the winning website of the Signals Lab competition is written entirely in Latin – right down to its URLs, headings and footers. The use of Latin dummy text in website production is nothing unusual; in this case, however, the ancient language wasn’t just found in a forgotten placeholder for content in production, but throughout the site, as part of a strategy to reveal the fallibility of search engine algorithms. On top of that, the website was also packed with made-up local information, forged reviews, and substandard backlinks. That Google allowed what is clearly a fake website to rank second among responses to the search term in question can only be explained either as an anomaly, or as a blind spot in the omniscient Googleverse.

Two lessons can be taken away from this little experiment. The first is that it’s a comfort for the search engine sector to know that, even with the supposedly mature level of its technology, Google can still be caught out with classic old-school fake spam SEO. The second is that users need to stay vigilant, and try to establish how far they can trust technological progress before letting themselves get swept up in all the excitement. Although search engines are certainly extremely practical, they will never become part of human reality. Whether it’s Google or Bing, at the end of the day, search engines are no more than database-supported ways of selling advertising which offer a compact and free version of real life to tempt users in. By the way: if you’re looking for Latin-speaking surgeons to operate on your nose, apparently Florida has what you need as well.

As with every trend, many see voice interfaces as a magic bullet. Yet their application is not relevant to every situation. So, for which services do they offer a genuine incremental value? What characterises a good dialogue and how do we guarantee that a customer’s data is handled securely? Let us show you what you should be paying attention to.

In theory, voice interfaces should be perfectly integrated into our everyday lives. We are accustomed to packing information into language and expressing our wishes verbally. However, this is not our only means of communicating information. Information is also passed on non-verbally, often through gestures, mimicry and tone. In online chats, we attempt to balance out the scant possibilities of non-verbal communication with the help of numerous emojis. When describing superlatives, most of us will turn to wild gesticulation. For example, we use sweeping gestures to explain the size or width of something. If we see something extraordinary and want to describe it, as with a phone call, email or letter, we can only do so verbally; this often feels limiting and explains why we gladly rely on sending pictures. With countless gadgets available online, when we come across a great one and tell a friend about it, we tend to enumerate only a few of its attributes. We do so not only because we are limited with our time, but also because we know that our counterparts might find different features exciting. Our experience tells us that it is much better to simply send friends a link to the product so that they can see for themselves what they like most about the gadget.

Verbal communication in everyday life reflects verbal communication with voice interfaces. Not every application has the potential to generate added value through the use of a voice interface. An example of this is Amazon Alexa’s Skill Store. There are a lot of so-called ‘useless skills’, poorly rated skills that nobody uses. Voice interface skills are the equivalent to apps in the mobile world. But what characterises these useless skills? They have no incremental value for the user. Either they are simply not designed for voice interfaces or they are not well designed for dialogues and thus cause user frustration. But why is that? What can be done better and how can useless skills be avoided

Find a meaningful application

We often use everyday phrases like “Can you just…?”, “I need a quick…” or “What was that again…?”. This is especially true when we are short on time or have our hands full. Especially in these situations, we do not have the opportunity to sit in front of a computer or to get our mobile phones out. And this is exactly where the ideal scenarios for practical voice interface usage are found.
It is possible to provide all kinds of information, from the control of connected systems such as smart homes, or the use of services such as rental car bookings. All ‘hands free’ scenarios are also predestined for voice interfaces. From the mechatronics engineer who is working on an engine with oily hands and needs some info on a spare part, to the amateur cook who wants to know the next step of a recipe while kneading dough.
In such situations, software serves to make our everyday lives easier and more pleasant. And that’s exactly what counts when using voice interfaces. It’s a question of short questions, logical support and fast results. Pragmatism is key. It is therefore important to consider exactly which service or application you want to offer with a voice interface and whether it will really help the user in their private or professional life.

Remember to always think in terms of dialogue and never in visual concepts

When the smartphone and mobile app revolution flooded the market, already existing concepts were simply scaled down and taken over. It was only over the course of time that these adapted concepts were refined and adapted for the mobile format. However, the way in which people process visual information is very selective. The subconscious mind acts like a filter, directing our attention to the things that are important to us. Additional information will only come to us later. By contrast, auditory perception works quite differently. In this case, the subconscious mind cannot decide which information to absorb first. Instead, we process everything we hear in a predetermined order.

And this is where the first big mistake arises: When designing a skill for a voice interface, it is often falsely assumed that all it takes is a simple adaptation of an already functioning visual concept. Yet visual concepts contain too much information for a voice interface. If you use all this content, the user is flooded with long texts and an endless amount of information. The result is both exhausting and unpleasant. For this reason, Amazon has already launched the ‘one-breath rule’. It states that the text Alexa should communicate in an interaction with the user must be no longer than a slow breath. To ensure the user does not feel overwhelmed and the voice interface adapts better, it is important to look at the information to be communicated in detail and take into account text lengths and information restrictions. 

Avoid long dialogues: A second big mistake in terms of dialogue, which is also based on the adaptation of visual concepts, are overly long stretches of dialogue. Especially when it comes to e-commerce, we are used to being led through a process page by page so that by the end of the process, the system contains all the information needed to make a purchase. These processes are stable and, in most cases, lead to success. With a voice interface, the situation is different. A simple, multi-step, question-answer dialogue that can be executed quickly by the interface can still take several minutes. If the user takes too long to answer, the dialogue is usually simply ended. If something is incorrect or misunderstood, it can lead to errors. In addition, some well-known interfaces simply drop dialogue, even for no apparent reason. This is all the more annoying the more advanced this sluggish dialogue is.

In order to avoid this, when using a voice interface for the first time, certain basic user information can be queried and then assumed during further use. If necessary, you can also access this default data through another source. For example, if a user wants to book a trip to Munich, the voice interface needs the following data: Place of departure, final destination, date, time, preferred method of travel and payment type. The user has previously stated that he lives in Hamburg, mostly travels by train and often pays by credit card. The next possible time is selected as the default departure time. The interface would therefore be able to make a valid booking by asking just one question, namely the destination. And all this without a long and possibly error-prone and repetitive question-answer game with a poor dynamic. The user should always be able to make subsequent changes to the existing data. 

Different phrases employed at the right time with a pleasant dynamic: Language gives us the opportunity to express a specific statement in many different ways. Linguistic variation is an expression of intelligence. So why shouldn’t voice interfaces also vary in their formulations? Through enhanced dynamics and numerous phrases, the process and overall interaction are rendered much more natural. The interface adapts to the user, instead of the other way around. These linguistic adjustments also correspond to repeated use of the interface. If the interface explains everything in detail the first time you use it, further repetition of usage instructions should be avoided unless the user asks for them.

In situations where the user needs help, there is also a lot to take into account. It is not always clear how to use voice interfaces. Therefore, there is the option of asking for help. The interface can take into account the situation in which the user finds themselves. Finally, it recognises whether the user is currently in a shopping cart or specifying a date for a trip. This ensures that it is easy to provide the user with a shopping cart-related help request specifically when the user is dealing with the shopping cart. This knowledge should definitely be harnessed to provide the best possible in-situ support.

Ensuring secure dialogues

As with any software development, data security is a key issue when developing voice interfaces. So, what must be considered during analysis and conception? In the article ‘Voice Interfaces – The Here and Now‘, the big players were put under the magnifying glass. The interfaces that it describes are all cloud-based. Thus, the language analysis and processing does not take place locally on the user’s computer, but in the respective data centres of the provider. Within the framework of the GDPR, these providers not only have to provide information about where their processing servers are located, but also comply with applicable basic regulations. However, the question arises, why would a financial service provider or health insurance company want to store highly sensitive customer data in the cloud of a foreign company? Amazon, for example, offers a high level of security when accessing their services through encrypted transmission or authentication via OAUTH2, yet everything else in their infrastructure is invisible to users and developers. It is almost impossible to anonymise a voice interface that works with sensitive data in such a way that prevents knowledge about the user being tapped from the cloud side. Everything that is said is processed in the cloud, as is everything that the interface communicates to the user. Therefore, it is only possible to use voice interfaces in situations where no sensitive data is handled.

Why the cloud? The blessing and curse of current voice interfaces is that sentence transcription and analysis is based on machine-learning technology. Once a dialogue model has been developed, the system must learn this model so that it can then understand similar sentence variants. This ‘learning’ is a computationally intensive process that is performed on the hardware of a server. From this perspective, these cloud solutions are both pragmatic and, seemingly, essential. However there are a few solutions in the field of voice interfaces that can run on local machines or servers. For example, with its speech recognition software Dragon, software manufacturer ‘Nuance’ offers a tool that enables transcription via local hardware.

What needs to be considered when dealing with pins and passwords? Another aspect of data security is the type of interface in question. While it is easy to quickly glance at a visual interface and check if anyone is paying attention when entering our password, with spoken language it is far more problematic. The tapping of security-sensitive data is therefore easy game. Pins and passwords should therefore never be part of a voice interface. Here, connection with a visual component is more advisable. The user is authenticated via the visual component, while additional operations are carried out using the auditory component.

Conclusion

The handling of sensitive data still represents one of the biggest challenges when using voice interfaces. Here, it is important to work with a particularly critical eye and design dialogues accordingly. Security questions should never be part of a voice interface dialogue. While it may be tempting, visual concepts should never be transferred directly to a voice interface. This results in the user being overwhelmed and dialogues being interrupted for being too long or due to errors. If all of these points are taken into consideration, the user will find working with a voice interface pleasant, natural and helpful. Of course, whether the interface makes sense overall largely depends on the concept and field of application.

This is the third contribution of a four-part series on the subject of voice interfaces:

Until now, companies looking to advertise their products online have done so using ads on search engines, social networks or media websites. Now, there is yet another option: an increasing number of retail websites are offering space for companies to place ads. For the retail platforms, such advertising represents a lucrative source of income. The platforms offer a decisive advantage as well: first-hand access to data about what consumers want. And companies are seeing plenty of other benefits, too – for one thing, people visiting these websites are already in a buying mood.

   1. Top players such as Amazon, Zalando, Otto and Alibaba

Based on such data, being able to advertise and sell where consumer interest is at its peak is exciting. Top players such as Amazon, Zalando, Otto and Alibaba have long been aware of the marketing potential to be found here. These marketplaces are set to bring in significantly more income in the future, with increasing numbers of users (more than 50% on Amazon) not only wanting to search for products, but also to buy sooner or later in most cases.

Ad placement, budgets and costs have a significantly closer relationship to sales on marketplaces than they do in other channels or on other platforms. The logistical aspects of marketplace operators (availability, scheduled delivery, shipping costs, special consumer benefits, etc.) are already rated by many buyers as highly relevant to their purchases – alongside price comparability or coordinated, similar product ranges.

    2. What’s changing where strategies are concerned?

In order to appeal to consumers at the end of the consumer decision journey, it is essential that, in addition to performance marketing on search engines, companies also seek to make contact on marketplaces. Deploying user data on integrated retail media platforms in particular enables companies to make even more efficient use of different advertising formats. In addition, the advertising effect of brand messages beyond platforms like these should not be neglected, irrespective of the focus on conversions and sales. With often extensive coverage and direct placement within the competitive environment, retail media offer many options for generating additional income beyond simply optimising the cost-turnover ratio (CTR).

   3. What about consumer spending behaviour? What’s up and what’s down?

According to the latest eMarketer Report, spending on Amazon in the USA is set to almost double in the next two years – primarily at the expense of Google and Facebook. Other channels and platforms on the market will continue to develop at a stable rate, however. A new and exciting development will be represented by marketplaces springing up which offer their advertising clients even more options for placement and/or cooperation, and, if chosen correctly from a strategic point of view, provide at least one thematically relevant alternative to Amazon or price comparison websites.

Until the release of Amazon’s Echo, aka Alexa, the big players had paid little attention to voice technologies. In the meantime, there are numerous other variants, but which are the best known and which voice interface is the most suitable?

Today’s voice interfaces are a combination of two components, namely transcription and natural language processing (NLP). A spoken sentence is transcribed into text. This is analysed using artificial intelligence, based on which a reaction is generated and converted back to analogue speech via a speech synthesis (see also part 1).

Different classifications

Conversational interfaces are differentiated by whether they use so-called knowledge domains or not. Knowledge domains are digital structures that map knowledge around a given subject area.

1) Conversational interfaces with knowledge domains 

Conversational interfaces with knowledge domains are not just about parsing phrases, but about understanding the actual meaning behind a sentence. These types of interfaces are called smart assistants. Consider this sentence, which is simple for us humans: “Reserve two seats at a two-star restaurant in Hamburg!” – it is very easy for us to understand. We know that a restaurant can be given ‘stars’, that Hamburg is a city and that you can reserve seats in a restaurant. However, without this prior knowledge, it is difficult to make sense of the sentence. ‘Two Stars’ could just as well be the name of a specific restaurant. What two seats are and how to reserve them is then completely unclear. That a restaurant with certain characteristics in Hamburg is to be searched for, is also unclear. However, Smart Assistants should be able to precisely understand these concepts and therefore require special basic knowledge in respective domains such as gastronomy, events, weather and travel.

2) Conversational Interfaces without knowledge domains

Conversational interfaces without domain knowledge, such as Alexa, do not have this skill. Instead, they use a different approach. For a possible dialogue, sentence structures are specified during implementation in which variable parts, so-called slots, are defined. The spoken sentence is then analysed and assigned with a sentence structure. Subsequently, the component which generates the response to what has been said is informed of which sentence structure has been recognised by the given variable parts. The fact that this does not require any basic knowledge is clarified by the following sentence: ‘I would like to buy a red shirt’. At this point, the system does not need to know anything about clothes or colours because it just compares the phrase with given phrases related to buying a shirt. For this purpose, it is defined in the interface dialogue model that there is a sentence structure with an ID called, for example, ‘shirt purchase’. It is then subsequently determined that the sentence structure may have the following characteristics: “I want to buy a <colour> shirt”, “I want to buy a shirt in the colour <colour>” and “I want to buy a shirt in <colour>”. In this way, it also defines that there is a variable phrase (slot) named ‘colour’. The desired possibilities for this slot are indicated, e.g. ‘red’, ‘green’ and ‘yellow’. If the user utters the above sentence, the analysis shows that it has the ‘shirt purchase’ sentence structure with the value ‘red’ for the slot ‘colour’. In a correspondingly structured form, a back-end system can already begin to build something with this information.

The current key stakeholders

Until the release of Amazon’s Echo, aka Alexa, most IT companies had paid little attention to voice technologies. Although Siri was released with a bang, it was perceived more as a helpful tool rather than a whole new class of interfaces. However, the advantages of hands-free features for mobile devices were not to be dismissed and today each big player develops their own language solution. Here is a brief introduction to the current key stakeholders:

Amazon‘s Alexa

If you look at the Amazon product range, it is clear that Alexa is a logical development from already existing technologies. The Fire Tablets (launched 2013), Fire Phone (2014) and first Fire TVs (2014) were already equipped with voice control. However, Alexa’s ‘Voice Interface as a Service’ or ‘Alexa Voice Service’ technology is still not considered a Smart Assistant. Instead of analysing the meaning of sentences, they are simply compared in the background. When asked more complex questions, Alexa quickly bails out. The reason for this is that it only handles superficial knowledge domains that are not open to the developer. In addition, requests that can be expressed to an Echo must be very concise and not overly complex in their formulation. For example, films can be searched for using the name of an actor, or restaurants can be searched for by indicating the area. However, it does not get much more complex than this.

Google Assistant

Google Now was originally part of Google Search and was only searchable on the web. Later it was spun off to expand domain knowledge, making it more competitive with wizards like Apple’s Siri or Samsung’s S Voice. Last year, Google Now was replaced by Google Assistant. The extent to which the various knowledge domains in the Google Assistant are interlinked was impressively demonstrated at the Google Developer Conference with the ‘Google Duplex’ product. As a component of the assistant, Google Duplex can make phone calls to real people and make appointments for the hairdresser, for example, or even book a table. In doing so, the assistant not only accesses the appointment calendar, but must also have appropriate domain knowledge.

Apple‘s Siri

The story of Siri is a bit different. The Smart Assistant was developed by the Siri Inc. company and from the outset took the approach of analysing language by means of domain knowledge. Siri Inc. is a spin-off of the Stanford Research Institute (SRI). Fifteen years ago, SRI collaborated with these institutions on the CALO (Cognitive Assistant that Learns and Organizes) project, the experience of which influenced the development of Siri. Siri was released in the App Store in 2010 and Siri Inc. was promptly bought by Apple. A year later, Apple then officially announced that Siri is now an integral part of iOS. It has since been unrolled across all platforms. Most recently, the HomePod was released as a smart loudspeaker that reflects the current trend in voice interfaces and is comparable to Amazon’s competing product, Echo.

Microsoft’s Cortana

Microsoft’s Cortana was presented to the public for the first time in 2014 at a conference. Also designed as a Smart Assistant, Cortana features interesting reality-based adaptations. For example, a real assistant usually takes notes about their supervisor or client in order to get to know the person better and remember their habits. This is where Cortana uses a virtual notebook. For example, when being used for the first time, Cortana asks a few preferences in order to be able to provide personalised answers at an early stage. This functionality can also be prompted as needed. The key element of Cortana is Bing; Bing-based services allow you to make informal queries with the search engine.

Samsung’s Viv

Samsung has also been trying to establish intelligent software for their devices for quite some time, which naturally must also include a voice interface. In 2016 Samsung bought the company of Siri’s developers, Viv Labs. Viv Lab’s system fully relies on domain knowledge. Unlike its competitors, however, Viv is able to extend the knowledge base of external developers into new domains. As a result, the system should become more intelligent and be able to understand more and more. For example, imagine a whiskey distillery. With the help of experts, the Viv is provided with knowledge about the domain of whiskey and its products. In addition, a distillery shares all of its knowledge concerning wooden barrels and their production. The Viv domain knowledge now provides valuable expertise on which wooden barrels influence the taste of certain types of alcohol. For example, oak barrels provide whiskey with a vanilla flavour. If I now ask Viv what results in the vanilla note of a particular whiskey from said factory, Viv can answer that this taste is most likely due to oak barrel aging. Thus, Viv has merged both domains.

IBM’s Watson

To clear up any misunderstandings, IBM Watson should also be mentioned here. There is no ‘Artificial Intelligence Watson’ that understands everything and continuously accumulates knowledge. Instead, Watson is a collection of various artificial intelligence tools brought together under a common concept that can be used to realise a wide variety of projects. In addition, there are projects that serve to build up a large knowledge base. However, one should not labour under the illusion that each Watson project provides access to this knowledge. If you want to implement a project with Watson, you need to provide your own database – just as with any other machine learning toolkit. Among other features, Watson provides transcription (The IBM® Speech to Text Service) and text analysis (Natural Language Understanding Service) tools. If you want to implement a project together with Watson, you build on these two tools when implementing voice interfaces.

From analysing the problem to finding the right voice interface

Of course, there are many additional solutions, some of which are very specialised, but which also aim to break through the restrictions of the big players in order to offer more development opportunities. Now, the question naturally arises: But why all the different voice interfaces? As with many complex problems, there is no single universal solution. There is no ‘good’ or ‘bad’ interface. There are only ‘right’ or ‘wrong’ applications for the different technologies. Alexa is not good for complex sentence structures, but is great for fast conversions and is already widely used. On the other hand, while Viv has not been able to assert itself yet, it has the potential to understand random and complex sentences.

The selection of the right voice interface therefore involves choosing certain criteria, such as the application, focus, problem definition, needs of the target group and how open an interface is for integration into your own projects.

This is the second contribution of a four-part series on the subject of voice interfaces: