At 40 degrees in the shade, Germany’s favourite pastime – watching television – is suddenly becoming irrelevant. A shame, because just as the summer heatwave is setting in, the relationship between TV devices and search engines is being newly configured. Find out why this is, and why special consideration should be given to voice-based search technology when it comes to younger and older target groups, in July’s edition of SEO News.

TV and search engines – two very different siblings on a bonding session

As we swelter our way through the summer of 2019, the global economy is beginning to look distinctly overcast, with world export champion Germany particularly badly hit. These developments force many companies to review their spending on marketing and advertising. Digital advertising channels provide the advantage (at least in theory) of permitting a direct comparison between costs and benefit, with the price of a conversion or ROI and ROAS with respect to budget allocated being generally straightforward to plan and calculate. This is less easy in the case of TV campaigns, however, which have an extensive reach that cannot be assessed with the same precision.

That’s why it’s in the interest of the advertising industry to start paying closer attention to the interplay between the two channels, in order to explore possible synergetic effects.  Although the question of how TV/display devices and search engines impact on one another is by no means a new one, the interesting thing is that both organic and paid searches are increasingly coming to be seen as the link between a sometimes diffuse TV impact and a company’s actual turnover.

Calls-to-action as a tool for generating higher demand

As New York trade journal Digiday reports, increasing numbers of American companies are beginning to examine their attribution models in order to establish how their TV presence is reflected in organic search requests and in the performance of their paid search campaigns. An increase in search requests for brand terms in particular can be stimulated not only by increased investment in TV and display device campaigns, but also by direct cues to search such as “Just Google XYZ”, or “Search for XYZ”, which can significantly increase search volumes via conventional media channels, the report reveals.

Although this creative approach has been in use for some years in the USA and the UK, in Germany it remains the exception rather than the rule. The setup enables analytics data from searches to be harnessed to optimize cross-media campaign planning throughout the customer journey. The approaches that enable conventional high-reach campaigns to stimulate awareness can be measured in the form of changes in search volumes, and the cost of paid search conversions used in turn to deliver the TV campaign’s ROI/ROAS. Sustained SEO work also enables newly-gained organic search volumes to be directed to landing pages with high conversion rates in a targeted way. This makes it possible to ensure an optimal user experience all the way from couch to conversion. As agency Mediaplus has established in a joint study with SevenOne Media and Google, similar advertising effects can also be achieved with the help of Google’s video search engine YouTube. This is why it’s high time that the long-standing competition between marketing siblings TV and search engines was ended, so that tight budgets can be used more effectively and efficiently in times of economic difficulty. After all, family needs to stick together.

Who’s talking to Alexa?

Even in an industry as latently hypereuphoric as ours, the tense hype about the possibilities and blessings of voice search technology has finally given way to a sober realism. We’ve pointed out here many times in the past that voice search is little more than an extension of the human-machine interface for search engines, and that its substantive developments in terms of new forms of interaction would most likely be unable to satisfy the high expectations surrounding them. As is now being reported, the expert prediction that by 2020 around 50% of all search requests will be made using voice technology was simply the result of an incorrect interpretation of data from the People’s Republic of China.

Voice search user numbers are also growing independently of this minor market research fail, of course. This is primarily due to the likewise inflationary market launch of dialogue-capable devices. US marketing agency Path conducted a global survey to investigate how the new technology is being used by different target groups on different platforms. The study delivered multifaceted results: Around 70% of participants reported using voice search on a weekly basis. A quarter use the technology as often as three times a day. When the respondents are divided into age groups, it’s striking that users at the lower (13-18 years) and upper (65+ years) ends of the spectrum in particular report using voice technology on a regular basis.

A glance at the used search systems reveals that the oldest user group communicates most often (approximately 57% of all group respondents) with Amazon’s voice assistant Alexa. Around 28% of respondents in the youngest target group aged between 18 and 22 likewise prefer the Echo/Alexa family produced by the technology giant from Seattle. This suggests that the best way to reach these especially solvent and tech-savvy groups is to employ a combination of conventional voice-based SEO with structured data and product data automation, like Amazon SEO. Such a combination is something that many agencies on the German market have yet to offer.

Artificial intelligence has many facets. In addition to autonomous driving, media planning and other application areas, it is also a major driver of innovation in the field of voice recognition. But can voice technology offer real added value, and if so, for whom?

Even five years after Amazon released its first Echo device, voice technology is still in its early stages. The glut of new devices, both from Amazon and Google, and the ongoing release of new features and upgrades demonstrate that there is still a long way to go before voice technology reaches its full potential. Having said that, voice recognition technology already makes life easier, or at least more convenient, in may ways. For example, even for the older generation, using speech recognition to switch the light on and off, control the heating, ask about the weather or find the right answer to a Trivial Pursuit question isn’t rocket science. And in certain situations, it even makes more sense to use voice control, for example while driving (at least as long as we still have to drive our cars ourselves), cycling, or for people with disabilities or older people who do not feel confident using other interfaces. Voice control is easy to understand and accessible to everyone on account of its intuitiveness.

Voice can enhance a brand’s personality

Brands communicate with their customers primarily through combinations of text and images. And, of course, every brand is very individual in terms of tonality and imagery. But until now, ‘real’ dialogue could only take place with representatives or brand ambassadors. Voice technology is changing this. Companies have to think carefully about how they present themselves as a brand to the outside world, what answers they want, and are able, to offer and how – and how they can do this as authentically as possible.

Therefore, voice should always be viewed as an opportunity to broaden and enhance a brand’s personality. In doing so, it is helpful to experiment with different approaches, content formats and sales pitches in order to find your brand’s ‘voice’.

At the Plan.Net Innovation Studio, we have been working with clients from various sectors over the past few years as they take their first steps into the world of voice technology. These sectors range from finance to automotive, retail to travel, and many others.

The first question you should ask yourself with projects like these is: what role should my brand actually play on the market? Do I just want to promote my own products and services, or do I want to try to occupy an entire field? Do I just want to inform, or do I want to give my customers the opportunity to buy something directly?

Whatever you do, the important thing is that you provide added value, and dispense with mere self-promotion.

Voice technology has become integral to our working lives

Voice technology is here to stay. Like the Internet, it has become a permanent feature of day-to-day life. That is why it is essential to examine the relevance of voice technology in every field.

Let’s take web searches as an example. The first three to four search results for a keyword will garner a click from the majority of users. With voice search, only the first result is actually relevant. This leads to even greater competition, and the platforms now answer many requests completely independently. As a result, it is becoming increasingly difficult for third-party content to jockey for position. Moreover, the platforms are still very careful when it comes to the use of advertising. They are – rightly – fearful of squandering the trust of their users. That is why we are currently seeing a renaissance in audio ads, which audio streaming providers place as pre-roll ads, for example, before their content – a tried-and-tested mechanism used by YouTube for years.

In view of the above, it is especially important that colleagues and employees get to grips specifically with voice technology, that they are granted the necessary freedom to do so, and that they are able to experiment in this area. Given that there are still relatively few experts in voice technology, this presents an opportunity for interested colleagues to upskill, and in doing so to create added value for the company in this innovative field.

About the author: Jonas Heitzer works as Creative Coder at Plan.Net Group’s Innovation Studio since 2017. His responsibilities include technical development.

Until the release of Amazon’s Echo, aka Alexa, the big players had paid little attention to voice technologies. In the meantime, there are numerous other variants, but which are the best known and which voice interface is the most suitable?

Today’s voice interfaces are a combination of two components, namely transcription and natural language processing (NLP). A spoken sentence is transcribed into text. This is analysed using artificial intelligence, based on which a reaction is generated and converted back to analogue speech via a speech synthesis (see also part 1).

Different classifications

Conversational interfaces are differentiated by whether they use so-called knowledge domains or not. Knowledge domains are digital structures that map knowledge around a given subject area.

1) Conversational interfaces with knowledge domains 

Conversational interfaces with knowledge domains are not just about parsing phrases, but about understanding the actual meaning behind a sentence. These types of interfaces are called smart assistants. Consider this sentence, which is simple for us humans: “Reserve two seats at a two-star restaurant in Hamburg!” – it is very easy for us to understand. We know that a restaurant can be given ‘stars’, that Hamburg is a city and that you can reserve seats in a restaurant. However, without this prior knowledge, it is difficult to make sense of the sentence. ‘Two Stars’ could just as well be the name of a specific restaurant. What two seats are and how to reserve them is then completely unclear. That a restaurant with certain characteristics in Hamburg is to be searched for, is also unclear. However, Smart Assistants should be able to precisely understand these concepts and therefore require special basic knowledge in respective domains such as gastronomy, events, weather and travel.

2) Conversational Interfaces without knowledge domains

Conversational interfaces without domain knowledge, such as Alexa, do not have this skill. Instead, they use a different approach. For a possible dialogue, sentence structures are specified during implementation in which variable parts, so-called slots, are defined. The spoken sentence is then analysed and assigned with a sentence structure. Subsequently, the component which generates the response to what has been said is informed of which sentence structure has been recognised by the given variable parts. The fact that this does not require any basic knowledge is clarified by the following sentence: ‘I would like to buy a red shirt’. At this point, the system does not need to know anything about clothes or colours because it just compares the phrase with given phrases related to buying a shirt. For this purpose, it is defined in the interface dialogue model that there is a sentence structure with an ID called, for example, ‘shirt purchase’. It is then subsequently determined that the sentence structure may have the following characteristics: “I want to buy a <colour> shirt”, “I want to buy a shirt in the colour <colour>” and “I want to buy a shirt in <colour>”. In this way, it also defines that there is a variable phrase (slot) named ‘colour’. The desired possibilities for this slot are indicated, e.g. ‘red’, ‘green’ and ‘yellow’. If the user utters the above sentence, the analysis shows that it has the ‘shirt purchase’ sentence structure with the value ‘red’ for the slot ‘colour’. In a correspondingly structured form, a back-end system can already begin to build something with this information.

The current key stakeholders

Until the release of Amazon’s Echo, aka Alexa, most IT companies had paid little attention to voice technologies. Although Siri was released with a bang, it was perceived more as a helpful tool rather than a whole new class of interfaces. However, the advantages of hands-free features for mobile devices were not to be dismissed and today each big player develops their own language solution. Here is a brief introduction to the current key stakeholders:

Amazon‘s Alexa

If you look at the Amazon product range, it is clear that Alexa is a logical development from already existing technologies. The Fire Tablets (launched 2013), Fire Phone (2014) and first Fire TVs (2014) were already equipped with voice control. However, Alexa’s ‘Voice Interface as a Service’ or ‘Alexa Voice Service’ technology is still not considered a Smart Assistant. Instead of analysing the meaning of sentences, they are simply compared in the background. When asked more complex questions, Alexa quickly bails out. The reason for this is that it only handles superficial knowledge domains that are not open to the developer. In addition, requests that can be expressed to an Echo must be very concise and not overly complex in their formulation. For example, films can be searched for using the name of an actor, or restaurants can be searched for by indicating the area. However, it does not get much more complex than this.

Google Assistant

Google Now was originally part of Google Search and was only searchable on the web. Later it was spun off to expand domain knowledge, making it more competitive with wizards like Apple’s Siri or Samsung’s S Voice. Last year, Google Now was replaced by Google Assistant. The extent to which the various knowledge domains in the Google Assistant are interlinked was impressively demonstrated at the Google Developer Conference with the ‘Google Duplex’ product. As a component of the assistant, Google Duplex can make phone calls to real people and make appointments for the hairdresser, for example, or even book a table. In doing so, the assistant not only accesses the appointment calendar, but must also have appropriate domain knowledge.

Apple‘s Siri

The story of Siri is a bit different. The Smart Assistant was developed by the Siri Inc. company and from the outset took the approach of analysing language by means of domain knowledge. Siri Inc. is a spin-off of the Stanford Research Institute (SRI). Fifteen years ago, SRI collaborated with these institutions on the CALO (Cognitive Assistant that Learns and Organizes) project, the experience of which influenced the development of Siri. Siri was released in the App Store in 2010 and Siri Inc. was promptly bought by Apple. A year later, Apple then officially announced that Siri is now an integral part of iOS. It has since been unrolled across all platforms. Most recently, the HomePod was released as a smart loudspeaker that reflects the current trend in voice interfaces and is comparable to Amazon’s competing product, Echo.

Microsoft’s Cortana

Microsoft’s Cortana was presented to the public for the first time in 2014 at a conference. Also designed as a Smart Assistant, Cortana features interesting reality-based adaptations. For example, a real assistant usually takes notes about their supervisor or client in order to get to know the person better and remember their habits. This is where Cortana uses a virtual notebook. For example, when being used for the first time, Cortana asks a few preferences in order to be able to provide personalised answers at an early stage. This functionality can also be prompted as needed. The key element of Cortana is Bing; Bing-based services allow you to make informal queries with the search engine.

Samsung’s Viv

Samsung has also been trying to establish intelligent software for their devices for quite some time, which naturally must also include a voice interface. In 2016 Samsung bought the company of Siri’s developers, Viv Labs. Viv Lab’s system fully relies on domain knowledge. Unlike its competitors, however, Viv is able to extend the knowledge base of external developers into new domains. As a result, the system should become more intelligent and be able to understand more and more. For example, imagine a whiskey distillery. With the help of experts, the Viv is provided with knowledge about the domain of whiskey and its products. In addition, a distillery shares all of its knowledge concerning wooden barrels and their production. The Viv domain knowledge now provides valuable expertise on which wooden barrels influence the taste of certain types of alcohol. For example, oak barrels provide whiskey with a vanilla flavour. If I now ask Viv what results in the vanilla note of a particular whiskey from said factory, Viv can answer that this taste is most likely due to oak barrel aging. Thus, Viv has merged both domains.

IBM’s Watson

To clear up any misunderstandings, IBM Watson should also be mentioned here. There is no ‘Artificial Intelligence Watson’ that understands everything and continuously accumulates knowledge. Instead, Watson is a collection of various artificial intelligence tools brought together under a common concept that can be used to realise a wide variety of projects. In addition, there are projects that serve to build up a large knowledge base. However, one should not labour under the illusion that each Watson project provides access to this knowledge. If you want to implement a project with Watson, you need to provide your own database – just as with any other machine learning toolkit. Among other features, Watson provides transcription (The IBM® Speech to Text Service) and text analysis (Natural Language Understanding Service) tools. If you want to implement a project together with Watson, you build on these two tools when implementing voice interfaces.

From analysing the problem to finding the right voice interface

Of course, there are many additional solutions, some of which are very specialised, but which also aim to break through the restrictions of the big players in order to offer more development opportunities. Now, the question naturally arises: But why all the different voice interfaces? As with many complex problems, there is no single universal solution. There is no ‘good’ or ‘bad’ interface. There are only ‘right’ or ‘wrong’ applications for the different technologies. Alexa is not good for complex sentence structures, but is great for fast conversions and is already widely used. On the other hand, while Viv has not been able to assert itself yet, it has the potential to understand random and complex sentences.

The selection of the right voice interface therefore involves choosing certain criteria, such as the application, focus, problem definition, needs of the target group and how open an interface is for integration into your own projects.

This is the second contribution of a four-part series on the subject of voice interfaces:

Until 2015, voice interfaces were perceived by most as a nice gimmick that was limited to smartphones and navigation systems. But with Amazon Echo, this technology entered the living rooms of many consumers around the world virtually overnight. Amazon is holding back its exact sales figures and has not released any other details yet, but according to news portal Business Insider in 2015 alone, 2.4 million Amazon Echos were sold worldwide, while in 2016, sales rose to 5.2 million. As a result, Apple also revamped the previously neglected Siri and, after six years of silence concerning its speech recognition programme, in June 2017 announced a very unique device: the HomePod. Other companies were subsequently forced to follow this trend, even if they were unsure how to handle it.

Back to the roots

At the same time, voice and conversational interfaces are not an entirely new concept. Voice interfaces are essentially conversational interfaces with a special input channel, namely for analogue language. The development stages of the past decades may even be known to many market observers. If you look at the technology behind a voice interface today, you will find two different components: One is responsible for transcribing analogue speech into text. The other analyses the text and reacts accordingly. This part is carried out by natural language processing and other artificial intelligence (AI) technologies. Both components have existed as separate technologies for a very long time:

1) Transcription

Transcribing simply means transforming spoken text or even sign language into a written form. Corresponding software has been available since 1982 when Dragon System launched its software. Somewhat rudimentary, it was developed for the former DOS (x86) and was called DragonDictate. Continuous transcribing was not yet possible, however 15 years later the same company launched Dragon NaturallySpeaking 1.0. The software already understood natural language so well that it was mainly used for computer dictation. However, the former systems had to be heavily voice trained, or the vocabulary used had to be limited in order to improve the recognition accuracy. Therefore, there were already corresponding prefabricated language packs for lawyers or medical practitioners, for example, whose language is highly specialised. Once optimised, these early systems delivered amazingly good results. In addition, Dragon already offered the option to control a Windows system with voice commands.

2) Natural Language Processing

After the language has been transcribed, the text can then be further processed. When considering a technology that can work with a natural-sounding input text, and that is also capable of reacting coherently to it, one quickly thinks of chatbots. These are a subclass of autonomous programmes called bots that can carry out certain tasks on their own. Chatbots simulate conversation partners and often act according to topics. Although these have enjoyed increasing popularity in recent years, it should also be described as a renaissance; The first chatbot was born 52 years ago. Computer scientist Joseph Weizenbaum developed ‘ELIZA’, which successfully demonstrated the processing of natural language and today is considered the prototype of modern chatbots.

3) Artificial Intelligence

The development of ELIZA showed that simple means are sufficient to achieve good results in the Turing artificial intelligence (AI) test, which concerns the subjective evaluation of a conversation. In spite of the bot’s simple mechanisms, test subjects have begun to build a personal bond and even write about private matters. Experiences with this first conversational Interface attracted a lot of attention and continuously improved chatbot technologies.

For example, in 1981, BITNET (Because It’s There NETwork) was launched, a network that links US research and teaching institutions. One component of this network was Bitnet Relay, a chat client that later became the Internet Relay Chat (IRC). Over the years, students and nerds have developed countless, more or less simple, chatbots for these chat systems, including ICQ. Like ELIZA, they were based on the simple recognition of sentences and not on the evaluation of knowledge.

In 2003, another important development was sparked, banking on a new class of chatbots, Smart Assistants such as Siri. CALO, the ‘Cognitive Assistant that Learns and Organizes’, was a development initiated by the Defense Advanced Research Projects Agency, involving many American universities. The system should help the user to interact with information more effectively and provide assistance by constantly improving their ability to interpret the wishes of the user correctly. The concept is based on digital knowledge representation. In this way, knowledge can be captured in a digital system and made usable. Semantic networks allow objects and their capabilities to be mapped in relation to other objects that enable the Smart Assistant to understand what a user wants to express with a given utterance. For example, if a customer wants to order a ‘dry wine’ through their Smart Assistant, then it needs to understand the connection between the terms ‘dry’ and ‘wine’, depending on the context. Only then does it understand that this term refers to a taste sensation and not the absence of fluid.


The simple recognition and comparison of texts, also called matching, and the intelligent analysis by means of knowledge representation are two different technologies that have evolved independently of each other. With the help of the matching approach, most applications can be implemented with straightforward resources. For more complex queries, however, a Smart Assistant is much better. However, in turn, this technology is more involved in terms of development and implementation because it requires a broad knowledge base.

Currently, the chatbots that one usually comes across are based on matching technology and can be trained with the help of machine learning (ML). With this method, the system is given as many text variants as possible to a certain statement, which it learns in order to then recognise other similar sentences in its subsequent application, without the need for any special knowledge.

Today we can choose between two technologies that can be used in a conversational interface. Depending on the requirements, one must ask the question whether a system that compares what has been said with learned sentence structures is sufficient or is a system needed that understands the meaning of what has been said and reacts accordingly?

This is the first contribution of a multi-part series on the subject of voice interfaces:

What voice internet means for the future of digital marketing

The screenless internet: A bold prediction for the future

At the end of 2016, Gartner published a bold prediction: by 2020 30% of web browsing sessions would be done without a screen. The main driver behind this push into a screenless future would be young and tech savvy target groups fully embracing digital assistants like Siri and Google assistant on mobile, Microsoft’s Cortana and Amazon’s Echo.

While 30% still feels slightly optimistic mid 2018, the vision of an increasingly screenless internet becomes more and more realistic every day. The adoption rate of smart speakers 3 years after launch is outpacing the smartphone adoption rate in the United States. And what’s maybe most surprising, it isn’t only the young early adopter crowd that is behind this success story, but parents and families. Interacting with technology seamlessly and naturally through conversation is making digital services more attractive to a wider range of consumers.

The new ubiquity of voice assistants

And it isn’t only stationary smart speakers that are growing in usage and capability, every major smartphone features its own digital assistant and consumers can interact with their TVs and cars through voice as well. The major tech players are investing massively in the field and within the next few years every electronic device we put in our homes, carry with us or wear, will be voice-capable.

So, have we finally reached peak mobile and can finally walk the earth with our chins held high again, freed from the chains of our smartphone screens? Well, not so fast.
There’s one issue many digital assistants still face, and let’s be perfectly honest here: despite being labeled “smart” they are still pretty dumb.

Computer speech recognition has reached human level accuracy through advancements in artificial intelligence and machine learning. But just because the machine now understands us perfectly, it isn’t necessarily capable of answering in an equally meaningful way and a lot of voice apps and services are still severely lacking. Designing better voice services and communicating with consumers is a big challenge, especially in marketing.

Peak mobile and “voice first” as the new mantra for marketing

Ever since the launch of the original iPhone in 2007 and the smartphone boom that followed, “mobile first” has been marketing’s mantra. Transforming every service and touchpoint from a desktop computer to a smaller screen and adapting to an entirely new usage situation on the go was a challenge. And even 10 years later, a lot of companies still struggle with certain aspects of the mobile revolution.

The rising popularity of video advertising on the web certainly helped ironing out many issues in terms of classic advertising. After all a pre-roll ad on a smartphone screen catches at least as much attention as it does in a browser. We figured out how to design apps, websites and shops for mobile, reduced complexity and shifted user experiences towards a new ecosystem. But this mostly worked by taking the visual assets representing our brands and services and making them smaller and touch capable.

Brand building in a post-screen digital world

With voice, this becomes a whole new struggle. We have to reinvent how brands speak to their consumers. Literally. And this time without the training wheels of established visual assets. At this year’s SXSW, Chris Ferrel of the Richards Group gave a great talk on this topic and one of his slides has been on my mind ever since: The visual web was about how your brand looks. The voice web is about how your brand looks at the world.

In recent decades, radio advertising has mostly been reduced to a push-to-store vehicle. Loud, obnoxious, and annoying the consumers just long enough, that visiting a store on their way home from work became a more attractive perspective, than listening to any more radio ads.

On the screenless internet, we could see a renaissance of the long-lost art of audio branding. A lot of podcast advertising is already moving in this direction, although there it is mostly carried by the personalities of the hosts. Turning brands into these kinds of personalities should have priority.

The challenges of voice search and voice commerce

We will also have to look at changing search patterns in voice. Text search tends to be short and precise, mostly one to three words. With voice, search queries become longer and follow a more natural speech pattern, so keyword advertising and SEO will have to adapt.

Voice enabled commerce poses a few interesting challenges as well. How do you sell a product, when your customer can’t see it? This might be less of an issue than initially imagined, though. “Alexa, order me kitchen towels” is pretty straight forward and Amazon already knows the brand I buy regularly. Utilizing existing customer data and working with the big market places will be key, at least for FMCG brands.

But how to get into the consumer’s relevant set? And what about sectors like fashion, that heavily rely on visual impressions? Tightly combining all marketing touchpoints comes into play, voice as a channel can’t be isolated from all other brand communication. Obviously, voice will not replace all other marketing channels, but it might become the first point of reference for consumers due to its ubiquity and seamless integration into their daily lives. Finding its role in the overall brand strategy will be crucial.

Navigating the twilight zone of technological evolution

What may be the biggest challenge of this brave new world of voice marketing is the fact that our connected world isn’t as connected as we would like it to be. The landscape of voice assistants is heavily fragmented and more importantly, the devices act in very isolated environments. While I can tell my digital assistant to turn on my kitchen lights or fire up my PlayStation when using compatible smart home hubs and devices, an assumedly simple task like “Siri, show me cool summer jackets from H&M on the bedroom TV” isn’t as easily accomplished.

Right now, it often is still up to the users to act as the interface between voice assistants and the other gadgets in their living spaces. The screenless internet isn’t the natural endpoint in the evolution of technology, it’s more of an unavoidable consequence of iterative steps in development. For now, we have to navigate through this weird, not fully-realized vision of a connected world and hope for technology to catch up and become truly interconnected. So, let’s find the voices of our brands until they regain the capability of also showing us their connected personality.