What if web browsers could talk?
We spend over 6 hours a day online. Over half of that is spent on a personal computer reading webpages through a web browser, meaning that 20% of our day is spent within a single application.
The most popular browsers are Chrome, Safari, Internet Explorer, Edge, Firefox and Opera. Behind the scenes, they have powerful engines that render webpages, compile Javascript, stream digital media and so on.
However, as the user, all you can do is view webpages. And that's kind of it. Sure you can open tabs, save bookmarks and search the web, but these are all just roundabout ways of viewing more webpages. And it's been that way for over two decades.
What if the browser could do more for you? What if it could talk to you while you're online?
In this post I'm going to introduce a new UX for interacting with information online. A chatbot, built into the web browser, that chats with you while you surf the web.
I'll be looking into questions like:
- What's wrong with the way you browse the web today?
- What is a conversational browser?
- What if my web browser could comment on the webpages I'm looking at?
- What if I could respond to it and have a conversation about what I'm doing online?
- What if it could suggest ways it can help me in that moment?
Here's an example of what this means.
You're shopping for a camera on Amazon. There are lots of things you can do on this webpage: purchase the item, browse pictures, read user reviews, etc. So far this is all very familiar, you can do this in any web browser.
This is where the conversational browser steps in and says:
What's happening here is that the browser is understanding that you're shopping online and it is actively trying to figure out how to help you. It's interacting with you through a conversation so that you can guide it.
In this article, I'm going to dive deeper into this example use case and flesh out how a conversational web browser could work, its interface and the benefits it could bring to everyone who uses the internet.
How many tabs do you have open right now?
It's likely that you don't spend all that much time thinking about your web browser. It already works well enough, so why would you need to consider interacting with the web in a different way? Usually all you need to do online is visit a webpage, read some information and then you're done and on to the next one.
But web browsing can get complicated very quickly, with your simple website visit turning into a journey. The more time you spend online, the more likely you are to think of new questions to ask and new things to do.
But this non-linear path-finding can lead to frustrating experiences, such as:
- Too many tabs open
- Copying and pasting from one website to another
- Trying to re-find something you looked at online a couple of days ago
- Re-entering the same information on multiple different websites
- Having to do several searches on Google to find what you're looking for
- Maintaining open tabs with incomplete tasks waiting to be finished
- Having information on your desktop that you need on your phone, and vice versa
Despite the many benefits of tabbed browsing, as you can see they are also the source of many frustrations with the web. Tabs were introduced by Mozilla in 2003 as a popular way of allowing you to browse non-linearly. They allow you to spin up new online enquiries without disrupting existing ones and they enable complex information finding.
However, even with tabbed browsing, you're still restricted to dealing with only one website at a time while the others are put on hold. Plus, tabs introduce a new cognitive burden on you: the switching cost of changing tabs, having to remember what's going on in each tab, the hassle of moving information from one tab into another, and so on. This cognitive load is the cause of many of the frustrations outlined above.
Why it helps to have a browser that can talk to you
Web browsers work through a request-response pattern. You make a request and the browser responds.
The set of possible requests is mostly restricted to "show me this website", "search for this" or "click on this". In each case, the browser's only response is to show a webpage.
Complicated requests like "should I buy this camera?" cannot be answered by your browser (or Alexa or Siri for that matter).
Instead, you have to break that question down into manageable requests that your browser can handle:
With tabbed browsing, you can open up new tabs to set up each of these requests whenever they occur to you. But each must be resolved individually, one at a time. And the questions have to occur to you in the first place. And you must remember to complete them.
A conversational browser is able to make its own requests. In responding to your request to show a website, it can infer that you may have other questions you'd like to ask.
In this way, you never need to ask the complex question "should I buy this camera?". Instead it can be extrapolated from your behaviour in the browser i.e. viewing the camera on a website like Amazon.
The browser can then predict what your next request is likely to be (such as searching for alternative prices) so it can immediately get to work in finding responses. Once it has a set of responses, it can make its own request to you to ask what you want to do next.
By making its own requests, a browser can pre-emptively resolve complex information needs and guide your decision-making using only a single tab and a handful of clicks.
This turns your browser into a digital assistant for getting things done online. But it's not trying to replace search engines and webpages the way Siri and Alexa do, instead it's trying to replace tabs and manual browsing effort. Because of this, a browser that talks to you is a potential next step in how we interact with the web.
Now, before I go into how this interaction could look and feel, let's start by thinking about what a conversational browser should do.
A conversational web browser isβ¦
Helpful where web browsers currently can't help
A conversational web browser must help the user in some meaningful way. Its purpose is not to be a toy or a game. It's an assistant that is the personification of what the browser is already doing: helping people access information and make decisions online.
The best way to judge if a service provided by the conversational browser is genuinely helpful is to ask, does this service:
- Save time?
- Reduce the number of websites that need to be visited?
- Reduce the number of tabs that need to be open?
- Do something that isn't already possible to do on the website?
- Combine information from several websites?
- Get information from a non-user friendly data source such as an API?
- Provide browser-only insight such as keeping track of tasks across multiple tabs/days?
- Connect different devices?
Unobtrusive and a light touch
No matter how helpful, an assistant that keeps intruding into your workflow can become annoying very fast.
When interacting with the user, the browser should:
- Only chat to the user when it can genuinely help, otherwise it should stay out of the way
- Use the user's context and actions as much as possible to judge whether they need help or not. Assume they don't.
- Communicate simply, sparingly and to the point.
- Give the user options and let them tailor their experience, but not provide too many options at once.
- Be easy to move around, hide and customise.
- Provide help with only a couple of clicks (ideally a single click).
- Require no spoken language or typed input.
- Be friendly, playful and fun to use.
Flexible to any information need
Web browsers can already render information in thousands of different ways to cater for a countless number of information needs. A conversational browser should also be flexible enough to work on any website and display any kind of information or media. It should easily allow users to follow different information pathways from the same starting point.
Respectful of privacy
A browser that talks to you is inevitably also going to be a web browser that watches what you do online. Where possible, all conversational dialogue processing should occur within the browser and communication with 3rd parties minimised. The user should have visibility and control over the conversational browser's usage of data and storage.
A speculative vision for a conversational web browser
A conversational web browser is, at its simplest, a web browser with a chatbot built into it.
But what is a chatbot built into a web browser? What does it say and how does it work?
Have a play with the demo below to get an idea of what this means. It's interactive, press the Get Started button to give it a go:
Once you're done with that, let me explain what's going on piece by piece.
Chat thread
The base layer of the conversational browser is the chat thread. It's made up of speech bubbles, timestamps and avatars. New messages scroll from bottom to top. Different colours represents different participants.
The aim is for it to be familiar to anybody who's used any messaging service:
The chat thread is managed by the browser itself. Because of this, it can appear over any website shown in the browser, it does not need the website's permission to appear.
Kind of like the bots built by Intercom, except not restricted to a single website and not only dealing with customer service support.
And... there's not much else to say about the chat thread design because it follows standard chat design conventions.
Instead, the main subject of this article will be the dialogue between browser and user wherein lies the core user experience. But before I dive into that, I'll briefly go over the widget.
Widget
Opening and closing the chat thread is achieved by clicking the on-page widget.
The widget is flexible and can be moved or hidden if dragged to the corner of the screen.
The widget only appears whenever the browser has something to say, otherwise it is hidden by default. When the browser has something very important to say, it shows a notification or a speech bubble to get your attention.
You can think of the widget as a lightweight, remote control into the denser, richer chat interface.
Browser Dialogue
A browser that chats to you needs to have something important to say.
Imagine having a friend or assistant sitting next to you while you're online, reading the same webpages you are. Except they're also looking up other things online that may help you and sharing them with you. They're commenting and pointing out things you might have missed. They're thinking three steps ahead of you. And they like to send you funny gifs and jokes.
This is what it should feel like when a browser talks to you. Chatting to the browser should feel like chatting to another person online. The tone of the dialogue should be informative, but interspersed with emojis, gifs and images to make it more conversational.
A great example of this is the financial chatbot app Cleo, which gives genuinely helpful money saving tips while being fun to interact with.
To reinforce the idea that the browser is chatting to you in real time, loading times can be masked by showing an 'is typing' state familiar to conventional chat interfaces.
There are three reasons why a browser may wish to say something to you:
- To show it's understood something that you've done online
- To ask for your input
- To help
The first reason may not seem that obvious, but it is important that the browser informs you of what it thinks is going on so that you can establish the context for the rest of the conversation. As a result, every conversation should start with the browser acknowledging what you're doing online in a way that's relevant to its upcoming dialogue.
This makes it easier for you to understand why the browser needs input from you and how it can help, which I'll explore further in the next sections.
User Dialogue
Conversation requires dialogue from both sides. However, having to use your voice or type messages to your browser would add an unwanted extra workload. So how can we get around that?
In practice, you already use natural language to interact with the web every time you use a search engine. And these searches are easily observed by the browser.
By observing searches and other online interactions, the conversational browser can gather enough input from you to begin the conversation. What this means is that you can contribute your side of the conversation with the browser by simply doing what you already do online. That is, doing searches, reading webpages, clicking links and so on.
These actions are added to the chat thread as your own contributions to the conversation:
The result is that you never need to start a conversation with the browser, you just use the web as normal and the browser reacts to you. This also means that the chat thread captures a handy track record of your activities online which also helps to contextualise messages from the browser.
The concept of surfing the web as a dialogue between you and the browser was the subject of my PhD thesis. You can read more about how this affects search engines on my post about Dynamic Information Retrieval.
User Choices
The second reason a browser may want to speak with you is to ask for your input, as it may need your guidance before it can effectively help you.
The flexibility of the chat thread means that many input actions are possible, for instance, here are buttons and dropdown menus:
Clicking on an option lets the browser know what it should do in situations where there may be many ways it can help you. Each choice gets added to the ongoing dialogue as if you typed it.
Your choices trigger browser behaviour and potentially more choices, giving the browser flexibility for working with you to navigate through a complex information need with only a couple of clicks.
Altogether, this means that the way you chat to the browser is entirely through your actions online and your choices within the chat.
Third Party Dialogue
The main aim of the conversational browser is to help you. The most common way for it to do that is to find information from other websites for you.
Because the chatbot is contained within the browser itself, it is capable of retrieving information from the web on your behalf. It can do this in the background without you needing to intervene.
When information has been found from a third party, it can be added to the chat thread as if the third party were a participant in the conversation. This makes the third party easily identifiable as the source of the content.
The flexibility of the chat means that there is no limit to what can get added. Any media that can be embedded on a website can appear in the chat thread. And as chats progress, more third parties may take part.
This opens up the conversational experience from being a one-to-one with the web browser, to being a group chat with the whole world wide web.
Automation for the web
It's easy to embed interactive media from third parties into the chat thread. This means that over time, the chat thread becomes less about conversation, and more of an interface to useful services online. Similar to how WeChat works in Asia.
You can think of each third party contribution as a tab that doesn't need to be opened, a search that didn't need to be made and a website that didn't need to be visited. The browser is browsing the web for you. It's automation for the web. And all of it controllable from a single webpage with a click.
When the browser is automatically responding to simple online requests, you can focus on the bigger questions instead. Questions like, "should I buy this camera?", without having to expend effort in thinking about how you're going to get all the information you need.
Automation also lets you tap into established and effective online browsing patterns for getting things done. For instance, you may not think to watch a video review when shopping online, but would find it useful to watch one when it's suggested by the browser.
In this way, the browser reduces the mental effort of using the web by being your online guide.
Dialogue Triggers
The conversational browser won't try to chat to you about every little thing you do online. It waits for a trigger before it decides to chat. Triggers are in-browser observations of things you do online, such as:
- Visiting a specific website.
- Visiting a type of website (such as news, shopping etc.).
- Visiting a website containing a certain type of information (i.e. an address, a video etc.).
- Clicking links or buttons on webpages.
Triggers work in a similar way to how Zapier or IFTTT works, except restricted to things that happen in the web browser only. Once a trigger has fired, then the chat widget will appear and the browser will begin a dialogue with you. Your choices within the chat, or any subsequent actions taken in the browser, may then fire new triggers.
Tasks
Triggers help the browser identify what kind of task you're doing online. For instance, a shopping task will be triggered by a visit to Amazon, whereas a news task will be triggered by reading a news article.
The conversational browser will respond to different tasks in different ways.
For example, during a shopping task the browser may help you by finding alternative prices for the item you're looking at. Or when you're reading a news article, the browser may show you what people are saying about the article's content on social media.
Each conversation between the browser and the user is based around an individual task, and each task will have different triggers.
It's often the case that a task will be completed over several webpages. This means that the conversation will carry over from one website to the next, with each contributing a little more to the chat thread.
The same is true if related websites are opened in different tabs.
If you change from one task to another while browsing the web, then the chat thread displayed by the browser will also switch. You'll only see the conversation relevant to the website that currently has your focus.
This also applies to tabbed browsing, meaning it is possible to have multiple ongoing chat threads running depending on which tab you have open, similar to how you may have different chat threads with different friends online.
The browser is intelligent enough to identify when tasks have changed or have become idle and will start new conversations accordingly.
What this means is that the browser isn't just having one conversation with you, it's running several, contextually relevant lines of dialogue that can help wherever your focus is at that moment in time.
Dialogue History
Over time, new messages get added to each task's chat thread with the browser. This becomes a history of the browser's conversation with you. An interactive timeline of websites you've visited, the interactions you've had and the important information you found.
Built into the conversational browser is a dashboard where these chats are grouped into task-specific threads that are easy to scroll through and search.
The effect is that each chat thread is a smart web history that allows you to finally close those open tabs because you can easily jump back into old tasks and pick up where you left off.
Here's the demo again if you'd like to have another go at it now that you've read all about its design:
What's Next?
The design for the conversational browser came about while building a product called Context Scout, which was created by myself and my co-founder Andy O'Harney. After coming through the Entrepreneur First programme, we have spent the past four years trying to build a business around getting browsers to do more to help people online. We've prototyped the technology as a Chrome browser add-on and experimented with different target markets.
Over that time, I've written about the limitations of search engines and how having a smart browser is like having a GPS for the web. Our former lead designer Emily Sappington also wrote about contextual AI for the browser and beyond.
The biggest challenge of productising the conversational browser is not so much the design, but building a coherent product message around a tool that has potentially thousands of use cases. The concept is intentionally generic, which makes it hard to identify with users about their particular online problems.
Further to this, there are flaws in this design that still need to be addressed:
- It's easy to overwhelm the user with too many options.
- Important functionality can be buried under too many conversation branches.
- Finding the right balance between showing information the user needs but not overwhelming them.
- Not being annoying.
Personalising the experience would be a way to minimise these problems. Given enough time and data, the browser could learn the types of task the user likes to do online, and how they like to do them. With this, it could automate most of the interactions needed to get things done.
And given enough users, the conversational browser could learn about new tasks and workflows by observing the collective effort of how people do things online.
Mobile Devices
48% of internet access now comes through mobile devices. A question I often get asked is whether this technology is suitable for mobile, and if not, does that mean it'll soon be irrelevant.
Mobile web usage is very different to desktop usage. Research shows that information needs on mobile tend to be localised and related to answering specific questions. Further, screen real estate is scarce and the internet is usually accessed through siloed, targeted apps i.e. shopping on the Amazon app, reading news on a news app etc.
The conversational browser works better with complex information needs, helping with scenarios where you're likely to open multiple tabs. It's these types of information needs that we tackle during the 3.5 hours per day we spend accessing the internet on our desktops.
Nonetheless, there is potential for the conversational browser to work on mobile, in tandem with desktop internet usage. Instead of embedding the chat thread within the browser itself, it could be displayed through an app on your mobile device.
In this way, chatting to the browser on your phone while you're at your desk will be no different to chatting to your friends on your phone.
The browser's messages and notifications could appear on your mobile while you browse the web on your desktop. This would open up the possibility for cross-device services. Services that are triggered by actions on your desktop but lead to apps being opened and used on your mobile
Over time and with enough traction, the conversational browser could start as a browser plug-in, move over to a complementary mobile app and then transition back onto the desktop as its own app, making it a cross-device, fully-fledged conversational web browser.
Conclusion
I've spent years thinking about the future of the web and how our interface to it could be improved. Web browsing technology still feels the same as it did 10 or 20 years ago and I strongly believe that disruption is on its way.
I wrote this article to share what we've learned so far and suggest one potential future interface for the web. Nonetheless, the conversational browser is more than just an interface. Programming it to deal with potentially thousands of tasks online with branching dialogue options requires a unique information architecture which I will talk about in a future blog post.
These ideas are not solely my own. Huge thanks to Andrew O'Harney, Stefano Bragaglia and Emily Sappington who contributed. Also to Sarah Thickett and Tim Hanson who kindly provided comments on this article.