Mozilla’s vision for the evolution of the Web
March 23, 2022
Mozilla's mission is to ensure that the Internet is a global public resource, open and accessible to all. We believe in an Internet that puts people first, where individuals can shape their own experience and are empowered, safe, and independent.
The Internet itself is low-level infrastructure — a connective backbone upon which other things are built. It’s essential that this backbone remains healthy, but it’s also not enough. People don’t experience the Internet directly. Rather, they experience it through the technology, products, and ecosystems built on top of it. The most important such system is the Web, which is by far the largest open communication system ever built.
This document describes our vision for the Web and how we intend to pursue that vision. We don’t have all the answers today, and we expect this vision to evolve over time as we identify new challenges and opportunities. We welcome collaboration — both in realizing this vision, and in expanding it in service of our mission.
While this document focuses on technical issues, we are well aware that many of the problems with the Web cannot be addressed solely by technology. Rather, technology must work hand-in-hand with social and policy changes to produce the Internet we want.
Our Values for the Web¶
We start by examining what makes the Web special. Here we identify a few key values, which guide our thinking. The Web in practice doesn’t always achieve these values, but we believe they reflect the Web at its best.
Everyone can access the Web, and use it to reach others.
A key strength of the Web is that there are minimal barriers to entry for both users and publishers. This differs from many other systems such as the telephone or television networks which limit full participation to large entities, inevitably resulting in a system that serves their interests rather than the needs of everyone. (Note: in this document "publishers" refers to entities who publish directly to users, as opposed to those who publish through a mediated platform.)
One key property that enables this is interoperability based on common standards; any endpoint which conforms to these standards is automatically part of the Web, and the standards themselves aim to avoid assumptions about the underlying hardware or software that might restrict where they can be deployed. This means that no single party decides which form-factors, devices, operating systems, and browsers may access the Web. It gives people more choices, and thus more avenues to overcome personal obstacles to access. Choices in assistive technology, localization, form-factor, and price, combined with thoughtful design of the standards themselves, all permit a wildly diverse group of people to reach the same Web.
Similarly, the technical architecture of the Web reduces gatekeeping on publishing. The URL acts as a global, low-friction, distributed namespace, allowing individuals to publish on their own terms, with comparatively little interference from centralized actors seeking to extract payment or exercise editorial control.
The installation-free nature of the Web also makes the default experience frictionless. Individuals can seamlessly browse between sites without obstruction or commitment, which empowers them to explore, participate, and choose when to form deeper relationships.
The Web is also durable. While not absolute, backwards-compatibility is a key principle of the Web: as new capabilities are added (typically as extensions to the core architecture), existing sites are still usable. The end result is that individuals can easily view older content and publishers can keep their content available and consistent over time without needing to recreate it.
All of these factors have worked together over time to give the Web incredible reach. The Web is not just a technology, but an enormous and thriving ecosystem that real people understand and use. Individuals can hear from a broad set of voices, and speak to a large audience. There is staggering human effort invested in the Web we have today, and it deserves thoughtful stewardship.
Once individuals reach the Web, they are empowered to accomplish their goals effectively and on their own terms.
The Web is versatile and expressive. Site authors have a wide range of tools at their disposal, enabling everything from simple information sharing to rich interactive experiences. Additionally, because websites are generally built by composing multiple subcomponents, authors can easily create powerful sites by building upon pre-existing work and services.
The deep flexibility and control afforded to authors also makes the Web open-ended. Unlike, say, cable television, the Web is routinely used in novel ways that platform operators never anticipated. This loose and unbounded nature can result in inconsistency across sites, but it also equips the Web with unique strength to serve a wide range of people and purposes.
Agency is not just for site authors, but also for individual users. The Web achieves this by offering people control. While other modalities aim to offer people choice — one can select from a menu of options such as channels on television or apps in an app store — the terms of each offering are mostly non-negotiable. Choice is good, but it’s not enough. Humans have diverse needs, and total reliance on providers to anticipate those needs is often inadequate.
The Web is different: because the basic design of the Web is intended to convey semantically meaningful information (rather than just an opaque stream of audio and video), users have a choice about how to interpret that information. If someone struggles with the color contrast or typography on a site, they can change it, or view it in Reader Mode. If someone chooses to browse the Web with assistive technology or an unusual form factor, they need not ask the site’s permission. If someone wants to block trackers, they can do that. And if they want to remix and reinterpret the content in more sophisticated ways, they can do that too.
All of this is possible because people have a user agent — the browser — which acts on their behalf. A user agent need not merely display the content the site provides, but can also shape the way it is displayed to better represent the user's interests. This can come in the form of controls allowing users to customize their experience, but also in the default settings for those controls. The result is a balance that offers unprecedented agency across constituencies: site authors have wide latitude in constructing the default experience, but individuals have the final say in how the site is interpreted. And because the Web is based on open standards, if users aren’t satisfied with one user agent, they can switch to another.
The experience of using the Web must not subject individuals to harm.
The core promise of the Web is that people may browse anywhere on the Web without unintuitive harmful consequences. It is not the individual’s responsibility to ascertain whether visiting a site will incur a bill, violate their privacy, sign them up for something, or infect their device with malware. This does not mean that it’s the browser’s job to prevent the user from seeing any content they might find objectionable – though of course users should be able to extend the browser to filter out content if that’s what they wish. Rather, the browser must protect the user from invisible harm, and if they don’t like a site — or something they see on a site — they can dismiss it with a single click.
This expectation is much stronger than with most native (i.e., downloadable rather than Web) software platforms. Traditionally, these platforms have not attempted to restrict the behavior of downloaded programs and instead required individuals to trust the author and install at their own risk. Newer platforms offer some limited technical protections, but struggle to convey the implications in a way that people can understand. As a consequence, such platforms largely lean on curated app stores and package signing to manage abuse. On the Web, no such mechanisms are needed because safety comes first: because the browser is designed to safely render any page, users can freely browse the Web without relying on someone curating the set of “safe” pages. This allows the Web to have very low barriers to entry and minimal central gatekeeping.
Safety is a promise, and maintaining it engenders trust and confidence in the Web. Trust in turn enables casual browsing, empowering individuals with unprecedented freedom to explore, and eliminating the need for gatekeepers to decide what may or may not be part of the Web. Gatekeepers can and do emerge for other reasons — but since they are not core to the design, we can work to reduce centralization without risking user safety.
Pursuing These Values¶
What do these values mean in practice? The Web is a mature technology, and so we don’t expect the Web in the foreseeable future to be radically different from the Web today. If we succeed, it can be the same Web at its core, only better:
Users will be able to surf without fear, knowing that they are safe not only from the sites they visit, but from attackers on the network stealing their data, as well as from being tracked around the Web.
Site authors will be able to build a broad range of content more easily than today. The Web will be the platform of choice for teams of all sizes and skill levels, making it simple to build smooth and beautiful sites and offering world-class capabilities for complex applications.
The interests of users and sites will be balanced, rather than tilted towards sites as they are today, with users able to experience the Web on their own terms.
Users will have their choice of what content they experience and who they can communicate with without being at the mercy of a few large sites. Small and medium-sized site authors will be able to succeed in reaching users without the permission of large players.
The Web will be accessible to many users who are currently shut out for economic or technical reasons.
In other words, we aim to fulfill the original ideal of the Web and the Internet as they should have been: a global resource, open and accessible to all.
The remainder of this section describes a number of specific technical areas of focus that are essential to fulfilling this vision.
Everyone’s activity on the Web should be private by default. Unfortunately, people are being spied on everywhere they go. We view this as a threat to everyone’s individual safety and to societal health, and are working through technology and policy to systematically identify and eliminate all mechanisms for cataloging individuals’ activity across sites.
The most widespread surveillance technique today is cross-site tracking, in which advertising networks leverage their presence on different sites to build detailed behavioral profiles of individuals based on their activity across the Web. While browsers have begun deploying anti-tracking measures (with Safari frequently leading the way), sites are also finding increasingly creative ways to track people using browser storage, navigation, primary identifiers, fingerprinting, or side-channel attacks. People rarely expect or meaningfully consent to this level of surveillance, so our ultimate objective is to eliminate cross-site tracking on the Web. We recognize that this may reduce the ability of many sites to monetize user visits via individually-targeted online advertising, but we consider this to be an acceptable tradeoff, and we expect profile-based ads to become less appealing because of privacy regulations and recent advances in contextual advertising.
Data Collection by Network Providers¶
Even with the current generation of largely encrypted network protocols, Internet Service Providers (ISPs) and Mobile Network Operators (MNOs) have significant visibility into user activity, both longitudinally (most people only send traffic through one or two providers) and horizontally (most traffic is carried by one of a few large providers). In some countries there are few barriers against misusing that information. As with cross-site tracking, our objective is to minimize the amount of data leaked to service providers, which means systematically closing a series of information leaks, with the three biggest being DNS, TLS Server Name Indication (SNI), and the IP address of the destination website.
We are fairly far along the path of protecting DNS traffic using a combination of technology and policy, particularly in the United States and Canada (with work underway for other jurisdictions). First, we are deploying DNS-over-HTTPS by default so that only a single party sees the target domain. Second, we require the DNS endpoints in Mozilla's Trusted Recursive Resolver program to legally commit to gathering a limited subset of data only for operational purposes, keeping the data confidential and deleting it promptly. Ideally the network provider agrees to these terms, but if they don’t, we can direct the query to a party who does. Long-term we also aim to augment these legal guarantees with technical mechanisms like oblivious DoH.
Protecting the SNI field is more difficult but we are actively working on Encrypted Client Hello, which will help to some degree, leaving the primary leak being the server’s IP address. While technologies such as VPNs and encrypted proxies can help prevent this attack, they are not currently practical for most ordinary uses. Providing complete privacy from ISP data collection looks to be a difficult problem and may ultimately require some combination of policy and technology. Fortunately we are seeing increasing progress in the policy domain,
with more and more jurisdictions looking to limit the capacity of network providers to process and retain user data.
Protecting Browser Data¶
Your browsing history is just that, yours. However, many services we offer to assist Firefox users, such as sharing your browsing history and bookmarks between devices, work better with a service in the cloud. Our objective is to provide those services in a way that keeps your data secure and private, even from us. For example, Firefox Sync works by storing data on our servers so that you can retrieve it from one device even if another device is offline or destroyed. However, before storing the data, we encrypt it with a user-held key, thus preventing us from seeing the contents.
A more challenging case is remote measurements: most browsers report data back to the browser maker to help them improve their product and evolve the Web. It’s easy to slide down the slope of vacuuming up more and more data, and we work hard to not do that with Firefox. However, there are occasionally specific pieces of data — for instance pages where users are experiencing problems — which are simultaneously very useful for improving the browser and would also reveal the user’s private information. Privacy always comes first for Mozilla, but an ever-present trade-off between privacy and product quality is not a healthy state for the industry. So we’re enthusiastic to advance Privacy Preserving Measurement technologies like Prio to enable browsers and other software products to provide strong privacy guarantees without putting themselves at a disadvantage.
A Web that kept all history confidential would be a vast improvement over the Web we have today. However, certain publishers and platforms — particularly those of tech giants with many widely-used services — still know far too much about too many people, which puts safety at risk and entrenches incumbents. Unfortunately, such entities are unlikely to change this situation on their own: data is a competitive asset, and the easiest and most flexible approach is to record as much as possible and store it in unencrypted form. There are emerging techniques that promise to allow sites to provide a good experience without compromising on user privacy, but they are unlikely to see wide adoption without a significant restructuring of sites' incentives. This is a challenging problem to which we don’t have all the answers, but we will eventually need to find a solution if we are to protect peoples’ privacy.
The Web’s core promise of safety should mean that sites must never be able to compromise someone’s device. However, as a practical matter, this has not turned out to be true. The Web has an enormous API surface and therefore an enormous attack surface. Web browsers are primarily written with languages (C/C++) and techniques that have proven to be extremely difficult to use correctly, and that fail catastrophically when bugs inevitably occur. Nearly every release of any major browser includes fixes for remotely exploitable security vulnerabilities. Simply put, browsers are not living up to the guarantee that it is safe to browse. If the Web is to succeed, we need to fulfill that promise.
Unfortunately, all major browsers contain large amounts of C and C++ code, and rewriting all of it would require impractical levels of resourcing and introduce an unacceptable volume of new defects. The standard defense — pioneered by Chrome — is to isolate pieces of the browser in their own “sandboxed” processes, thus limiting the effect of compromise. This is an important technique but one which is running into a number of limitations, principally because of the resource cost of the sandbox and the engineering cost of separating out each individual component. It seems likely that the level of sandboxing implemented by Chrome, with each site in its own process, is close to the limit of practicality. Fortunately, new lightweight in-process sandboxing techniques are now available, making it possible to cheaply sandbox individual components. We believe these will allow us to protect existing C/C++ code with minimal resource consumption and effort.
Unlike many of the technologies described in this document, most of the work in this area is invisible to users. In a few cases some changes to the Web Platform are required, such as the COOP and COEP countermeasures to timing attacks like Spectre. But in general, what we want is for browsers to just get invisibly more secure. While this requires less formal coordination than changes to the observable Web Platform, it’s such a hard problem that browsers informally cooperate to share techniques and knowledge, with the result being a more secure Web.
The Web began as a research project in 1989 and like nearly all systems of that era it was unencrypted. Today’s Internet is a hostile place for unencrypted traffic and we need to ensure the confidentiality, integrity, and authenticity of every bit that is transmitted or received.
The Web also makes use of a number of other protocols beyond HTTP. We should secure new protocols from the ground up, as we did with QUIC, WebPush, and WebRTC. Furthermore, we should look for opportunities to introduce end-to-end encryption at the application layer with protocols like Messaging Layer Security. For established unencrypted protocols like DNS, we need to follow the path of HTTP by defining encrypted versions like DNS-over-HTTPS and then gradually transitioning the ecosystem to full encryption.
At the same time as technology is enabling ubiquitous encryption, we also see activity by governments to weaken encryption. We believe that this represents a threat to the security and privacy of the Internet and that governments should work to strengthen user security, not weaken it.
Safety for New Capabilities¶
Safety concerns also come into play whenever we consider adding new capabilities to the Web Platform. There are many advantages to publishing on the Web, but the Web also limits what sites can do. This has resulted in ongoing efforts to expand the variety of content that can be delivered via the browser by adding new capabilities, such as access to the camera or microphone for WebRTC interactions, running fast compiled code written in any language with WebAssembly, and more immersive apps with the Fullscreen API. For the most part, this is a good thing: people can access an ever-wider set of experiences with all the benefits that the Web brings. However, new capabilities can introduce risks which must be managed carefully.
Ideally, new capabilities can be designed so that they can be safely exposed to any site without requesting the user’s permission. Often this takes some care and results in a feature which is somewhat different from the equivalent functionality available to native applications. For example, it is not safe to allow Web content to open up an arbitrary TCP socket to any server on the Internet because that capability could be used to bypass corporate firewalls. Instead, WebSockets and WebRTC both implement mechanisms which ensure that the website can only talk to servers which have consented to communicate with them. This allows us to make those APIs universally available without having to explain the risks to the user.
However, other capabilities — like camera access — involve inherent risks which are difficult to sidestep. In these cases, we cannot safely expose the capability by default. In some circumstances, it may be appropriate to offer people the choice to enable it for a given site. We use the following criteria to evaluate whether that is the case:
Value: Does the capability deliver sufficient benefit to the user to warrant the risk?
Severity: What level of harm can occur if the capability is misused?
Consent: Can individuals make an informed decision about the risk?
Transparency: Is it clear how the capability is being used, and by whom?
Revocability: What happens if someone changes their mind?
For camera access, the consequences of misuse are serious but they’re also easy to understand: the site can see whatever the camera is pointed at and users have the ability to cover or reorient their camera if they are concerned about covert usage by sites to which they have granted permission. Additionally, revoking camera access has clear implications and is easy to verify, while the value of teleconferencing on the Web is substantial. Combined with a variety of technical mitigations, we concluded that it was appropriate to offer camera access on the Web.
Permission prompts alone cannot make every capability safe, however, and we are wary of subjecting people to a barrage of inscrutable and high-stakes choices, along with the resulting decision fatigue. Some features require careful consideration of consequences, which is directly at odds with the goal of casual browsing. Moreover, individuals are often ill-equipped to understand the risks and consequences. In practice, evidence suggests that many people just accept whatever prompts they’re offered — particularly as they become more common. If this results in surprise and regret, individuals will lose trust in the Web.
For example, we think the proposed WebUSB API is not good for the Web. WebUSB enables low-level communication between a website and any device plugged into someone’s computer. Most USB devices are not hardened against adversarial input, and connecting them to untrusted sources could enable credential theft, malware injection, surveillance, or even physical harm. These risks are opaque to most people, and the protocol operates quickly and silently without any standard indication of whether and how it is being used. And while WebUSB offers real benefits for certain specialized tasks like updating firmware, those tasks are generally not things most people do on a daily basis. For this reason we concluded that it was not safe to offer the WebUSB API. By contrast, higher-level APIs like WebAuthn allow for the use of USB tokens for specific applications we have determined to be safe.
As a general principle, we are enthusiastic about bringing more content and applications to the Web. But certain applications may just not be suitable for the Web at large, and that’s OK. In some cases, we may be able to resolve this tension by allowing users to extend their browser to provide elevated capabilities to specific sites they trust (see Extensibility. But at the end of the day, our mission does not require us to move every last non-browser application into the browser. It does, however, require us to keep people safe on the Web.
People often discuss software performance in terms of abstract throughput (e.g., "twice as fast"), but what mostly matters on the Web is responsiveness, which users experience as the absence of friction. Site authors conceive experiences as instant and smooth, but delays and stutters arise as practical defects which cause the result to fall short of what was promised. The consequences of these defects fall on users, who experience frustration, cognitive overhead, and inability to accomplish their goals. This friction also reflects poorly on the site and impedes its objectives. Everyone pays a cost and nobody benefits.
We can't make every operation instantaneous. But fortunately, there are well-understood time budgets within which humans experience interactions as frustration-free. Roughly speaking, these boil down to 16 milliseconds for animation frames, 100 milliseconds for input feedback, and 1000 milliseconds for navigation. We want every user to consistently experience sites within these budgets so that the Web serves peoples' needs without making them miserable.
Unfortunately, we're nowhere close to that goal — and despite considerable advancements in hardware and software, we've made very little progress towards achieving it. Simply put, people are building much more demanding sites than they were before, primarily in order to create richer experiences with less effort. As with encryption, we know by now that asking site authors to make sacrifices to achieve a responsive experience will not achieve the outcome we want. We need to offer authors the capabilities to build the experience they want in a performant way, and ensure that the easiest way to build something is also the fastest. And since performance is only as strong as its weakest link, we need to systematically apply this thinking to every layer of the stack.
Under the Hood¶
Second, we need to identify and eliminate performance cliffs. Site authors often struggle to obtain good performance in the Web Platform, and small, seemingly innocuous changes can make a fast page inexplicably slow. Delivering speed with consistency reduces hiccups for users and empowers authors to create and iterate without fear. For example, it's significantly more expensive to perform text layout for bidirectional languages like Arabic and Hebrew than for unidirectional languages. Firefox previously conditioned this extra computation on the presence of any bidirectional text in the document, but this meant that including a single word from such a language — by, for example, linking to a translation — would make the page substantially slower even if the vast majority of the text was unidirectional. Firefox now makes this decision at a more-granular level, but there's still work to do to eliminate the overhead completely.
Finally, we’ve seen numerous optimizations to individual browsers subsystems, but insufficient focus on the big picture of how these systems operate together. For example, scheduling the same work in a smarter order can have a much larger impact on the experience than an incremental reduction in total computation. Similarly, better cache management to improve the reuse of high-value resources can avoid computation and fetches altogether. We see significant opportunities to improve Firefox performance with these holistic and cross-cutting approaches, and will pursue them going forward.
There are limits to what browsers can optimize without the site’s help, and there are limits to the kinds of experiences sites can build without the right abstractions. This often necessitates new additions to the Web Platform to allow sites and browsers to cooperate on performance. These sorts of enhancements generally fall into a few categories.
First, they can provide sites with a smoother and more-specific mechanism to perform a task, with fewer constraints and observable side-effects that might require the browser to perform unnecessary work. For example, Intersection Observers supplanted much more expensive techniques for measuring when elements enter and leave the viewport.
Designing these capabilities well isn’t easy, for several reasons: they need to improve performance substantially to be worth doing; they need to be general enough to be broadly useful; they need to integrate seamlessly into the existing Web Platform; they need to be straightforward to implement across multiple browsers and hardware architectures; they need to carefully manage the risk of unintended harm to other priorities (i.e., privacy and security); and they need to be simple enough for a wide range of site authors to understand and deploy. This is a lot to ask, but we’ve seen impressive work so far and believe the industry is up to challenges ahead.
Poor networking performance is one of the most obvious contributors to an overall slow Web experience. While to a great extent this is a result of slow network connections, in many cases we are also failing to make optimal use of those connections. In some cases these changes can be made unilaterally on the client or server side, but in others they require improvements to the Web Platform or the networking protocols that it uses.
When data must be sent, it is important to schedule that transmission to minimize the work done on the critical path. For instance, historically browsers used OCSP for certificate revocation checking. Because OCSP servers are often slow and pages cannot be rendered prior to OSCP check completion, this contributes to page load latency. Increasingly browsers are preloading certificate status using technologies such as CRLSets in Chrome or CRLite in Firefox. This also has the advantage of improving user privacy by not leaking to the OCSP server which certificates the browser has requested. It seems likely that similar optimizations are possible elsewhere.
We can also improve our basic networking protocols. In recent years, we have seen the development of HTTP/2, TLS 1.3, and QUIC, all of which are designed to reduce latency — especially in the area of connection setup, which is a large contributor to overall performance. An additional benefit is that these new protocols are always encrypted while offering comparable — if not better — performance than the unencrypted protocols they replace, encouraging full encryption of the Internet. As deployment of these protocols increases, the Web will get faster, and it will get easier to obtain good performance without having to resort to the kind of hacks that sites have traditionally used to compensate for poor HTTP performance. In addition, there are a number of new potential areas for improvement (forward error correction, better prioritization, etc.) that have yet to be explored, and we expect to see more work on these in the future.
Finally, we can reduce network latency by bringing the endpoints closer together. In the early days of the Internet, geographically-distributed hosting was available only to the most well-resourced publishers, but over time innovation and competition among CDN providers have greatly expanded the share of sites using this technique. Similarly, new edge computation techniques allow developers to apply the same approach to dynamic responses using standard Web technologies like WebAssembly. We see significant potential for innovation in this space to speed up the Web, and look forward to seeing it evolve.
No matter how much we improve the platform, we can’t make every operation instantaneous. Moreover, backwards-compatibility constraints make it very difficult to prevent sites from using inefficient patterns. Because it will always be possible to construct slow sites, authors play a crucial role in achieving a fast Web.
The traditional approach to this has been to assume that authors will monitor and tune the performance of their site, and offer them diagnostics with which to do so. We’ve seen great advancements in this area with Web APIs like Navigation Timing, developer tools like the Firefox Profiler, and services like Lighthouse and WebPageTest. However, while powerful diagnostics are necessary for a fast Web, they’re not sufficient. Despite the availability of these tools, many sites are still slow because their authors lack the resources, interest, or expertise to optimize performance. There are two broad ways to approach this.
First, we can provide sites with appropriate incentives to be fast. Sites already have natural incentives to optimize performance in order to improve retention and conversion rates, but despite keynote speakers at Web conferences highlighting these findings for years, they appear to be insufficient to generate the kind of industry-wide change we want to see. However, the enormous size of the SEO industry demonstrates that many sites will go to great lengths to improve search placement, so we’re happy to see Google’s Web Vitals initiative directly tie search placement to page performance.
Second, we can make it easier and more automatic to build fast sites. This is difficult for browsers to do unilaterally: making something automatic requires being opinionated, and it's hard for a platform as general as the Web to be opinionated about some of the complex areas — like resource loading and state management — where performance issues commonly arise. Over the past decade, an ecosystem of tools and frameworks has evolved to fill these gaps, with some powering hundreds of thousands of sites across the Web today. Consequently, the design choices and defaults of these building blocks have a large impact on the performance characteristics of the Web. The pace of evolution for these tools is impressive, and so while we see areas for improvement, we are optimistic that they will be addressed with time. We don’t intend to directly drive this evolution ourselves, but are enthusiastic to collaborate with the developers of these building blocks — incumbents and newcomers alike — to provide the necessary foundations in the platform.
Building websites has gotten substantially easier in many ways, but it’s also become more complex, and there remain a number of pain points which make the experience more difficult than it needs to be. This has several negative consequences. First, it disempowers site authors by hampering their ability to express themselves. Second, it drives content to native app platforms, which diminishes the Web’s reach. Finally, it encourages centralization by tilting the playing field towards large publishers and platform providers with sophisticated engineering teams and complex infrastructure. Our goal is to reverse these trends by making it easier to build and maintain sites.
The most powerful way to make something easier is to make it simpler, so we aim to reduce the total complexity that authors need to grapple with to produce their desired result. To be most effective, we have to prioritize what we simplify, so our strategy is to categorize development techniques into increasing tiers of complexity, and then work to eliminate the usability gaps that push people up the ladder towards more complex approaches. Typically this means building new features that allow publishers to more easily perform functions that previously required large amounts of code, often in the form of monolithic, third-party libraries, frameworks, or platforms.
The Declarative Web¶
There are two deficiencies here that are worth addressing. The first is the lack of good standardized controls that are also easily styleable across browsers. Native app platforms such as iOS and Android provide rich libraries of controls which perform well and are styled to match the rest of the platform. By contrast, the base Web platform is comparatively deficient, with a much more limited set of built-in controls. And where the web does have equivalent controls, they’re often insufficiently styleable and have inconsistent internal structures across browsers, which makes it difficult to make them visually consistent with the rest of the Web page. We want to fill these gaps, and are pleased to see the OpenUI effort already making progress in this space.
The Web is unique because it offers users unparalleled control over the content they experience. This control is the foundation of agency, but the reality has not always lived up to the promise, and we see threats to it going forward. Consequently, we seek to protect and expand the mechanisms that empower people to experience the Web on their own terms.
The Web’s superior ability to offer control comes from its technical architecture, in which users have a choice in their user agent (i.e., browser) and sites communicate information in a way that is receptive to reinterpretation. HTML and CSS offer semantic transparency, providing the browser with a model of the presentation which can be modified or reinterpreted. Web standards give the browser wide discretion to operate, and the loose coupling of Web Platform features and their uneven and incremental deployment discourages sites from making hard assumptions about the final result. These technical properties are a necessary ingredient to effective controls, but they’re under threat from several angles.
First, the emergence of more powerful and complicated toolchains can obscure semantic intent and hinder reinterpretation. For example, developers often encode extensive semantic information in a framework’s component hierarchy, but that information gets stripped out by the tools, leaving the browser with a soup of div elements. Worse, some frameworks aim to bypass the DOM entirely by rendering directly to a canvas element. Where possible, we try to work with frameworks to find elegant and efficient mechanisms to provide the browser with a meaningful semantic model of the content.
Second, as new types of content are added to the Web, it can be technically and politically challenging to integrate them with semantic transparency in mind. For example, text-oriented sites like newspapers and magazines are usually rendered directly with the usual Web primitives, making it easy for users to save, reformat, remix, or translate them. By contrast, strong demands for digital rights management (DRM) technologies for audio and video lead to their incorporation into the Web Platform with Encrypted Media Extensions. Faced with the prospect of people abandoning Firefox en masse to access streaming services, we eventually chose to support EME, but nevertheless view it as a regrettable chapter in the Web’s evolution.
Reinterpretable content is necessary for control, but it's not sufficient: the browser needs to provide users with levers by which to control their experience. Sites and browsers can work together to offer this control, for example by using prefers-color-scheme to customize the presentation in accordance with the user’s wishes.
There is a natural tension between functionality and simplicity. Offering control often means offering a feature, but too many features can be confusing and overwhelming, ultimately hindering people's agency. Just as sites cannot anticipate all user needs, neither can browsers.
Extensibility resolves this tension by allowing people to customize their browsing experience with add-ons. Developers have a high tolerance for scope, so browsers can offer numerous and configurable extension points to enable them to build a wide variety of features. The menu of available add-ons then provides users with extensive on-demand levers to meet their needs while keeping the default experience simple.
Add-ons have access to much more powerful capabilities than sites do, which makes them distinctly not casual. This necessitates some degree of gatekeeping and curation in order to keep people safe from malicious add-ons. We are exploring ways to reduce this friction, but ultimately need to exercise some degree of oversight to balance openness, agency, and safety for browser extensions.
Add-ons can also provide a mechanism for extending the regular Web Platform with features which are too dangerous to provide by default. Because users must explicitly install add-ons and are reminded as part of the installation experience that add-ons have powerful privileges beyond those of ordinary Web pages, site-specific add-ons can allow users to provide elevated capabilities to sites they trust without compromising the casual interaction model that underpins the Web. We are actively experimenting with this approach in Firefox.
Mediating Between Competing Interests¶
Empowering users to enhance their own experience is usually good for everyone, but sometimes sites and users have opposing goals. In these cases, sites may seek to limit user control. Sites have extensive capabilities for advancing their interests. The role of a browser like Firefox is to level the playing field by acting on behalf of individuals to provide them with tools and aggregate their influence.
This misalignment of interests tends to manifest in a few common dimensions. First, many sites narrowly focus on their own engagement and seek to commandeer the user’s attention in distracting and invasive ways. Firefox has a long history of countermeasures to these techniques, originally with pop-up blocking and more recently by restricting auto-playing videos and notification abuse, which we plan to continue. Second, many sites are deeply invested in running intrusive scripts in order to generate revenue or collect analytics, and thus disapprove of tracking protection features or content-blocking add-ons. Finally, sites sometimes attempt to disable user capabilities like form autofill, copy and paste, or right-click — either in an attempt to safeguard their intellectual property, or because they view themselves to be a better judge of the user’s best interest. In all of these cases, we believe browsers should find creative technical and non-technical solutions to keep the user in control.
The technical forms of reinterpretation described above give the user some control over their experience but they tend to fall down in allowing users to shape their experience on communications platforms (email, social networks, etc.). This is because these platforms use generic semantic structures to deliver dynamic (often user-generated) content, and so it's difficult for browsers to differentiate between what the user wants and what they don't. For instance, while it is possible to block all ads, blocking certain types of ads is a much harder problem, and filtering comments is even harder.
In these situations, the primary method for controlling one’s experience is through whatever controls are provided by the platform, which are all too often limited or opaque. This can lead to situations in which users have very negative experiences (including misinformation, bullying, doxxing, even death threats) and are helpless to do anything about it beyond disengaging entirely. In addition to cases where the platform simply fails to give users control, in the past platforms have actively cooperated in abusive behavior, for instance via providing extremely fine-grained targeting for advertisements designed to manipulate users' political behavior. For obvious reasons, in these cases platforms are not incentivized to provide control over these aspects of user experience.
We do not well understand how to solve this set of problems: while it is possible that enhancements to browsers or the Web Platform can help to some extent by surfacing more information that can then be used for filtering, it seems likely that creating a more positive experience for all users of the Web will eventually require social and policy changes as well as technological ones.
Less than 20% of the world’s population speaks English, and less than 5% speak it natively. The Web cannot adequately serve humanity if it provides a first-class experience only in English or a handful of dominant languages. Unfortunately, the prevalence of English among the people developing the Web’s technical infrastructure has often resulted in just that. We want the Web to work well for everyone regardless of where they live and what languages they speak.
However, support for local languages isn’t enough; sites need to actually be available in the languages people understand. Many broadly relevant sites could be useful to a much larger audience if linguistic barriers were overcome, but exist only in English and are not translated even into languages which are well-supported by the Web Platform. In order to bring the Web to everyone, we need to make it as easy as possible to support all locales.
For sites which have the resources to invest in localization, we need technology that enables that. Traditionally, translation was accomplished by simply translating blocks of text. Today's rich Web applications can be very dynamic, and thus require substantially more nuance in order to handle genders, multiple nested plural categories, declensions, and other grammatical features which differ across languages and dialects. Moreover, the experience of generating localizations and applying them to sites is error-prone and cumbersome. We want it to be as easy and flexible to localize a site as it is to style it with CSS. Over the past decade we’ve built such a system for Firefox, called Fluent. Beyond localizing Firefox, we’re working with others to bring the ideas and technology behind Fluent to other client-side software projects with the ICU4X Rust library and to the Web with the Unicode Consortium's MessageFormat Working Group. Together with a new proposal for DOM localization, we see a much more powerful story for localizing Web Sites on the horizon.
However, we know that many sites simply cannot invest in translation and will provide their content in only a small number of languages — perhaps just one. In these cases, technologies such as automated translation (and better yet, client-side translation) can still bring these sites to the world. These technologies depend on being able to access the site at a semantic level so that they can properly understand the context of the text they are translating, either through the basic Web mechanisms that allow for reinterpretation or — better yet — by explicitly providing semantic structure via mechanisms like MessageFormat.
Roughly one billion people live with some form of disability. For these people, the advent of the Web was a great step forward in their ability to participate in information exchange. Unlike other media, the early Web’s simple structure and semantic transparency made it practical to interpret with assistive technology like screen readers without much or any explicit consideration from the site (alternative text for images being one notable, but often tolerable, exception). Combined with browser features for controlling things like font size and color, the Web provides wide flexibility for overcoming obstacles to access. However, as sites evolved from simple documents to much richer and more-complex experiences, its accessibility has worsened.
The biggest challenge is that modern site-building techniques tend to require much more intentional effort by the author in order to deliver an accessible experience. The Web began with only a few dynamic elements, like
Setting a High Bar¶
We want the Web to be around for a long time, so it’s important to get it right. This means browser vendors should collectively set a high bar for Web Platform quality. Every company is influenced by its own agenda, business needs, and politics. Left unchecked, those pressures can easily result in the inclusion of ill-conceived features that everyone comes to regret. This dynamic was on full display in the era of Internet Explorer 6, and we’re still unwinding the consequences.
The Web’s multi-stakeholder development process is far from perfect, but nonetheless serves as a powerful bulwark against a corporate agenda-du-jour pushing bad ideas into the Web. Every organization suffers from this blind spot, including Mozilla. Some time ago, we proposed a plethora of half-baked Web APIs as part of our FirefoxOS initiative to build a Web-based operating system. Other vendors largely ignored or rejected these proposals — which was frustrating at the time, but which we are deeply grateful for today. We believe the Web deserves a high bar, and invite others to hold us to it.
While the Web has been fantastically successful in displacing old-style "desktop" applications on personal computers, native apps remain dominant on mobile devices. This is true even for sites like Facebook and Twitter which have powerful, heavily-used Web-based versions for desktop users. There are a number of reasons why developers often choose to target mobile apps rather than the Web, including:
- Built-in app stores dramatically reduce the friction of finding and installing native software. In addition, because those apps are curated and (to some extent) sandboxed, users feel more confident installing them than they would downloading an executable to their personal computer.
- Apps have a built-in monetization story facilitated by the operating system vendor (albeit with a significant fraction paid to the platform). By contrast, Web monetization is largely DIY.
- Mobile apps are usually smoother and perform better on their intended devices. The Android and iOS teams have invested years in building fast, smooth, high quality widget sets that enable any developer to deliver a good experience. By contrast, even experienced developers struggle to produce comparable experiences through the Web Platform. As an extreme example, on desktop operating systems Firefox uses Web technologies to render its UI but on Android we found we needed to use the native widgets.
- Native apps can take advantage of capabilities which are not offered by the Web, such as appearing on the home screen or accessing certain sensors.
While there has been significant progress on making it possible for Web-based applications to act more like mobile apps in the form of Progressive Web Applications (PWAs), native applications still dominate the space. In a number of cases, developers will choose to have both a PWA and a native app, but this just serves to highlight the quality gap between the native and Web experiences.
Much of the discussion around mobile capabilities has been driven by Google’s Project Fugu, which aims to “close gaps in the web's capabilities enabling new classes of applications to run on the web”, mostly by adding new APIs that provide capabilities that are new to the Web. While it is certainly the case that some applications which are currently deployed as native apps could instead be built as Web apps if only "API X" were available, the evidence that this would result in a wholesale shift from native to the Web is thin. To the contrary, the fact that developers choose to build native applications rather than PWAs — even when they do not need any special capabilities — suggests that this strategy will not be effective, and despite numerous new capability APIs in Chrome, developers still seem to favor native apps.
Even for the casual browsing use cases where the Web shines, we know that the experience is dramatically worse on mobile devices. This is due to a combination of factors including the limitations of small screen sizes, generally slower processors, battery lifetime, poor animation APIs, and slower networks. All of these lead to a situation where mobile is a much worse experience than desktop even on the same content, especially when the content — or the browsers — just reflect the desktop experience shrunken down to the mobile form factor and idioms.
The conclusion we draw from this is that the place to start when thinking about mobile is to address the use cases that are in principle well-served by the Web model but in practice have not been served well by the mobile Web. To a great extent this consists of the incremental improvements we have discussed earlier in this document (improve overall performance and responsiveness, frameworks which perform well by default, and browser and framework affordances which adapt to multiple screen sizes). None of this is impossible, it is merely the hard work of systematically working through every source of friction which stands in the way of having a first class Web experience on mobile.
One improvement in particular stands out: monetization. The vast majority of the Web and much of the app ecosystem is funded by advertising; yet we know that display advertising is especially problematic on the mobile Web, both because of scarce screen real estate and because of the network performance impact of loading the ads themselves. A better monetization story would not only make the mobile Web more attractive but would also help remove one of the major factors driving people toward native apps.
Our objective is not to displace native apps entirely on mobile. Rather, it is to address the forces that push mobile users off of the Web even for the casual use cases where the Web ought to shine. Success here looks like a world in which the Web works so well in these situations that users are mostly indifferent to whether their interactions are with apps or with websites. We want a world in which developers are not forced to invest in an app merely to get an acceptable user experience, but are still able to build apps in situations where that makes sense.
The Internet consumes a lot of electricity. Estimates vary, but even conservative ones show data centers, communications infrastructure, and consumer devices each consuming hundreds of Terawatt-hours per year. This accounts for several percent of global electricity consumption, much of which is generated by fossil fuels. Simultaneously, the Internet is also a force for reducing carbon emissions by replacing energy-intensive activities such as travel-intensive meetings and paper mail with videoconferencing and electronic messaging. It also brings value to the world in many other ways, and people quickly adapt their lives to depend on its expanding capabilities. As such, simply turning it off or degrading its functionality (e.g., storing and transmitting videos at low resolutions) in order to conserve energy are not realistic solutions.
At the same time, we know that this energy consumption contributes to global climate change. While addressing this issue will ultimately require decarbonizing global electricity generation, we should look for areas where the Internet can improve today. In this vein, we see two key properties in the Internet that create opportunities to mitigate its carbon footprint in advance of a fully-renewable grid.
The first property is the relative location-independence of computation. While client endpoints and network infrastructure are not easy to move, data centers can be positioned much more intelligently and collocate services for many different customers. This allows them to either locate in places where clean energy is already abundant, or use their scale and flexibility to negotiate additional renewable capacity in areas that lack it. To the extent that market and regulatory incentives are insufficient, the recent uptick in voluntary corporate commitments to sustainable cloud services demonstrates the power of public opinion to drive these changes.
The second property is the low-friction nature of software distribution, which means that design decisions in widely-used protocols and implementations can have an outsized impact on electricity consumption. This means that we should be conscious of these considerations when designing new systems that will be deployed at scale. In many cases, the right incentives exist already. For example, consumer software increasingly runs on battery-powered devices, which creates competitive pressure to be judicious about energy consumption (it’s certainly something we spend a lot of effort on in Firefox). This also has the advantage of allowing the Web to work on low-powered devices, thus reducing the rate at which people have to replace their devices and the concomitant environmental impact. However, this isn’t always the case. For example, Bitcoin’s use of a proof-of-work algorithm for blockchain integrity causes it to consume an enormous amount of electricity, and incentivizes behavior that is bad for the environment and bad for users. We should be thoughtful in the design of new systems to ensure that the requirements enable and encourage implementations to operate in an energy-efficient way, and then work to identify the biggest opportunities for saving energies in existing systems operating at scale and direct our efforts towards optimizing them.
What We Don’t Know¶
There are some problems with the Web which we find concerning but do not yet have a clear strategy to address. We have some ideas in these areas but no silver bullets. Ultimately, we aim to collaborate with a broad coalition of like-minded organizations and individuals to identify and pursue effective measures.
We want a Web without gatekeepers. The Web’s open and distributed architecture entails much less inherent centralization than other modalities such as radio or television. Nevertheless, there are powerful technical and market forces which incentivize consolidation, and we’ve seen the Web drift towards centralization along a number of axes, including network providers, hosting companies, software developers, and content platforms. The latter category is perhaps most significant: because a huge fraction of content is accessed from a small number of platforms, they exert an enormous amount of control over user experience on the Web. Even in the best case, this has the potential to privilege certain perspectives over others, but as recent events have shown, having people receive so much of their information from a small number of platforms is a powerful force for discrimination, misinformation, and manipulation.
The Web is primarily funded through advertising. While this has the advantage that it allows people to access huge amounts of content free of charge, it also has many negative consequences. People find the ads themselves distracting and annoying, leading them to install ad blockers which can undercut the business model for many publishers. Furthermore, over time advertising on the Web has moved from simple banner ads to individually targeted ads, which depend on surveillance and profiling technologies that are deeply problematic. This is not good for users. However, even publishers generally find the situation unsatisfactory as more and more of their revenue ends up in the pockets of large ad networks and the value of their relationship with the user leaks out to third-party websites. The present model is not working and it threatens the survival of the Web as a positive force in the world.
While this is clearly a central question for the health of the Web, we do not yet have a good answer. We are taking some unilateral measures to protect our users, but we recognize that this will not be enough. Ultimately, we need to collaborate across the ecosystem to devise better monetization models that work for users and publishers. We are heartened to see publishers and others considering alternative models. For example:
- Some publishers are exploring contextual advertising methods that leverage machine learning to offer high-value ads without tracking users.
- A variety of apps and services have arisen to offer value to end users while enabling publishers to monetize users who would never pay for a subscription.
- There are some efforts underway to examine how to make the display advertising ecosystem more respectful of user privacy.
It remains to be seen whether any of these efforts will bear fruit, but solving this problem is essential for the future of the Web.
The Web is an enormous asset to humanity and Mozilla is committed to protecting it and making it better. Powerful economic and technological forces have combined to make the Web the way it is today. Making it better won’t be easy and we can’t do it alone. Some parts of the road ahead are clear and some – especially how to address monetization and centralization – are much murkier, but we believe that we can all work together as a community to make a Web that is truly open and accessible to all.