Product manager Karen Kim kicked off the chat with a question: “How can we…?” Several Zoom meetings and one Miro board later, her team had a solid wishlist of engineer-dreamed ideas to pick from and deliver.
The Firefox web platform team often runs behind the scenes: People only take notice when things go wrong, like when a website breaks. Work revolves around Gecko, the engine that runs Firefox, and its ability to process and bring online content to your screen. Gecko is one of only three major engines shared among the many browsers we see today. So why should we not innovate based on our own engine’s unique qualities and capabilities to advance the web?
In the fall of 2021, Kim was approached by former Platform Internationalization Technical Lead Manager Zibi Braniecki (now a Mozilla alumni) with a hypothesis rooted to an internal problem: How can we make our product development process more inclusive? How can we make it more joyful to build? It made a lot of sense: Mozilla’s engineers have their finger on the pulse of the newest web technologies. Instead of approaching an engineer with a “here’s what we decided, now build it” attitude, why not partner up early, share ideas and build some magic together for the community?
In her book Continuous Discovery Habits, Teresa Torres emphasizes innovating in a product trio, which is comprised of a product manager, a software engineer and a designer. “When a product trio is tasked with an outcome, they have a choice,” she says. “They can choose to engage with customers, do the work required to truly understand their customers’ context, and focus on creating value for their customers.” Each role takes on a different area of expertise, so sharing these contexts early and using them to identify people-focused opportunities are integral to smooth cross-functional dynamics.
Following the conversation with Braniecki, Kim folded Torres’ insights into a series of blue sky (as in the sky’s the limit) sessions with platform engineering teams. If you know your improv, the objective was to foster “yes, and! energy” first: “If the sky’s the limit, what technologies that you’ve seen in movies like ‘The Matrix’ or ‘Mission Impossible’ and books do you wish we could have in Firefox? How can we harness Gecko to deliver a prototype to people?”
“Stop energy”— or setting limits by ruling out functionalities that fall out of scope — could come later. These blue sky sessions led to awesome ideas from engineers across the platform organization. The next step was to prioritize an idea that one to two engineers could build over a short development cycle. A committee of product managers and engineers walked through each idea and compiled a shortlist based on the following criteria: the Firefox 2022 vision, Mozilla’s vision for the web and big 2022 release themes. The ideal project needed to (1) align well with Mozilla’s current organizational priorities, and (2) have the potential to meaningfully impact the overall user experience. The shortlist then went through another review, this time to determine which idea would be the most feasible to build within the expected timeframe.
Text recognition (the ability to copy text directly from images, aka “Optical Character Recognition”) was selected to be the inaugural feature.
Now, onto how the text recognition capability was built.
Formal feature experiments are rare for the platform team, so platform discovery work became a new outlet for a growing experimental culture at Mozilla. The aim for this project was to build and test incrementally in a lightweight and low-friction environment. On top of Torres’ principles, the platform discovery process was designed to validate the theory that this environment could support a successful “product trio” moving nimbly through exploration and quick learnings.
To kick off production, Kim worked closely with a group of platform engineers as the product manager. After further scoping, they determined that they could ship text recognition the fastest if they started with macOS. Shortly after that, UX designers joined the project, and they soon determined what the prototype would look like. From then on, Kim from product, Greg Tatum from engineering, and Ryan Casey from design became a tight-knit trio to get the pilot feature across the finish line.
Building off of that aforementioned “yes, and! energy,” the team sought to make Text Recognition useful, accessible, and delightful – one that could positively impact the digital lives of users with disabilities/disabled usersFootnote  in particular. “Images make up a huge chunk of the content that we love to share with family and friends via text or social media,” Kim says, “but by nature, they’re limited. Internet users are estimated to share about 3.2 billion images daily, yet we continue to exclude members of our community who are blind or have low vision from accessing a medium that we enjoy so frequently!” In the industry, strides had been made to make images more inclusive. For example, alternative or alt text has emerged in the most prominent social media platforms. “We wanted to prioritize a meaningful feature that would stretch the traditional barriers of a hugely popular medium, but in order to build more mindfully, we needed to gather more awareness of inclusive design. That’s why we wanted to include our accessibility (a11y) team as early as possible.”
With guidance from Morgan Rae Reschenberg, senior accessibility platform engineer, the text recognition trio drafted more a11y-friendly designs, added screen reader support to the prototype, and ran thorough quality assurance testing to ensure a smooth experience that can reach as many of our community members as possible. If you own a Mac and use Firefox 106 or higher, you can use VoiceOver to read out the results when you copy text from images — even in several different languages. Jamie Teh, co-creator of the screen reader NVDA and Morgan’s colleague, put the partnership into words: “One of the joys of inclusive design is creating products that a lot of people can use but can be especially beneficial for some. Text recognition is part of our effort to make accessibility and inclusion an even bigger part of our process across Mozilla… The Firefox desktop platform and front-end teams got us involved early on so that when the feature shipped, it’s immediately accessible and delightfully so, rather than just ticking the required boxes.” It goes beyond cutting down time away from your browsing experience since you don’t have to retype the caption in your favorite meme or logistics from an event flyer — an accessible version of text recognition can open up a traditionally closed source of content such as an image with text to broader audiences.
It is thrilling to say that the contributors involved in the project will continue to work closely with Morgan and Jamie and refine the UX for Firefox’s text recognition capability. The feature will extend into other operating systems and become more joyful and easier to use.
The entire discovery process was driven by relationships, within the product trio and among partners. With the help of our accessibility, legal and QA teams, along with our internal Mozillian “Foxfooders,” text recognition got robust love and support from day one to delivery. The trio designed, built, gathered more insights, made some changes based on those insights, and moved forward. Maire Reavy, Firefox platform engineering director and project sponsor, said, “Relationships are built on trust, and trust is currency for a low-friction environment where we can be reactive and nimble.” To that point, relationships can be a product manager’s most important asset. Building trust by constantly communicating and opening up the floor for early collaboration can lead to a shared and felt product vision that can spark excitement and keep morale high. Trust also helps you get context, make decisions, align more easily OR apply a “disagree and commit” policy when you need to move forward. Relationships allow for more bonding and learning — there’s more spirit in the actual building of a product.
Platform discovery was a process – a cross-functional best-practice – where Mozillians could generate new ideas that could unlock the power and potential of Gecko. Stakeholders across disciplines identified potential growth areas where Mozilla could invest in early, and they brought together a product trio that tested, adapted and grew a feature in shorter cycles. In building text recognition, the four disciplines of product, engineering, design, and accessibility came together to ship a meaningful, joyful feature.
Return to content for footnote Author’s Note: The language around disability is ever-evolving, and currently, the most appropriate term is up to individual preference. Since we are referring to a broader group instead of an individual, we chose this term because it covers a) the people who are disabled because that is the way society treats them, b) the people who are disabled because of a medical condition or consider disability part of their core identity, and c) the people who identify as both.