Building Transparent Ranking Algorithms (Part 1)
Translating “Transparency” to Practical, Concrete Suggestions for Engineers
Introduction
According to a 2019 study conducted by the Pew Research Center, most Americans believe that technology companies should be more transparent with how they collect and use personal data. In the past few years, journalists, government officials, academics, and tech workers have called on software companies and the state to work together to enact policies that enforce algorithmic transparency. Especially with recent revelations of Facebook’s internal research and the alleged negative impact of Instagram on the mental health of teenage girls, Congress has accelerated its search for legislation to require transparency from big tech.
Naive implementations of algorithmic transparency, however—such as requiring ranking algorithms and datasets to be open source—are not often realistic. Big tech is not like academia. For example, if Google were to open-source their ranking algorithms and data collection sources, they risk losing their competitive edge over other search engines. Malicious actors may also exploit code and data vulnerabilities. Furthermore, only a small subset of users would have the technical expertise, interest, and time to understand how the software works.
How do we solve the transparency problem if we cannot make our engineering processes entirely transparent? Make the algorithm simpler, Facebook whistleblower Frances Haugen suggests, so any user can understand it. “I’m a strong proponent of chronological ranking, ordering by time with a little bit of spam demotion,” she told the Senate in October 2021.
However, as a journalist at Wired points out, Facebook has presented an option for users to view posts in chronological order for some years now. There is a reason why no one uses the option. “I scrolled through 35 consecutive posts [by companies] before I hit my first entry by a human,” Wired reporter Brian Barrett writes after testing chronological ranking. The first human post “happened to be from a person I don’t remember sharing a video of Elvis impersonators.”
Likewise, forcing software corporations like Google to stop collecting user data would come at a cost to the quality of search results and recommendations. Search is meant to satisfy users’ information wants, presented in the form of explicit text queries or implicit requests for recommendations. Search engines and recommendation systems understand users’ information wants, in part, by sampling user data.
The purpose of this document is to provide real, practical ways for engineers working in a day-to-day setting to increase the transparency of their software products. We acknowledge that non-transparent software presents real social maladies, but opinion pieces in the news do not present the most comprehensive solution space, if they present any real solution at all.
Many of the suggestions presented in this document have been implemented by tech companies. Many have been suggested by social scientists and think tanks. But no single company has implemented all of these suggestions. We hope that these suggestions will give engineers a sense of choice between alternatives, and a sense of optimism that they do have the power in their day-to-day work to build more transparent software.
Why Transparency?
Transparency is a quality of the user experience. It consists of at least two components: the user should feel that they understand how the software makes decisions for them, and they should feel that they have the ability to control the software’s decision making process.
Transparency is most often discussed in the context of ranking algorithms in information retrieval. These are the central components of search engines, recommendation systems, and news feeds. However, other decision making systems might be considered too, from automatic hiring software to risk assessment for insurance pricing. In this document, we mainly discuss ranking algorithms, but expect that many of the proposed solutions could be applied to other kinds of software.
There are at least four separate reasons for why transparency is not desirable from the perspective of software companies:
As discussed in the Introduction, companies can lose their competitive edge in the market if their engineering processes, code, or data are open-sourced.
Vulnerabilities in the system are made more apparent when the software is made transparent. Malicious actors can exploit transparency.
Creating new user experience features in favor of transparency requires more engineering resources. Many software companies, especially smaller ones, do not have the resources to develop these features.
Levels of user trust are difficult to quantify. Unlike latency and resource utilization, it is not easy to directly evaluate product features for their effects on levels of user trust.
However, we believe that the costs of developing transparent software can be mitigated, and the rewards for developing transparency are worth considering, even from the corporate perspective, for the following reasons:
Transparent software features increase levels of user trust in the product.
Enabling the user to adjust how the software makes decisions can, in many cases, improve the software’s decision-making capabilities. This is most apparent in machine learning-based ranking algorithms, which are able to learn from mistakes.
Users who trust their tools are more likely to use them again. This translates into more profit and business value generated by the product.
Least convincingly (from the corporate perspective), giving the user power and understanding over the machines that they use is ethically the “right” thing to do.
Since transparency is an aspect of the user’s subjective experience, engineers have only partial control over the transparency of their applications. Transparency is at least as much of a marketing problem as it is a product design problem. Users gain their trust from separate sources, so it is also important for separate organizations to collaborate on building transparent technologies. In the following sections, we tackle these three solution spaces — product design, collaboration, and marketing — separately.
Read more
Introduction (Part 1)
Product Design (Part 2)
Collaboration Initiatives (Part 3)
To be continued…