1. Introduction
The initial bargain social media offered us was the magic of frictionless connection in exchange for a few relatively unobtrusive ads. For the first time, we didn’t have to pick up a phone to tell our friends or family how we were doing because we could update everyone we knew about our triumphs and disappointments all at the same time. The appeal of social media wasn’t just about broadcasting your life—you also could witness your friend’s most important moments no matter where they were happening in the world. And in exchange for all this human connection, all we were asked to do was scroll past a few ads selling us things we were probably interested in anyway. But with the rise of hashtags and newsfeeds, it became clear that the future of social media wasn’t just about maintaining offline friendships. Social platforms could also be places where activists organized and social movements arose. After social media played a central role in the Arab Spring in 2011, its potential to give rise to new political and social causes and allow new ideas to gain traction became a key part of social media platforms’ offerings. Twitter and Facebook in particular leaned on the potential of their platforms to facilitate civic organizing to portray their businesses to investors not only as tools to keep up with family and friends but as platforms that could democratize access to political power.
This feel-good mission was highly motivating for employees and investors, and it also helped social media companies manage hard economic realities. Almost from the beginning, it was clear that simply allowing distant families and friends to stay in touch was not a viable business strategy. As Mark Zuckerberg once memorably noted, Facebook makes money by selling ads (as all social media does). More accurately, the actual product that Facebook and all other social media companies sell is their users’ attention. The influx of new forms of content—first civic causes and later gaming and entertainment—meant that social media platforms had more and more ways to hold users’ attention for longer and to engage them more actively. The initial period of growth for social media was fueled by adding new users, but by the middle of the 2010s platforms needed to find ways to earn more money per user. Different platforms monetize their users in different ways, so this common goal led to diverging paths across companies. Video-focused platforms, including YouTube and TikTok, typically (although not exclusively) sell ads on a per view basis, meaning they charge advertisers based on the number of users an ad is shown to. Other platforms, including Twitter (now X) and Facebook, more commonly sell performance ads and charge based on actions that the advertiser wants the user to take, such as clicking a link or buying a product from an ad. Depending on how a company monetizes its users, its priorities might vary: It might be more important to keep users on the platform for longer or instead to get them to use the platform in a more interactive way, for example by clicking more links or engaging with more content.
To deliver on these diverse goals, all major social media platforms have come to rely on the same basic tool: algorithmic content feeds. Companies tune their algorithmic feeds to maximize certain outcomes—such as user session length, user engagement, and return visits per day—because these metrics represent the user attention they can sell to advertisers. And here lies the rub: In a very literal sense, content feed algorithms are optimized to be good for advertisers first and users second.
Algorithmic feeds have also become the focus of significant public concern. Parents worry about the impact of infinite scroll on their kids, politicians worry about the potential for political bias on foreign-owned platforms, and many people worry about the seemingly addictive nature of feed algorithms. Even for users who don’t think their social media usage is problematic, the feeling of being manipulated by a content algorithm can simply feel bad. Of course, technology that is both tremendously useful but also sometimes dangerous is nothing new. In the last century, we saw the rise of society-changing technologies in the form of cars, trains, and planes and had to build a system of institutions, rituals, and incentives that have helped us steadily drive down the fatality rates associated with using all three. Of course, social media lack the very immediate physical risks that these other technologies have. Like vehicles though, social media feeds are complex systems with risk profiles that can be dramatically altered by changes to minor components which are invisible to end users. These factors posed particular challenges to reducing and mitigating risks associated with vehicles, and these are factors that algorithmic feeds also possess.
If you were to see a Ford Model T in front of you today, you would immediately recognize it as a car. Both a Model T and a Tesla Model X have wheels, windshields, steering wheels, and brake pedals. They are largely the same shape and size, and if you opened your eyes and found yourself in the passenger seat of a Model T rolling down the street, you would feel the rumble of the engine through the seats and feel the breeze on your fingers when you stuck your hand out of the window. But while the window you rolled down might look very similar in both cars, the glass used to make the windows is very different—and that difference could end your life if something went wrong. Tempered glass was not a mandatory feature in cars for nearly 30 years after the first Model T was sold. As a result, when a car window broke, the glass would break into long, sharp shards that often seriously injured or even killed the car’s occupants, instead of dissolving into the small, relatively safe pieces of today’s broken tempered glass. The progress we have seen in automotive safety did not happen overnight or in a straight line. In the early 1960s, for example, fatalities per vehicle miles traveled rose notably due to a number of factors, including the increased weight of cars and the growing prominence of interior styling choices, like chrome over fabric, that made injuries more severe [49]. No one intentionally set out to harm consumers—but many small choices made to drive sales translated in aggregate to harms that cost thousands of people their lives each year [79].
Today, we know to ask which safety features are available in a car or how a car performs in a crash. This is in part thanks to the work done in the mid-1960s to educate legislators and the public that policy choices existed that could influence how dangerous cars are. Citizens responded by pressuring Congress to pass laws that led to the formation of the National Transportation Safety Board. This agency standardized practices for testing new vehicles and communicating to the public the meaningful safety differences between different car models. In the following decades, these objective measuring sticks and consistent methods for communicating the choices consumers had available to them created a virtuous cycle that fueled innovation in a wide range of safety technologies.
No similar set of measuring sticks or communication methods for consumers yet exists for social media. One reason for this is simple: These technologies are new. But there is a larger problem. With cars, independent agencies can crash-test vehicles or collect data on fatalities, but currently no one outside of the major social media companies can collect meaningful outcome data and understand what makes each platform or product different from the rest. We can observe that platforms are different from one another, and observers speculate that some platforms may create different harms than others, but measuring either of these is currently not possible. This lack of measurability creates a vacuum of information that prevents standard market forces from driving safety improvements of these products. Consumers are unaware that the same mechanisms that make our feeds more compelling or pleasurable today than they were in 2010—like the ability to intuit what we’re “interested” in and then provide us with content that enables a “sticky” experience that’s difficult to walk away from—may also introduce significant systemic risks into society that are invisible to the casual observer.
Feed algorithms may not have literal engines or brakes, but they do have components that make them move content faster or slower, and they even have systems that allow them to pivot to new content areas more nimbly. The effects of these components are noticeable to users—how many times have you heard your friends talk about how quickly TikTok “figures them out” or say they have “poisoned” their YouTube algorithm by watching a few out-of-character videos? Social media companies, however, aren’t very clear with users about these characteristics of their algorithms and don’t always tell users when they make major changes to them. It may seem that social media platforms are completely different from one another, but, technically, just as a Chevy Suburban and a Fiat 500 both contain the same basic components, so too do the Reddit feed and TikTok’s For You page.
In this paper, we will describe all the core components that make algorithmic feeds what they are and propose a classification framework for these systems. We will analogize the points of comparison to the characteristics of cars to help explain what the different features actually do and how they work together. We will also use the framework to develop “window stickers” for five important—and very different—social media platforms. We call these visualizations Feed Cards because, like model cards, they are designed to convey the most important pieces of information about these systems to help the public quickly understand the key aspects of feed algorithms.
The goal of this work is to give everyone—not just computer scientists—an understanding of content feed algorithms. It is in the interest of tech companies to portray their algorithms as vast, complex systems that they themselves can barely control or understand, as if they were great beasts roaming the land that they have learned to steer but could never tame. This portrayal allows them to both claim credit when systems do things we like and dodge responsibility when they cause harm. It is also fundamentally inaccurate. In fact, as we will show, there are some broad similarities across feed algorithms, as well as differences that are neither hard to describe nor to understand.
We believe that social media companies owe the public a clear explanation of how their products work—one that is understandable to users—and that they need to make it clear when their systems are changing. Some companies, notably Meta, do try to do some of this in their quarterly transparency reports.
But until we have better mechanisms for obligating them to fulfill this responsibility, we hope that work like ours can contribute to building a more specific common language and framework for studying and debating about these systems. As content feeds come under greater scrutiny, legislators and policymakers need to know enough about algorithmic feeds to know what proposed legislation to regulate them would do. Judges and lawyers need to better understand the facts of the systems at the core of pressing legal matters. And regular users, some of whom feel manipulated by content feeds, want to better understand how content shows up in their feeds so that they can make more informed decisions about how they use these platforms.A better informed public debate, and more transparency into the workings of social media systems, are the first steps needed for a more productive dialogue about how to make social media systems better and safer for their users.
2. Algorithmic Feed System Design
Before we describe how algorithmic feeds can differ, we will first describe the commonalities in how these systems are designed to lay the groundwork for our comparison framework.
2.1 Feed recommendation pipeline
At a high level, social media content feeds must select content from a large pool of possible content items and ultimately order them in a sequence to show to the user. All the content feed algorithms we will review use a similar three-stage pipeline approach: first selecting candidate content, then ranking those candidates according to how well a model believes they will perform on some set of objectives, and finally assembling a feed of content based on the content’s ranking scores and other factors, such as the platform’s business goals and higher-level goals for the overall mixture of content, as shown in Figure 1. We discuss each stage below.
Inventory selection. The initial stage of feed recommendation is selecting the content that may be shown to the user from the very large pool of items the platform hosts. The size of this pool can vary wildly from hundreds to hundreds of thousands of items.
Fig. 1. Social media feed recommender system
Sometimes, a piece of content is eligible because the user has explicitly asked to see content from that particular author or category: Facebook describes its inventory selection process in this way [41]. In X’s For You feed, only 50 percent of inventory comes from sources the user has opted in to see, with the rest coming from unconnected sources that are indirectly related to the user, such as content that a friend interacted with (but did not themselves create) [8-]. In other cases, content is eligible to be shown because of implicit factors the platform has observed about the user, such as being similar to other content the user has watched, as is the case on TikTok [71].
Candidate ranking. Even after the inventory selection process, users still typically have much more content that is eligible to be shown to them than they will have time to view. In the candidate ranking stage, content chosen in the inventory selection stage receives a ranking score. This score is the output of an algorithm that encapsulates the platform’s core goals for what content it will prioritize. The formula may combine predictions of outcomes that might occur if the user is shown the content—such as liking or commenting on the content—with evaluations of different content characteristics. For example, YouTube’s ranking algorithm attempts to maximize for “valued watchtime” [18]. This means that for every video in a user’s inventory, YouTube makes a prediction of how long a user will watch that video, which it combines with a judgement of how much the user will “value” watching the video. A user’s general geography is commonly used as a ranking feature as well. Reddit, for example, says: “In addition, we use your selection from the Location Customization setting to serve geographically relevant content and recommendations” [56].
Ranking features are not always specific to the user. For example, some ranking algorithms use the reputation of the content creator as a positive or negative signal [73]. Additionally, Facebook’s algorithm is known to take into account the predicted downstream engagement by other users that might result from the possible resharing if the user is shown content—in addition to any engagement that might occur immediately [22]. In some cases, algorithm goals relate to content creators rather than viewers. In the case of TikTok, the ranking algorithm is sometimes selectively used to ensure that a creator gets a minimum number of views to encourage continued content creation and engagement with the platform [4]. We discuss these and other aspects of ranking algorithms in depth in section 3.3.
Feed assembly. Once each candidate has been ranked, a user’s entire feed is developed. While content is generally ordered per its ranking score, adjustments are made to achieve goals such as showing users content from a diversity of creators and on a variety of topics (rather than showing users several content items from the same user back to back) [41]. Also, some content is inserted to achieve business goals. Advertising is the most common, but platforms also sometimes insert platform-created content, such as voting information, or they may at this stage “heat” content from accounts they seek to make more popular. According to leaked TikTok documents, “[t]he heating feature refers to boosting videos into the For You feed through operation intervention to achieve a certain number of video views” [4].
2.2 Product design goals and limitations
Business goals for feed recommendation algorithms. All major social media platforms we study are operated by for-profit companies which, as a high-level goal, seek to maximize revenue while minimizing costs. These platforms currently pursue advertising as a primary (but not exclusive) source of revenue [72, 76]. This ad-driven business model creates specific constraints and incentives for platforms beyond simply maximizing user preferences and adds advertisers to the growing list of stakeholders for these multi-stakeholder systems. For example, social media companies must maintain an acceptable brand reputation so that advertisers remain willing to run ads on their platform [68]. However, not all social media advertising is sold in the same way.
Digital advertising is often grouped into two broad categories: brand advertising and performance advertising [59]. Performance advertising seeks to drive a specific action by the consumer, typically a purchase. Brand advertising seeks to increase consumers’ awareness of a brand and its products so that they might make a purchase later. Because users are much less likely to immediately click on ads that play on video-focused platforms, it is harder to measure the impact of ads on these platforms. Despite attempts to build better customer tracking [55], YouTube is still primarily seen as a platform for brand advertising. By contrast, Facebook is widely seen as a platform for performance advertising [70] and has built its ad network around metrics tracked by performance advertisers [43]. The needs of these different monetization models are visible in these companies’ respective feed algorithms, in ways we will discuss in the next section.
Proxy outcomes as substitutes for desired outcomes. Feed algorithms’ stated goals as described to users are sometimes difficult to directly measure. For example, a commonly stated goal is to serve users what they “find interesting” [56]. But how can algorithms learn to predict what content each user would say meets this criteria? Apps could ask users if they liked something directly after each piece of content is displayed, but we are not aware of any platform that does this. Some apps conduct user studies to do the above in limited circumstances [18]. However, in practice, companies typically use proxy outcomes that they believe are related to the stated goal of serving users content they enjoy. As engineers from YouTube have stated, “It is important to emphasize that recommendation often involves solving a surrogate problem and transferring the result to a particular context” [11]. For instance, many platforms use user engagement with content (i.e., likes, comments, or reshares) as a proxy for content users enjoy seeing. Another common proxy goal is user time spent on the platform, in the form of session length or total time spent over a week or month. TikTok uses this proxy goal. Its interface—which shows one piece of content at a time and doesn’t move to the next one until the user swipes—is designed to ensure that users always provide feedback about how long they are willing to watch each video, giving the algorithm extremely granular information about what content serves this goal.
Non-content recommendations drive user feature discovery. Systems that employ feed recommendation algorithms will also sometimes recommend other users to follow, topics, or groups as a mechanism to prompt users to explicitly give the recommendation algorithm more data about the content they would watch or engage with.
Table 1. Taxonomy Categories Overview
Inventory |
Ranking |
Optimization Objectives |
Directly Connected |
Content-based |
Usage Intensity |
Indirectly Connected |
User-based |
Timeliness |
Unconnected |
|
Novelty Specificity |
Such priming is particularly important for platforms whose recommendation algorithms rely heavily on user-based, or social, features because such algorithms are prone to cold-start problems, in which the algorithm might not have enough information early in its use to work well [56, 71].
3. Social Media Feed Algorithm Comparison Framework
In this section, we propose a framework for comparing recommendation algorithms for social media content feeds by analogizing key characteristics of algorithms to features of cars. We focus our framework on areas where the differences between the platforms we study create different outcomes for users by evaluating three key elements: the pools of inventory a user’s feed is sourced from, the data features that are extracted from content and users, and the optimization characteristics of the ranking algorithms that use those data features to prioritize content. We are not aware of any rigorous research on the safety or security profiles of algorithmic feeds or their major features. However, where appropriate, we will note areas where safety concerns exist and where further research to validate or refute those concerns is warranted.
3.1 Inventory selection: the engine
There are three pools of content that the inventory for a particular user’s feed is drawn from:
Directly connected: Some social media feed algorithms prioritize content from accounts and sources that the user has affirmatively signaled they are interested in. On platforms such as YouTube, this takes the form of a user subscribing to another account [18], and all platforms we study have some form of this signaling. Directly connected content is the most scarce but also has a great deal of certainty attached to it.
Indirectly connected: Many platforms, including X and Facebook, source content that accounts the user has indicated an interest in have themselves engaged with [41, 75]. For example, instead of seeing content from a user they have followed, a Facebook user might see content their friends have commented on or liked.
Unconnected: Content that a user has not directly indicated an interest in, and that platforms have not inferred an interest in, is also used to some degree by all platforms we study. TikTok, for example, includes this kind of content as a way of increasing novelty and discovering new user interests [71].
These categorizations are already widely used by platforms in their transparency efforts when describing algorithmic feed construction for the general public [41, 74]. Connected content is a bit like a bicycle crank: inherently driven by the user and therefore limited in range by the user’s efforts. Indirectly connected content is still limited by the user, but it has more range—a bit like an e-bike with an electrically assisted crank. By contrast, a system that relies on unconnected content is more like a gasoline engine, because, while the ‘fuel’ for this engine is entirely external to the user, this lack of connection gives the system much more range and speed. Many of the safety concerns raised by critics relate to this unconnected content. This is because when algorithms have a higher mix of unconnected content, users see more content from outside their own social circle—content that they have never affirmatively signaled they wish to see that may have a higher likelihood of being unwanted or inappropriate for a wide range of reasons. Just like real vehicles, these motors that power algorithmic feeds can be (and often are) hybrid systems that use connected and unconnected inventory together.
3.2 Features for feed algorithms: fuel types
Feed algorithms are typically described as relying on either content-based features [54] or user network-based features (typically referred to in the academic literature as “collaborative filtering”) [17]. Significant research has focused on the impacts on spread of some of the user network-based features, in particular as vectors for “online social contagion” that can spread content quickly within a social group [16, 65]. Less research has been done on content-based features, but some work has focused on the political and factual dimensions of content and its authors, finding that more extreme partisan and less factual content received more user engagement [14].
Stefano Grassi/RIVA Illustrations
Building on this distinction, we develop two categories of features for feed algorithms: content-based features and user-based features. Content-based features can be the content itself (such as the text of a post) or metadata directly associated with the content, such as the length of the text. User-based features can contain information about one or more users, including the relationships between users. It is likely obvious that user-based features are inferred about the content consumer, but they are also used for the content creator. (Here, when we refer to content creator, we simply mean the user who created the content—not the increasingly more common job title of “content creator.”) Certain features simply don’t work with some kinds of systems. For example, TikTok has no sense of connected content because, while it lets users friend other users, it doesn’t use features related to that social graph as part of its For You algorithm.
User-based and content-based features aren’t mutually exclusive and are often used together. For example, a user’s inferred topic interests may be used in conjunction with a piece of content’s inferred topic. Unlike our classifications for inventory sources, this classification of features is not exhaustive; instead, our goal has been to create a representative list. We expect that new data sources will regularly be created and new types of data source features will be developed over time.
Content-based feature examples
Content topics: Small machine learning models called content classifiers are frequently used to identify content topics [71]. These can also be inferred from hashtags, which are often included in the content text [13].
Content language: The language of the content is sometimes determined based on the content itself, but this is not always possible. This can also be inferred from the creator’s language, which may have been directly provided or inferred from previously created content [18, 56, 71].
Content named entities: Named entities can be people, companies, or well-known locations. These are sometimes topics but are often considered separately because they are just as often context that frames another topic as they are the topic itself [60].
User-based features
Social graph network: A user’s network is the collection of either one-way social ties, such as a fan following a celebrity or a local business’s account, or two-way social ties, such as two friends who follow each other [41, 56]. All these ties together form a social graph.
Social graph user behavior: The app creates a smaller social graph of ties to determine who a user interacts with the most. They are sometimes referred to as that user’s close friends and are tracked separately [60].
On-platform behavior: A user’s on-platform actions, including what videos they watch, which items they react to, and which items they comment on are commonly observed as an indirect measure of user interests [71].
Demographics: These include (relatively) immutable characteristics of an individual, including age and race. Income and family status can also be tracked or inferred [37].
3.3 Feed algorithm ranking characteristics
Content feed algorithms are fundamentally designed to solve an optimization problem: to identify the content that maximizes the value of a mathematical function that represents the overall goals of the algorithm. As we have discussed, the business model of social media companies is ad-driven. But in order to meet the needs of advertisers over the long term, platforms must serve several—sometimes competing—goals for other stakeholders as well, including maximizing usage over the short and long term and delivering enough value to content creators that they will continue to make inventory for other users to consume. The connection between these broad objectives and the recommendation of any particular item is indirect and difficult to measure. Instead, content feed algorithms are designed to balance several different optimization objectives to serve the overall goal of maximizing usage. Here, we compare each of these design characteristics to different features of vehicles. In the case of cars or trains, designers make different design choices depending on a vehicle’s purpose and operating conditions; designers of content feed algorithms adhere to similar principles.
We classify feed algorithm optimization goals along four dimensions: usage intensity, specificity, novelty, and content timeliness. Along each of these dimensions, an algorithm can be classed as high, medium, or low. We identified these dimensions through our review of documents describing and of the code implementing ranking functions.
Usage intensity: exposure to the elements
All the major social media platforms that we studied primarily optimize their recommendations for usage. However, not all forms of usage are equal, and there appears to be a trade-off between how intense usage is and how long it can be sustained. High intensity usage is very interactive, involving the user taking more frequent direct actions in relation to the content they are consuming, such as commenting on or reacting to posts. By contrast, low intensity usage is passive and involves the user simply watching content with little to no interactivity. Low intensity usage appears to be sustainable for longer periods of time, whereas high intensity usage creates more user reaction and engagement. This can create problems. As Mark Zuckerberg has said, “left unchecked, people will engage disproportionately with more sensationalist and provocative content” [83]. Shocking or extreme content can intensely engage users, but it is also easy to see why it can also trigger a user to end their session quickly.
Stefano Grassi/RIVA Illustrations
Notably, in 2012 when YouTube shifted their algorithm to maximize watch time, total video views and engagement decreased [18, 44]. After Facebook’s shift in 2018 to prioritizing high intensity usage, which it refers to as Meaningful Social Interaction (MSI), time spent on the platform decreased [10, 84]. This high intensity usage is best measured by user content interactions such as reactions, comments, or reshares, and low intensity usage is best measured by user session time. In practice, even platforms that strongly prioritize either high or low intensity usage incorporate elements of both into their ranking goals.
Returning to our car analogy, intensity can be thought of as how exposed a driver might be to the road and the environment. A high intensity algorithm might be like an open-topped race car, where the driver can feel every pebble on the road and the wind in their hair (and every drop of rain, too.) A low intensity algorithm is more like a Rolls Royce, where the driver is totally isolated from any discomfort (or sensation) of the ride.
Specificity: passenger capacity
Another key area of difference between the content feed algorithms of different social media platforms is the specificity of content they recommend to users. Some platforms focus on identifying content with low specificity that therefore has broad appeal and then showing that content to as many users as possible. Others focus more on showing users content that is highly specific to their interests or networks, even if that content is niche. This factor is sometimes referred to as “personalization” or a “filter bubble” when niche interests are particularly prioritized, and understanding the impact of this factor on users has been of significant interest to researchers [33–35, 52]. Algorithms that seek to prioritize more broadly appealing content do this by heavily penalizing predicted user outcomes that are negative, such as asking to see less of that type of content. Platforms that focus on more niche content risk occasionally showing users content that they do not like and are more tolerant of occasional negative reactions.
Stefano Grassi/RIVA Illustrations
On platforms with more user-based ranking features, more specific content is often more directly connected content. However, on platforms that rely on more content-based features, more specific content is niche content that the individual user is thought to prefer. Different content categories are known to be appealing to broader or more niche audiences (think cats vs. conspiracy theories), just as some users in social networks are more connected than others. Separate from what system builders intend, some content recommendation systems bias toward broad content because broad content receives more signals from collaborative features overall— because it is more broadly appealing, more people will watch the video or click a ‘like’ button. This problem usually referred to as a “popularity bias” [3]. Just as vehicles can be two-seaters or buses, content recommendation algorithms can similarly be designed to cater to either small or large audiences.
There is also a manifestation of specificity in the inequality of the distribution of total user views over content on a platform. A perfectly specific recommendation algorithm would recommend each piece of content to only one person, resulting in a perfectly flat distribution with each piece of content having the same number of views: one. A perfectly unspecific algorithm would broadcast a single piece of content to all users, resulting in a maximally unequal distribution, with that one piece of content having all possible user views and all other content having zero views. In practice, platforms that prioritize high-specificity content have flatter distributions of views over content, while platforms that prioritize low-specificity content instead focus views on a smaller portion of the distribution [32, 39].
Because specificity focuses (or diffuses) user attention over a larger or smaller pool of content, there is a relationship between how specific an algorithm can be and how much content moderation capacity a social media platform will require. To understand why this is the case, consider a social network that on a given day has 100 pieces of content produced and viewed by its 1,000 viewers. If all content had an equal number of views (10), then in order to review the content seen by 90 percent of viewers (ideally before it was shown to those viewers), content moderators would need to review 90 items. If, on the other hand, there were 10 favored pieces of content that were shown to all users, and the other 90 posts weren’t seen by anyone, only nine pieces of content would need to be reviewed in order to hit the same threshold of content seen by 90 percent of viewers.
Novelty: predictability & off-road capability
Platforms use several design features to learn about users’ interests, but they also insert a small amount of novel content that they have no reason to believe a user might be interested in as a strategy to learn new social connections and content interests. All feed recommendation systems engage in a high degree of novelty-seeking with new users about whom they have little information as a way of solving the “cold-start” problem [63]. Some platforms, most notably TikTok, also incorporate novelty as an ongoing part of the user experience, even for long-standing users [71][77]. This need to discover new user interests will sometimes come at the expense of showing them content that they might like more right now. Thus, even high-novelty platforms primarily show users familiar content, interspersed with moderate doses of novelty. We believe the best empirical measure of this dimension is the share of content in users’ feeds that they are known to have a strong affinity for, either because of a content feature or social connection.
Stefano Grassi/RIVA Illustrations
This can also be thought of as the ability of the algorithm to take the user to new or surprising places. Much as trains can only ride on rails and therefore can’t go anywhere other than where their tracks can take them, while ATVs can go off-road and even cross streams and rocky trails, different feed algorithms are more or less predictable as to where they will take their users.
Content timeliness: speed
All feed recommendation algorithms that we have studied show some preference for more recent content over older content, but some of these preferences are very slight compared to others. For example, both YouTube and TikTok appear to place relatively little weight on the timeliness of content when compared to X or Reddit, which heavily prioritize recent content [56, 60]. It appears that at least some preference for newer content is needed simply to counteract the fact that older content has received more user interaction and that recommendation systems therefore have more information about older content than they do about newer content.
Stefano Grassi/RIVA Illustrations
There are two primary empirical measurements of this outcome. One is the lifespan of content—the period of time it is actively being recommended in user feeds. On platforms that prioritize more recent content, the content lifespan will be shorter, while algorithms that are willing to recommend older content will result in more content getting views days, weeks, or even years after initial creation. The other relevant metric of timeliness is how quickly content accumulates user views after it is created, or the acceleration and speed of the algorithm.
More simply, content timeliness can be thought of as the speed of an algorithm. Just as some cars have bigger engines that allow them to travel at high speeds while others have tiny engines that struggle to keep up on the highway, different content feeds are designed to accelerate content at different paces. As with specificity, there is an inherent relationship between timeliness and moderation capacity, albeit a different one. The faster an algorithm accelerates content, the earlier in its lifecycle the content receives the bulk of the attention it will ever get. Feed algorithms that are relatively slower also leave more time to review content before items are shown to many users, whereas faster algorithms would require similarly paced review systems if they are to identify content that violates platform policies before it becomes widely seen.
4. Feed Cards
Taking inspiration from user-facing transparency mechanisms such as model cards [46] and privacy nutrition labels [30], we propose feed cards, a window sticker for algorithmic feeds. In designing feed cards, we are attempting to maximize for comparability between algorithms, one of the primary goals of our comparison framework. Our framework classifies an algorithmic feed’s inventory sources, ranking features, and ranking algorithms. Feed cards display inventory sources as a stacked bar chart, a form that has repeatedly been found to be interpretable by a wide audience for proportions [66]. We do not believe that exact quantification of ranking features is practical or informative for users, because the exact ratios of features are almost constantly in flux. Instead, we recommend listing the top three most important features in order of importance. Finally, our comparison framework classifies ranking algorithms as high, medium, or low along the four dimensions we define. On our prototype feed card, we represent these dimensions as a circular lollipop graph because these graphs have commonly been used to represent multivariate data in a way that allows for easy legibility for a general audience [62]. For some dimensions, platforms have not made data about their systems transparent enough for us to make an exact classification of high/medium/low. In these cases, we have shaded a larger range of the area around the lollipop to indicate that range of uncertainty.
Fig. 2. Prototype feed card for X (formerly Twitter)
Vehicle illustration: Stefano Grassi/RIVA Illustrations.
Feed card: Laura Edelson, Frances Haugen, & Damon McCoy
In Figure 2, we show an example feed card for X, based on the results from our case study in Appendix C. For inventory sources, unconnected content makes up 50 percent of inventory, and direct connected and indirect connected content make up 25 percent each. The top three ranking features are the social graph network, the social graph user behavior, and the consumer interests. For ranking goals, we classify X as high usage intensity, high content timeliness, medium novelty, and low specificity.
In Figures 3 through 5, we also present feed cards for TikTok, YouTube, Facebook, and Reddit. These feed cards are based on the sources we reviewed in section 3. Note that for platforms for which we do not know the exact breakdown of the feed composition, the “Feed Composition” bar reflects this by showing blurred lines between content categories. Sources for the feed cards for each social media platform are listed in Appendix D.
Fig. 3. Prototype feed card for TikTok
Vehicle illustration: Stefano Grassi/RIVA Illustrations.
Feed card: Laura Edelson, Frances Haugen, & Damon McCoy
Fig. 4. Prototype feed card for Reddit
Vehicle illustration: Stefano Grassi/RIVA Illustrations.
Feed card: Laura Edelson, Frances Haugen, & Damon McCoy
Fig. 5. Prototype feed card for Facebook
Vehicle illustration: Stefano Grassi/RIVA Illustrations.
Feed card: Laura Edelson, Frances Haugen, & Damon McCoy
Fig. 6. Prototype feed card for YouTube
Vehicle illustration: Stefano Grassi/RIVA Illustrations.
Feed card: Laura Edelson, Frances Haugen, & Damon McCoy
5. Discussion and Conclusions
It is clear to us from our extensive study of these five social media feed algorithms that the significant differences in design are primarily driven by differences in the companies’ business and monetization strategies. The algorithms companies develop are not neutral, nor are they fixed. For the most part, companies are trying to achieve product design goals through their algorithmic feeds that will maximize revenue over the short and long term by recruiting and retaining audiences and content creators and monetizing them.
Social media algorithms are machines built by human hands, made to capture human attention. The people who design and build them, and the companies that put them into the world, bear responsibility for what they do, both for good and ill. To be clear, we believe that social media has brought many good things to society. But it has also brought harm, particularly to its youngest users. It is this harm that has caused the greatest public outcry in recent years, as users have demanded safer systems and greater accountability.
Identifying and mitigating harmful consequences is a struggle that we have experienced with other new technologies. Improvements in automotive safety were also initially spurred on by tragedy, brought to the public’s attention by safety advocates like Ralph Nader [49]. When the public became aware of not only the number of deaths these products caused but of the fact that these deaths were caused by product designs that did not prioritize user safety, the demand for change became overwhelming [28, 36].
We do not believe that the people who build algorithmic feeds want to build products that harm their users. But at present, they have little incentive to prioritize factors that could mitigate harms, much less to build entirely new kinds of feeds aimed at other goals. Changing this calculus requires equipping the public with better information about how social media systems work. It is our hope that independent research like this paper, combined with efforts to require more transparency from social media companies, will lead to more successful efforts to hold companies to account for the design choices they make, and to building social media tools that put users first.
References
- 2024. Qualitative Document Review Artifacts. https://osf.io/437vh/?view_only=7780931ede0947aa834f1858d96aa40d.
- Himan Abdollahpouri and Robin Burke. 2019. Multi-stakeholder recommendation and its connection to multi-sided fairness. arXiv preprint arXiv:1907.13158 (2019).
- Himan Abdollahpouri, Robin Burke, and Bamshad Mobasher. 2017. Controlling popularity bias in learning-to-rank recommendation. In Proceedings of the eleventh ACM conference on recommender systems. 42–46.
- Emily Baker-White. 2023. TikTok’s Secret ‘Heating’ Button Can Make Anyone Go Viral. https://www.forbes.com/sites/emilybaker-white/2023/01/20/tiktoks-secret-heating-button-can-make-anyone-go-viral/.
- Robert M. Bell and Yehuda Koren. 2007. Lessons from the Netflix prize challenge. ACM SIGKDD Explorations Newsletter 9, 2 (2007), 75–79.
- James Bennett, Stan Lanning, et al. 2007. The Netflix prize. In Proceedings of KDD cup and workshop, Vol. 2007. New York, 35.
- Paul Bouchaud, David Chavalarias, and Maziyar Panahi. 2023. Crowdsourced audit of Twitter’s recommender systems. Scientific Reports 13, 1 (2023), 16815.
- Glenn A. Bowen. 2009. Document analysis as a qualitative research method. Qualitative research journal 9, 2 (2009), 27–40.
- Jean Burgess and Joshua Green. 2018. YouTube: Online video and participatory culture. Polity Press. 1–5 pages.
- Josh Constantine. 2018. Facebook’s U.S. user count declines as it prioritizes well-being. https://techcrunch.com/2018/01/31/facebook-time-spent/.
- Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for Youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191–198.
- Michael A. DeVito. 2017. From editors to algorithms: A values-based approach to understanding story selection in the Facebook news feed. Digital journalism 5, 6 (2017), 753–773.
- Hayley Dorney. 2023. The dos and don’ts of hashtags. https://web.archive.org/web/20240101081742/https://business.twitter.com/en/blog/the-dos-and-donts-of-hashtags.html.
- Laura Edelson, Minh-Kha Nguyen, Ian Goldstein, Oana Goga, Damon McCoy, and Tobias Lauinger. 2021. Understanding engagement with U.S. (mis)information news sources on Facebook. In Proceedings of the 21st ACM Internet Measurement Conference. ACM. https://doi.org/10.1145/3487552.3487859
- fbarchive.org. 2023. fbarchive. https://fbarchive.org/.
- Sharad Goel, Ashton Anderson, Jake Hofman, and Duncan J. Watts. 2015. The Structural Virality of Online Diffusion. Manag. Sci. (2015). https://doi.org/10.1287/MNSC.2015.2158
- David Goldberg, David Nichols, Brian Oki, and Douglas Terry. 1992. Using collaborative filtering to weave an information tapestry. Commun. ACM 35, 12 (1992), 61–70.
- Cristos Goodrow. 2021. On YouTube’s recommendation system. https://blog.youtube/inside-youtube/on-youtubes-recommendation-system/.
- Google. 2023. Google Scholar. https://scholar.google.com/.
- Furkan Gursoy and Ioannis A. Kakadiaris. 2022. System cards for AI-based decision-making for public policy. arXiv preprint arXiv:2203.04754 (2022).
- Ido Guy, Naama Zwerdling, Inbal Ronen, David Carmel, and Erel Uziel. 2010. Social media recommendation based on people and tags. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. 194–201.
- Keach Hagey and Jeff Horwitz. 2021. Facebook Tried to Make Its Platform a Healthier Place. It Got Angrier Instead. https://www.wsj.com/articles/facebook-algorithm-change-zuckerberg-11631654215.
- F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4 (2015), 1–19.
- Christopher M. Hoadley, Heng Xu, Joey J. Lee, and Mary Beth Rosson. 2010. Privacy as information access and illusory control: The case of the Facebook News Feed privacy outcry. Electronic commerce research and applications 9, 1 (2010), 50–60.
- Jeff Horwitz. 2021. Facebook Files. https://www.wsj.com/articles/the-facebook-files-11631713039.
- Jeff Horwitz. 2023. Broken Code: Inside Facebook and the Fight to Expose Its Harmful Secrets. Knopf Doubleday Publishing Group. 336 pages. https://books.google.com/books?id=-bSmEAAAQBAJ
- Akshay Java, Xiaodan Song, Tim Finin, and Belle Tseng. 2007. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis. 56–65.
- Christopher Jensen. 2015. 50 Years Ago, ‘Unsafe at Any Speed’ Shook the Auto World. https://www.nytimes.com/2015/11/27/automobiles/50-years-ago-unsafe-at-any-speed-shook-the-auto-world.html.
- Jussi Karlgren. 1990. An algebra for recommendations: Using reader data as a basis for measuring document proximity. https://jussikarlgren.wordpress.com/wp-content/uploads/1990/09/algebrawp.pdf.
- Patrick Gage Kelley, Joanna Bresee, Lorrie Faith Cranor, and Robert W. Reeder. 2009. A" nutrition label" for privacy. In Proceedings of the 5th Symposium on Usable Privacy and Security. 1–12.
- Bart P. Knijnenburg, Martijn C. Willemsen, Zeno Gantner, Hakan Soncu, and Chris Newell. 2012. Explaining the user experience of recommender systems. User modeling and user-adapted interaction 22 (2012), 441–504.
- Tomo Lazovich, Luca Belli, Aaron Gonzales, Amanda Bower, Uthaipon Tantipongpipat, Kristian Lum, Ferenc Huszar, and Rumman Chowdhury. 2022. Measuring disparate outcomes of content recommendation algorithms with distributional inequality metrics. Patterns 3, 8 (2022).
- Erwan Le Merrer, Gilles Tredan, and Ali Yesilkanat. 2023. Modeling rabbit-holes on YouTube. Social Network Analysis and Mining 13, 1 (2023), 100.
- Mark Ledwich and Anna Zaitsev. 2019. Algorithmic extremism: Examining YouTube’s rabbit hole of radicalization. arXiv preprint arXiv:1912.11211 (2019).
- Mark Ledwich, Anna Zaitsev, and Anton Laukemper. 2022. Radical bubbles on YouTube? Revisiting algorithmic extremism with personalised recommendations. First Monday (2022).
- Matthew T. Lee. 1998. The Ford Pinto case and the development of auto safety regulations, 1893—1978. Business and Economic History (1998), 390–401.
- Meiying Li, Jinyi Yao, and Bradley Ray Green. 2021. US Patent 11,163,843 B1: Systems and Methods for Recommending Content. https://imageppubs.uspto.gov/dirsearch-public/print/downloadPdf/11163843.
- Nora McDonald, Sarita Schoenebeck, and Andrea Forte. 2019. Reliability and Inter-Rater Reliability in Qualitative Research: Norms and Guidelines for CSCW and HCI Practice. Proc. ACM Hum.-Comput. Interact. 3, CSCW, Article 72 (Nov. 2019), 23 pages. https://doi.org/10.1145/3359174
- Ryan McGrady, Kevin Zheng, Rebecca Curran, Jason Baumgartner, and Ethan Zuckerman. 2023. Dialing for Videos: A Random Sample of YouTube. Journal of Quantitative Description: Digital Media 3 (2023).
- Miller McPherson, Lynn Smith-Lovin, and James M. Cook. 2001. Birds of a feather: Homophily in social networks. Annual review of sociology 27, 1 (2001), 415–444.
- Meta. 2021. How machine learning powers Facebook’s News Feed ranking algorithm. https://engineering.fb.com/2021/01/26/core-infra/news-feedranking/.
- Meta. 2022. What is the Instagram Feed? https://ai.meta.com/tools/system-cards/instagram-feed-ranking/.
- Meta. 2023. Maximize your performance. https://www.facebook.com/business/ads/performance-marketing/.
- Eric Meyerson. 2012. YouTube Now: Why We Focus on Watch Time. https://blog.youtube/news-and-events/youtube-now-why-we-focus-onwatch-time/.
- Silvia Milano, Mariarosaria Taddeo, and Luciano Floridi. 2021. Ethical aspects of multi-stakeholder recommendation systems. The information society 37, 1 (2021), 35–45.
- Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency. 220–229.
- Sudip Mittal, Neha Gupta, Prateek Dewan, and Ponnurangam Kumaraguru. 2013. The pin-bang theory: Discovering the Pinterest world. arXiv preprint arXiv:1307.4952 (2013).
- Haradhan Kumar Mohajan. 2018. Qualitative research methodology in social sciences and related subjects. Journal of economic development, environment and people 7, 1 (2018), 23–48.
- Ralph Nader. 1965. Unsafe at Any Speed: The Designed-in Dangers of the American Automobile. Grossman. https://books.google.com/books?id=iCckAAAAMAAJ
- Arvind Narayanan. 2023. Understanding Social Media Recommendation Algorithms. Knight First Amendment Institute. (2023). https://knightcolumbia.org/content/understanding-social-media-recommendation-algorithms
- Arvind Narayanan. 2023. Twitter showed us its algorithm. What does it tell us? Knight First Amendment Institute. Retrieved April15 (2023), 2023.
- Derek O’Callaghan, Derek Greene, Maura Conway, Joe Carthy, and Pádraig Cunningham. 2015. Down the (white) rabbit hole: The extreme right and online recommender systems. Social Science Computer Review 33, 4 (2015), 459–478.
- United States Patent and Trademark Office. 2023. Patent Public Search. https://ppubs.uspto.gov/pubwebapp/static/pages/landing.html.
- Michael J. Pazzani and Daniel Billsus. 2007. Content-based recommendation systems. In The adaptive web: methods and strategies of web personalization. Springer, 325–341.
- Ekaterina Petrova. 2018. 4 questions for YouTube’s performance ads product lead. https://www.thinkwithgoogle.com/marketing-strategies/video/ performance-video-ads/.
- Reddit. 2023. How Reddit Personalizes Content and Community Recommendations. https://support.reddithelp.com/hc/en-us/articles/360056999452How-Reddit-Personalizes-Content-and-Community-Recommendations.
- Elaine Rich. 1979. User modeling via stereotypes. Cognitive science 3, 4 (1979), 329–354.
- Gary Rivlin. 2006. Wallflower at the Web Party. New York Times, https://www.nytimes.com/2006/10/15/business/yourmoney/15friend.html
- Joel Rubinson. 2019. Framework for managing brand vs performance advertising. https://blog.joelrubinson.net/2019/04/framework-for-managingbrand-vs-performance-advertising/.
- Venu Satuluri, Yao Wu, Xun Zheng, Yilei Qian, Brian Wichers, Qieyun Dai, Gui Ming Tang, Jerry Jiang, and Jimmy Lin. 2020. SimClusters: Community-based representations for heterogeneous recommendations at Twitter. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 3183–3193.
- Sanchan Sahai Saxena, Sergey Markov, Fei Wang, Yi-Wei Wu, Ed Ignatius Tanghal Salvana, William Taube Schurman, and Youssef Ahres. 2022. US Patent 11,245,966 B2: Matching and Ranking Content Items. https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/11245966.
- Jonathan Schwabish. 2021. Better Data Visualizations: A Guide for Scholars, Researchers, and Wonks. Columbia University Press. 67–132 pages.
- Suvash Sedhain, Scott Sanner, Darius Braziunas, Lexing Xie, and Jordan Christensen. 2014. Social collaborative filtering for cold-start recommendations. In Proceedings of the 8th ACM Conference on Recommender Systems. 345–348.
- Aneesh Sharma, Jerry Jiang, Praveen Bommannavar, Brian Larson, and Jimmy Lin. 2016. GraphJet: Real-time content recommendations at Twitter. In Proceedings of the VLDB Endowment 9, 13 (2016), 1281–1292.
- Jieun Shin, Lian Jian, Kevin Driscoll, and François Bar. 2018. The diffusion of misinformation on social media: Temporal pattern, message, and source. Computers in Human Behavior (2018). https://doi.org/10.1016/J.CHB.2018.02.008
- Harri Siirtola. 2019. The Cost of Pie Charts. In 2019 23rd International Conference Information Visualisation (IV). 151–156. https://doi.org/10.1109/IV. 2019.00034
- Statista. 2024. Most popular social networks worldwide as of January 2024, ranked by number of monthly active users. https://web.archive.org/web/20240210093015/https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/
- Peter Suciu. 2023. X Has Become A Full-Fledged PR Disaster But It Won’t Go Away Anytime Soon. https://www.forbes.com/sites/petersuciu/2023/ 12/08/x-has-become-a-full-fledged-pr-disaster-but-it-wont-go-away-anytime-soon/?sh=7271071b3324.
- Özge Sürer, Robin Burke, and Edward C. Malthouse. 2018. Multistakeholder recommendation with provider constraints. In Proceedings of the 12th ACM Conference on Recommender Systems. 54–62.
- Adobe Communications team. 2022. The definitive guide to performance marketing. https://business.adobe.com/blog/basics/performance-marketing.
- TikTok. 2020. How TikTok recommends videos #ForYou. https://newsroom.tiktok.com/en-us/how-tiktok-recommends-videos-for-you.
- Twitter. 2022. About Twitter Blue. https://web.archive.org/web/20221226051236/https://help.twitter.com/en/using-twitter/twitter-blue.
- Twitter. 2023. Features Overview. https://github.com/twitter/the-algorithm-ml/blob/main/projects/home/recap/FEATURES.md.
- Twitter. 2023. Heavy Ranker. https://github.com/twitter/the-algorithm-ml/blob/b85210863f7a94efded0ef5c5ccf4ff42767876c/projects/home/recap/ README.md.
- Twitter. 2023. Twitter’s Recommendation Algorithm. https://web.archive.org/web/20250105183703/https://blog.x.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm.
- YouTube. 2023. YouTube and YouTube Music ad-free, offline, and in the background. https://www.youtube.com/premium.
- Karan Vombatkere, Sepehr Mousavi, Savvas Zannettou, Franziska Roesner, and Krishna P. Gummadi. 2024. TikTok and the Art of Personalization: Investigating Exploration and Exploitation on Social Media Feeds. arXiv preprint arXiv:2403.12410 (2024).
- Zhi Wang, Wenwu Zhu, Peng Cui, Lifeng Sun, and Shiqiang Yang. 2013. Social media recommendation. Social media retrieval (2013), 23–42.
- Collision Week. 2019. U.S. Roadway Fatalities Decline for Second Straight Year. https://collisionweek.com/2019/10/23/u-s-roadway-fatalitiesdecline-second-straight-year/.
- X. 2023. Twitter’s Recommendation Algorithm. https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendationalgorithm.
- Xiwang Yang, Yang Guo, Yong Liu, and Harald Steck. 2014. A survey of collaborative filtering based social recommender systems. Computer communications 41 (2014), 1–10.
- Hongzhi Yin and Bin Cui. 2016. Spatio-temporal recommendation in social media. Springer.
- Mark Zuckerberg. 2018. A blueprint for content governance and enforcement. https://web.archive.org/web/20181117034615/https://www.facebook.com/notes/mark-zuckerberg/a-blueprint-for-contentgovernance-and-enforcement/10156443129621634/.
- Mark Zuckerberg. 2018. https://www.facebook.com/zuck/posts/10104413015393571.
A. Background & Related Work
This section provides background information on the systems we study in this work and the tools of analysis and evaluation we use. We also describe related work to contextualize our transparency proposals for readers interested in this broader field.
A.1 Historical overview of content recommender systems
The first recommender system for digital content of which we are aware was described by Rich in 1979 [57]. This system was known as “Grundy” and recommended books based on their content and the profiles of users’ interests, or “stereotypes.” The system relied on having a relatively rich understanding of the content being recommended and at least a superficial level of understanding, in the form of user profiles, of user interests that might be matched to the characteristics of the content. This approach came to be known as “content filtering.” A primary limitation of content filtering was that the characteristics of the content must be known to be well-matched to the user profile, which proved to be a significant barrier in the early internet era, leading to the rise of other styles of recommendation systems.
The Tapestry email system, developed by Xerox PARC and described by Goldberg et al. in 1992 [17], was the first system to incorporate user feedback when recommending content to users. It did this by allowing users to tag messages with topics or even simple recommendations to read. Then, other users could filter messages based on user-provided tags. The term for this approach, “collaborative filtering,” was also introduced by Goldberg, although the concept had been proposed by Karlgren in 1990 [29]. MovieLens, a film recommendation website launched in 1997, was one of the first projects to use the collaborative filtering approach to make recommendations for a general audience on the internet and garnered significant public attention [23]. It was noted early on that users’ social networks displayed a high degree of homophily, providing a strong theoretical basis for this approach [40]. Collaborative filtering was further refined to include user behavior, in addition to directly-provided user feedback [81]. Another breakthrough in demonstrating the power of the collaborative filtering approach was the Netflix Prize, a competition that ran from 2006 to 2009 with a $1 million top prize for an algorithm that could improve on Netflix’s own system for recommending movies to its users. The prize spurred investment in the field by academic and industry researchers [5, 6].
A.2 Historical overview of social media networks
The first website we are aware of with many of the features commonly associated with social media—including the discovery of second-order contacts, profiles, and content sharing—was Friendster, which launched in 2002 [58]. Profiles and content shared on Friendster could only be viewed by a user’s “friends.” When Myspace launched the following year, its major innovation was that profiles and content were public, by default. However, users still used these and other early social media sites by navigating to other users’ pages, rather than consuming a curated selection of content.
A.3 Social media content feeds
In 2006, Facebook’s News Feed launched, giving users a constantly updating “feed” of their friends’ status updates and profile changes [24]. This feature heavily influenced the development of social media and became a standard feature associated with this category of product, and it relied on a content recommendation algorithm that could identify which items users were interested in seeing. Also in 2006, Twitter first launched and began to be used by attendees at public events in 2007 to discuss the event in real time [27]. The hashtag first became prominent on Twitter as a means to find relevant content and was commonly used to spread information at large public gatherings or about timely current events. Several social media networks around different media types were launched during the 2000s. YouTube, a video-sharing platform, launched in 2005 [9] and Pinterest, a photo-sharing site, launched in 2010 [47], among many others. By this point, content feeds had become a must-have feature of social media, and every social media network of which we are aware that launched after this point included a content feed feature.
A.4 Multi-stakeholder recommendation and evaluation frameworks
Algorithmic feeds for social media platforms are a multiple-stakeholder recommendation system in which consumers’ interests must be balanced with the interests of content creators, platforms, and advertisers. Knijnenburg et al. proposed a user-centric evaluation framework for recommender systems that went beyond accuracy in 2012 [31]. However, this framework is generally focused on recommender systems that receive direct user input and feedback. Recommender systems for user-generated content were first described as “multi-stakeholder recommendation systems” by Sürer et al. in 2018; they focused particularly on optimizing outcomes for content creators and consumers within a set of platform-specified constraints [69]. Abdollahpouri and Burke described the fairness considerations of multi-stakeholder systems [2]. Milano et al. explore the ethical considerations of multi-stakeholder recommendations using a consequentialist framework [45]. In our study, we build on this prior conceptual work to investigate real-world optimization goal tensions that arise due to multi-stakeholder challenges of algorithmic feeds.
A.5 Transparency tools for user-facing systems
Usability researchers have long sought to develop tools to increase understanding of systems on the part of their users. Kelley et al. develop a privacy “nutrition label” and demonstrate that this approach increases understanding on the part of a general audience [30]. More recently, model cards have been used to share information about specific models as machine learning systems have become more ubiquitous [46]. System cards, an extension of model cards, are a tool to share information about an entire AI-backed system that may be the product of several models [20]. Developed by researchers at Meta, the pilot application for system cards was Instagram’s content recommendation feed [42]. This implementation is highly specific to Instagram and is more focused on describing to consumers the aspects that contributed to content appearing in a user’s feed rather than describing the Instagram feed algorithm design. In contrast, our work enables an understanding of feed algorithm designs across apps.
A.6 Algorithmic feed systems analysis
There has been a progression of studies analyzing algorithmic feed systems. In 2010, Guy et al. compared the efficacy of content filtering, collaborative filtering, and hybrid approaches, finding that hybrid approaches performed best [21]. In 2013, Wang et al. provided an overview of current (at the time of writing) major social media platforms and their recommender systems, again through the prism of content vs. collaborative filtering. The authors reframe these approaches as interest-oriented vs. influence-oriented and also discuss hybrid options [78]. Yin and Cui describe the growing use of “spatio-temporal” features for recommendation [82].
DeVito used a narrower qualitative document review approach in 2017, which we employ as well, to understand story selection in the Facebook News Feed and identify the values that drive those selections [12]. Most similarly to us, Narayanan takes a holistic and descriptive view of social media feed recommendation algorithms but does not propose a taxonomy [50]. Finally, Bouchaud et al. perform a crowdsourced algorithm audit via browser extension-based data donation, with analysis informed by a review of X’s algorithm source code. [7] Our work differs from these in that we do not seek to specifically measure forms of algorithmic bias but instead attempt to describe feed algorithm designs as a whole.
B. Appendix: Methodology
B.1 Platform selection
We consulted a recent list of the most widely used social media platforms and selected YouTube, Facebook, and TikTok because they are the three largest social media platforms with algorithmic feeds by user count (removing Instagram and WhatsApp because of their corporate redundancy with Facebook [67]). We supplemented these platforms with X and Reddit, selecting these because of the depth of scholarly work on these two platforms and the current open source availability of X’s recommendation algorithm [74], thus enabling our case study and increasing relevance to other independent work.
B.2 Data collection
Social media platforms differ in their degree of openness about how their systems operate. Some lack basic user-facing documentation about how their feed algorithms work (TikTok), while one (X) has made the code for much of its recommendation algorithm public. To develop our comparison framework, we rely on six classes of primary sources: patents, source code, product documentation, official blogs, academic publications, and leaked documents. A full list of all documents we reviewed with a mapping to the framework components they informed is available in an online supplement at the Open Science Foundation archive [1]. We describe our methodology for the collection of each of these document types below.
Patents. We used the U.S. Patent and Trademark Office’s Public Patent Search to identify patents with details about feed recommendation inventory sources, feed recommendation algorithm features, or feed recommendation algorithms assigned to the five companies that are the focus of our review [53]. We used the “basic search” tool to search for patents with a combination of the assignee name of the corporate entities that operate the platforms that are the subject of our study and the term “recommendation.”
We performed this search on September 7, 2023 for the following assignee names: Facebook (n=751), Meta (n=148), Reddit (n=0), ByteDance (n=18), TikTok (n=2), Twitter (n=63), Alphabet (n=0), and Google (n=2,111). We reviewed the abstracts of all patents returned in this search that had been filed or updated in the last 10 years. We identified patents for further review by looking for mentions in titles or abstracts of data sources or features used for feed recommendation, or algorithms or techniques for feed recommendation.Source code. We searched for source code for the recommendation systems of the platforms we studied on GitHub by reviewing each platform’s repositories under listed organizational accounts. Reddit had a historical archive of its code base, including its search recommendation, but it has been archived since 2017 and is likely not representative of Reddit’s current feed recommendation algorithm. X made the source code for its feed recommendation algorithm public [75]. We reviewed this code as of January 1, 2024.
Product documentation. We searched the help sections of each app’s website for the terms “feed” and “recommendation” to identify pages with relevant information about feed recommendation data sources, features, or algorithms. These were sometimes aimed at helping users understand why videos were recommended to them and sometimes aimed at explaining to creators why their video may or may not be recommended; we considered these to be in scope.
Official blogs. All of these platforms maintain product and engineer blogs where they announce new features and details into how aspects of their platforms work. We identified relevant posts on these blogs by searching for the terms “feed” and “recommendation.”
Academic publications. We used Google Scholar [19] to identify academic publications, authored by people who worked for companies that own the apps we study, about components of their recommendation systems. We did this by searching for the platform name and the term “recommendation” and reviewing the first three pages of results for relevance and authorship matching our criteria.
Leaked documents. In recent years, document leaks and whistleblowers have become an important source of information about social media algorithms. We identified leaked documents by searching for platform names and the terms “whistleblower” or “leak” and reviewing the news stories and attached documents. In the case of Facebook, we make use of fbarchive [15], an archive of redacted versions of the Facebook Files [25].
B.3 Qualitative document analysis
Qualitative document analysis [8] as a research method has been growing in popularity, particularly in the social sciences [48]. We use this approach to formalize our study of the product documentation of Facebook, Reddit, TikTok, X, and YouTube.
We began our analysis with a broad review of the consumer-facing product documentation from these companies’ support websites and posts from these companies’ development or engineering blogs. After an initial review of these documents, we inductively developed the primary dimensions of our comparison framework. We then revisited each source document, identified key passages, and coded them for relevance to the primary dimensions of our comparison framework. Within these rounds of coding, discussions occurred frequently to rectify any coding disagreements. Following our codebook application, we conducted axial coding to create our higher-level framework themes, which we present in the results. We did not calculate inter-rater reliability, since our work focuses on forming higher-level themes rather than quantitative counts of codes [38]. Our codebook is available alongside the full list of source documents at the Open Science Foundation archive [1].
C. Appendix: X case study
X made parts of its recommender system open source [75], enabling review of some parts of its ranking algorithm. We also base our evaluation on other documentation that X has provided. For the purposes of this case study, we will focus on the default For You feed. While X has been the most transparent about its feed recommendation algorithm out of all the major social media platforms, there remain some areas of uncertainty where there is insufficient publicly available data to make a strict classification. In these cases, we will describe a range of uncertainty of possible values for a category.
C.1 Inventory sources
In its blog post announcing the open sourcing of its algorithm code, X addresses the question of inventory sources directly: “Today, the For You timeline consists of 50% In-Network Tweets and 50% Out-of-Network Tweets on average, though this may vary from user to user” [75]. This means that in total, 50 percent of content is unconnected and 50 percent is either indirectly or directly connected. Of “in-network” posts, no distinction is made between direct and indirect connections. For the purposes of our comparison framework, while the total of these two values is 50 percent, the range of uncertainty for each is 0-50 percent.
C.2 Ranking features
The ranking features that are used by X’s recommendation algorithm are well documented [73]. X has published blogs and academic papers about how it models social graph features, which are highly influential to its algorithm [60, 75]. X models a synthetic social graph of its users’ interactions that it calls “sim clusters.” This synthetic social graph is updated every three weeks to bias toward recent interactions. Several content-based features also describe a relationship to a “topic,” meaning that X additionally maintains a topic model for content. Sim clusters are used for recommending connected content, and a separate system called GraphJet [64] is used for recommending unconnected content. GraphJet “maintains a real-time interaction graph between users and Tweets” [75], incorporating social graph and user activity features.
As has been widely noted, especially in an analysis of X’s algorithm code by Narayanan [51], content (both posts and replies) from users who are X subscribers [72] gets an additional ranking boost, which we would classify as a“creator status feature.” Overall, the three most important features appear to be the social graph network, social graph user behavior, and consumer interests—all three of which are collaborative rather than content-based.
C.3 Algorithm optimization objectives
Usage intensity. X’s ranking algorithm for the For You feed is referred to as “Heavy Ranker” [74]. It incorporates elements of both low intensity (longer-duration) usage and high intensity (high engagement) usage, but high intensity objectives predominate. Only two of the ranking algorithm’s 10 model components are duration-focused: the probability the user will watch at least half of a post’s video and the probability the user will open the post thread and stay there for at least two minutes. These two components have a combined weighting of 10.005.
The remaining engagement-based factors—including the probability that the user will favorite, repost, or reply to the post, the probability that the post’s author will engage with the user’s reply, and the probability that the user opens the post author’s profile and likes or replies to a post—have an order of magnitude greater combined weight of 102. X also directly states that their algorithm is designed to maximize high intensity usage: “The most important component in ranking In-Network Tweets is Real Graph. Real Graph is a model which predicts the likelihood of engagement between two users. The higher the Real Graph score between you and the author of the Tweet, the more of their tweets we’ll include” [75]. For our purposes, all these factors mean that X is a predominantly high intensity recommender system.
Specificity. As we have discussed, optimization for specificity often manifests in how heavily the likelihood of a negative reaction is weighed. Heavy Ranker weighs the probability of positive and negative reactions from a user, giving us a window into how willing X’s algorithm is to expose users to niche content they might not like. The negative outcome of a user asking to see less of a post type is weighted as -74, and the outcome of a user reporting a post as violating terms of service is -369 [74]. On the other hand, various positive attributes including liking, reposting, or opening the post author’s profile and other such actions, have a combined weight of 34.5. It is difficult to compare this ratio of positive to negative weighing because we lack comparable numbers for other platforms.
Instead, to evaluate this dimension we can use empirical measurements of the distribution of views of content on a random sample of content on X in comparison to other platforms. Lazovich et al. perform such an analysis, calculating GINI coefficients on a per-user basis [32], finding that this number is always above 0.95. For comparison, we turn to an analysis of a large random sample of YouTube content performed by McGrady et al [39]. The authors did not compute the GINI coefficient of their sample, but by comparing other distributional measures, X appears to be significantly more imbalanced than YouTube. Ideally, we would have more than one comparison point but we are unaware of published analyses of random samples of other platforms. Nevertheless, given the high GINI coefficient of X and the relatively lower inequality on YouTube, we classify X as a low specificity appeal algorithm.
Novelty. As we have already discussed, users’ feeds are a mix of connected and unconnected content [75]. While unconnected content is filtered so that only content with at least a second-order relationship to the user is eligible to be shown, X describes this as a “quality safeguard” [75] intended to exclude spam content rather than as a mechanism to ensure familiarity. This novelty ratio appears balanced between the two extremes, and we class this as a medium novelty algorithm.
Content timeliness. Several features that are used by X’s ranking algorithm appear to be time-related, but none of its ranking goals explicitly factor in timeliness. Instead, timeliness appears to be a factor in X’s inventory selection process, where posts must have been recently recreated or have a recent interaction in order to be eligible to be displayed in a user’s feed. We additionally know that X considers a post’s shelf life to be best measured in hours (as opposed to days or weeks) based on an academic publication [60], so we therefore classify its algorithm as high timeliness.
© 2025, Laura Edelson, Frances Haugen, and Damon McCoy.
Cite as: Laura Edelson, Frances Haugen, and Damon McCoy, Into the Driver’s Seat With Social Media Content Feeds, 25-01 Knight First Amend. Inst. (Mar. 6, 2025), https://knightcolumbia.org/content/into-the-drivers-seat-with-social-media-content-feeds[https://perma.cc/CW48-PFC8].
https://transparency.meta.com/reports/
We experimented with other terms, including “algorithm” or “recommender,” but in practice we found these other terms did not return relevant documents that were also not returned by our initial search terms or returned so many non-relevant documents as to be impractical.
Laura Edelson is the chief technologist of the Antitrust Division of the Department of Justice.
Frances Haugen is an advocate for accountability and transparency in social media.
Damon McCoy is a professor at the Tandon School of Engineering at NYU.