Cybersecurity & Tech

Lost In Translation: Language Gaps in Social Media Labels

Samantha Bradshaw, Miles McCain
Friday, February 4, 2022, 8:01 AM

Platforms often fail to make warning labels accessible to users who do not speak English.

A person uses a translation app. (Mohamed Hassan, https://tinyurl.com/2p8e8d3n)

Published by The Lawfare Institute
in Cooperation With
Brookings

On Nov. 4, 2020, a QAnon adherent reposted a series of English-language conspiracies about the 2020 U.S. presidential election on Twitter. The tweets, which were in Spanish, reshared links to English-language misinformation about a U.S. Postal Service whistleblower in Michigan and debunked claims of vote-changing software. While Twitter applied an information label to the debunked voting software tweet, that label did not automatically translate to Spanish. Nor did the label translate when browsing Twitter with Spanish-language settings. This incident exemplifies the ways in which online platform policies can negatively impact non-English, minority or marginalized groups of users. It also raises an important question about how platform labels appear for people who browse the internet in languages other than English. 

Before the 2020 election, various platforms introduced new policies to combat the spread of misinformation over social media. One of these strategies involved creating new labeling for content that has been fact-checked as false or misleading. Major social media platforms, including Facebook, Twitter, YouTube and TikTok, have existing labeling policies for different kinds of content. Some platforms focus on “source labels” that provide users with more information about the content’s source, such as whether it is a state-funded broadcaster. Other labels focus on the veracity of the content itself, such as the labels Twitter applies to synthetic and manipulated media or content about vaccine safety. TikTok also applies labels to dangerous content, such as stunt videos or dangerous viral challenges, to discourage users from trying potentially dangerous stunts themselves.

Platforms hope that by contextualizing content with labels they can encourage users to engage more critically with the information they consume online. For sensitive content that does not necessarily violate platform policies, labeling is an attractive alternative to removal as it enables platforms to leave potentially sensitive content online while hopefully limiting its ill effects. In practice, however, research about the empirical effectiveness of content labels suggests mixed results.

Labeling Practices

Regarding the behavior of users, some research has shown that labeled content gets shared less on social media. But when it comes to affecting users’ beliefs, labeling can have mixed results. Some research argues that the timing of labels is important: If users have already been exposed to misinformation, seeing a “disputed” flag after the fact does not alter their original belief. Labeling information can also lead to what some scholars call an “implied truth effect,” where unlabeled false claims are more likely to be interpreted as true. However, labels must be noticeable and understandable to users to be effective, demonstrating the importance of placement in addition to timing. Thus, the broader implications of labeling practices are still a matter of dispute. Regardless, social media companies continue to employ labeling policies in response to sensitive content. 

When it comes to the practice of labeling content, however, the language diversity of content is sometimes overlooked. Social media companies operate on a global scale and therefore cannot assume that their users will understand English. Even among people in the United States, almost one in 10 speak English less than “very well.” Given the language diversity of social media users, we explored how information labels appear for users who browse the internet in languages other than English.

Non-English Content

We focused on three kinds of English-language labeled content—2020 election misinformation, vaccine misinformation, and content from state media organizations—in our audit of Facebook’s, Twitter’s, YouTube’s and TikTok’s labeling practices. We assessed how a sample of these labels, originally in English, appeared for users browsing the platforms in nine languages—Spanish, Indonesian, Portuguese, Hindi, Chinese, Arabic, French, Russian and Bengali. These are some of the most widely spoken languages online, representing significant geographic diversity and supported by the platforms we evaluated.

We hoped that platforms would translate the content labels into users’ preferred languages. Translated labels are easier to read and therefore more likely to be effective. And if the platforms did not translate the labels, we expected platforms to display them in English, rather than hiding them altogether. While some platforms translate labels quite well—TikTok, for example, translated the labels for every language we tested—other platforms do not.

Facebook

Facebook translated the election content label into all the languages we tested except for Spanish, Russian and Bengali. In these three languages, the label was only partially translated: part of the label was in English, while the other part was translated. Facebook fully translated their state media label and vaccine misinformation label into every language we tested.

Graphical user interface, websiteDescription automatically generated

Figure 1. Image of a partially translated label on Facebook.

Graphical user interface, applicationDescription automatically generated

Figure 2. Image of a fully translated election-related label in Arabic.

Twitter

While Twitter translated its vaccine misinformation label into all languages we tested, it did not translate its election misinformation label or state media label at all. These two labels appeared in English for every language we tested.

Figure 3. Image of a partially translated label on Twitter containing both English and Spanish text.

Graphical user interface, applicationDescription automatically generated

Figure 4. A fully translated English to French label on Twitter.

Graphical user interface, text, applicationDescription automatically generated

Figure 5. An RT tweet with the language preference set to Bengali. Twitter did not translate its state media label into any language we tested.

YouTube

YouTube’s approach to translating its content labels concerned us most. Unlike the other platforms, which fall back to displaying labels in English in the absence of a translation, YouTube hid the label altogether. For example, the election misinformation label was simply missing when viewing YouTube in any language we tested other than Spanish. Similarly, the vaccine misinformation label was missing in the Portuguese, Russian, and Bengali interfaces.

 

Graphical user interface, logo, websiteDescription automatically generated

 

Figure 6. A video on YouTube with an election label translated into Spanish. Of the languages we tested, Spanish was the only language where the label did not disappear.

Logo, company nameDescription automatically generated

Figure 7. The same video on YouTube as before, but with the language preference set to Russian. Note that the label is missing. 

Consequences of Poor Translation

Examining label translation for English-language content may seem contrived—after all, if a user can understand English-language content, one might assume that the user could understand an English-language label as well—but it is still a plausible misinformation vector. Political misinformation spread widely through American communities during the 2020 presidential election, some of which were primarily immigrant and non-English speaking. There are several ways that English-language misinformation can create unique challenges for non-English audiences.

First, some people may know enough English to pick out important details from English misinformation, while still not knowing enough to understand an untranslated content label. For example, a post might include references in English to “vaccine” and “deadly side effects,” or “stolen election” and “voter fraud.” Such a post might promote distrust of vaccines or decrease faith in the election. A lack of English fluency might also limit one’s ability to judge the post’s credibility, highlighting the need for a translated content label. 

Second, misinformation on social media often has a visual component (for example, memes and videos). These visuals could be misleading or harmful on their own, even without understanding any accompanying text. For example, photos of allergic reactions—vaccine related or not—superimposed on a photo of a vaccination clinic might cause viewers to rethink vaccination, regardless of whether they understand the textual component of the image. Posts that contain visual media are far less limited by the language barrier, and research has shown their unique potential for harm. 

Third, English-language misinformation can be shared in non-English-speaking communities, where members might discuss the misinformation in languages other than English. For example, someone might share a piece of English election-related misinformation in a group chat whose members may not all speak English. Members of the group may repeat the misinformation in another language but not mention the associated content label. 

 

Graphical user interface, applicationDescription automatically generated

Figure 8. A Twitter user (now suspended) reshared English-language misinformation threads in Spanish.

In all of these cases, a translated label for the original content is important. While we looked at only a small segment of content and labels, our results have broader implications for responses to misinformation online. In particular, our findings emphasize the importance of fully testing labeling mechanisms, especially for communities in which people do not speak or browse the internet in English. We have seen repeatedly that platform responses can result in biases, stigma and injustice toward certain groups of users. And labeling that disappears or does not translate can reinforce discrimination and inequity, especially among marginalized communities.

Platforms have the resources to translate and properly apply labels so that everyone—English speaking or not—can benefit from their additional contextual information. While it’s tempting to rapidly roll out responses, investments in testing, implementation and impact need to happen to ensure that, no matter what language people speak, the labels they see are clear and accessible.

Table 1. Summary of findings.

Note: Data accurate as of December 1, 2021. “Partially translated” means that one component of the label was translated, but other components were not. “Translated” means that the entire label was translated. “Not translated” means that the label was shown in English. “Hidden” means that no label was shown at all. Exceptions are noted in blue. Note that we tested a sample of labels for each platform, so our analysis is not exhaustive. There may be additional labels that do or do not translate properly in each of the categories we evaluated.

Samantha Bradshaw is an assistant professor in new technology and security at American University’s School of International Service.
Miles McCain is a researcher and student at Stanford University, currently working on digital safety and privacy. He is a research assistant at the Stanford Internet Observatory and a Non-Resident Undergraduate Fellow at the Cornell Tech Policy Lab.

Subscribe to Lawfare