The Puzzling Non-Use of Data Access Laws: The NetzDG Case
Published by The Lawfare Institute
in Cooperation With
People around the world see significant upsides to social media but also widely believe it has led to political manipulation, division, and, in Europe, the rapid spread of illegal hate speech. How should governments respond to these downsides? They could pass laws making certain kinds of content illegal or encourage companies to deplatform malign users, label content, or impose content standards. But do such interventions work?
Maybe. With a few exceptions, we really don’t know what works. Research simply is not keeping pace with the information environment’s rapid evolution. And there are significant gaps between researchers and regulators about how to interpret the information we do have. Part of the challenge is data access: Scholars cannot observe much of what happens online, especially when it comes to illicit content and efforts to remove it.
Germany’s Network Enforcement Act (NetzDG) was amended in June 2021 specifically to solve this problem. The amendment enables researchers to request highly detailed data from platforms on their content moderation efforts and on the spread of illegal content. To date, however, there is no evidence this access provision has ever been used.
This non-use should worry those hoping that legal mandates for data access, such as those coming online next year under the European Union Digital Services Act (DSA), will be enough to dramatically increase the pace of policy-relevant research on the information environment. The NetzDG experience suggests that access provisions need to be accompanied by other kinds of support. As we explain below, the research community needs one or more centralized, technically adept organizations that can support data access requests and help scholars navigate emerging laws and regulations.
To see why, we first need to dig deeply into the history of NetzDG.
NetzDG: A Story of Trial and Error
NetzDG was passed in 2018, largely in response to Germany’s far right—often violent and newly emboldened by the refugee crisis from the Syrian civil war—which was using social media to spread hate and motivate attacks.
The law represents one of the most ambitious regulatory attempts to hold platforms accountable for domestically illegal content. It imposes specific requirements on large social network platforms (those with over 2 million users in Germany) to report and remove content that is already illegal under German hate speech, libel, and defamation laws. Within Germany, platforms must provide a mechanism for users to report content as specifically illegal (rather than simply in violation of platform rules). Upon receiving a report, the platform must immediately investigate the reported content and remove flagrantly illegal material (for example, clear and obvious hate speech) within 24 hours and all other illegal content within seven days or face fines of up to 50 million euros. The law also imposes transparency requirements; if a platform receives more than 100 complaints through its NetzDG reporting mechanism, it is obligated to publish semiannual content moderation reports. In making the case for these measures, German Justice Minister Heiko Maas said, “Experience has shown that, without political pressure, the large platform operators will not fulfill their obligations.”
When it was first implemented, NetzDG faced immediate criticism from many sides. Human Rights Watch said it could lead to “unaccountable, overbroad censorship,” and experts argued it chilled both individual expression and discourse on important topics. Toomas Hendrik Ilves, former president of Estonia, said the law had “legion” technical, jurisdictional, and implementation problems, and that authoritarian regimes would copy its language to justify their own crackdowns on dissent—a prediction that later appeared to come true. Far-right politicians censored under NetzDG used the law’s inconsistencies to chip away at it. One of their lawyers maintains an online “Facebook Wall of Shame” that features posts that do not violate German law but were nevertheless removed from the platform.
In response to this criticism and building on the law’s transparency requirements, Germany amended NetzDG in June 2021. New provisions included greater information requirements and a more standardized format for platform transparency reports, ease-of-use requirements for the content reporting mechanism, and an appeals process for removed content. Crucially, the amendment included a first-of-its-kind requirement that platforms make data available to researchers studying questions related to automatic detection and deletion of illegal content and/or the dissemination of content that has been reported or removed as illegal.
Given the pressing policy importance of these issues and the debate over NetzDG’s effectiveness, one would expect that this novel data access mechanism would have attracted significant interest and use by now.
Nearly two years after the introduction of the data access provisions, we can find no evidence that scholars have taken advantage of this opportunity. Transparency reports from both YouTube and Facebook do not mention disclosing data to researchers under NetzDG. Twitter’s NetzDG reports do not even mention the data access amendment. And we could find no evidence that either company has used other mechanisms to disclose the kinds of granular data on content moderation and the reach of illegal content contemplated under the law. Further, the law’s data access provisions are not used by any of the research papers in the first 150 Google Scholar results for any of the search terms “NetzDG,” “NetzDG amendment 5a,” and “NetzDG data.” Given significant debate over the efficacy of platform content moderation practices and deep concerns over how Russian propaganda on the war in Ukraine is spreading in Germany, this apparent non-use is puzzling.
A Valuable but Underused Tool
If NetzDG were working as intended, company-provided data would be helping researchers measure the impact of content removal and contribute scientific evidence to global discussions on how to combat hate speech and other forms of illegal content. They might, for example, be able to quantify how much content was removed improperly due to so-called overblocking, one of the primary concerns raised about the law. So why have NetzDG’s data access provisions failed to enable research so far?
One possibility is that researchers are simply unaware of the provision. This seems unlikely given that 1,050 Google Scholar results since 2019 include the terms “NetzDG,” “data,” “access,” and “amendment.”
A second possibility is that NetzDG’s provisions lack clarity. While the topics on which researchers can request data are clear, the conditions under which a platform can deny a researcher’s access request involve subtle balancing tests. Under the law, a platform can deny the request if platform interests in protection significantly outweigh the public interest in the proposed research, or if legitimate interests of the data subjects are adversely affected and the public interest does not outweigh the subjects’ interest in secrecy. Neither of these tests has been adjudicated as best we can tell, and perhaps fear of the time and legal expenses that could be incurred in the event of a dispute is deterring researchers from requesting information in the first place.
A third possibility is that researchers are not willing or able to pay for the data requests. NetzDG allows platforms to recoup up to 5,000 euros in expenses for meeting data requests. While that amount may loom large for early-career scholars or those at less wealthy institutions, it is modest compared to the costs of surveys that social scientists usually undertake. While the grantors supporting work on the information environment typically fund work on a project-by-project basis, the inefficiencies of which one of us has written about elsewhere, the amount is small compared to typical research grants in the social sciences, which are in the tens to hundreds of thousands of dollars.
A fourth possibility is that technical capacity is a key barrier. NetzDG Section 5a(4) requires researchers to submit a “protection concept” that details how personal data provided in response to a request will be protected as well as what precautions will be taken to protect providers’ interests. Addressing these requirements while complying with Europe’s standing General Data Protection Regulation (GDPR), which protects consumers’ data privacy, is challenging.
How challenging? One way to judge is to look at the recommendations of the European Digital Media Observatory’s draft Code of Conduct for Platform-to-Researcher Data Sharing, which outlines principles for how researchers and platforms can share data while meeting a range of legal and ethical considerations. The code lays out a risk assessment framework based on two questions: (a) how private subjects might reasonably expect the data to remain and (b) how the proposed processing could impact rights and freedoms if the data or research outputs were misused. The code then lays out recommended technical safeguards—including pseudonymization, data encryption, and virtual clean rooms—depending on those risk assessments.
For low-risk data, the code recommends straightforward measures such as data minimization (collecting only the information necessary to answer the research question) and pseudonymization (replacing identifiers with codes known only to the researcher). For medium-risk data (that is, scores more than low on one risk criteria), the code recommends using access-restricted application programming interfaces (APIs) and requires where applicable the use of data encryption, access restrictions, and data destruction protocols. For high-risk data (which score above low on both criteria), the code recommends using either virtual clean rooms or physical data safe rooms.
Most kinds of data covered under NetzDG access provisions likely meet the medium-risk criteria at least—they must by definition have some real or claimed connection to illegal content. And that means that researchers wanting to use the data must understand both digital and physical security measures and have staff who can set up appropriate computational facilities, which most academic and civil society researchers and research centers lack.
Bridging the Gap
The non-use to date of NetzDG’s data access provisions has important implications for other regulatory efforts to make data available to researchers. The most prominent such effort is Article 40 of the DSA, which will soon require platforms with more than 45 million monthly EU users (so-called very large online platforms) to provide data for studies of systemic risks associated with the platforms’ services. The digital services coordinators (DSCs) responsible for vetting researchers and requesting data from companies will be established by February 2024, at which point researchers should be able to begin submitting requests.
While the DSA has no requirement that researchers reimburse platforms for their expenses in providing the data, it does requires researchers to provide appropriate data privacy and security guarantees and ensure GDPR compliance. Without meaningful action to help lower these barriers, many researchers will be unable to take advantage of the new access provisions and, thus, unable to produce the policy-relevant research on the impact of prior regulatory actions, how platform policies shape the information environment, and how malign actors respond to enforcement efforts.
Simple tweaks to the law will not be enough, because some barriers researchers face in using access provisions are there for very good reasons. While removing fees seems like an obvious step, exempting EU researchers from complying with the GDPR or from providing appropriate security and privacy safeguards would not be good for platforms, governments, or researchers themselves.
Rather, helping researchers meet and overcome these barriers would enable them to take advantage of legally mandated data access in an ethically responsible, security-conscious, and legally compliant manner. Just as regulators implementing DSA access provisions will need the kind of support contemplated by the European Digital Media Observatory’s proposed Intermediary Body to Support Research on Digital Platforms, scholars seeking to use the provision will need expertise that is outside their professional remit.
The most efficient way to help researchers on this score is to pool resources in larger, centralized institutions that can support data access requests and help scholars navigate emerging laws and regulations. Centralized institutions can speed research in many ways. When it comes to navigating data access provisions, they can distribute the costs of meeting legal, privacy and security, and technical data protection requirements by retaining legal support, hosting staff that help researchers submit data access requests, vetting researchers, managing privacy and security audits, and developing secure compute environments. Centralized research institutions can also make life easier for platforms that could negotiate to ensure privacy protection with a small number of trusted entities rather than having to vet dozens or hundreds of researchers.
With new data access laws, we have an opportunity to catapult collective knowledge about the information environment. While the DSA provides a promising foundation for researchers to access platform data, it’s only part of the solution. Researchers need support navigating data access laws through shared infrastructure to help create evidence-based solutions to societal problems and opportunities in the information environment.