The Nuts and Bolts of XKEYSCORE
I've been reading through the 48 classified documents about the NSA's XKEYSCORE system released by the Intercept last week. From the article:
Published by The Lawfare Institute
in Cooperation With
I've been reading through the 48 classified documents about the NSA's XKEYSCORE system released by the Intercept last week. From the article:
The NSA's XKEYSCORE program, first revealed by The Guardian, sweeps up countless people's Internet searches, emails, documents, usernames and passwords, and other private communications. XKEYSCORE is fed a constant flow of Internet traffic from fiber optic cables that make up the backbone of the world's communication network, among other sources, for processing. As of 2008, the surveillance system boasted approximately 150 field sites in the United States, Mexico, Brazil, United Kingdom, Spain, Russia, Nigeria, Somalia, Pakistan, Japan, Australia, as well as many other countries, consisting of over 700 servers. These servers store "full-take data" at the collection sites -- meaning that they captured all of the traffic collected -- and, as of 2009, stored content for 3 to 5 days and metadata for 30 to 45 days. NSA documents indicate that tens of billions of records are stored in its database. "It is a fully distributed processing and query system that runs on machines around the world," an NSA briefing on XKEYSCORE says. "At field sites, XKEYSCORE can run on multiple computers that gives it the ability to scale in both processing power and storage."
There seems to be no access controls at all restricting how analysts can use XKEYSCORE. Standing queries -- called "workflows" -- and new fingerprints have an approval process, presumably for load issues, but individual queries are not approved beforehand but may be audited after the fact. These are things which are supposed to be low latency, and you can't have an approval process for low latency analyst queries. Since a query can get at the recorded raw data, a single query is effectively a retrospective wiretap. All this means that the Intercept is correct when it writes:
These facts bolster one of Snowden's most controversial statements, made in his first video interview published by The Guardian on June 9, 2013. "I, sitting at my desk," said Snowden, could "wiretap anyone, from you or your accountant, to a federal judge to even the president, if I had a personal email."
You'll only get the data if it's in the NSA's databases, but if it is there you'll get it. Honestly, there's not much in these documents that's a surprise to anyone who studied the 2013 XKEYSCORE leaks and knows what can be done with a highly customizable Intrusion Detection System. But it's always interesting to read the details. One document -- "Intro to Context Sensitive Scanning with X-KEYSCORE Fingerprints (2010) -- talks about some of the queries an analyst can run. A sample scenario: "I want to look for people using Mojahedeen Secrets encryption from an iPhone" (page 6). Mujahedeen Secrets is an encryption program written by al Qaeda supporters. It has been around since 2007. Last year, Stewart Baker cited its increased use as evidence that Snowden harmed America. I thought the opposite, that the NSA benefits from al Qaeda using this program. I wrote: "There's nothing that screams 'hack me' more than using specially designed al Qaeda encryption software." And now we see how it's done. In the document, we read about the specific XKEYSCORE queries an analyst can use to search for traffic encrypted by Mujahedeen Secrets. Here are some of the program's fingerprints (page 10):
encryption/mojahaden2
encryption/mojahaden2/encodedheader
encryption/mojahaden2/hidden
encryption/mojahaden2/hidden2
encryption/mojahaden2/hidden44
encryption/mojahaden2/secure_file_cendode
encryption/mojahaden2/securefile
So if you want to search for all iPhone users of Mujahedeen Secrets (page 33):
fingerprint('demo/scenario4')=fingerprint('encryption/mojahdeen2' and fingerprint('browser/cellphone/iphone')
Or you can search for the program's use in the encrypted text, because (page 37): "...many of the CT Targets are now smart enough not to leave the Mojahedeen Secrets header in the E-mails they send. How can we detect that the E-mail (which looks like junk) is in fact Mojahedeen Secrets encrypted text." Summary of the answer: there are lots of ways to detect the use of this program that users can't detect. And you can combine the use of Mujahedeen Secrets with other identifiers to find targets. For example, you can specifically search for the program's use in extremist forums (page 9). (Note that the NSA wrote that comment about Mujahedeen Secrets users increasing their opsec in 2010, two years before Snowden supposedly told them that the NSA was listening on their communications. Honestly, I would not be surprised if the program turned out to have been a US operation to get Islamic radicals to make their traffic stand out more easily.) It's not just Mujahedeen Secrets. Nicholas Weaver explains how you can use XKEYSCORE to identify co-conspirators who are all using PGP. And these searches are just one example. Other examples from the documents include:
- "Targets using mail.ru from a behind a large Iranian proxy" (here, page 7).
- Usernames and passwords of people visiting gov.ir (here, page 26 and following).
- People in Pakistan visiting certain German-language message boards (here, page 1).
- HTTP POST traffic from Russia in the middle of the night -- useful for finding people trying to steal our data (here, page 16).
- People doing web searches on jihadist topics from Kabul (here).
E-mails, chats, web-browsing traffic, pictures, documents, voice calls, webcam photos, web searches, advertising analytics traffic, social media traffic, botnet traffic, logged keystrokes, file uploads to online services, Skype sessions and more: if you can figure out how to form the query, you can ask XKEYSCORE for it. For an example of how complex the searches can be, look at this XKEYSCORE query published in March, showing how New Zealand used the system to spy on the World Trade Organization: automatically track any email body with any particular WTO-related content for the upcoming election. (Good new documents to read include this, this, and this.) I always read these NSA documents with an assumption that other countries are doing the same thing. The NSA is not made of magic, and XKEYSCORE is not some super-advanced NSA-only technology. It is the same sort of thing that every other country would use with its surveillance data. For example, Russia explicitly requires ISPs to install similar monitors as part of its SORM Internet surveillance system. As a home user, you can build your own XKEYSCORE using the public-domain Bro Security Monitor and the related Network Time Machine attached to a back-end data-storage system. (Lawrence Berkeley National Laboratory uses this system to store three months' worth of Internet traffic for retrospective surveillance -- it used the data to study Heartbleed.) The primary advantage the NSA has is that it sees more of the Internet than anyone else, and spends more money to store the data it intercepts for longer than anyone else. And if these documents explain XKEYSCORE in 2009 and 2010, expect that it's much more powerful now. Back to encryption and Mujahedeen Secrets. If you want to stay secure, whether you're trying to evade surveillance by Russia, China, the NSA, criminals intercepting large amounts of traffic, or anyone else, try not to stand out. Don't use some homemade specialized cryptography that can be easily identified by a system like this. Use reasonably strong encryption software on a reasonably secure device. If you trust Apple's claims (pages 35-6), use iMessage and Facetime on your iPhone. I really like Moxie Marlinspike's Signal for both text and voice, but worry that it's too obvious because it's still rare. Ubiquitous encryption is the bane of listeners worldwide, and it's the best thing we can deploy to make the world safer.