LAS VEGAS — Are you an internet privacy fanatic? Do you block browser tracking cookies? Do you use Duck Duck Go for anonymous web searches?
Credit: Pathdoc/ShutterstockIt doesn't matter. Your internet service provider (ISP) or your browser extensions can collect and sell your web-browsing history even if you take the above precautions. And anyone who obtains that data, whether the data is anonymized or not, will likely be able to figure out your real name and see exactly what you do online.
Those were the findings of German television journalist Svea Eckert and data scientist Andreas Dewes, who spoke at the DEF CON 25 hacker conference here last Friday (July 28).
Given a month's worth of browsing history from 3 million supposedly protected German citizens, the researchers identified individuals by correlating the anonymized data with public information scraped from social-media postings.
They found a German politician who searched for an herbal supplement to stimulate an aging brain; a police detective who mixed personal and business activities on his work computer; and a judge who visited many porn sites at the same time he shopped for a baby stroller.
The only ways to keep your web browsing history truly private, Eckert and Dewes told DEF CON, is to either run the Tor anonymizing protocol through multiple exit nodes, or to use a virtual-private-network (VPN) service that rotates proxy servers.
If that sounds inconvenient, it is. But now that the U.S. Senate has revoked privacy-protection rules for ISPs, every American's browsing history — and, by inference, identity — is up for sale to the highest bidder.
Stopping the collection
German ISPs are bound by strict privacy laws from collecting such personal information without permission. But what about American users, whose ISPs have no such constraints? Eckert and Dewes said that one solution would be to constantly run Tor, the freely available but sometimes difficult-to-use web-anonymization protocol.
Another would be to use a VPN service — but Dewes warned to do some research before signing up with one, because some VPN services also collect and sell user information.
In either case, your ISP wouldn't be able to see where you were going online. But you would have to make sure that the "exit" IP addresses — the exit nodes in Tor, or the proxy servers in the VPN — would be regularly changing so that the exit IP address did not become associated with a particular user.
Getting the data
Eckert and Dewes did their research for a feature called "Naked on the Net" that aired on the German television news magazine Panorama in November 2016. They knew that hundreds of companies buy and sell web-browsing data collected by websites and search engines.
Huge amounts of this data can be bought openly as long as information that would identify individual users, such as a computer or smartphone's Internet Protocol (IP) address, is stripped out.
To obtain that data, Eckert and fellow reporters created a fake online-marketing firm, complete with a slick website full of corporate buzzwords and staffers with fake LinkedIn profiles.
Posing as representatives of the firm, they approached about 150 online-data brokers with an interest to buy browsing data. However, they were told many times that because of strict German privacy laws, such information might be difficult to obtain.
"They said U.S. or U.K. data would be no problem, but that Germany was hard," Eckert said.
Finally, one company said it had browsing data from German residents. It offered Eckert and Dewes one month's sample for free, but didn't tell them where it came from. The data set consisted of 3 billion visits to 9 million websites by 3 million Germans.
Crunching the numbers
Each user was identified only by a number, with no corresponding IP address. But Eckert and Dewes knew that with so much data, anonymization was impossible. They used a method developed in 2008 by data scientists at the University of Texas, who had crunched user data provided by Netflix to positively identify thousands of supposedly anonymous users.
The technique is simple. Each user's entire browsing history for a month was in the anonymized data set. The researchers built up a second data set corresponding to the same month by "scraping" publicly posted data from Twitter, Facebook, YouTube, Google Maps and other online services.
Every time someone linked to, commented on or recommended a website, it went into the second data set. Finally, a computer algorithm looked for matches between the two data sets.
Statistically, millions of people may click on a single specific website in a given month. But a smaller number of people are going to click on two of the same websites in a month.
Make that three, then four, then five matching websites, and so on, and the numbers get whittled down to fewer and fewer people until only person is left. If those matches come from social-media sources, as they did in this instance, then the anonymous user suddenly has a name.
It was pretty easy to have a computer program sift through the data and come up with many exact matches, Eckert and Dewes said. In some cases, it took only one match.
Dewes found that only a logged-in Twitter user could access his or her own account-analytics page, which has a unique URL. The same was true of the German business-networking service Xing, which mandates the use of real names. So if either URL appeared in an anonymized user's history, the researchers could be pretty certain he or she was the account owner.
Unmasking the users
Using these methods, the researchers found that Valerie Wilms, a member of the German federal parliament, had searched for Tebonin, an herbal supplement meant to increase blood flow to aging brains.
"You can see everything — sh*t!" Wilms exclaimed when Eckert showed the politician her browsing history on camera. "This is really bad to see something like this — especially if it is connected with my own name."
Wilms had tried to protect her privacy by using the Duck Duck Go search engine, which unlike Google or Bing does not log user search data. But the Duck Duck Go search string for Tebonin was right there in her anonymized user data.
The researchers identified a police detective who searched for a used car at the same time he was writing an email to send to a foreign ISP regarding a cybercrime investigation. The email itself didn't show up in the data, but the detective used Google Translate to translate his draft from German into English.
It turns out that Google Translate puts the text being translated right into the URL (try it yourself). The detective had copied and pasted the entire draft email, including his own name, email address and telephone number, into Google Translate.
Worst of all was the case of a judge who visited some raunchy porn sites.
"He has really specific tastes," Eckert dryly commented as the judge's browsing history appeared on the DEF CON projection screens.
The same man also looked for baby strollers and vacation spots to which a couple with a young child could travel.
"He's not doing anything criminal at all. He's just a normal guy," Eckert said. "But you see how sensitive this could be, and how he could be blackmailed, especially in his position."
Who collected the data
You might imagine that using a browser's "private" or "incognito" mode, or a tracker blocker, might stop such collection of browsing history. But a private mode simply stops the browser itself from collecting the history; it doesn't stop the ISP from collecting it.
Likewise, a tracker blocker only stops websites from logging that you've visited them, and doesn't stop your ISP from seeing that you've visited them.
However, Eckert and Dewes were pretty certain that the data they'd bought hadn't originally come from German ISPs.
In Germany, personal information such as names, addresses, IP addresses and email addresses cannot be collected by private companies without the explicit agreement of the persons concerned. (The rules killed by the U.S. Senate in April would have made American ISPs do the same.)
So where had such detailed information, which seemed to evade tracker blockers and anonymous search engines, come from? With the assistance of a security researcher, Eckert and Dewes found that a browser extension called Web of Trust had been collecting and selling the data.
Ironically, Web of Trust is meant to vet websites for "reputation and safety information" and protect users with "secure browsing while shopping and surfing," according to its page in the Chrome Web Store. The people whose browsing history Eckert and Dewes had obtained had installed the Web of Trust extension to guard against the very thing the extension was doing.
Following the Panorama broadcast in November, Web of Trust was removed from the Mozilla Firefox, Google Chrome and Opera extension stores. The extensions returned a few months later with a new feature — one that let users opt out of having their personal information collected.
"High-dimension user-related data is very hard to anonymize," Eckert said. "The increase in public information on many people make deanonymization even easier."