Another day, another Facebook public-relations disaster.
Two caches of Facebook user data, one of them with 540 million records, were recently found unprotected on Amazon cloud servers, the data-breach-monitoring firm UpGuard said in a blog posting Wednesday (April 3).
Facebook didn't put the data there. It was collected by two third-party companies, who violated Facebook's own policies by storing it publicly. But the fact that there was so much of it, and that it seems to have been fairly easy to find, makes us wonder how much more Facebook user data there is floating in the cloud beyond Facebook's reach.
"As Facebook faces scrutiny over its data stewardship practices, they have made efforts to reduce third-party access," the UpGuard post says. "But as these exposures show, the data genie cannot be put back in the bottle. Data about Facebook users has been spread far beyond the bounds of what Facebook can control today."
To be clear, the 540 million records found in the larger of the two data sets does not translate to 540 million Facebook users. That stash of data was collected by a Mexican digital-media firm called Cultura Colectiva, which has a nice-looking, photo-heavy website in both Spanish and English devoted to everything and anything Latin American, as well as lots of pop culture. It's kind of a hybrid of Mashable, Tumblr and BuzzFeed.
Cultura Colectiva, which has been around since 2013, urges readers to share its stories on Facebook, Twitter, WhatsApp and Pinterest. Its commenting system uses Facebook's API. Anyone who wants to comment has to log into Facebook and stay logged in.
It looks like the 540 million records may in fact be an aggregate of all the "comments, likes, reactions, account names, Facebook IDs and more," as UpGuard described it, pertaining to every comment ever made on any Cultura Colectiva story.
That's still 146 gigabytes of material, UpGuard said. It would be a rich trove of Facebook user data to mine for anyone interested. But it doesn't include Facebook passwords, and it doesn't offer any path directly into Facebook accounts.
Sorry, the pool is closed
The other data set was collected by a failed social-media startup called At the Pool that aimed at "pooling" people who shared common interests and happened to be geographically close to each other. At one point, At the Pool required Facebook authentication, similar to Tinder.
The At the Pool Amazon bucket found by UpGuard contained data on 22,000 Facebook users, including their Facebook user ID, likes, friends, photos, groups and interests. Plaintext passwords were included, but they weren't Facebook passwords.
"The passwords are presumably for the 'At the Pool' app rather than for the user's Facebook account, but would put users at risk who have reused the same password across accounts," the UpGuard blog post notes.
After two years of operation, At the Pool went belly-up in 2014. The data may have sat there unprotected on Amazon's cloud for five years. On the plus side, the data was removed from the Amazon server while UpGuard's researchers were poking around in it, and before they had a chance to notify anyone.
"It is unknown if this is a coincidence, if there was a hosting period lapse, or if a responsible party became aware of the exposure at that time," UpGuard wrote. "Regardless, the application is no longer active and all signs point to its parent company having shut down."
Such was not the case with the Cultura Colectiva data. UpGuard said they told Cultura Colectiva about the uprotected stash on Jan. 10 and Jan. 14, and told Amazon server admins on Feb. 1 and Feb. 21.
"It was not until the morning of April 3rd, 2019, after Facebook was contacted by Bloomberg for comment, that the database backup ... was finally secured," the post said.
They've lost control again
These two stories have vaguely happy endings. There's no evidence that anyone other than the UpGuard researchers was aware of either data set, although naturally we can't know for sure.
On the downside, what these tales indicate is that lots and lots of companies, most of which you've never heard of, have hooks into Facebook user content. There is almost certainly tons more Facebook data stored on Amazon web servers, only with protection. These two data sets are merely what the UpGuard researchers were able to find.
So far, the amount of misused Facebook data we have learned about — Cambridge Analytica and these two databases — is probably just the tip of the iceberg.
"The data exposed in each of these sets would not exist without Facebook, yet these data sets are no longer under Facebook's control," the UpGuard post says in conclusion.
"The Facebook platform facilitated the collection of data about individuals and its transfer to third parties," it adds. "The responsibility for securing [Facebook user data] lies with millions of app developers who have built on its platform."