Remember that massive data leak of mortgage and loan data we reported on Wednesday?
In case you missed it, millions of documents were found leaking after an exposed Elasticsearch server was found without a password. The data contained highly sensitive financial data on tens of thousands of individuals who took out loans or mortgages over the past decade with U.S. financial institutions. The documents were converted using a technology called OCR from their original paper documents to a computer readable format and stored in the database, but they weren’t easy to read. That said, it was possible to discern names, addresses, birth dates, Social Security numbers and other private financial data by anyone who knew where to find the server.
Independent security researcher Bob Diachenko and TechCrunch traced the source of the leaking database to a Texas-based data and analytics company, Ascension. When reached, the company said that one of its vendors, OpticsML, a New York-based document management startup, had mishandled the data and was to blame for the data leak.
It turns out that data was exposed again — but this time, it was the original documents.
Diachenko found the second trove of data in a separate exposed Amazon S3 storage server, which too was not protected with a password. Anyone who went to an easy-to-guess web address in their web browser could have accessed the storage server and see — and download — the files stored inside.
In a note to TechCrunch, Diachenko said he was “very surprised” to find the server in the first place, let alone open and accessible. Because Amazon storage servers are private by default and aren’t accessible to the web, someone would have made a conscious decision to set its permissions to public.
The bucket contained 21 files containing 23,000 pages of PDF documents stitched together — or about 1.3 gigabytes in size. Diachenko said that portions of the data in the exposed Elasticsearch database on Wednesday matched data found in the Amazon S3 bucket, confirming that some or all of the data is the same as what was previously discovered. Like in Wednesday’s report, the server contained documents from banks and financial institutions across the U.S., including loans and mortgage agreements. We also found documents from U.S. Department of Housing and Urban Development, as well as W-2 tax forms, loan repayment schedules, and other sensitive financial information.
Many of the files also contained names, addresses, phone numbers, and Social Security numbers, and more.
When we tried to reach OpticsML on Wednesday, its website had been pulled offline and the listed phone number was disconnected. After scouring through old cached version of the site, we found an email address.
TechCrunch emailed chief executive Sean Lanning, and the bucket was secured within the hour.
Lanning acknowledged our email but did not comment. Instead, OpticsML chief technology officer John Brozena confirmed the breach in a separate email, but declined to answer several questions about the exposed data — including how long the bucket was open and why it was set to public.
“We are working with the appropriate authorities and a forensic team to analyze the full extent of the situation regarding the exposed Elasticsearch server,” said Brozena. “As part of this investigation we learned that 21 documents used for testing were made identifiable by the previously discussed Elasticsearch leak. These documents were taken offline promptly.”
He added that OpticsML is “working to notify all affected parties” when asked about informing customers and state regulators, as per state data breach notification laws.
But Diachenko said there was no telling how many times the bucket might have been accessed before it was discovered.
“I would assume that after such publicity like these guys had, first thing you would do is to check if your cloud storage is down or, at least, password-protected,” he said.