Microsoft AI Researchers Accidentally Leak 38 Terabytes of Internal Sensitive Data Through Azure Storage

Artificial intelligence (AI) researchers working for tech giant Microsoft accidentally leaked 38 terabytes worth of internal sensitive data through the Azure storage system.

A cybersecurity firm's new research found that a large cache of private data was exposed on the software development platform GitHub. The team at the cloud security company Wiz found the leak of cloud-hosted data on the AI training platform via a misconfigured link.

Microsoft's Massive 38-Terabyte Leak

The company said that the data was leaked by Microsoft's research team while it was publishing open-source training data on GitHub. Users of the repository were urged to download AI models from a cloud storage URL.

However, it was misconfigured to grant permissions on the entire storage account and it also gave users full control permissions instead of read-only. This means that they could delete and overwrite existing files.

The leaked data allegedly included Microsoft employees' personal computer backups, which contained their passwords for the tech giant's services, secret keys, and more than 30,000 internal Microsoft Teams messages from 359 Microsoft Employees, as per Yahoo Finance.

While open data sharing is a crucial component of AI training, sharing larger amounts of data leaves companies exposed to greater risk if they are shared incorrectly. Wiz also shared the data in June with Microsoft, which immediately worked to remove the exposed data.

A Microsoft spokesperson, when asked about the situation, said that they confirmed no customer data was exposed as part of the leak and that no other internal services were put at risk.

On Monday, the tech giant released a blog post where it said that it had investigated and remediated the incident. It added that the data exposed in the storage account included backups for two former employees' workstation profiles and internal Microsoft Teams messages of the two employees in question.

Fortunately, Wiz noted that the storage account was not directly exposed, adding that the Microsoft AI developers included an overly permissive shared access signature (SAS) token in the URL, according to TechCrunch.

Dangers of Using SAS Tokens

These SAS tokens are a mechanism that is used by Azure that allows users to create shareable links that grant access to the storage account's data. Wiz co-founder and CTO Ami Luttwak said that AI unlocks massive potential for tech companies.

However, he noted that with the growing amounts of data being handled, companies need to add more security checks and safeguards to keep them safe. Microsoft also worked on expanding GitHub's secret spanning service, which is responsible for monitoring all public open source code changes for plaintext exposure of credentials and other secrets to include any SAS token that has overly permissive expirations or privileges.

Wiz recently warned that due to a lack of monitoring and governance, SAS tokens pose a security risk and urged that their usage should be limited as much as possible. These tokens are very difficult to track as not even Microsoft provides a centralized way to manage them within the Azure portal.

The firm added that these SAS tokens can be configured to last effectively forever, having no upper limit on their expiry time. Wiz recommended that the use of SAS tokens for external sharing should be avoided, said Bleeping Computer.