Microsoft AI employee accidentally leaked 38TB of data

Rate this post

A misconfigured link accidentally leaked access to 38TB of Microsoft data, opening up the ability to inject malicious code into its AI model.

That’s the finding from cloud security provider Wiz, which scanned the internet for recently opened storage accounts. He found a software repository on Microsoft-owned GitHub dedicated to providing open-source code and AI models for image recognition.

On the affected GitHub page, a Microsoft employee created a URL that enabled visitors to the software repository to download the AI ​​model from an Azure storage container. “However, this URL allowed more access than open-source models,” Wiz said in its report. “It was configured to grant permissions to the entire storage account, accidentally exposing additional private data.”

Wiz Research’s scan also indicated that the Azure storage container contained 38TB of data, including “passwords to Microsoft services, secret keys and 30,000 internal Microsoft Teams messages from 359 Microsoft employees.”

The storage container’s URL was also created using a powerful “shared access signature,” or SAS token, which gave anyone visiting the link—including potential attackers—the ability to view, delete, or overwrite those files.

“This is particularly interesting given the original purpose of the repository: to provide AI models for use in training code,” Wiese said. “That is, the attacker could have injected malicious code into all the AI ​​models in this storage account and infected every user who trusts Microsoft’s GitHub repository.”

Wiz informed Microsoft about this in June, and the company promptly shut down the leak. “No customer data was exposed and no other internal services were compromised by this issue,” Microsoft said in its own report.

The company also said that the exposed storage container contained backups and internal Microsoft team messages from two former Microsoft employees. To prevent any further leaks, Microsoft is scanning GitHub for SAS tokens that “may contain excessively permissive expirations or privileges.”

“The system looked for a specific SAS URL identified by Wiz in the ‘robust-models-transfer’ repo, but the search was incorrectly marked as a false positive,” Microsoft said. “The root cause issue for this has been fixed and it has been confirmed that the system is now detecting and reporting all over-provisioned SAS tokens correctly.”

Still, the incident is a reminder to securely configure access to cloud storage accounts, especially those housing large data sets. “As data scientists and engineers race to bring new AI solutions into production, the large amounts of data they handle require additional security checks and security,” Wiz added.

The company’s report details some alleged security flaws with SAS tokens on Azure accounts. But Microsoft says, “Like any key-based authentication mechanism, SAS can be revoked at any time by rotating the original key. In addition, SAS supports granular revocation at the container level, without rotating storage account keys.

Leave a Comment