Microsoft AI researchers accidentally exposed tens of terabytes of sensitive data, including private keys and passwords, when they published a storage bucket of open-source training data on GitHub.
In research shared with TechCrunch, cloud security startup Wiz said it discovered a GitHub repository belonging to Microsoft’s AI research division as part of ongoing work into accidental exposure of cloud-hosted data.
Readers of the GitHub repository, which provides open source code and AI models for image recognition, were instructed to download the model from an Azure storage URL. However, Wiz discovered that this URL was configured to grant permissions to the entire storage account, which accidentally exposed additional private data.
The data included 38 terabytes of sensitive information, including personal backups of two Microsoft employees’ personal computers. The data also includes passwords to Microsoft services, secret keys and other sensitive personal data, including 30,000 internal Microsoft Teams messages from hundreds of Microsoft employees.
The URL that exposed this data from 2020 was also misconfigured to allow “full control” instead of “read-only” permissions, according to Wiz, meaning anyone who knew where to look could potentially delete, change, and inject malicious content. can give its contents.
Wiz notes that the storage account was not directly disclosed. Instead, Microsoft AI developers included a super-permissive Shared Access Signature (SAS) token in the URL. SAS Tokens are a mechanism used by Azure that allows users to create shareable links by providing access to Azure Storage Account data.
“AI opens up enormous potential for tech companies,” Viz co-founder and CTO Ami Luttwak told TechCrunch. “However, as data scientists and engineers race to bring new AI solutions into production, the large amounts of data they handle require additional security checks and security measures. As many development teams need to handle large amounts of data, share it with their peers, or collaborate on public open-source projects, cases like Microsoft’s are becoming increasingly difficult to monitor and avoid.
Wiz said it shared its findings with Microsoft on June 22, and Microsoft canceled SAS tokens two days later on June 24. Microsoft said on August 16 that it had completed its investigation into the potential organizational impact.
In a blog post shared with TechCrunch ahead of publication, Microsoft’s Security Response Center said “no customer data was exposed and no other internal services were compromised by this issue.”
Microsoft said that as a result of Wiz’s research, it has expanded GitHub’s secret spawning service, which monitors all public open-source code changes for plaintext exposure of credentials and other secrets, to include any SAS tokens that have higher permissions or privileges.