#Cybersecurity 28 Feb. 2025

Exposed GitHub Repositories: How Copilot’s Cache Created a Security Risk

By Kobi Shmueli

A major security vulnerability has been discovered in Microsoft's Copilot, stemming from Bing’s caching system. When public GitHub repositories were made private or deleted, Bing’s cache retained the data, allowing Copilot to surface outdated and potentially sensitive code snippets. This flaw exposed confidential information from organizations such as Google, IBM, PayPal, and even Microsoft itself.

How the Vulnerability Works

The problem originates from Bing's indexing and caching of public repositories. When a repository's status changes from public to private, the cached data is not promptly updated or removed. As a result, Copilot continued to provide suggestions based on this outdated cache, leading to the unintentional exposure of intellectual property, access keys, security tokens, and internal software packages. This is not a direct flaw in Copilot or GitHub but a consequence of Bing’s caching behavior.

Discovery and Impact

Israeli cybersecurity firm Lasso identified this issue when they found that their private GitHub repository was still accessible via Copilot's suggestions. Further investigation revealed that more than 16,000 organizations were impacted, with over 300 private security credentials and more than 100 internal software packages at risk. This raised significant security concerns about intellectual property protection and the risk of confidential information leakage.

Microsoft’s Response and Ongoing Concerns

After being informed of the issue in November 2024, Microsoft updated its security policies in January 2025. They restricted public access to Bing's cache, aiming to reduce further exposure. However, concerns persist about the persistence of cached data and the potential for similar vulnerabilities in other AI-driven tools. This incident highlights the complexities of data privacy in AI systems, especially for platforms relying on vast datasets for machine learning.

Community Reactions and Ethical Concerns

The incident has sparked debates about Copilot’s permissions, particularly its extensive read and write access to public and private repositories. Developers are demanding more transparency about how Copilot accesses and utilizes code from private repositories. This situation has fueled broader concerns about AI ethics and data privacy.

Recommendations for Developers and Organizations

Audit Repository History: Review the history of any repository that was ever public for sensitive data. Even if a repository was public for a short time, automated tools can quickly scan GitHub for sensitive information like access keys, usernames, passwords, and other secrets.
Rotate Keys and Credentials: Replace any access keys or credentials that might have been exposed.
Use Secure Coding Practices: Avoid storing sensitive information like API keys in source code. Use environment variables or secure secrets management tools instead.
Monitor Access: Implement monitoring solutions to detect unauthorized access or unusual activity related to your repositories.
Demand Transparency and Control: Advocate for platforms like GitHub and Microsoft to provide more transparency and control over how AI tools access and use cached data.
Review Permissions Regularly: Periodically review the permissions granted to Copilot and other third-party tools, limiting access to only what's necessary.
Check Which AI Tools Have Access: Regularly review which AI tools have access to your GitHub repositories, not just for this issue, but as a best practice for safeguarding sensitive information in the future.
Restrict Public Repository Creation: Limit the ability to create public repositories to specific people within the organization. Consult with security and legal teams before making any repository public to ensure compliance with privacy regulations and intellectual property protections.
Use Separate Accounts for Public Repos: Consider creating a separate organization account solely for public repositories to minimize the risk of accidentally exposing sensitive internal information.

Business Implications and Risk Management

This incident isn't just a technical concern; it poses significant business risks. Data leaks can damage brand reputation, lead to loss of customer trust, and result in legal penalties due to non-compliance with data privacy regulations. For companies leveraging AI in their products, this serves as a reminder to implement robust data security practices. Sales and marketing teams should be proactive in communicating security measures to clients to maintain credibility and customer confidence.

Final Thoughts

This incident illustrates the challenges of data privacy in AI systems and underscores the need for transparent data policies and robust security practices. Developers and organizations must be vigilant about how their data is cached and accessed, especially when using AI-powered tools that rely on public repositories. By proactively securing sensitive information, regularly reviewing access permissions, and advocating for better transparency from technology providers, the developer community can mitigate the risks of data exposure.

Source: Lasso Security - Major Vulnerability in Microsoft Copilot

Stay frozen! ❄️

-Kobi.

Share Article