Based on the audit data presented in the 2024 “Open Source Security and Risk Analysis” (OSSRA) report, organizations in all verticals should be concerned about the potential risk of litigation or threat to their intellectual property rights due to failure to comply with an open source license. The report’s findings show that over half—53%—of the 2023 audited codebases contained open source with license conflicts.
An open source license outlines a set of terms and conditions for use of an open source component (or a snippet of a component’s code) in software, as well as end user obligations, including how it may be used and redistributed.
Most open source licenses fall into one of two categories. A “permissive” license allows use of the component with few restrictions. Generally, the main requirement of this type of license is to include attribution of the original code to the original developers. A “copyleft” license (also known as a viral license) generally includes a reciprocity obligation stating that modified and extended versions are released under the same terms and conditions as the original code, and that the source code containing changes must be provided upon request. In a general sense, most permissive licenses are considered low-risk from a compliance standpoint, while copyleft licenses can expose organizations to varying levels of IP and compliance risk.
It should be noted that terms such as “low-risk” are only a guideline and should not be used by developers to make decisions about using the open source governed by a license. For example, although Apache 2 software—generally considered to have a low-risk license—can be included in projects licensed under GNU General Public License 3.0 (GPLv3), GPLv3 software cannot be included in Apache projects. This is a result of Apache Software Foundation’s licensing philosophy and the GPLv3 authors’ interpretation of copyright law. Another example is the JSON license, based on the permissive MIT license, which adds the restriction that “the software shall be used for good, not evil.” The ambiguity of this statement leaves its meaning up to interpretation—and adds risk. Rather than trying to interpret often convoluted licensing language, the safest strategy is for developers to consult their corporate policies and legal teams for specific guidance regarding license compliance.
As noted above, code snippets (an extract from a larger piece of code that a developer cuts and pastes into their own code), carry with them any license terms and obligations associated with that larger piece of code. One of the more popular repositories of code snippets is the Stack Overflow site, which automatically licenses all publicly accessible user contributions under Creative Commons ShareAlike (CC-SA)—including the code snippets posted on Stack Overflow.
The CC-SA license can be interpreted in some situations as having a similar viral effect as the GNU Public License (that is, any work derived from a copyleft-licensed work must also be licensed under the same copyleft terms), so it can become a concern from a legal standpoint. As the data in the 2024 OSSRA report shows, CC-SA licenses were the top cause of license conflicts, with CC-SA 3.0 and 4.0 alone producing 33% of the license conflicts found.
Arising with the use of AI-powered coding suggestion tools are questions around ownership, copyright, and licensing of the generated code. For example, a class-action lawsuit filed against GitHub, Microsoft, and OpenAI claims that GitHub Copilot—a cloud-based AI tool that offers developers autocomplete-style suggestions as they code—violates both copyright law and software licensing requirements. The lawsuit further claims that the code suggested by Copilot uses licensed materials without attribution, copyright notice, or adherence to the original licensing terms.
The Copilot case highlights the legal complexities surrounding the use of AI-generated code. For software developers, refraining from using AI-assisted coding tools until the issue is resolved by legal or government decision is obviously the safest way to avoid an action for license or copyright violations. But the reality is that many developers are using and will continue to use those tools. Those who are using them should, at a minimum, have their organization ask their AI tool vendors whether its recommendations include source code subject to open source licenses, and if so, whether that code can be highlighted or excluded from recommendation altogether.
Another solution is to use one of the available code scanners, such as Synopsys Black Duck®, which uses snippet analysis to scan source code and match individual lines of code back to any open source project they may originate from. For those organizations planning or involved in an M&A transaction, an open source audit as part of software due diligence can highlight instances where an AI has copied code from open source projects and help organizations understand the potential risk to their IP.
Whether your organization develops or uses software, there’s a near certainty it has open source components. Do you know exactly what those components are and whether they pose security or license risks? Without visibility into your code and keeping proactive software hygiene practices, you’re exposing your software to potential exploits from open source vulnerabilities and IP compliance questions.
The major theme of the 2024 OSSRA report is Do you know what’s in your code? With the prevalence of open source and the rise in AI-generated code, it’s a question more important than ever before.
The 2024 Open Source Security and Risk Analysis Report is here.