Open Source AI

Are the traditional open source rules applicable to AI?

Mark Hinkle
November 01, 2024 • Estimated Reading Time: 15 minutes

This week, I attended the All Things Open Conference in Raleigh, NC. It’s one of my favorite events centered around open source software, so this week, I am sharing something.

I’ve partnered with them to produce an All Things Open AI conference in Durham, NC, in March of next year. I am very excited to be able to bring in-person AI events to the area. Feel free to join the mailing list on the site to get exclusive updates and early access to discounted tickets and training opportunities.

AI Efficiency Edge

Easily Run Private and Cost-Effective AI

For businesses seeking to adopt AI without compromising data privacy or accumulating high cloud costs, Ollama offers an accessible solution. This open source framework allows companies to run large language models (LLMs) directly on their own hardware, avoiding the risks of external data transfers and delivering faster, tailored results. It’s also fairly easy to do, even on a desktop computer provided it’s powerful enough.

Ollama Benefits for Business

1. Data Privacy and Control

Running AI models locally means sensitive information—like customer data or proprietary business insights—never leaves your internal systems. This setup is ideal for industries with strict data regulations (e.g., finance, healthcare) or companies needing to ensure client confidentiality.

2. Cost Savings

Ollama reduces costs typically associated with API usage or cloud service fees by eliminating reliance on cloud-based AI platforms. These savings quickly add up for businesses using AI on a large scale.

3. Performance and Responsiveness

Processing data in-house minimizes latency, leading to quicker, more responsive AI applications. This is especially beneficial in operations where real-time responses are critical.

Example Use Case: Elevating Financial Services with Secure AI

Scenario: A financial services firm wants to automate customer inquiries but faces regulatory hurdles with external data sharing. They chose Ollama to build an AI assistant that runs securely within their own infrastructure.

Customer Data Remains In-House: Sensitive information stays within the firm’s private network, ensuring compliance with privacy regulations.
Reduced Costs: By using Ollama to power their AI solution, the firm avoids recurring cloud expenses, improving ROI.
Improved Service Speed: Real-time data processing enables faster customer responses, enhancing service satisfaction and operational efficiency.

Implementing Ollama allows businesses to leverage AI's full potential while ensuring that data remains secure, costs are controlled, and performance is optimized. By running models locally, businesses can achieve high-level AI capabilities without sacrificing privacy or incurring prohibitive expenses—key advantages in today's data-centric landscape.

AI Deep Dive

Open Source AI

The Open Source Initiative's (OSI) release of version 1.0 of the Open Source AI Definition (OSAID) at the All Things Open 2024 conference in Raleigh, North Carolina, may be well-intentioned but raises several practical and strategic concerns. While touted as a breakthrough after years of collaboration among tech giants, their official endorsements are noticeably missing on the endorsements page.

Brief Overview of Open Source

Think of open source like sharing a recipe: instead of keeping it secret, anyone can see, use, and improve it. Open source software is code available to everyone, allowing users to understand, modify, and enhance it. This is different from most software, where only the creator controls how it works.

Why Should You Care?

Transparency and Trust: Open source software is open for anyone to inspect, so there are no hidden surprises or privacy concerns. This transparency builds user trust.
Freedom and Control: With open source, you’re not tied to one company’s limitations. If you need to make changes or customize the software, you can do it (or have someone do it for you).
Community and Quality: Open source thrives on global collaboration. Anyone can suggest improvements, meaning bugs are fixed quickly, new features are added, and quality improves over time.
Cost-Effectiveness: Open source software is often free to use, providing high-quality tools without licensing fees, making it a great choice for individuals and organizations.
Shared Innovation: Open source is a powerful strategy for sharing the development load on “non-differentiating” technology—the tools everyone needs but don’t set companies apart. By collaborating on foundational technologies, industries can move forward faster, focusing their unique resources on what differentiates them.

Open Source in Your Daily Life

Open source is all around you:

Smartphones: Android, one of the most popular operating systems, is open source.
Web Browsers: Firefox and Chromium (used in Chrome) are open source, allowing secure browsing with a strong privacy focus.
Streaming Services: Behind the scenes, open source software powers Netflix, YouTube, and other streaming services, keeping them fast and reliable.
Smart Home Devices: Many smart home systems and routers use open source software to ensure flexibility and security.

How Open Source Set the Stage for AI

The LAMP stack (Linux, Apache, MySQL, and PHP) exemplifies how open source has allowed industries to innovate faster. By building the web’s foundation, LAMP helped companies and developers worldwide focus on creating new digital experiences without reinventing core technology. This shared framework enabled the rapid development of web services and cloud computing platforms, which are now crucial for artificial intelligence (AI).

Today, open source is helping AI progress in similar ways. Tools like TensorFlow, PyTorch, and OpenAI’s models provide shared foundations that developers and companies can build upon, allowing the industry to move forward faster while supporting collaboration, transparency, and innovation.

By choosing open source, you’re supporting a system that accelerates development, shares knowledge, and prioritizes openness for the good of everyone.

Open Source has proven to be a powerful catalyst for innovation. It demonstrates that immense benefits accrue to everyone by removing barriers to learning, using, sharing, and improving software systems. These benefits arise from licenses that adhere to the Open Source Definition (OSD), granting key freedoms to use, study, modify, and distribute software without excessive restriction.

The same freedoms are essential for AI to enable developers, deployers, and end users to benefit from enhanced autonomy, transparency, frictionless reuse, and collaborative improvement. However, with the rise of large language models (LLMs) like Meta’s Llama 3, it’s becoming increasingly clear that applying traditional open source licensing to AI introduces unique challenges—revealing a need for an adapted framework tailored to AI.

The Complexity of Open Source in AI: Square Peg, Round Hole

The Open Source Definition was developed with software in mind, and it works best for applications with standard dependencies, accessible codebases, and achievable reproducibility. LLMs diverge from these norms in ways that make it challenging to apply the OSD to them:

Data Transparency: One of the most significant hurdles is data transparency. For an AI model to be reproducible in the open source sense, it needs a complete Training Data Bill of Materials (TDBOM). This is similar to a Software Bill of materials that is becoming popular in software supply chains. This includes the data’s provenance, selection criteria, processing steps, and labeling processes. Releasing this information is difficult for many LLMs due to the proprietary nature of data sources, privacy issues, and the vast scale of data involved. Or beyond that it can be very difficult to identify and confirm the source of the vast amount of data used to train foundation models.
Full Source Code: In traditional open source, access to the full source code facilitates modification and reproducibility. With LLMs, providing the complete code used in data preprocessing, tokenization, training configurations, and fine-tuning processes is not only resource-intensive but often reveals sensitive internal methodologies or optimizations, which companies guard as proprietary.
Parameters and Computational Barriers: Traditional open source licensing assumes accessible software that users can run and modify with standard resources. In contrast, LLMs require extensive computational power to reproduce, even with access to model weights and architecture. This makes “openness” in AI much more resource-intensive and exclusive, limiting the practical freedoms intended by the OSD.
Legal and Reproducibility Issues: Unlike software, simply releasing weights and source code doesn’t necessarily mean a system is accessible. Without the underlying data and a way to reproduce the results fully, LLMs fall short of open source’s transparency and accessibility goals.

Case in Point: Meta’s Llama 3 License

Meta’s Llama 3 Community License Agreement offers a prominent example of how traditional open source licensing falls short in AI applications. While Meta has labeled Llama 3 as “open source,” its license diverges from OSD norms in several key ways, making the term “open source” somewhat misleading:

Controlled Redistribution and Use: Meta’s license grants a non-exclusive, non-transferable, royalty-free license but mandates prominent branding (“Built with Meta Llama 3”) on any derivative works or products using Llama 3. It also restricts using the model to train or improve any other LLM. Such conditions are inconsistent with standard open source licenses, which typically avoid branding mandates and restrictions on derivative use.
Proprietary Data and Reproducibility: The license allows users access to model weights and code but lacks data transparency—without which true reproducibility isn’t feasible. Open source emphasizes accessibility, but without a TDBOM, reproducing Llama 3’s training process is challenging, if not impossible.
Limitations on Usage: Meta’s license restricts specific applications of the Llama 3 model, including its use in military, critical infrastructure, and other high-risk environments, as outlined in its Acceptable Use Policy. Such restrictions highlight the gap between open source’s unrestrictive ethos and LLM-specific requirements.
Additional Licensing Complexity: Llama 3’s license terminates user rights if their organization exceeds 700 million monthly active users without securing an additional license. These limitations further demonstrate the controlled nature of Llama’s distribution and fall short of true open source principles.

Lessons from the Past: Custom Licenses and Market Confusion

Meta is not the first to adopt a unique license structure and label it as “open source,” a practice that has historically caused market confusion. In the late 1990s, Sun Microsystems’ “Community Source License” for Java introduced additional restrictions, leading many to question its “open source” label. Similarly, SugarCRM’s “Sugar Public License” introduced restrictive terms inconsistent with OSD, ultimately causing a backlash in the developer community. In both cases, these licenses sought to balance proprietary control with open source-like freedoms, creating confusion and fracturing trust in open source definitions. Meta’s “Community License” for Llama 3, by adding restrictions on use and redistribution, risks repeating these past mistakes, potentially confusing users and diluting the open source label.

Open Source Clarity: Avoiding Legal and Compliance Pitfalls

Many companies understand open source as software with specific freedoms and responsibilities, as defined by recognized standards like the Open Source Initiative (OSI). This clarity has allowed developers, legal teams, and business leaders to use, modify, and distribute software within a well-defined legal framework for decades. However, as new AI models and frameworks emerge that do not comply with traditional open source definitions, this shift can potentially lead to confusion or unintentional legal exposure.

The Risk of “Muddied” Open Source Definitions

Legal Ambiguity: When frameworks position themselves as open source without adhering to standard OSI definitions, legal departments face uncertainty. Traditional open source offers a consistent understanding of licensing terms and usage rights, but new interpretations may carry limitations or conditions that could surprise compliance and legal teams. For companies with strict governance over software use, this can introduce unforeseen risks.
Developer Implications: For developers accustomed to OSI-compliant open source, ambiguous licensing can lead to practices that might inadvertently violate usage terms. Developers rely on standard open source norms to integrate, modify, or distribute software without needing extensive legal reviews. Non-standard definitions can complicate this understanding, requiring additional time and legal oversight.
Risk of Non-Compliance: Companies that misunderstand or misinterpret licensing could unintentionally violate terms, leading to potential legal challenges, reputational harm, and fines. In regulated industries, this could also disrupt compliance audits or limit flexibility with internal processes.

Clear Paths Forward for Legal and Development Teams

To navigate these complexities, legal and development teams should focus on tools that strictly adhere to recognized open source definitions or collaborate closely to review the terms of newer frameworks. Ensuring that frameworks are used compliant and legally securely requires aligning with a company’s open source policies or adjusting them to account for this evolving category of AI tooling.

Proposing a New Approach: “Responsible AI Openness”

Rather than forcing AI models like Llama into the open source category, the industry could benefit from a new framework that upholds core open source values while accommodating AI-specific needs. A “Responsible AI Openness” framework could provide a more realistic and transparent approach by focusing on the following elements:

Transparent Methodology and TDBOM: This approach would mandate disclosure of data methodologies and high-level data attributes without requiring full, unrestricted data access. A TDBOM could enable sufficient reproducibility for researchers and developers to understand and improve AI models while respecting proprietary boundaries.
Open Architecture and Weights: LLMs could be released with access to model architecture, weights, and configuration files, but for non-commercial use, to encourage research and responsible development without opening proprietary components to misuse.
Ethical Usage Requirements: This framework could explicitly encourage ethical usage and establish limits for high-risk applications without restricting core freedoms, balancing responsible use and openness.
Community Governance and Feedback: Similar to open source projects, a community-driven governance model could be implemented to address model evolution, facilitate responsible development, and avoid unintended consequences.

Conclusion: Reimagining Open Source for AI

For AI to benefit from open source’s foundational principles, it needs a framework that respects the unique challenges of LLMs. Calling models like Meta’s Llama 3 “open source” despite restrictions confuses the market and risks diminishing trust in the open source community. Instead, a “Responsible AI Openness” approach, focused on transparency, ethical usage, and responsible distribution, would be better suited to the needs of large language models while preserving the spirit of open source.

By embracing a new licensing model that accommodates AI’s complex requirements, we can foster collaboration, innovation, and transparency without misusing the open source label—paving the way for a more accessible and ethically responsible AI ecosystem.

AI Toolbox

Google Maps Powered By Gemini - More of one to watch but Google announced this week that they will be adding AI capabilities to Google Maps, Google Earth, and Waze.
Ideogram - I am finding Ideogram to be the easiest generative AI tool for creating images it is the most capable, easy-to-use image creation tool

Prompt of the Week

Analyze Voter Choices for Your Area

It’s election season in the United States. There are numerous local and national races, but this prompt can help you compare the candidates' issues using ChatGPT. Of course, you should verify and check the sources, but I was impressed by their accuracy.

How To Use This Prompt

Replace your postal code with your postal code to gather local data. Ensure you see the notification that ChatGPT is searching the web and gathering the latest data.

"I live in Cary, NC (ZIP 27513) and am preparing to vote in the upcoming election. I would like help identifying candidates who align with my values: I lean toward financial conservatism and social liberalism, and I am especially interested in candidates with a track record of successful financial management—either in private business or public service. Please provide an overview of candidates for my area, focusing on their fiscal policies, social views, and any demonstrated financial successes. If possible, also include any available information on their leadership style or key accomplishments."

How did we do with this edition of the AIE?

Your AI Sherpa,

Mark R. Hinkle
Publisher
The AIE Network
Connect with me on LinkedIn
Follow Me on Twitter

Reply

or to participate.