Why Big Tech Is Giving AI Models Away (Almost) for Free: The New Wave of Permissive Licenses

Why Big Tech Is Giving AI Models Away (Almost) for Free: The New Wave of Permissive Licenses

Why Big Tech Is Giving AI Models Away (Almost) for Free: The New Wave of Permissive Licenses

🤖 AI Law & Technology Policy

Why Big Tech Is Giving AI Models Away
(Almost) for Free

Meta, Google, Microsoft and Mistral are releasing some of the world's most powerful AI models under permissive licences — for anyone to use, modify, and deploy. This post explains the real business logic behind the strategy, decodes what "permissive" actually means legally, and sets out the hidden risks for businesses and developers who rely on these models.

New Wave Licences Open Weights vs Open Source Meta LLaMA · Mistral · Gemma EU AI Act GPAI Rules AI Liability & Legal Risk Business Compliance Guide
🔑 Key topics covered in this post
Open weights AI models Meta LLaMA 3 licence Apache 2.0 vs custom licences EU AI Act GPAI obligations AI training data copyright Downstream AI liability Acceptable use policies AI risk management Open-source vs open-weights Business compliance checklist
1
The Open-Source AI Revolution — What Is Actually Happening

In early 2023, the dominant narrative in AI was one of secrecy. OpenAI had released GPT-4 as a fully closed model — no weights, no architecture details, no access without an API key. Google followed a similar path with Gemini. The assumption was that the most powerful AI models would remain proprietary assets, accessible only through controlled commercial interfaces. That narrative collapsed faster than almost anyone predicted.

Within eighteen months, Meta released LLaMA 2 and then LLaMA 3 — models competitive with the best commercial offerings — under licences that allowed free commercial use. Mistral AI released Mistral 7B under Apache 2.0, one of the most permissive software licences that exists. Google released Gemma. Microsoft released the Phi series. Falcon, from the Technology Innovation Institute in Abu Dhabi, became one of the most downloaded models in history. The world went from a handful of gated models to a landscape where dozens of frontier-class AI systems were publicly downloadable.

📅 Key open model releases — 2023 to 2025
🦙
February 2023
Meta LLaMA 1 (7B–65B parameters)
Released for research only — then leaked to 4chan within days, triggering mass public distribution. The leak accelerated the open-source AI movement more than any deliberate release.
Research-only licence (then leaked)
🌐
September 2023
Mistral 7B
Released by Paris-based Mistral AI under Apache 2.0. Outperformed LLaMA 2 13B on most benchmarks at a fraction of the size. Proved that smaller, highly optimised models could match much larger rivals.
Apache 2.0 — fully permissive
🦙
July 2023
Meta LLaMA 2 (7B–70B parameters)
Meta's first intentional commercial release — permitting free use for most commercial purposes with a custom community licence. Downloaded over 30 million times in its first year. Became the foundation for thousands of fine-tuned specialist models.
LLaMA 2 Community Licence (commercial, with restrictions)
💎
February 2024
Google Gemma (2B–7B parameters)
Google's open-weights release built on the Gemini research infrastructure. Released under a custom Gemma licence permitting commercial use with specific restrictions on redistribution and prohibited uses.
Custom Gemma licence
🦙
April 2024
Meta LLaMA 3 (8B–70B, later 405B parameters)
Meta's most powerful open release — LLaMA 3 405B competes directly with GPT-4 class models. Released under the LLaMA 3 Community Licence with commercial rights for most uses. The 405B release marked a new threshold: open-weights models at GPT-4 parity.
LLaMA 3 Community Licence
2024–2025
Microsoft Phi-3 / Phi-4, DeepSeek V3, Qwen 2.5, Falcon 180B
A wave of open-weights releases from Microsoft, China-based labs (DeepSeek, Alibaba's Qwen), and the UAE's Technology Innovation Institute. DeepSeek V3 released under MIT licence at near-GPT-4 performance — sending shockwaves through the AI industry in January 2025.
MIT / Apache 2.0 DeepSeek Licence
🔍 Critical distinction: "Open Weights" vs. "Open Source" — they are not the same thing
✅ True Open Source (OSI definition)
📂
Full source code availableComplete training code, data pipelines, and model architecture published.
📊
Training data disclosedThe datasets used to train the model are publicly documented or accessible.
🔓
No use restrictionsAny use — commercial, research, modification, redistribution — is permitted without conditions.
🏆
ExamplesTruly OSI-compliant AI models are rare; Mistral 7B under Apache 2.0 comes closest, but training data is not disclosed.
⚠️ Open Weights (what most "open" models actually are)
⚖️
Model weights onlyThe trained neural network parameters are published — but not the training code, data, or full pipeline.
🚫
Training data unknownWhat data the model was trained on is not disclosed — creating copyright and liability risks for downstream users.
📋
Acceptable use restrictionsCustom licences typically prohibit specific uses — military, illegal content, certain revenue thresholds — with legal consequences for breach.
🏢
ExamplesMeta LLaMA 2 & 3, Google Gemma, Microsoft Phi — all released as open weights, not open source in the OSI sense.
405B
Parameters in Meta LLaMA 3 — GPT-4 class, publicly downloadable
30M+
Downloads of LLaMA 2 in its first year after release
$100M+
Estimated compute cost to train frontier open-weight models
5,000+
Fine-tuned LLaMA derivatives published on Hugging Face
⚡ The key legal point most businesses miss
When a company describes a model as "open source", it almost certainly means "open weights" — a fundamentally different and legally more complex situation. Open weights models come with contractual licences that impose real obligations on users. Using a Meta LLaMA or Google Gemma model in a commercial product means entering a legal agreement with conditions that most developers never read. Understanding what you agreed to — and what risks you inherited — is the starting point of any serious legal analysis of open AI deployment.
2
Why Big Tech Gives Models Away — The Real Business Logic

When a company spends hundreds of millions of dollars training an AI model and then releases it for free, the question is not whether they are being generous — it is what they gain in return. The open-release strategy is one of the most calculated moves in the technology industry's recent history. Understanding it matters not just for business strategy, but for understanding the legal and regulatory implications that follow.

🔗
Reason 1: Developer Ecosystem Lock-In
When millions of developers build applications on LLaMA or Gemma, they invest time, infrastructure, and institutional knowledge in those specific models. When Meta releases LLaMA 4, those developers have a strong incentive to upgrade within the Meta ecosystem rather than switch to a competitor. The model itself is the loss leader — the ecosystem loyalty is the product. This is the same strategy that made Android the world's dominant mobile operating system: give away the platform, monetise the surrounding infrastructure.
→ The model is free. The ecosystem loyalty is not.
☁️
Reason 2: Cloud Compute Revenue
Running a 70B or 405B parameter model requires significant GPU compute — and that compute is overwhelmingly purchased from AWS, Google Cloud, Microsoft Azure, or (in Meta's case) Meta's own internal infrastructure. When Meta releases LLaMA for free, it simultaneously lists the model on AWS SageMaker, Azure AI Studio, and Google Cloud Vertex AI. Every company that fine-tunes or deploys LLaMA at scale pays cloud providers for the GPU time. The model is the advertisement; the cloud bill is the revenue. For Google, releasing Gemma as open weights while simultaneously running Gemini commercially is a strategy to capture both markets simultaneously.
→ Free weights + expensive compute = profitable strategy.
🏆
Reason 3: Talent Attraction and Research Credibility
The world's best AI researchers want to work on models that the research community can study, build on, and critique. A company that releases models openly builds academic credibility, attracts top-tier research talent, and receives a flood of external contributions — bug reports, fine-tuning improvements, safety research, and benchmark results. Meta's open release strategy has made it one of the most respected names in AI research circles, despite being a social-media company at its core. The reputational return on a model release is significant and hard to quantify, but very real.
→ Open releases convert research talent into corporate assets.
⚖️
Reason 4: Regulatory Positioning and Antitrust Defence
In a regulatory environment where OpenAI, Google, and Microsoft are under intense antitrust scrutiny for potentially monopolising AI infrastructure, releasing models openly is a powerful counter-narrative. Meta has explicitly framed its open-release strategy in regulatory terms: if the best models are freely available, no single company can be said to control AI. This argument has real weight in Washington, Brussels, and London. By commoditising the model layer, the dominant players also make it harder for regulators to draw a clear line between dominant and non-dominant actors — because anyone can, in theory, run the same model.
→ "Open" is a regulatory shield as much as a technical choice.
💥
Reason 5: Weakening Competitors by Commoditising the Model Layer
If a company's primary competitive asset is its proprietary model — as is the case for OpenAI — then a strategy that makes equally powerful models freely available destroys that asset's value. Meta has no meaningful AI API business to protect. OpenAI does. By releasing LLaMA 3 at GPT-4 parity, Meta does not directly compete with OpenAI — it undermines the entire premise that you need to pay for a frontier model. This is sometimes called "strategic commoditisation": making your competitor's core product free, so the competitive battleground shifts to a layer where you have more advantages (in Meta's case, advertising infrastructure and social graph data).
→ Make the competition's product worth zero — then compete elsewhere.
🧪
Reason 6: Outsourcing Safety Research and Red-Teaming
Finding every way a large language model can be manipulated, jailbroken, or misused requires testing at massive scale. No internal team can match the creative adversarial pressure of thousands of external researchers, security professionals, and — inevitably — malicious actors. By releasing models openly, companies receive an enormous volume of real-world safety feedback that would otherwise cost tens of millions to generate internally. The safety improvements that result flow back into future proprietary and open releases alike. The open model is, in a meaningful sense, a globally crowdsourced red-teaming exercise.
→ Global users find safety failures that internal teams miss.
📊 Who gains what from the open-release strategy
Stakeholder What they gain What they give up Net position
Meta (LLaMA) Ecosystem lock-in, regulatory goodwill, research talent, competitive disruption of OpenAI Model weights (which cost $100M+ to train) Strong positive
Google (Gemma) Developer mindshare, cloud compute revenue, ability to compete in both open & closed segments Some model capability disclosure Positive
Microsoft (Phi) Azure cloud consumption, enterprise developer adoption, research positioning Small model weights (Phi is relatively compact) Positive (low cost)
DeepSeek Global talent attraction, credibility, geopolitical soft power for Chinese AI Model weights and architectural insights Strategic — long-term gain unclear
Startups & developers Frontier AI capability without API costs; ability to fine-tune for niche use cases Must manage compliance, compute, and licence obligations independently Mixed — benefit with hidden costs
OpenAI / Anthropic Nothing — they lose the premium pricing power of proprietary frontier models Revenue from API access as open alternatives approach parity Negative pressure
🔥 The paradox of "free" AI
The models are free. The compute to run them is not. The legal advice to use them safely is not. The compliance infrastructure to deploy them responsibly under the EU AI Act is not. The liability when something goes wrong is not. Businesses that approach open AI models as genuinely cost-free — without factoring in the operational, legal, and regulatory overhead — typically discover the true cost at the worst possible time: during an incident, a regulator inquiry, or an investor due diligence.
💡 What this means for businesses evaluating open AI models
Understanding why a company released a model openly is directly relevant to assessing the risks of using it. Models released primarily as competitive disruption tools (like DeepSeek) may have different long-term support and update commitments than models released as ecosystem builders (like Meta LLaMA). The licence, the company's track record, the jurisdictional origin of the model's developer, and the training data provenance are all factors that should inform enterprise adoption decisions — not just benchmark performance scores.
3
Decoding AI Licences — What "Permissive" Really Means and What It Doesn't

The word "permissive" in an AI licence context is doing a great deal of legal work — and businesses that take it at face value are taking on significant legal exposure. A permissive software licence like Apache 2.0 means something very specific and well-understood in software law. But most AI model releases do not use Apache 2.0 — they use custom licences that borrow the feel of permissiveness while containing restrictions that can materially affect how the model can be used commercially. Here is what each major licence actually says, in plain language.

Apache 2.0 — The True Permissive Standard (Mistral 7B, Falcon 40B, Phi-2)
Fully permissive
✅ What you can do
Commercial use without restriction or royalty
Modify the model weights and architecture
Redistribute original or modified versions
Sub-license to customers and partners
Integrate into proprietary products and SaaS
❌ What you cannot do
Remove copyright notices or attribution
Use contributors' names to endorse your product
Hold contributors liable for damages (no warranty)
⚠️ Important caveats
!
Training data not disclosed — copyright risk inherited from unknown sources
!
No warranty on model outputs — all liability passes to the deployer
!
Does not grant rights to use the model creator's trademarks or brand
!
Regulatory obligations (EU AI Act) apply regardless of licence type
🦙
Meta LLaMA 3 Community Licence — The "Almost Free" Licence
Custom — commercial with restrictions
✅ What you can do
Commercial use including in paid products and services
Fine-tune and create derivative models
Distribute fine-tuned versions under compatible terms
Use for research and development purposes
❌ What you cannot do
700M MAU threshold: companies with over 700 million monthly active users must obtain a separate licence directly from Meta
Use the "Llama" brand name in product names
Train other AI models using LLaMA outputs (model distillation)
All uses prohibited under the Acceptable Use Policy (AUP)
⚠️ Critical watch-points
!
The AUP is long and includes broad prohibitions — military, surveillance, disinformation, and more
!
Derivative models must include LLaMA 3 attribution
!
Breach of AUP terminates licence automatically with no cure period
!
US law governs; California courts have jurisdiction
💎
Google Gemma Terms of Use — Restrictive Despite Appearances
Custom — significant restrictions
✅ What you can do
Use for research, development, and commercial applications
Fine-tune the model for specific use cases
Distribute applications built on the model
❌ What you cannot do
Use to train or improve competing AI foundation models
Circumvent Google's usage policies or safety filters
Use "Gemma" or "Google" branding in your product
All prohibited uses under the extensive prohibited use list
⚠️ Critical watch-points
!
Google can unilaterally update the terms — continued use constitutes acceptance
!
Prohibited use list significantly broader than Apache 2.0
!
Google reserves right to audit compliance — unusual for an "open" model
🔬
DeepSeek Licence — MIT-Style But with Jurisdictional Complexity
Custom — jurisdiction risk
✅ What you can do
Commercial use, modification, and redistribution
Fine-tune and deploy in production environments
Build derivative models and products
❌ What you cannot do
Use outputs to train competing foundation models
Deploy in ways violating applicable law (including Chinese law for some users)
Misrepresent or remove attribution
⚠️ Critical watch-points
!
Chinese jurisdiction — export control implications for US/EU businesses under ITAR, EAR, and EU dual-use regulations
!
Data privacy risks: API use sends data to servers in China; self-hosted weights do not — but the distinction matters for GDPR
!
Potentially subject to Chinese data security law obligations
⛔ Acceptable Use Policy (AUP) — the prohibitions most users never read
Common AUP prohibitions across all major licences
🚫
Generating content that sexually exploits minors (CSAM)
🚫
Creating weapons of mass destruction, biological, chemical, nuclear, or radiological weapons
🚫
Building cyberweapons or malicious code intended to cause damage
🚫
Undermining election integrity or generating mass political disinformation
🚫
Enabling illegal surveillance, tracking, or profiling of individuals
🚫
Generating defamatory, harassing, or abusive content at scale
Less obvious prohibitions that catch businesses off guard
⚠️
Military applications — explicitly prohibited in LLaMA licences; unclear scope for dual-use defence contractors
⚠️
Autonomous decisions in high-stakes contexts — medical diagnosis, criminal sentencing, financial decisions without human oversight
⚠️
Impersonation at scale — building chatbots that claim to be human without disclosure may breach AUP and consumer protection law
⚠️
Model distillation — using a LLaMA model's outputs to train or improve another AI model is explicitly prohibited across most major licences
⚠️
Revenue threshold triggers — LLaMA 3's 700M MAU clause catches fast-growing platforms before they have time to negotiate a separate licence
⚠️ Automatic licence termination — the clause businesses miss
Most custom AI model licences contain automatic termination clauses: if you breach the Acceptable Use Policy or any core licence term, your right to use the model ends immediately — with no notice period and no opportunity to cure the breach. This means a company that has built a production product on LLaMA could theoretically lose the right to operate that product overnight if a compliance issue is identified. This is not a theoretical risk — it has commercial implications for enterprise software, SaaS products, and any business where the AI component is central to the value proposition.
💡 Practical licence selection guidance
For enterprise deployment with minimal legal risk, Apache 2.0 models (Mistral, certain Falcon versions) offer the cleanest contractual position. For frontier capability with manageable risk, LLaMA 3 is generally acceptable for most commercial applications if the AUP is carefully reviewed. For regulated industries (healthcare, finance, critical infrastructure), any open model deployment requires a specific legal opinion on whether the use case falls within or outside the AUP prohibitions — particularly around autonomous high-stakes decisions. Avoid DeepSeek API (vs self-hosted) for any data subject to GDPR, and assess Chinese export control implications before enterprise deployment of self-hosted DeepSeek.
4
Legal Grey Zones — Copyright, Training Data and Downstream Liability

Open AI model licences are remarkably good at defining what the model provider will allow you to do. They are remarkably poor at clarifying who bears liability when the model does something harmful, whether the training data was lawfully obtained, and what happens to the intellectual property in outputs the model generates. These are not gaps in the licence by oversight — they are deliberate allocations of risk away from the model provider and onto the deployer. Understanding what the licence does not say is as important as reading what it does.

📚
Training Data Copyright
None of the major open-weight model providers fully disclose what data was used to train their models. Meta's LLaMA technical papers describe training data in broad categories; the specific datasets and their copyright status are not provided. This matters because if a model was trained on copyrighted material without authorisation, the copyright holder's claims may extend to outputs generated from that training — and the deployer of the model may inherit that exposure.
Risk for deployers: Content generated by open models may infringe third-party copyright if the model memorised and reproduces protected expression. The deployer — not Meta or Mistral — is the entity with a customer relationship and the one most likely to face infringement claims.
⚖️
Output Ownership and IP Rights
Who owns the text, code, image descriptions, or analysis that an open model generates? The licence does not answer this question — because it cannot under current law. Most jurisdictions do not recognise AI-generated content as having a human author, which means it cannot attract copyright protection. The practical consequence is that competitors can freely copy AI-generated content from your products, and you cannot prevent them through copyright enforcement.
Risk for deployers: AI-generated content in products and marketing materials may be unprotectable as intellectual property. This affects valuation, investment due diligence, and competitive moats built on AI-generated assets.
🏥
Harmful Output Liability
If a deployed open model provides harmful medical advice, generates defamatory content about a real person, or produces outputs that cause financial harm to a user, who is legally responsible? The model licence contains full "as is" disclaimers and holds the original developer harmless. The deployer — the company that built the product — is the party with a terms-of-service relationship with end users, and is therefore the primary exposure point for any resulting liability claims.
Risk for deployers: Product liability for AI output harm sits with the deployer. This exposure is new, evolving rapidly in case law, and not yet clearly defined in most jurisdictions — making it hard to quantify but impossible to ignore.
🔒
Data Privacy in Fine-Tuning
Many businesses fine-tune open models on their own proprietary data — customer interaction logs, internal documents, or domain-specific datasets. This fine-tuning process can cause models to memorise sensitive data and reproduce it in outputs accessible to other users. This is not a theoretical risk: researchers have demonstrated that fine-tuned LLMs reproduce verbatim passages from their training data under adversarial prompting.
Risk for deployers: Fine-tuning on personal data without appropriate technical safeguards may breach GDPR data minimisation and purpose limitation principles. Memorisation of training data in a shared deployment creates serious data protection liability.
🌍
Jurisdictional Gaps in the Licence
Major AI model licences are drafted under US law. When a European company deploys a US-licensed open model to serve EU customers, the licence governs one part of the relationship — but EU consumer protection law, the EU AI Act, GDPR, and sector-specific regulations govern the product itself, regardless of what the licence says. There is no opt-out from mandatory regulatory law by reference to a US-governed contract.
Risk for deployers: Compliance with the licence does not mean compliance with EU law. EU-based businesses must layer EU regulatory obligations on top of, not instead of, their contractual licence obligations.
🔗
Supply Chain Liability for Fine-Tuned Derivatives
The open-weights ecosystem has created thousands of publicly available fine-tuned derivative models built on LLaMA and Mistral bases. Businesses that use these third-party fine-tuned models (rather than official releases) inherit both the original model's licence obligations and whatever additional risks the fine-tuner introduced — including unverified training data, removed safety filters, or modified behaviour that violates the original AUP.
Risk for deployers: Using community fine-tuned models downloaded from Hugging Face without due diligence creates compounded licence and safety risks that are often invisible until a specific incident.
⛓️ The AI liability chain — where responsibility actually sits
🏢 Model Developer
(Meta, Mistral, Google)
Releases model weights with full warranty disclaimers. Accepts no liability for model outputs, downstream uses, or harms. Holds termination rights if AUP is breached. Maintains reputational risk and faces regulatory scrutiny as GPAI provider under EU AI Act.
Contractually protected
Regulatory exposure (GPAI)
🔧 Fine-Tuner / Adapter
(Hugging Face community, B2B model vendors)
Modifies the base model for specific use cases. Inherits base licence obligations; adds its own licence terms. May introduce new risks through modified behaviour, different training data, or removed safety filters. Rarely has contractual liability to end deployers.
Partial liability — often unaccountable
🏗️ Deploying Business
(SaaS companies, enterprises, developers)
Integrates the model into a product and deploys to end users. Has the direct customer relationship. Responsible for product safety, AUP compliance, GDPR obligations, EU AI Act deployer duties, and user harm under consumer protection law. Bears the primary legal exposure in any incident involving model outputs.
Primary liability exposure
👤 End User
(consumers, B2B customers)
Uses the AI-powered product under the deployer's terms of service. May bear responsibility for misuse if terms clearly prohibit it — but consumer protection law limits the deployer's ability to disclaim liability for harms to individual users.
Limited liability (consumer protection applies)
📋 What "AS IS, WITHOUT WARRANTY OF ANY KIND" actually means for your business
What the licence clause says
Every major open AI model licence includes a standard disclaimer in capitals: the model is provided "AS IS" without any warranty of merchantability, fitness for a particular purpose, non-infringement, or accuracy. The model provider accepts no liability for direct, indirect, incidental, or consequential damages arising from use of the model.
What this means in practice
If LLaMA 3 generates factually wrong information that causes your customer financial loss, Meta has no obligation to you. If the model produces outputs that infringe a third party's copyright, Meta bears no liability. If the model fails in a critical application, your contractual recourse against Meta is essentially zero. All of this risk sits with the deploying company — and ultimately flows to the product's users.
⚠️ The training data copyright litigation wave is coming
The New York Times v. OpenAI lawsuit, Getty Images v. Stability AI, and dozens of author class actions in the US and UK signal a clear direction: courts are increasingly willing to scrutinise whether training data was lawfully obtained. While these actions currently target the model developers, a successful ruling establishing that training on copyrighted data creates infringement liability could have downstream implications for businesses that deploy the resulting models. Legal counsel should monitor this litigation closely — the risk is not resolved by the current licence terms.
💡 Managing grey zone risk in practice
The legal grey zones around open AI models are not reasons to avoid them — they are reasons to deploy them with appropriate legal and technical risk management. This means: obtaining a legal opinion on training data risk for your specific use case; implementing output filtering and human review for high-stakes applications; documenting your due diligence on the model licence and AUP; ensuring your terms of service allocate liability appropriately to end users; and maintaining a model change log to track which version and fine-tune is in production at any point in time.
5
EU AI Act and the GPAI Model Rules — What Open-Source Providers Must Still Do

The EU AI Act, which entered force in August 2024 with GPAI obligations applying from August 2025, creates a new regulatory category specifically for large AI models: General Purpose AI (GPAI) models. This category directly affects the most powerful open-weight models — LLaMA 3 405B, Mistral Large, and any model with systemic risk potential. Critically, the EU AI Act provides only a partial exemption for open-source models — and most businesses operating in the EU will find that the exemption does not eliminate their compliance obligations, because they are deployers, not providers.

🏛️ The EU AI Act open-source GPAI exemption — what it covers and what it doesn't
✅ What the open-source exemption removes
Technical documentation obligation (Art. 53(1)(a))Open-weight GPAI providers are not required to prepare full technical documentation as non-open-source GPAI providers must.
Information provision to downstream providers (Art. 53(1)(b))Partial exemption from the obligation to provide detailed information to businesses building on the model.
Copyright compliance policy publication (Art. 53(1)(c))Open-source providers are partially exempt from the requirement to publish a detailed copyright compliance policy.
Training data summary publication (Art. 53(1)(d))The obligation to publish summaries of training data content is reduced for open-source providers — though not eliminated.
❌ What the exemption does NOT remove
Systemic risk obligations (Art. 55)If a GPAI model exceeds the 10²⁵ FLOP training threshold, ALL systemic risk obligations apply regardless of whether it is open-source or not — including adversarial testing, incident reporting, and cybersecurity measures.
Deployer obligationsBusinesses that deploy open GPAI models in EU-facing products are subject to the full deployer obligation framework — risk management, transparency, human oversight — regardless of the provider's exemption status.
Prohibited use restrictions (Art. 5)The prohibited AI practices — social scoring, real-time biometric surveillance, manipulation — apply to any AI system regardless of the model's licence or open-source status.
High-risk AI system requirementsIf the deployment constitutes a high-risk AI system (Annex III), all high-risk obligations apply — data governance, conformity assessment, registration — irrespective of whether the underlying model is open-source.
📋 Article 53 — GPAI provider obligations that apply to open-weight model developers
Art.
53(1)(a)
Technical Documentation
GPAI model providers must draw up and keep up-to-date technical documentation before placing the model on the EU market. For non-open-source models, this must be comprehensive. For open-source, reduced requirements apply — but some documentation is still required if systemic risk applies.
Provider obligation Reduced for open-source
Art.
53(1)(c)
Copyright Compliance Policy
GPAI providers must implement a policy to comply with EU copyright law, specifically regarding text and data mining under the Digital Single Market Directive. Providers must be able to demonstrate that training data was either licensed, in the public domain, or used under an applicable exception.
Provider obligation
Art.
55(1)(a)
Adversarial Testing (Systemic Risk Models Only)
GPAI models with systemic risk must undergo adversarial testing — red-teaming — to identify and mitigate risks at the model level before and after deployment. For open-source models above the systemic risk threshold (10²⁵ FLOPs), this obligation applies in full. This is a significant cost and capability requirement.
Provider obligation Systemic risk models only
Art.
55(1)(b)
Serious Incident Reporting
Systemic-risk GPAI model providers must report serious incidents or malfunctions to the European AI Office without undue delay. This creates an ongoing operational obligation — providers of open models cannot disclaim awareness of downstream incidents if they are monitoring deployment at scale.
Provider obligation Systemic risk models only
⚠️ The systemic risk threshold — which models does it catch?
10²⁵ FLOPs
The EU AI Act designates GPAI models trained with more than 10²⁵ floating-point operations (FLOPs) as presenting "systemic risk" — triggering the enhanced obligations under Article 55. This threshold is designed to capture the most powerful frontier models, not smaller open-weight releases. Current assessment of where major models fall:
⬆ GPT-4: likely above (OpenAI does not disclose) ⬆ Gemini Ultra: likely above threshold ? LLaMA 3 405B: borderline — compute not fully disclosed ⬇ LLaMA 3 8B / 70B: well below threshold ⬇ Mistral 7B: well below threshold ⬇ Phi-3 / Phi-4: below threshold
🏗️ What EU AI Act deployer obligations apply to businesses using open models
📋
Risk Management System
Deployers of high-risk AI systems must implement a risk management system covering the full lifecycle. Where an open model is used in a high-risk context (HR, education, critical infrastructure, law enforcement), this obligation applies in full.
👁️
Human Oversight Measures
High-risk AI deployments must include human oversight capable of monitoring outputs, intervening, and switching the system off. Deployers using open models in high-risk settings must build this capability into their product architecture.
📊
Transparency to End Users
AI systems that interact with natural persons must disclose that the user is interacting with AI. Chatbots and voice assistants built on open models must include clear disclosure — irrespective of whether the underlying model provider does so.
🔍
Fundamental Rights Impact Assessment
Deployers of high-risk AI in certain public and regulated contexts (banks, insurers, public bodies) must conduct a Fundamental Rights Impact Assessment (FRIA) before deployment. This applies regardless of whether the model is open or proprietary.
🔗
Supply Chain Due Diligence
Deployers must exercise due diligence on the AI components they use — including open models. This means verifying that the model's use in your specific context is consistent with the provider's documentation and intended use cases.
📝
EU AI Act Registration
High-risk AI systems must be registered in the EU AI Act database before deployment. Deployers using open models in Annex III high-risk categories are responsible for registration — the model provider's open-source status does not transfer this obligation.
💡 The practical takeaway on EU AI Act and open models
For most EU-based businesses using open AI models in commercial products, the open-source status of the underlying model provides very limited regulatory relief. The deployer is responsible for ensuring the product meets EU AI Act requirements — including risk management, human oversight, transparency, and (where applicable) fundamental rights impact assessment. The model licence and the regulatory compliance framework are parallel obligations, not alternatives. Legal counsel with combined AI law and data protection expertise is essential for any business deploying open models in EU-facing products. Learn more about AI risk and liability →
6
What This Means for Businesses — A Legal Due Diligence Checklist

The open AI model landscape offers extraordinary opportunities: frontier-class AI capability at a fraction of the cost of proprietary API access, full customisability through fine-tuning, and freedom from vendor lock-in. But the legal and regulatory overhead that comes with deployment is real, growing, and often underestimated. The five questions below are the starting point for any serious legal due diligence on open AI model deployment — and the checklist that follows should be completed before any open model goes into production.

Q1
Does the model licence actually permit your specific intended use?
Many businesses assume that "commercial use permitted" in a model licence covers all commercial applications. It does not. Check the specific Acceptable Use Policy for your use case — military, medical diagnosis, financial advice, autonomous decision-making, and political content all carry specific restrictions across most major licences. The AUP, not the headline licence type, is what matters for compliance.
→ Read the AUP in full; obtain a written legal opinion for regulated industry use cases
Q2
Does your deployment constitute a high-risk AI system under the EU AI Act?
If your product falls within the Annex III high-risk categories — HR and recruitment, educational assessment, credit scoring, biometric identification, law enforcement, critical infrastructure — the full EU AI Act high-risk regime applies regardless of which model you use. The open-source status of the underlying model provides no exemption from deployer obligations in high-risk contexts.
→ Map your product against Annex III; implement risk management system if required
Q3
What data are you using to fine-tune the model, and is your use GDPR-compliant?
Fine-tuning on personal data creates GDPR obligations that do not disappear when the fine-tuning is complete. The model may memorise training data and reproduce it in outputs — constituting ongoing processing of personal data. This is a material compliance issue that requires a lawful basis, data minimisation controls, and technical safeguards to prevent inadvertent reproduction of personal data in model outputs.
→ Conduct GDPR impact assessment for fine-tuning; implement output filtering and access controls
Q4
Who in your organisation is responsible when the model causes harm?
Open model licences place all liability on the deployer. This needs to be reflected in your internal governance: a named product owner responsible for AI risk, a documented escalation process for AI incidents, and a response plan for scenarios including harmful outputs, data breaches from model memorisation, and regulatory inquiries. Without assigned accountability, you have a compliance gap that regulators will identify during any audit.
→ Appoint an AI risk owner; document incident response procedures before deployment
Q5
Is the jurisdictional origin of the model a compliance risk for your business?
For models developed in China (DeepSeek, Qwen), using the API (not self-hosted) involves data transfers to Chinese-jurisdiction servers — raising GDPR and data sovereignty concerns. Self-hosting the weights avoids this, but export control regulations may apply to certain deployment contexts. For EU and US government or defence contractors, the origin of the model's training data and developer jurisdiction are relevant to procurement and security compliance requirements.
→ Map model origin against your data residency requirements; obtain export control opinion for sensitive contexts
✅ Pre-deployment compliance checklist for open AI model deployment
Legal & Contractual
Licence reviewFull text of model licence + AUP reviewed by legal counsel for your specific use case
EU AI Act classificationProduct assessed against Annex III; high-risk obligations mapped where applicable
Liability allocationTerms of service updated to reflect AI use; liability limitations clearly stated to end users
Copyright opinionLegal opinion obtained on training data copyright risk for your specific industry context
Jurisdictional complianceModel origin and data routing assessed against GDPR, export controls, and data residency requirements
Governance & Technical
AI governance policyInternal policy documenting permitted AI tools, use cases, and approval process for new model deployments
Human oversight mechanismDefined process for human review of high-stakes AI outputs before they reach end users
Output monitoringAutomated monitoring for harmful, inaccurate, or policy-violating outputs; incident log maintained
Model version controlRegistry of which model version and fine-tune is in production; change management process documented
AI transparency disclosureUsers informed they are interacting with AI; no claims of human identity in AI interactions
🚩 Signals that you need specialist AI legal counsel immediately
🏥
Regulated industry deployment
Healthcare, financial services, insurance, or legal — sector-specific regulation layers on top of AI Act obligations.
🇨🇳
Using Chinese-origin models
DeepSeek, Qwen, or similar models — GDPR, export control, and data sovereignty implications require assessment.
📊
Autonomous decision-making
AI making or heavily influencing decisions about individuals — credit, employment, benefits, healthcare — triggers high-risk obligations.
🎯
Fine-tuning on customer data
Training or fine-tuning on data that includes personal information about your customers or users creates GDPR processing obligations.
🌍
Cross-border EU deployment
Products serving EU residents trigger EU AI Act obligations regardless of where the company or server infrastructure is located.
💼
Investor due diligence upcoming
Investors increasingly conduct AI compliance due diligence — undocumented AI deployments create deal friction and valuation risk.
⚖️ AI Risk & Liability Legal Advisory
Using open AI models in your product? Get the legal layer right.
WCR Legal's AI law practice advises businesses on the full spectrum of legal risks arising from open and proprietary AI model deployment — from licence compliance and copyright exposure to EU AI Act classification, risk management system design, and AI governance frameworks. We work with technology companies, financial institutions, and regulated businesses navigating the legal complexity of the open AI model ecosystem.
AI model licence review & AUP compliance
EU AI Act classification & deployer obligations
GPAI regulatory analysis
AI-related GDPR & data protection
Copyright & training data risk assessment
AI governance policy design
AI incident response frameworks
Investor AI due diligence support

Oleg Prosin is the Managing Partner at WCR Legal, focusing on international business structuring, regulatory frameworks for FinTech companies, digital assets, and licensing regimes across various jurisdictions. Works with founders and investment firms on compliance, operating models, and cross-border expansion strategies.