What Happens to the License When You Fine‑Tune a Model

What Happens to the License When You Fine‑Tune a Model

What Happens to the License When You Fine‑Tune a Model

AI Model Licensing · Fine-Tuning

What Happens to the License When You Fine‑Tune a Model?

Fine-tuning a model does not create a blank-slate licence. The original model's terms follow your adapted weights, LoRA adapters, and any service built on top of them — often in ways developers do not anticipate until distribution or fundraising.

Llama 3 Gemma Mistral LoRA adapters Derivative work Downstream obligations Weight ownership Dataset licence mixing

In this guide

Introduction — The Licence Does Not Reset When You Fine-Tune

Fine-tuning is widely understood as a technical process: you take a pre-trained model, continue training on a curated dataset, and the result is a model better suited to your use case. What is less well understood is the legal effect of that process. The fine-tuned model, the LoRA adapter, and any product built on either of them carry the original model's licence forward — often with additional obligations triggered by the fine-tune itself.

The assumption that further training creates a new, independently owned model with a clean licence is one of the most common — and most commercially significant — mistakes in AI product development. It surfaces at the worst possible moments: when distributing a fine-tuned model publicly, when closing an enterprise contract that requires licence warranties, and during M&A due diligence when a buyer's legal team reviews the IP chain of a product's core model.

Three Common Misconceptions About Fine-Tuning and Licences

Misconception

"Fine-tuning creates a new model I own outright"

Fine-tuning modifies the base model's weights — it does not create an independent model. The fine-tuned weights are, at minimum, a derivative work that remains bound by the base model's licence. Ownership of the fine-tune is constrained by what the base licence permits you to own and distribute.

Misconception

"LoRA adapters are just training data — they're not a model"

LoRA (Low-Rank Adaptation) adapters contain learned weight modifications specific to the base model. They are not standalone — they only function when loaded alongside the base model. Distributing a LoRA adapter effectively distributes a modified version of the base model, and the base licence's distribution rules apply.

Misconception

"I'm using it as a service — the licence is irrelevant"

Running a fine-tuned model behind an API may avoid licence provisions that are triggered only by distribution of weights. But use-case restrictions, flow-down obligations, and competitor clauses in licences like Llama 3 and Gemma apply regardless of deployment method — API, SaaS, or embedded product.

Where the Legal Questions Cluster

Questions with clearer answers
Does the base model's licence bind the fine-tune? Yes — for all major model licences.
Do use-case restrictions apply to API products? Yes — Llama 3 and Gemma AUP/PUP apply regardless of deployment.
Can Mistral fine-tunes be distributed under any licence? Yes — Apache-2.0 permits relicensing of derivatives.
Must Gemma fine-tunes pass the PUP to downstream users? Yes — the flow-down obligation is explicit.
Questions without settled answers
Is a fine-tuned model a "derivative work" under copyright law? Not resolved — the legal definition is unsettled.
Who owns the new weights created by fine-tuning? Depends on base licence, training data, and jurisdiction.
Does combining two model licences create a conflict? Possible — but no authoritative case law exists yet.
When does a dataset licence bind the model trained on it? No consensus across major jurisdictions.

Four Factors That Determine Licence Exposure After Fine-Tuning

📄
Base model licence type

Apache-2.0, custom AUP (Llama/Gemma), or proprietary — each creates a different baseline

🔁
Fine-tuning method

Full fine-tune vs LoRA adapter vs RLHF — affects whether weights are considered a derivative

🚀
Deployment model

Internal use, weights distribution, API service, or embedded in a product — each triggers different clauses

📦
Training dataset licence

CC-BY, CC-BY-SA, proprietary, or scraped data — each carries potential obligations to the resulting model

⚖️

Note on IP ownership: The question of who owns the weights of a fine-tuned model intersects with broader AI IP ownership questions — including whether training outputs are protectable as copyright and how ownership is structured for products built by teams using multiple base models. For background on AI IP ownership frameworks, see AI IP Ownership — wcr.legal.

Section 1 — How the Original Model Licence Applies to Fine-Tunes and LoRA Adapters

The moment you start fine-tuning a model, the base model's licence governs what you can do with the result. The three licences that matter most for commercial AI development — Meta's Llama 3 Community License, Google's Gemma Terms of Use, and the Apache-2.0 licence covering Mistral models — handle fine-tuned derivatives differently, with implications that reach from weight distribution through to enterprise product agreements.

🦙

Llama 3 — Community License

Derivative works permitted under the Llama 3 licence only

The Llama 3 Community License explicitly addresses fine-tuned derivatives. A model produced by fine-tuning Llama 3 is a "Llama 3 derivative" and must itself be distributed under the Llama 3 Community License — not under Apache-2.0, MIT, or any other licence. This means every downstream user of your fine-tune inherits the same restrictions you are subject to: the Acceptable Use Policy, the 700M MAU threshold, the competitor restriction, and the training ban.

Practically, this limits your commercialisation options. You cannot take a Llama 3 fine-tune and distribute it as if it were your proprietary model with clean IP — the licence follows the weights. However, running the fine-tuned model as a closed API service (where you do not distribute the weights) is permitted under the licence, subject to the use-case restrictions continuing to apply to that service.

Fine-tune permitted

Yes — all methods

Derivatives licence

Llama 3 only

Relicensing

Not permitted

API service

Permitted

AUP/PUP binding

Yes — all derivatives

700M clause

Applies to fine-tunes

💎

Gemma — Terms of Use

Flow-down obligation requires PUP transmission to all downstream recipients

Gemma's Terms of Use treat fine-tuned models as derivatives bound by the same terms. The most commercially significant implication is the flow-down obligation: if you distribute a Gemma fine-tune or run a product built on one, you must ensure that your downstream users — including enterprise clients receiving an API service — operate within the Prohibited Use Policy. This transforms a licensing obligation into a contract management obligation at every layer of your distribution chain.

Google retains unilateral termination rights for any breach of the Terms of Use — a power that extends to breaches by your downstream users if you have not adequately implemented the flow-down. The combination of flow-down obligation and unilateral termination means a compliance failure by one of your enterprise clients could, in theory, expose your own licence to termination.

Fine-tune permitted

Yes — all methods

Derivatives licence

Gemma ToU

Flow-down required

Yes — to all users

API service

Permitted + PUP applies

Termination right

Google retains

Terms updatable

Yes — continued use = accept

🌀

Mistral — Apache-2.0

Derivatives may be distributed under any licence including proprietary

Mistral's publicly released models (Mistral 7B, Mixtral 8x7B) are distributed under Apache-2.0 — the most permissive framework for fine-tuning. Apache-2.0 permits derivative works to be distributed under any licence, including a proprietary licence that closes the fine-tuned weights entirely. There is no flow-down obligation, no use-case restriction, and no scale threshold. The base licence includes a patent grant covering the licensed code.

The practical implication for product development is significant: a Mistral fine-tune can be proprietary, can be distributed under a custom licence, and can be transferred in an M&A transaction without the licence chain issues that arise with Llama 3 or Gemma derivatives. This is the reason Apache-2.0 models are consistently preferred in enterprise and regulated industry product stacks where IP clarity is a procurement or investment requirement.

Fine-tune permitted

Yes — all methods

Derivatives licence

Any — incl. proprietary

Flow-down required

None

API service

Permitted, unrestricted

Patent grant

Included

Terms updatable

No — irrevocable grant

🔗

LoRA Adapters — Are They Covered by the Base Model Licence?

Why adapter files are not independent of the base model's terms

What makes a LoRA adapter a derivative

A LoRA (Low-Rank Adaptation) adapter is a set of lightweight weight matrices trained to modify a specific base model's outputs. The adapter has no standalone function — it must be merged with or loaded alongside the base model weights to produce any output. From a licence perspective, distributing a LoRA adapter for a Llama 3 or Gemma model is functionally equivalent to distributing a modified version of those weights, because the adapter only exists in relationship to the base model.

What this means in practice

Sharing a LoRA adapter on Hugging Face, GitHub, or via a download link triggers the base model's distribution provisions — the same rules that would apply to sharing full fine-tuned weights. For Llama 3, the adapter must be accompanied by the Llama 3 licence. For Gemma, the flow-down obligation applies to anyone who receives and uses the adapter. For Mistral/Apache-2.0, the adapter can be shared under any licence. Treating LoRA adapters as "just training artefacts" rather than distributed model derivatives is a common compliance gap.

Fine-Tuning Method vs Licence Coverage — Summary Matrix

Fine-tuning scenario
Llama 3
Gemma
Mistral (Apache)
Full fine-tune — internal use only
Permitted
Permitted
Permitted
LoRA adapter — internal use only
Permitted
Permitted
Permitted
Distribute full fine-tuned weights publicly
Llama lic. only
Gemma ToU + PUP
Any licence
Distribute LoRA adapter publicly
Llama lic. only
Gemma ToU + PUP
Any licence
Run fine-tune as commercial API service
AUP + 700M applies
PUP + flow-down
Unrestricted
Relicense fine-tune under proprietary terms
Not permitted
Not permitted
Permitted
Use fine-tune to train another non-derivative model
Prohibited
Prohibited
Permitted

Section 2 — Internal Use vs Distributing Your Model or Service

The distinction between using a fine-tuned model internally and making it available to others — whether as downloadable weights, a deployed API, or an embedded product — is the single most important variable in determining which licence obligations are activated. For some provisions, internal use creates no obligation at all. For others, the restriction applies regardless of whether anyone outside your organisation ever touches the model.

Understanding exactly where the "distribution trigger" falls for each licence requires treating each deployment scenario separately. The same fine-tuned model can move from zero compliance obligations (internal research) to significant downstream obligations (public release) with a single deployment decision.

Internal use — obligations at minimum

Running fine-tuned weights within your own infrastructure

When a fine-tuned model is used exclusively within your organisation — for internal tooling, research, evaluation, or employee-facing products — the licence provisions that are triggered by distribution do not apply. For Llama 3, the weight distribution rules, relicensing restrictions, and derivative model disclosure requirements only activate when you share the weights or product with external parties.

However, use-case restrictions are not gated by distribution. The Llama 3 competitor clause and Gemma's Prohibited Use Policy apply to internal use just as they apply to deployed products. Running a fine-tuned Gemma model to assist with tasks that fall within the PUP prohibition — even for internal employees — constitutes a breach of the licence.

Key boundary: "Internal" means employees and contractors working under your organisation's supervision, on your infrastructure, for your organisation's purposes. Sharing a model with a subsidiary, joint venture partner, or outsourced team may constitute distribution depending on the licence's definition of "affiliate".

Distribution — full obligation stack activated

Making weights or model outputs available externally

"Distribution" in model licence terms covers more than publishing weights on Hugging Face. It includes: making weights downloadable by any third party, bundling a model into a software product delivered to customers, providing API access to a model (for licences that treat API provision as distribution), and transferring weights in an M&A transaction or investment structure.

For Llama 3, distribution of fine-tuned weights requires the recipient to receive the Llama 3 Community License. For Gemma, distribution activates the flow-down obligation — the recipient must be contractually bound to the PUP before they can legally use the derivative. For Mistral/Apache-2.0, distribution can occur under any licence with only an attribution requirement.

The API grey area: Running a model as an API service is not weight distribution in the conventional sense — users interact with the model but do not receive the weights. Most major model licences treat this as a commercial use scenario rather than a distribution scenario, meaning weight-specific provisions (derivative licensing, source disclosure) may not apply. But use-case and flow-down provisions continue to apply to API-delivered services.

Distribution Trigger — What Activates Which Obligations

Internal use (employees, own infrastructure)
Llama 3
AUP only
Gemma
PUP only
Mistral (Apache)
No restrictions
Affiliate / subsidiary use
Llama 3
Review lic. definition
Gemma
Review ToU definition
Mistral (Apache)
No restrictions
API service to external users
Llama 3
AUP + 700M clause
Gemma
PUP + flow-down req'd
Mistral (Apache)
No restrictions
Distribute fine-tuned weights publicly
Llama 3
Llama lic. + AUP required
Gemma
Gemma ToU + PUP required
Mistral (Apache)
Any licence + attribution
Embed in commercial SaaS product
Llama 3
AUP + competitor check
Gemma
PUP + flow-down in ToS
Mistral (Apache)
Unrestricted
Transfer weights in M&A or investment
Llama 3
Acquirer bound by Llama lic.
Gemma
Acquirer bound by Gemma ToU
Mistral (Apache)
Licence transfers cleanly
Before you distribute a fine-tuned model or LoRA adapter — checklist
1
Identify the base model's licence version — the licence in effect when you downloaded the weights governs your obligations. Document the version, download date, and source.
2
Determine whether distribution is required — consider whether running an API service avoids weight distribution triggers without sacrificing the commercial goal.
3
For Llama 3 derivatives: every recipient must receive the Llama 3 Community License. Ensure it is included in the release package, repository, or transfer agreement.
4
For Gemma derivatives: every recipient must be contractually bound to the Prohibited Use Policy. Add the PUP flow-down clause to any download agreement, ToS, or enterprise MSA.
5
For Mistral/Apache-2.0 derivatives: include attribution notices from the original Apache-2.0 licence. No other restriction applies — confirm your chosen derivative licence does not inadvertently impose obligations from other sources.
6
Check LoRA adapters separately — if releasing adapters independently of the full weights, the same distribution obligations apply. Treating adapters as non-model artefacts is the most common gap in fine-tuning compliance.
⚠️

M&A and fundraising note: In due diligence for a startup whose core product is built on a fine-tuned Llama 3 or Gemma model, the question is not just "what licence does the model ship under?" — it is "does the licence bind the acquirer, and on what terms?" For Llama 3 and Gemma derivatives, the acquirer inherits the same licence constraints as the seller, including the 700M MAU clause and PUP flow-down. This is a material term that must be disclosed in the IP schedule and reviewed by the buyer's legal team as part of standard AI IP due diligence.

Section 3 — Ownership Questions Around Weights After Further Training

Fine-tuning a model generates new weight values that did not exist before. The natural assumption is that the entity performing the training owns what it creates. The reality is more nuanced: ownership of fine-tuned weights depends on the base model licence, the copyright status of the training data, employment and contractor agreements within the team doing the training, and the unsettled legal question of whether AI-generated weight changes are protectable at all under current copyright law.

The Ownership Stack — What You Are Actually Claiming Rights Over

Fine-tuned model ownership — layer by layer
Base model weights
Owned by the original model developer (Meta, Google, Mistral AI). Your fine-tuning does not transfer ownership of the underlying weights. You receive a licence to use them — not a transfer of IP. Even after fine-tuning, the original weights and their IP remain with the licensor.
Licensor
Architecture & training code
The model architecture (transformer design, attention mechanisms) is typically covered by patents and copyright held by the original developer or their contributors. Using the architecture under licence does not transfer the architecture's IP.
Licensor
Delta weights (fine-tune modifications)
The new weight values generated by your training are where your ownership claim arises. The legal question is whether these weight modifications are (a) copyright-protectable as a creative work, (b) owned by your organisation, and (c) separable from the base weights for ownership purposes. No jurisdiction has fully resolved this.
Contested
Training data contribution
If your fine-tuning dataset contains proprietary or third-party licensed material, that material's IP holders may have a claim on the model's outputs in specific contexts. Dataset licence terms increasingly assert rights over models trained on the dataset — a question that remains unsettled in most jurisdictions.
Potentially shared
Your fine-tuning pipeline & code
The code you write to run fine-tuning, process data, and evaluate the model is standard software copyright owned by your organisation (subject to employee/contractor agreements). This is the clearest element of ownership in the stack — but it is not the model itself.
Your org

Three Unresolved Questions That Affect Fine-Tune Ownership Claims

⚖️

Is a fine-tuned model a "derivative work" under copyright?

Copyright protection requires a human-authored creative contribution. Fine-tuning involves selecting a dataset and hyperparameters — but the weight modifications themselves are generated by an automated process. Whether that process produces a copyrightable derivative work depends on the level of human creative input, which varies by fine-tuning method and by jurisdiction. No major copyright authority has issued a definitive ruling on model-weight derivatives.

🏢

Do employment and contractor agreements capture the fine-tune?

Even where a fine-tune might be copyright-protectable, the ownership question shifts to whether your organisation — rather than the individual researchers who ran the training — holds the rights. This requires valid work-for-hire or IP assignment clauses in employment and contractor agreements that expressly cover AI model training outputs. Standard software IP assignment clauses may not be drafted broadly enough to cover model weights created using third-party licensed base models.

📋

Does the base model licence constrain ownership claims?

For Llama 3 and Gemma derivatives, the licence restricts how you can characterise ownership of the fine-tune — you cannot represent your Llama 3 fine-tune as your unencumbered proprietary model because it must be distributed under the Llama 3 licence. For Apache-2.0 models, no such constraint exists: a Mistral fine-tune can be distributed as a proprietary model, and the ownership claim to the new weights is legally cleaner.

The copyright gap — why "I trained it, I own it" is not reliable
The problem

Copyright protects original human expression. Training a model does not require human expression in the weights — it requires human decisions about data selection, task framing, and evaluation criteria, but the weight values themselves emerge from a mathematical optimisation process. Several jurisdictions — including the US Copyright Office — have declined to protect AI-generated outputs without substantial human authorship. Whether fine-tuning constitutes sufficient human authorship to generate copyright in the resulting weights is an open question with significant commercial implications.

The practical implication

If fine-tuned weights are not copyright-protectable in a given jurisdiction, your primary protection for the model is contractual (licence terms and trade secrecy) rather than copyright. This matters for enforcement: trade secrets require active protection measures and are lost if disclosed without protection. A model distributed under a custom licence without copyright backing is protected only as long as the contract is enforceable and the weights remain non-public. Due diligence for investments or acquisitions involving proprietary fine-tuned models should include a legal assessment of the copyright position in the target's key jurisdictions.

Ownership Clarity by Base Model and Scenario

Can you claim proprietary ownership of the fine-tune?
Llama 3 fine-tune
Constrained by licence
Gemma fine-tune
Constrained by ToU
Mistral fine-tune
Yes — full claim
Can you prevent others from copying the fine-tune?
Llama 3 fine-tune
Via Llama lic. terms
Gemma fine-tune
Via Gemma ToU terms
Mistral fine-tune
If copyright is valid
Can you transfer / sell the fine-tune in M&A?
Llama 3 fine-tune
Yes — licence transfers too
Gemma fine-tune
Yes — ToU transfers too
Mistral fine-tune
Yes — clean transfer
Can you represent the fine-tune as your IP asset?
Llama 3 fine-tune
Partially — base IP excluded
Gemma fine-tune
Partially — base IP excluded
Mistral fine-tune
Yes — with copyright caveats
Does employment / IP assignment apply clearly?
Llama 3 fine-tune
Review agreement scope
Gemma fine-tune
Review agreement scope
Mistral fine-tune
Review agreement scope

Section 4 — Risks When You Mix Models and Datasets Under Different Licences

Most production fine-tuning pipelines involve at least two licence sources: the base model and the training dataset. Many involve more — continued pre-training on a second model, RLHF using a reward model under a separate licence, or synthetic data generated by a proprietary model. Each licence combination creates a potential conflict that, if unaddressed, can contaminate the fine-tuned model's legal status for all downstream uses.

Unlike software dependency licence conflicts, which are well-documented and covered by standard OSS compliance tooling, model licence mixing has no established resolution framework and limited case law. The risks are real, the obligations are contractual (making breaches enforceable), and the worst-case outcomes — model withdrawal, injunction, or forced re-training — are commercially severe.

Four High-Risk Mixing Scenarios

1

Fine-tuning a Llama 3 model on a CC-BY-SA dataset

Copyleft dataset meets custom model licence — obligations collide

Creative Commons BY-SA (Share-Alike) is a copyleft licence: any work derived from a CC-BY-SA dataset must be distributed under the same or compatible licence. If you fine-tune Llama 3 on a CC-BY-SA corpus and the resulting model is considered a derivative of the dataset, you face two conflicting obligations: the Llama 3 licence requires distribution under Llama 3 terms only, while CC-BY-SA requires distribution under CC-BY-SA or compatible terms. These are not reconcilable.

The question of whether a model is a "derivative work" of its training data is unsettled, but the risk is live: several dataset providers have explicitly asserted that models trained on their CC-BY-SA data are derivatives. If that position is accepted in litigation or arbitration, a Llama 3 fine-tune trained on CC-BY-SA data would face a licence conflict with no clean resolution.

Risk verdict: High — do not use CC-BY-SA datasets to fine-tune custom-licence models without specific legal clearance. Use permissively licensed datasets (CC-BY, CC0, Apache-2.0) instead.
2

Continued pre-training on Gemma, then LoRA fine-tuning on Llama 3

Two custom model licences with conflicting downstream obligations

Some development pipelines use one model for continued pre-training (domain adaptation on large unlabelled corpora) and a different model as the base for task-specific fine-tuning. If the pre-training stage uses Gemma weights and the fine-tuning stage uses Llama 3 weights — or vice versa — the resulting model carries obligations from both licences simultaneously.

Gemma requires flow-down of the Prohibited Use Policy to all downstream recipients. Llama 3 requires distribution only under the Llama 3 Community License and prohibits relicensing under other terms. If both licences bind the merged model, satisfying one may make it impossible to satisfy the other without additional agreements with both Google and Meta. In practice, this scenario should be avoided by selecting a single base model licence track and staying within it throughout the pipeline.

Risk verdict: High — mixing Gemma and Llama 3 weights in a single model creates conflicting downstream obligations that cannot be resolved without bilateral licensor cooperation.
3

Using GPT-4 / proprietary model outputs as synthetic training data

OpenAI and similar providers explicitly prohibit using outputs to train competing models

A widely used fine-tuning technique generates synthetic training data by prompting a large proprietary model (GPT-4, Claude, Gemini) and using the outputs to train a smaller open-weight model. OpenAI's Terms of Service explicitly prohibit using outputs from OpenAI models to develop AI models that compete with OpenAI's products. Similar restrictions appear in Anthropic's and Google's API terms.

This prohibition applies regardless of the base model being fine-tuned. If you generate synthetic instruction data using GPT-4 and use it to fine-tune Mistral, Llama 3, or Gemma, the fine-tuned model's training data creates a contractual obligation to OpenAI — even though the weights themselves derive from a different model. The contamination is in the dataset, not the base model licence.

Risk verdict: High — proprietary model outputs used as training data create contractual obligations that bind the resulting fine-tune regardless of the base model licence. This is one of the most common undiscovered compliance gaps in commercial AI products.
4

Mixing Apache-2.0 and custom-licence model weights (model merging)

Model merging or ensemble techniques applied across licence types

Techniques like SLERP merging, linear weight interpolation, or model ensembling combine weights from two or more base models into a single merged model. If one source model is Apache-2.0 (Mistral) and the other is Llama 3 or Gemma, the merged model inherits the more restrictive licence obligations from the custom model — Apache-2.0 does not "cleanse" the merged weights.

Merged models published on Hugging Face using Llama 3 as one source must include the Llama 3 Community License. The merged model cannot be redistributed under Apache-2.0 alone, because the Llama 3 weights — even partially — are present in the merged output and remain bound by the Llama 3 licence. This is particularly relevant for the growing ecosystem of publicly shared merged models where licence chain compliance is frequently absent.

Risk verdict: Moderate to high — model merging across licence types always inherits the most restrictive source licence. The resulting model cannot be relicensed under permissive terms and must comply with all applicable custom licence obligations.

Dataset Licence Compatibility — Common Fine-Tuning Combinations

CC0 / public domain
Llama 3 fine-tune
Compatible
Gemma fine-tune
Compatible
Mistral fine-tune
Compatible
Risk level
Low
CC-BY (attribution only)
Llama 3 fine-tune
Compatible
Gemma fine-tune
Compatible
Mistral fine-tune
Compatible
Risk level
Low
CC-BY-SA (share-alike)
Llama 3 fine-tune
Conflict risk
Gemma fine-tune
Conflict risk
Mistral fine-tune
Possible conflict
Risk level
High
CC-BY-NC (non-commercial)
Llama 3 fine-tune
Blocks commercial use
Gemma fine-tune
Blocks commercial use
Mistral fine-tune
Blocks commercial use
Risk level
High
Proprietary licensed dataset
Llama 3 fine-tune
Review licence scope
Gemma fine-tune
Review licence scope
Mistral fine-tune
Review licence scope
Risk level
Medium
Synthetic data from GPT-4 / proprietary LLM
Llama 3 fine-tune
ToS breach risk
Gemma fine-tune
ToS breach risk
Mistral fine-tune
ToS breach risk
Risk level
High
Apache-2.0 licensed dataset
Llama 3 fine-tune
Compatible
Gemma fine-tune
Compatible
Mistral fine-tune
Compatible
Risk level
Low
Fine-tuning compliance checklist — before you train and before you ship
Audit every dataset licence in your fine-tuning pipeline — including pre-training corpora, instruction datasets, RLHF preference data, and any evaluation sets used to generate synthetic labels.
Check for CC-BY-SA and NC licences — these are incompatible with commercial deployment under custom model licences. Replace with CC-BY, CC0, or Apache-2.0 alternatives before training.
Check for synthetic data from proprietary LLM APIs — GPT-4, Claude, and Gemini API outputs used as training data create contractual obligations that bind the resulting model regardless of the base model licence.
Select a single base model licence track — avoid mixing Gemma and Llama 3 weights in a single model or pipeline. If model merging is required, use models from the same licence family.
Document the full training provenance — base model version, download date, dataset sources and licences, any auxiliary models used (reward models, embedding models). This documentation is required for due diligence and enterprise procurement.
Review LoRA adapter distribution separately from full-weight distribution — LoRA adapters for Llama 3 or Gemma base models trigger the same licence obligations as distributing full fine-tuned weights.
For Gemma-based pipelines: implement PUP flow-down in customer-facing agreements before any external deployment — API, SaaS, or enterprise delivery.
Evaluate Mistral/Apache-2.0 as the base model wherever performance requirements allow — it eliminates the majority of fine-tuning licence risk at source.
Conclusion

The Licence Follows the Weights — Through Every Stage of Training

Fine-tuning is a legal event as much as a technical one. The base model's licence attaches to the fine-tuned weights, the LoRA adapters, and every product that delivers those weights to users — whether as a download, an API, or an embedded feature. The licence does not reset, it does not weaken, and it does not transfer ownership of the base model's IP to the fine-tuner.

The key distinctions that govern compliance are clear in structure: internal use avoids distribution triggers but not use-case restrictions; distribution (including public release, API services, and M&A transfers) activates the full obligation stack; and model or dataset mixing always inherits the most restrictive licence in the pipeline. Where Mistral/Apache-2.0 performs comparably to Llama 3 or Gemma, choosing the Apache-2.0 model eliminates all of these risks at source.

For teams already working with Llama 3 or Gemma fine-tunes, the priority is documentation: provenance records, dataset licence audits, and contract updates (PUP flow-down, Llama 3 licence pass-through) to enterprise customer agreements. These are not just compliance items — they are the materials that will be reviewed in every fundraising round and acquisition conversation involving an AI product with a custom-licence model at its core.


For the broader framework on AI IP ownership and how model licence choice interacts with investment structuring, see AI IP Ownership — wcr.legal.

Oleg Prosin is the Managing Partner at WCR Legal, focusing on international business structuring, regulatory frameworks for FinTech companies, digital assets, and licensing regimes across various jurisdictions. Works with founders and investment firms on compliance, operating models, and cross-border expansion strategies.