AI Model Licensing AI training, Anthropic ToS, customer data, data privacy, fine-tuning, GDPR, OpenAI ToS Oleg Prosin May 17, 2026 0 Comments

Can I Use Customer Data to Train My AI Model? What Your Provider’s ToS Actually Says

AI Law · Founder Guide

Can I Use Customer Data to Train My AI Model? What Your Provider’s ToS Actually Says

Q: Does OpenAI train on my API data by default?

No. For API customers (ChatGPT Business, Enterprise, API), OpenAI does not use your data to train models by default. This is different from free-tier ChatGPT where training opt-out must be manually enabled.

Q: Can I fine-tune a model on my customers' conversation data?

Technically yes with most providers -- but legally it depends on whether your customers consented to this use. GDPR's purpose limitation principle requires that data collected for one purpose cannot be repurposed for AI training without separate, explicit consent.

Q: What happens to my fine-tuned model if I stop using OpenAI?

Your fine-tuned model weights are yours, but they run on OpenAI infrastructure. If you terminate the relationship, you lose access to the model unless you have arranged data export. Check the termination and data deletion clauses before fine-tuning.

Q: What will investors ask about AI training data on due diligence?

Expect questions about: which models you use and on what terms, whether customer data is used for training or fine-tuning, whether proper consents are in place, and what happens to your product if a provider changes their terms.

OpenAI, Anthropic and Mistral have different rules on customer data, fine-tuning, and output ownership. Here is what the ToS says — and what investors will ask on DD.

17 May 2026 ~7 min read AI Governance Series A Checklist EU + US

In this article

6 sections · ~7 min

1

What “using customer data” actually means

Inference vs fine-tuning vs RAG

2

What each provider’s ToS actually says

OpenAI · Anthropic · Mistral compared

3

Three scenarios where this goes wrong

Real DD red flags, mapped

4

What to check right now

5-minute founder checklist

5

What’s your risk level?

3-question self-assessment

6

Common questions

What investors ask on DD

Section 1

What “Using Customer Data” Actually Means

You are in a Series A due diligence meeting. The investor asks: “Do you use customer conversation data to train or fine-tune your AI model?” The room goes quiet. You signed the API Terms of Service eighteen months ago and have never read past page two. This is not unusual — most B2B founders sign AI provider agreements on the assumption they are standard. They are not. If you need a full review, our AI Model Licensing practice covers exactly this.

Three distinct modes — only one carries real legal risk

The difference determines your entire compliance posture

3 modes

1

Inference — data is processed, the model does not change

Low legal risk for API customers

A customer sends a query. Your app sends it to the API. The model responds. The model weights remain untouched. Customer data passes through but leaves no permanent trace. Standard API terms for business customers explicitly confirm this: your inputs are not used for training.

Most AI products operate primarily at inference. If you are an API customer (not free-tier ChatGPT), this is a non-issue for training questions.

2

Fine-Tuning — customer data reshapes the model’s weights permanently

High legal risk — this is where compliance concentrates

This is where the legal exposure lives. You take conversations from your customers and use them as training examples. The model learns from them — its parameters change permanently. Under GDPR, repurposing data collected for customer support into training data is a purpose limitation violation unless you obtained explicit prior consent.

Investors conducting AI-aware DD will ask specifically about fine-tuning. If you have fine-tuned on customer data without documented consent, expect follow-up questions that are uncomfortable to answer in real time.

3

RAG — data is retrieved at runtime, the model does not change

Moderate risk — data governance questions apply

Retrieval-Augmented Generation pulls relevant documents into the context window at query time. The model is not retrained on them. Data governance questions still arise — what data are you retrieving, who owns it, is it personal data — but you are not in fine-tuning territory.

RAG architectures are generally cleaner from a provider ToS perspective, but your own data handling practices still need to be defensible on DD.

Key distinction

The legal risk is almost entirely in fine-tuning. Inference is a non-issue for API customers — the model is not retrained on your customers’ data, and standard API terms confirm this. When an investor asks “do you use customer data for training?”, they are asking about fine-tuning.

Section 2

What Each Provider’s ToS Actually Says

A plain-language comparison of what OpenAI, Anthropic and Mistral actually permit under their API Terms of Service as of 2024–2025. For a deeper analysis of the licensing structure behind these agreements, see our guide on how AI models are actually licensed.

OAI

OpenAI API

GPT-4o · GPT-4 · GPT-3.5

Customer data used for training by default

No

API data not used to train models by default. Must opt out manually on free tier.

Fine-tuning on customer data

Conditional

Permitted via Fine-tuning API with your own data. GDPR consent from customers still required independently.

Ownership of fine-tuned model outputs

Customer

Outputs assigned to customer, subject to OpenAI usage policies. Weights run on OpenAI infrastructure.

ANT

Anthropic API

Claude 3.5 · Claude 3 Opus

Customer data used for training by default

No

Explicitly prohibited in API terms. Inputs isolated from model training.

Fine-tuning on customer data

Limited

Restricted to enterprise arrangements. Not available via standard API access.

Ownership of fine-tuned model outputs

Customer

Outputs assigned to customer under business agreements.

MIS

Mistral API

Mistral Large · Open weights

Customer data used for training by default

No

API data isolated from training pipelines. Open-weight models are a separate consideration.

Fine-tuning on customer data

Permitted

Fine-tuning API available. Open-weight models operate under their own licence — check each separately.

Ownership of fine-tuned model outputs

Customer

Outputs belong to customer. Open model weights remain under original licence terms.

Important distinction

API ≠ ChatGPT free tier. These rules apply to API customers only. Free-tier ChatGPT users operate under entirely different terms — and by default, their conversations may be used to improve OpenAI’s models unless they opt out manually. If your team uses the free tier for internal tasks, that is a separate governance question from your API usage.

Section 3

Three Scenarios Where This Goes Wrong

These are the situations that surface in investor DD and cause deals to be delayed, repriced, or restructured. They are fixable — but the window to fix them is before the process starts, not during it.

DD

GDPR · Consent

Fine-tuning on customer conversations without consent

The most common DD red flag in AI products

What happens: You use customer support conversations to fine-tune a model. Nobody told customers their conversations would be repurposed as training data.

Why it’s a problem: Under GDPR Article 5(1)(b), data collected for one purpose (support) cannot be repurposed for another (AI training) without separate, explicit consent. No privacy policy disclaimer fixes this retroactively.

What investors see: An automatic red flag. Remediation may require deleting the fine-tuned model and retraining from scratch. Deals restructure or fail at this point.

EU

GDPR · Cross-Border

EU user data processed by a US provider without a DPA

A structural violation most founders discover on DD

What happens: Your product serves EU customers. Every AI interaction sends personal data to OpenAI or Anthropic infrastructure in the United States. You have never executed a Data Processing Agreement.

Why it’s a problem: GDPR Article 46 requires an appropriate safeguard for international data transfers. Without a signed DPA and SCCs, there is no legal basis for the transfer. Both providers offer DPAs — but they must be actively executed.

What investors see: On Day 1 of DD they ask for the signed DPA. If it does not exist, the deal timeline slips. See our cross-border AI compliance guide.

ToS

Vendor Lock-In

Your provider changes its Terms of Service

OpenAI updated ToS 4x in 2023–2024

What happens: Your entire product depends on a single AI provider. That provider updates its Terms of Service — changing what is permitted with fine-tuned models, or what happens to your model weights if you stop paying.

Why it’s a problem: If you have no contractual framework beyond click-through terms, a ToS change can require product redesign overnight. No negotiated termination clause, no data portability right, no notice period.

What investors see: Existential concentration risk with no contractual mitigation. This creates a valuation haircut, escrow holdback, or condition precedent requiring provider redundancy before closing.

Not sure your current provider agreement actually protects your business? A 30-minute review surfaces the gaps before your next investor meeting.

Book a ToS review →

Section 4

What to Check Right Now

A 5-minute self-assessment. If you cannot answer every item here confidently, you have gaps that will surface in investor due diligence. Work through this before you accept an LOI or term sheet.

AI provider compliance checklist

7 items — minimum for any investment-ready AI product

DD-ready

Confirm your API tier — Business, Enterprise, or free/consumer

The training restrictions and DPA availability differ entirely by tier. Free-tier terms are consumer-oriented; API terms are what apply to your product. Confirm which agreement governs your usage.

Verify the training opt-out is active (OpenAI API customers)

Confirm in your account settings that data is not being used for training improvements. Log in and verify — do not assume.

Account settings → Data controls

Execute a Data Processing Agreement if you have EU users

Both OpenAI and Anthropic offer DPAs for business customers. If you have EU-based users or process EU personal data through the API, a signed DPA is a legal requirement under GDPR Article 28 — not optional.

Document customer consent for any fine-tuning on their data

If you have used or plan to use customer conversation data for fine-tuning, verify that your privacy notice explicitly disclosed this use at collection — and that customers had a meaningful opportunity to consent or object.

Review upstream licences for any open-source models you have fine-tuned

LLaMA, Mistral open-weights, Falcon and others each have their own licence terms governing commercial use. Some prohibit fine-tuning for commercial products above certain revenue thresholds.

Check: Meta AI Acceptable Use Policy · Mistral licence · Falcon licence

Include an AI output ownership clause in your customer agreements

Your customer contracts should specify who owns the AI-generated outputs your product produces — and this must be consistent with what your provider’s ToS actually assigns.

Prepare a written answer to the training data provenance question

Write it out now: which models you use, on what terms, whether customer data is used for training or fine-tuning, whether consents are in place. If you struggle to write it, you are not ready for the question in a DD meeting. See our AI Model Licensing Guide for Founders.

This is the minimum

A sophisticated investor’s AI diligence will go significantly deeper than this checklist. But if you cannot tick every item here, you have a problem that will surface before closing. Fix the checklist first — then prepare for the deeper questions.

Section 5

What’s Your Risk Level?

Answer three questions to get an immediate read on your compliance exposure ahead of fundraising.

3-question risk assessment

Takes under 60 seconds — results are immediate

Self-assessment

Question 1 of 3

Which AI provider powers your product?

OpenAI API

Anthropic API

Mistral API

Open-source (LLaMA, Falcon, etc.)

Multiple providers

Question 2 of 3

Do you have EU-based users or customers?

Yes

No

Not sure

Question 3 of 3

Have you fine-tuned any model using customer conversation data?

Yes

No

Planning to

Assess my risk →

Need a Review of Your AI Provider Agreements?

WCR Legal reviews AI model licensing agreements for B2B founders preparing for investment. We identify what your provider’s ToS actually permits, surface GDPR exposure, and prepare you for the questions investors ask in due diligence.

Book a 30-min consultation → AI Model Licensing services

Section 6

Common Questions from Founders and Investors

Frequently asked questions

5 questions — what comes up most in DD and legal reviews

5 questions

1

Does OpenAI train on my API data by default?

+

No. For API customers — ChatGPT Business, Enterprise, and direct API access — OpenAI does not use your data to train models by default. This is explicitly stated in the API terms. This is materially different from the free-tier ChatGPT experience, where training opt-out must be manually enabled by the user. If you are building on the API, your inference data is not feeding back into model improvements.

2

Can I fine-tune a model on my customers’ conversation data?

+

Technically yes with most providers — but legally it depends entirely on whether your customers consented to this use when their data was collected. GDPR’s purpose limitation principle (Article 5(1)(b)) requires that data collected for one purpose — such as customer support — cannot be repurposed for AI training without separate, explicit consent. A retrospective privacy policy update does not satisfy this requirement.

3

What happens to my fine-tuned model if I stop using OpenAI?

+

Your fine-tuned model weights are yours in the sense that OpenAI assigns you the outputs — but they run exclusively on OpenAI infrastructure. If you terminate your account, you lose access to the fine-tuned model unless you have arranged for data export under the agreement’s termination provisions. Review the termination and data deletion clauses in your specific agreement before investing significantly in fine-tuning.

4

Do I need a DPA with OpenAI or Anthropic if I have EU users?

+

Yes. If you process personal data of EU residents through these APIs — which you almost certainly do if your product serves EU customers — GDPR Article 28 requires a Data Processing Agreement with each processor. Both OpenAI and Anthropic offer DPAs for business customers. The DPA must be actively executed by both parties — it does not apply automatically when you sign up. This is one of the first documents a GDPR-conscious investor will request.

5

What will investors ask about AI training data on due diligence?

+

Expect questions in four areas: (1) which models you use and the specific terms governing that usage; (2) whether customer data is used for training or fine-tuning and what consent documentation exists; (3) what happens to your product if a provider changes their terms or you need to switch; and (4) whether proper cross-border transfer mechanisms are in place for EU data. See our AI due diligence guide for a complete investor checklist.

Can I Use Customer Data to Train My AI Model? What Your Provider’s ToS Actually Says

What “Using Customer Data” Actually Means

What Each Provider’s ToS Actually Says

Three Scenarios Where This Goes Wrong

What to Check Right Now

What’s Your Risk Level?

Common Questions from Founders and Investors

Trademark Strategy for Web3 and AI Companies: Where and What to Register

OpenAI vs Anthropic vs Mistral: Who Owns Your AI Outputs?

Post Comment Cancel reply