Can I Use Customer Data to Train My AI Model

Can I Use Customer Data to Train My AI Model? What Your Provider’s ToS Actually Says

AI Law · Founder Guide

Can I Use Customer Data to Train My AI Model? What Your Provider’s ToS Actually Says

OpenAI, Anthropic and Mistral have different rules on customer data, fine-tuning, and output ownership. Here is what the ToS says — and what investors will ask on DD.
17 May 2026 ~7 min read AI Governance Series A Checklist EU + US
In this article
6 sections · ~7 min
1
What “using customer data” actually means
Inference vs fine-tuning vs RAG
2
What each provider’s ToS actually says
OpenAI · Anthropic · Mistral compared
3
Three scenarios where this goes wrong
Real DD red flags, mapped
4
What to check right now
5-minute founder checklist
5
What’s your risk level?
3-question self-assessment
6
Common questions
What investors ask on DD
Section 1

What “Using Customer Data” Actually Means

You are in a Series A due diligence meeting. The investor asks: “Do you use customer conversation data to train or fine-tune your AI model?” The room goes quiet. You signed the API Terms of Service eighteen months ago and have never read past page two. This is not unusual — most B2B founders sign AI provider agreements on the assumption they are standard. They are not. If you need a full review, our AI Model Licensing practice covers exactly this.
Three distinct modes — only one carries real legal risk
The difference determines your entire compliance posture
3 modes
1
Inference — data is processed, the model does not change
Low legal risk for API customers
A customer sends a query. Your app sends it to the API. The model responds. The model weights remain untouched. Customer data passes through but leaves no permanent trace. Standard API terms for business customers explicitly confirm this: your inputs are not used for training.
Most AI products operate primarily at inference. If you are an API customer (not free-tier ChatGPT), this is a non-issue for training questions.
2
Fine-Tuning — customer data reshapes the model’s weights permanently
High legal risk — this is where compliance concentrates
This is where the legal exposure lives. You take conversations from your customers and use them as training examples. The model learns from them — its parameters change permanently. Under GDPR, repurposing data collected for customer support into training data is a purpose limitation violation unless you obtained explicit prior consent.
Investors conducting AI-aware DD will ask specifically about fine-tuning. If you have fine-tuned on customer data without documented consent, expect follow-up questions that are uncomfortable to answer in real time.
3
RAG — data is retrieved at runtime, the model does not change
Moderate risk — data governance questions apply
Retrieval-Augmented Generation pulls relevant documents into the context window at query time. The model is not retrained on them. Data governance questions still arise — what data are you retrieving, who owns it, is it personal data — but you are not in fine-tuning territory.
RAG architectures are generally cleaner from a provider ToS perspective, but your own data handling practices still need to be defensible on DD.
Key distinction
The legal risk is almost entirely in fine-tuning. Inference is a non-issue for API customers — the model is not retrained on your customers’ data, and standard API terms confirm this. When an investor asks “do you use customer data for training?”, they are asking about fine-tuning.
Section 2

What Each Provider’s ToS Actually Says

A plain-language comparison of what OpenAI, Anthropic and Mistral actually permit under their API Terms of Service as of 2024–2025. For a deeper analysis of the licensing structure behind these agreements, see our guide on how AI models are actually licensed.
OpenAI API
GPT-4o · GPT-4 · GPT-3.5
Customer data used for training by default
No
API data not used to train models by default. Must opt out manually on free tier.
Fine-tuning on customer data
Conditional
Permitted via Fine-tuning API with your own data. GDPR consent from customers still required independently.
Ownership of fine-tuned model outputs
Customer
Outputs assigned to customer, subject to OpenAI usage policies. Weights run on OpenAI infrastructure.
Anthropic API
Claude 3.5 · Claude 3 Opus
Customer data used for training by default
No
Explicitly prohibited in API terms. Inputs isolated from model training.
Fine-tuning on customer data
Limited
Restricted to enterprise arrangements. Not available via standard API access.
Ownership of fine-tuned model outputs
Customer
Outputs assigned to customer under business agreements.
Mistral API
Mistral Large · Open weights
Customer data used for training by default
No
API data isolated from training pipelines. Open-weight models are a separate consideration.
Fine-tuning on customer data
Permitted
Fine-tuning API available. Open-weight models operate under their own licence — check each separately.
Ownership of fine-tuned model outputs
Customer
Outputs belong to customer. Open model weights remain under original licence terms.
Important distinction
API ≠ ChatGPT free tier. These rules apply to API customers only. Free-tier ChatGPT users operate under entirely different terms — and by default, their conversations may be used to improve OpenAI’s models unless they opt out manually. If your team uses the free tier for internal tasks, that is a separate governance question from your API usage.
Section 3

Three Scenarios Where This Goes Wrong

These are the situations that surface in investor DD and cause deals to be delayed, repriced, or restructured. They are fixable — but the window to fix them is before the process starts, not during it.
DD
GDPR · Consent
Fine-tuning on customer conversations without consent
The most common DD red flag in AI products
What happens: You use customer support conversations to fine-tune a model. Nobody told customers their conversations would be repurposed as training data.
Why it’s a problem: Under GDPR Article 5(1)(b), data collected for one purpose (support) cannot be repurposed for another (AI training) without separate, explicit consent. No privacy policy disclaimer fixes this retroactively.
What investors see: An automatic red flag. Remediation may require deleting the fine-tuned model and retraining from scratch. Deals restructure or fail at this point.
EU
GDPR · Cross-Border
EU user data processed by a US provider without a DPA
A structural violation most founders discover on DD
What happens: Your product serves EU customers. Every AI interaction sends personal data to OpenAI or Anthropic infrastructure in the United States. You have never executed a Data Processing Agreement.
Why it’s a problem: GDPR Article 46 requires an appropriate safeguard for international data transfers. Without a signed DPA and SCCs, there is no legal basis for the transfer. Both providers offer DPAs — but they must be actively executed.
What investors see: On Day 1 of DD they ask for the signed DPA. If it does not exist, the deal timeline slips. See our cross-border AI compliance guide.
ToS
Vendor Lock-In
Your provider changes its Terms of Service
OpenAI updated ToS 4x in 2023–2024
What happens: Your entire product depends on a single AI provider. That provider updates its Terms of Service — changing what is permitted with fine-tuned models, or what happens to your model weights if you stop paying.
Why it’s a problem: If you have no contractual framework beyond click-through terms, a ToS change can require product redesign overnight. No negotiated termination clause, no data portability right, no notice period.
What investors see: Existential concentration risk with no contractual mitigation. This creates a valuation haircut, escrow holdback, or condition precedent requiring provider redundancy before closing.
Not sure your current provider agreement actually protects your business? A 30-minute review surfaces the gaps before your next investor meeting.
Book a ToS review →
Section 4

What to Check Right Now

A 5-minute self-assessment. If you cannot answer every item here confidently, you have gaps that will surface in investor due diligence. Work through this before you accept an LOI or term sheet.
AI provider compliance checklist
7 items — minimum for any investment-ready AI product
DD-ready
Confirm your API tier — Business, Enterprise, or free/consumer
The training restrictions and DPA availability differ entirely by tier. Free-tier terms are consumer-oriented; API terms are what apply to your product. Confirm which agreement governs your usage.
Verify the training opt-out is active (OpenAI API customers)
Confirm in your account settings that data is not being used for training improvements. Log in and verify — do not assume.
Account settings → Data controls
Execute a Data Processing Agreement if you have EU users
Both OpenAI and Anthropic offer DPAs for business customers. If you have EU-based users or process EU personal data through the API, a signed DPA is a legal requirement under GDPR Article 28 — not optional.
Document customer consent for any fine-tuning on their data
If you have used or plan to use customer conversation data for fine-tuning, verify that your privacy notice explicitly disclosed this use at collection — and that customers had a meaningful opportunity to consent or object.
Review upstream licences for any open-source models you have fine-tuned
LLaMA, Mistral open-weights, Falcon and others each have their own licence terms governing commercial use. Some prohibit fine-tuning for commercial products above certain revenue thresholds.
Check: Meta AI Acceptable Use Policy · Mistral licence · Falcon licence
Include an AI output ownership clause in your customer agreements
Your customer contracts should specify who owns the AI-generated outputs your product produces — and this must be consistent with what your provider’s ToS actually assigns.
Prepare a written answer to the training data provenance question
Write it out now: which models you use, on what terms, whether customer data is used for training or fine-tuning, whether consents are in place. If you struggle to write it, you are not ready for the question in a DD meeting. See our AI Model Licensing Guide for Founders.
This is the minimum
A sophisticated investor’s AI diligence will go significantly deeper than this checklist. But if you cannot tick every item here, you have a problem that will surface before closing. Fix the checklist first — then prepare for the deeper questions.
Section 5

What’s Your Risk Level?

Answer three questions to get an immediate read on your compliance exposure ahead of fundraising.
3-question risk assessment
Takes under 60 seconds — results are immediate
Self-assessment
Question 1 of 3
Which AI provider powers your product?
OpenAI API
Anthropic API
Mistral API
Open-source (LLaMA, Falcon, etc.)
Multiple providers
Question 2 of 3
Do you have EU-based users or customers?
Yes
No
Not sure
Question 3 of 3
Have you fine-tuned any model using customer conversation data?
Yes
No
Planning to
Assess my risk →
Need a Review of Your AI Provider Agreements?
WCR Legal reviews AI model licensing agreements for B2B founders preparing for investment. We identify what your provider’s ToS actually permits, surface GDPR exposure, and prepare you for the questions investors ask in due diligence.
Section 6

Common Questions from Founders and Investors

Frequently asked questions
5 questions — what comes up most in DD and legal reviews
5 questions
1
Does OpenAI train on my API data by default?
+
No. For API customers — ChatGPT Business, Enterprise, and direct API access — OpenAI does not use your data to train models by default. This is explicitly stated in the API terms. This is materially different from the free-tier ChatGPT experience, where training opt-out must be manually enabled by the user. If you are building on the API, your inference data is not feeding back into model improvements.
2
Can I fine-tune a model on my customers’ conversation data?
+
Technically yes with most providers — but legally it depends entirely on whether your customers consented to this use when their data was collected. GDPR’s purpose limitation principle (Article 5(1)(b)) requires that data collected for one purpose — such as customer support — cannot be repurposed for AI training without separate, explicit consent. A retrospective privacy policy update does not satisfy this requirement.
3
What happens to my fine-tuned model if I stop using OpenAI?
+
Your fine-tuned model weights are yours in the sense that OpenAI assigns you the outputs — but they run exclusively on OpenAI infrastructure. If you terminate your account, you lose access to the fine-tuned model unless you have arranged for data export under the agreement’s termination provisions. Review the termination and data deletion clauses in your specific agreement before investing significantly in fine-tuning.
4
Do I need a DPA with OpenAI or Anthropic if I have EU users?
+
Yes. If you process personal data of EU residents through these APIs — which you almost certainly do if your product serves EU customers — GDPR Article 28 requires a Data Processing Agreement with each processor. Both OpenAI and Anthropic offer DPAs for business customers. The DPA must be actively executed by both parties — it does not apply automatically when you sign up. This is one of the first documents a GDPR-conscious investor will request.
5
What will investors ask about AI training data on due diligence?
+
Expect questions in four areas: (1) which models you use and the specific terms governing that usage; (2) whether customer data is used for training or fine-tuning and what consent documentation exists; (3) what happens to your product if a provider changes their terms or you need to switch; and (4) whether proper cross-border transfer mechanisms are in place for EU data. See our AI due diligence guide for a complete investor checklist.

Oleg Prosin is the Managing Partner at WCR Legal, focusing on international business structuring, regulatory frameworks for FinTech companies, digital assets, and licensing regimes across various jurisdictions. Works with founders and investment firms on compliance, operating models, and cross-border expansion strategies.

Post Comment