Is Your Law Firm's Client Data Safe With AI? A Plain-English Guide for Managing Partners
Quick Take / Direct Answer
When AI is deployed on private cloud infrastructure — Azure Private Endpoint or AWS VPC, within your firm's own cloud environment — client documents are never transmitted to any external AI provider during operation. OpenAI does not train on API data when a DPA is signed. Microsoft Copilot processes data on Microsoft's cloud, not the firm's own servers. For firms with strict confidentiality obligations, private deployment is the only defensible architecture.
The Three Data Models — And What Each Means for Your Firm
Model 1: Consumer AI tools (ChatGPT free, Bing Chat) Data processed and potentially used for training. Do not use for client work. Full stop.
Model 2: Enterprise AI tools (Microsoft Copilot for M365, ChatGPT Enterprise, Harvey) Data processed on the vendor's cloud infrastructure under a DPA. Vendor commits not to use data for model training. Data does leave your firm's direct control and is processed on third-party infrastructure. Appropriate for general productivity; concerning for highly sensitive client documents where confidentiality obligations are absolute.
Model 3: Custom AI on private deployment (Govistudio model) The entire AI system — vector database, embedding model, language model inference, query processing — runs inside your firm's own cloud environment. No data is transmitted to any external AI provider during operation. The firm retains full control of all data at all times. This is the appropriate architecture for client documents subject to attorney-client privilege.
Does OpenAI Train on Your Law Firm's Data?
This is the question every managing partner asks. The correct answer requires a distinction:
Consumer ChatGPT (chat.openai.com): OpenAI's privacy policy states that conversations may be used to improve and train models. Do not use this with client data.
OpenAI API (used by developers to build systems): When a Data Processing Agreement (DPA) is signed, OpenAI commits that API inputs and outputs are not used for training. However, data is still processed on OpenAI's shared infrastructure.
Azure OpenAI Service (Microsoft): Runs on Microsoft's cloud infrastructure — not OpenAI's. Microsoft's enterprise DPA similarly commits to no training use. Data stays within Microsoft's infrastructure, not OpenAI's.
Govistudio's custom AI (private deployment): Govistudio builds your system to run on your firm's own Azure tenant or AWS VPC. When your attorney makes a query, the query is processed by infrastructure your firm controls. OpenAI and Microsoft never see the query, the documents, or the answer.
UK GDPR and Attorney-Client Privilege Requirements
UK GDPR (for UK firms and firms handling UK client data):
- AI vendor must sign a Data Processing Agreement (DPA) as a data processor under Article 28
- Processing of personal data must have a lawful basis documented in your Record of Processing Activities (ROPA)
- High-risk processing requires a Data Protection Impact Assessment (DPIA)
- Data residency: if data must stay in the UK, deployment on Azure UK South or AWS eu-west-2 is required
Attorney-client privilege (UK: Legal Professional Privilege):
- Privilege protects communications between solicitor and client
- Communications disclosed to third parties without appropriate safeguard risk losing privilege
- A properly constructed private AI deployment — where the firm's own cloud environment processes all data — maintains privilege by keeping data within the firm's control
- Shared-cloud tools where a third-party vendor processes data introduce privilege risk that should be assessed by your data protection and professional responsibility counsel
Data Handling Comparison
| Consumer ChatGPT | Microsoft Copilot (Enterprise) | Custom AI (Private Deployment) | |
|---|---|---|---|
| Data used for model training | Potentially | No (DPA) | No |
| Data leaves firm's environment | ✓ | ✓ | ✗ |
| Processed on third-party infrastructure | ✓ | ✓ | ✗ |
| Data residency control | ✗ | Partial | ✓ Full |
| Privilege risk | High | Low–Medium | Minimal |
| Appropriate for client documents | ✗ | Consult counsel | ✓ |
What to Require From Any AI Vendor Before Proceeding
Before signing any AI vendor agreement, obtain and review:
- A signed Data Processing Agreement (DPA) designating the vendor as a data processor under applicable data protection law
- Written confirmation of data residency — specifically which data centres process your data and in which jurisdiction
- Written confirmation that your data is not used to train or improve the vendor's AI models
- A description of the security architecture — specifically whether your data is processed on shared or dedicated infrastructure
- Audit rights or SOC 2 Type II certification confirming security controls
- A data retention and deletion policy — when you terminate, what happens to your data
Govistudio provides a standard DPA, security architecture documentation, and signed data handling commitments before any engagement begins.
FAQs
Q: Does OpenAI train on data submitted through the API? A: No — when a Data Processing Agreement is signed with OpenAI or Microsoft (for Azure OpenAI), your data is not used for model training. This is contractually committed. The key distinction is API access (enterprise) versus consumer ChatGPT (which may use data for training under its default privacy policy).
Q: What is a private AI deployment and why do law firms need it? A: A private deployment means all AI processing occurs within your firm's own cloud environment — your Azure subscription or AWS account. No queries, document contents, or answers are transmitted to any external AI provider. This is the architecture required when client confidentiality and privilege obligations are absolute.
Q: Do we need a DPIA for our AI system? A: Under UK GDPR, a DPIA is required when processing is "likely to result in a high risk" to individuals. AI systems processing client personal data (in legal matters, medical records, employment disputes) typically meet this threshold. Govistudio provides a DPIA template as part of the implementation process.
Q: What does data residency mean and why does it matter? A: Data residency determines the physical location of the servers that process and store your data. For UK firms under UK GDPR, data processed outside the UK or EEA may require additional safeguards (standard contractual clauses). Private deployment on Azure UK South or AWS eu-west-2 (Ireland) keeps data in UK/EEA jurisdiction without additional requirements.