OpenAI HIPAA: Can You Use ChatGPT With PHI? A Developer's Guide
It's the most-asked question we get from healthtech founders: "can we use OpenAI in our healthcare product?"
The short answer is yes — but not the way most people first try it. OpenAI's consumer API at api.openai.com is not HIPAA-eligible. Microsoft's Azure OpenAI Service is. That single distinction determines whether your architecture is compliant or whether you have a finding waiting to surface in your next audit.
This is the developer-grade detail behind that distinction: what Azure OpenAI's BAA actually covers, the logging gotcha that catches most teams, the architectural patterns we use in production healthcare builds, and the specific lines of code that change between the consumer SDK and the Azure SDK.
Short answer for the hurry
- ❌ OpenAI consumer API (
api.openai.com) — not HIPAA-eligible. Don't send PHI here. ToS explicitly excludes BAA coverage; prompts and responses can be logged for safety review. - ✅ Azure OpenAI Service — HIPAA-eligible under Microsoft's standard BAA for Azure customers. Same GPT-4-class models. Different deployment, different SDK endpoint, different logging behavior.
- ⚠️ OpenAI Enterprise — offered to enterprise customers with a separately negotiated BAA. Available but requires direct sales engagement and is generally less common in our build engagements than Azure.
If you're building a healthcare product that needs GPT-4 today and you want a defensible path: deploy on Azure OpenAI. We'll cover the why and the how below.
Why the consumer API can't carry PHI
OpenAI's standard Terms of Use and Business Terms (the contracts behind api.openai.com) explicitly exclude HIPAA from the scope of permitted use cases. OpenAI doesn't sign Business Associate Agreements at this tier. Beyond the legal absence:
- Prompts and outputs may be logged for safety review and abuse detection, with retention controlled by OpenAI's data policies — not yours.
- Improvements to models can use API inputs depending on your enterprise data sharing settings (default for many tiers).
- Geographic data residency isn't contractually guaranteed at the level US healthcare clients typically need.
Sending PHI to the consumer API is the equivalent of mailing patient records to a vendor with no BAA. The legal exposure is real even if the technical performance is identical.
We've inherited multiple healthcare codebases where the founding team prototyped on the consumer API "to validate the idea," then deployed to production never having migrated the endpoint. The first audit found it. The remediation cost more than rebuilding the AI features against Azure from scratch would have.
What Azure OpenAI's BAA actually covers
Microsoft signs a standard BAA with Azure customers as part of the Microsoft Online Services Agreement. The BAA covers a defined inventory of Azure services, and Azure OpenAI Service is on that inventory.
Specifically:
- The Azure OpenAI inference endpoints are in BAA scope when accessed through your Azure subscription with the BAA executed.
- Azure storage, networking, key vault, and Azure AD identity services used in conjunction with Azure OpenAI are also in scope under the same BAA, provided you're using the eligible service tiers.
- The underlying GPT-4 family models (and the o-series, the GPT-4-turbo variants, the embedding models, the moderation models, the realtime/voice models — at the time of writing) are accessible through this BAA-covered endpoint.
What's NOT in scope:
api.openai.comdirect calls — even if you're billed through Azure for other things.- Azure services outside the BAA-eligible list — Microsoft maintains a published list of covered services; new services often need explicit inclusion before they're BAA-eligible.
- Third-party plugins or external tools invoked from your Azure OpenAI deployment that touch PHI without their own BAA chain.
The published list of Azure services covered under the BAA is updated periodically. Before treating a specific Azure feature as in-scope, verify it against the current Microsoft-published list. We maintain an internal copy for client engagements and re-check at each major Azure announcement.
The logging gotcha
Azure OpenAI Service has a feature called content filtering and abuse monitoring that, by default, logs prompts and completions for up to 30 days for human review of policy violations. Microsoft documents this clearly — and many engineers building healthcare features don't notice the implication until their security team raises it.
The implication: by default, your PHI prompts may pass through a Microsoft-side abuse review pipeline. Microsoft considers this covered under the BAA (Microsoft personnel reviewing covered data are bound by the same BAA constraints), but several healthcare clients we work with want a stricter posture.
The fix: apply for the Limited Access form for "Modified Content Filtering" or "Modified Abuse Monitoring" through your Azure account team. This disables the abuse-logging pipeline for your specific deployment. Microsoft reviews and approves these requests for legitimate compliance scenarios — healthcare is the most common reason granted.
Document the approval in your control mapping. Auditors who know to ask about this will ask. Having the approval letter ready short-circuits a 30-minute conversation.
The architectural pattern we use
Across the healthcare AI engagements we've shipped, the same architectural pattern keeps showing up:
Application service (handles PHI)
↓
AI Inference Service (Softedge-owned, in BAA scope)
↓ (Azure SDK call)
Azure OpenAI deployment (in your Azure subscription)
↓
GPT-4 / o1 / embeddings inference
Three properties of this pattern that matter:
- The inference service is a separable component, not direct calls scattered through application code. This means: swapping providers later (to Anthropic via Bedrock, or to a self-hosted Llama for cost reasons) doesn't require touching product logic.
- The Azure deployment lives in your subscription, not ours. Customer code, customer keys, customer audit trail — we're builders, not data hosts.
- Audit logging happens at the inference service layer, not at the Azure layer. We log who, what, which model version, what came back, and what the downstream effect was — feeding both your HIPAA audit trail and your model-quality monitoring with the same data.
Code differences — consumer vs Azure
If you've prototyped on the consumer API and need to migrate, the SDK call sites look superficially similar but differ in three important ways:
Consumer (NOT HIPAA-eligible):
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const completion = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: prompt }],
});
Azure (HIPAA-eligible under BAA):
import { AzureOpenAI } from 'openai';
const openai = new AzureOpenAI({
endpoint: process.env.AZURE_OPENAI_ENDPOINT, // your deployment URL
apiKey: process.env.AZURE_OPENAI_KEY,
apiVersion: '2024-08-01-preview',
deployment: 'gpt-4o-prod', // your deployment name in Azure
});
const completion = await openai.chat.completions.create({
model: 'gpt-4o-prod', // deployment name, not OpenAI model name
messages: [{ role: 'user', content: prompt }],
});
The differences:
- Endpoint points to your Azure deployment, not
api.openai.com - Authentication uses your Azure deployment key (or Entra ID managed identity for stricter setups), not an OpenAI API key
modelrefers to your deployment name in Azure, not OpenAI's model name — you map them in the Azure portal- API version is explicit; Azure pins to specific API versions while consumer OpenAI evolves silently
For stricter setups, replace the API key with Azure Entra ID authentication using DefaultAzureCredential. This eliminates a long-lived secret and lets you scope access through Azure RBAC.
When Claude or open-source models fit better
Azure OpenAI is the default we recommend, but it's not the only answer.
Anthropic Claude via AWS Bedrock — fits well if your stack is already AWS-heavy. AWS signs BAAs covering Bedrock-hosted Claude. Comparable model quality, similar architectural pattern (Bedrock deployment in your AWS account, BAA covers the inference path).
Self-hosted open-source models (Llama 3, Mistral) on dedicated infrastructure — best for the highest-sensitivity workloads where you want zero third-party model exposure. Higher operational overhead, lower per-inference cost at scale, full control over data flow. Common for clients building products that handle psychiatric records, addiction medicine, or behavioral health data — categories where even BAA-covered vendor exposure feels too broad.
We've shipped against all three. The selection criterion is usually: which cloud is your stack already on, and is there a specific data-sensitivity concern that pushes you off vendor-hosted entirely.
Cost notes for healthcare builds
Azure OpenAI pricing is comparable to OpenAI's consumer pricing on a per-token basis — same GPT-4o costs roughly the same regardless of how you access it. The architectural overhead of being HIPAA-eligible isn't a higher per-call rate; it's the operational work of running it inside Azure with proper logging, monitoring, and BAA documentation.
Where costs do diverge:
- Provisioned throughput units in Azure OpenAI give you predictable capacity at a flat monthly cost. Useful for healthcare products with steady traffic where you want guaranteed availability and aren't paying for unused capacity.
- Reserved capacity for embedding workloads (think: indexing patient records into a vector database) can substantially reduce per-token cost compared to on-demand.
- Audit log storage is a real line item once you're at meaningful scale. Plan for it in your cost model.
Common mistakes we see
After enough healthcare AI engagements, the failure modes are predictable:
- Using consumer API in dev, Azure in prod. Don't. Dev environments handle real data too — synthetic data is often not as synthetic as engineers think. Use Azure throughout.
- Logging prompts to your APM (DataDog, Sentry, etc.) for debugging. PHI prompts in APM tools sends PHI outside the BAA envelope. Log hashes for diagnostics, full prompts only in your in-BAA audit log.
- Treating embeddings as not-PHI. Embedding inversion is a real attack class. Treat embeddings of PHI as PHI.
- Skipping the abuse monitoring opt-out. Default behavior may be acceptable for your org; many healthcare clients need the stricter setting. Decide deliberately.
- Hard-coding model deployment names. Use environment variables. Azure deployments get renamed during operations work, and you'll want to swap deployment names without code changes.
Getting started
If you're in the "validate the idea" phase: don't prototype on the consumer API even with fake data. Stand up a small Azure OpenAI deployment from day one. The 30 minutes of Azure setup saves the painful migration later.
If you have a production codebase calling the consumer API with PHI: stop. Migrate to Azure as a priority work item. The architectural lift is small; the legal exposure of leaving it is not.
If you're architecting a new healthcare product and want to do this right from the start: that's the engagement shape we work on most often. Talk to us about your build.
For more on the broader architecture around HIPAA-compliant AI, see our 2026 architect's guide — the long-form take on the three architectural decisions that determine whether your healthcare AI features survive audit. Or browse our healthcare software development services for how we approach regulated software builds end-to-end.