Skip to main content

Command Palette

Search for a command to run...

A Hospital Just Asked How Your Healthtech AI Was Trained: How to Answer the Article 10 Data Governance Section

Updated
7 min read

The questionnaire arrived from a 1,400-bed teaching hospital in Belgium. Section 4 had a single, brutal question:

"Describe the training data used for your AI clinical decision support tool. Specify the patient population represented, the geographic source of records, ethnic and demographic distribution, exclusion criteria, and how the dataset was governed under Article 10 of Regulation (EU) 2024/1689."

The CTO of a 25-person healthtech SaaS company had been waiting six weeks for this RFP to land. The annual contract was €240,000. The question on training data sat in row 47 of a 91-row spreadsheet.

There was no good answer ready.

This is the section healthtech founders most often fail. Not because their data is bad — usually it is fine — but because no one has translated what they have into the language Article 10 asks for. Here is exactly what hospital procurement teams are asking when they ask about training data, and how to answer without inventing facts.

Why Article 10 Is the Hardest Section for Healthtech

Article 10 of the EU AI Act sets the requirements for "data and data governance" of high-risk AI systems. For healthtech, the obligations sit higher than for any other vertical, because the consequences of training data failure — misdiagnosis, delayed treatment, mistriage — are immediate and physical.

Article 10(2) requires that training, validation, and testing datasets be subject to "appropriate data governance and management practices." Those practices must cover, among other things:

  • The relevant design choices about the data
  • Data collection processes and the origin of the data
  • Data preparation processing operations such as annotation, labelling, cleaning, enrichment, and aggregation
  • The formulation of relevant assumptions
  • Examination in view of possible biases that could affect health, safety, and fundamental rights
  • Identification of relevant data gaps and how those gaps are addressed

Article 10(3) requires that datasets be "relevant, sufficiently representative, and to the best extent possible, free of errors and complete in view of the intended purpose."

Hospital compliance teams have read Article 10. They are looking for evidence that you have, too.

The Six Things Hospital Procurement Is Actually Asking

When a hospital sends the training data section, they are checking six specific things. Buyers rarely list them this clearly, but every careful procurement team is testing for all six.

1. Patient population Whose health records did your model learn from? Adult ICU patients? Outpatient adults? Pediatric oncology? "Patients" is not a useful answer.

2. Geographic representativeness Was the data European, North American, or global? European hospitals will not deploy a model trained only on US records on European patient populations without explanation. Genetic, dietary, and prescription patterns differ.

3. Demographic distribution What is the age, sex, and (where lawfully recorded) ethnic distribution of the training cohort? Article 10(2)(f) explicitly requires examination of biases that affect fundamental rights. Hospitals expect a numeric answer, not "diverse."

4. Exclusion criteria Whom did you exclude from training, and why? Pregnant patients? Pediatric? Patients with comorbidities? Hospitals deploy your AI on populations you may have excluded; they need to know which.

5. Data preparation How was the data labelled? By whom? What was the inter-rater agreement on the labels? "Labelled by clinicians" is not enough. "Labelled by 3 board-certified radiologists with κ = 0.78" is.

6. Known biases and gaps Article 10 explicitly requires you to identify biases and gaps. A "no known biases" answer fails. Every dataset has known limitations; saying you found none is a credibility failure with hospital compliance teams.

A Template for the Training Data Section

Here is a structure that works for most healthtech SaaS companies with a clinical AI feature. Adapt the specifics; do not copy the numbers.


"The AI clinical decision support module was trained on a dataset of [N] de-identified electronic health records sourced from [list partner institutions / public datasets]. The dataset spans [date range] and includes patients from [list of countries or regions].

Demographic distribution: [age breakdown — e.g., 18–40: X%, 41–65: Y%, 66+: Z%], [sex breakdown — e.g., 52% female, 48% male], [where lawfully recorded, ethnic distribution] (Article 10(2)(f), bias examination).

Exclusion criteria: patients under 18, patients with [condition X], patients without [data element Y]. The model is not validated for excluded populations; the product UI displays a warning when an excluded patient profile is detected.

Data preparation: labels were generated by [N board-certified specialists] using a written annotation protocol (available on request). Inter-rater agreement: Cohen's κ = [value] across [N] adjudicated cases. Data cleaning included [missing-value strategy], [outlier handling], [de-identification method, e.g., HIPAA Safe Harbor / GDPR pseudonymisation under Article 4(5)].

Known biases and gaps: the dataset under-represents [specific group, e.g., adults over 75]. The product warns clinicians that performance on under-represented groups has not been validated. Re-training with a broader dataset is scheduled for [date].

Data governance: training datasets are stored in [region], access is logged, and changes to the dataset are tracked in a versioned dataset registry. Data subject requests are handled per the controller agreement signed with each partner institution. The dataset has been examined for biases that could affect health, safety, and fundamental rights, in line with Article 10(2)(f), and the bias assessment is available on request."


This answer satisfies all six implicit questions a hospital is asking, in roughly 250 words.

The Three Mistakes That Lose Healthtech Deals

Mistake 1: claiming the dataset is "diverse" without numbers. Hospital compliance teams treat "diverse" as a non-answer. They want the breakdown.

Mistake 2: claiming the dataset is "free of bias." Article 10 requires bias examination, not bias absence. A claim of "no bias" signals to a careful reader that you have not actually examined the data.

Mistake 3: confusing inference data with training data. Hospitals are not asking what data your product processes when used; they are asking what data the model learned from. These are different. Many vendors answer the wrong question and get bounced back.

What If Your Training Data Is Mostly American?

This is the single most common situation for healthtech SaaS founded in the US selling into European hospitals. The honest answer wins more often than the dressed-up one.

A defensible response:

"Approximately 78 percent of training records originated from US health systems, with 22 percent from European partner institutions. We have validated the model on a held-out European cohort of [N] patients and report a [accuracy metric drop / no significant drop]. We disclose the geographic skew in product documentation and recommend that European clinicians treat the model's outputs as decision support, not decision automation, until our European cohort reaches [N] records, scheduled for [date]."

A hospital may still decide the gap is too large for them. But they will not catch you trying to hide it, which is what kills the next five deals after this one.

The Article 26 Tail

Once you answer Article 10, the same questionnaire usually has a follow-up under Article 26 (deployer obligations): "How will the deploying clinician verify model output against the patient in front of them?"

This is the human oversight question. It is a separate Article and a separate answer, but it builds on what you said about training data. The connection: a clinician cannot exercise meaningful oversight if they do not know what population the model was trained on.

Buyers who are paying attention will compare your Article 10 answer against your Article 26 answer for consistency. Inconsistency is a deal-breaker.

The Question Behind the Question

When a hospital asks how your AI was trained, they are not pursuing a research interest. They are deciding whether to bring your product into a clinical workflow that will affect real patients tomorrow morning. The depth and specificity of your answer is the signal they use to decide whether to trust you.


The August 2 2026 deadline for high-risk AI obligations is the date most European hospital procurement teams have flagged in their vendor reviews. Healthtech AI is high-risk under Annex III. If your training data section reads like a pitch deck instead of an Article 10 dataset description, the deal stalls.

Try Complizo free at complizo.com — paste your first hospital questionnaire and get answers traceable back to the AI features and training data you actually ship.

More from this blog

Complizo

30 posts