Skip to main content

Command Palette

Search for a command to run...

A BigLaw Firm Just Asked How Your AI Due Diligence Tool Performs Under Adversarial Conditions: Answering the Article 15 Robustness and Accuracy Questions

Published
4 min read

A BigLaw firm's Chief Technology Officer just sent your company a security questionnaire alongside their standard procurement intake. One section is labeled "EU AI Act — Article 15 Accuracy, Robustness, and Cybersecurity." They want to know how your AI due diligence tool behaves when documents are unusual, incomplete, or deliberately formatted to confuse a model.

This post explains what Article 15 requires for legal AI tools, what "adversarial robustness" means in practice for document analysis systems, and how to write answers that satisfy a law firm's risk committee without overclaiming.

What Article 15 Actually Requires

Article 15 of the EU AI Act requires that high-risk AI systems achieve an appropriate level of accuracy, robustness, and cybersecurity throughout their lifecycle. For a legal AI tool used in due diligence — which falls under Annex III as a system that makes or assists decisions affecting legal analysis — this means three things:

Accuracy: The system must perform at a documented level across the range of use cases it is marketed for. Accuracy claims must be measurable and disclosed.

Robustness: The system must handle errors, faults, and unexpected inputs without producing outputs that a legal professional would rely on without further review. This includes graceful degradation — outputting "I cannot reliably extract this clause" rather than a hallucinated answer.

Cybersecurity: The system must be resistant to adversarial inputs — documents crafted to manipulate the model's outputs. For legal AI, this means adversarial PDFs, deliberately mislabeled documents, or prompt injection via contract text.

Answering the Questionnaire Questions

"What is the documented accuracy of your AI system across the use cases it is marketed for?"

Our system achieves documented precision and recall rates on our standard evaluation benchmark, which covers the document types and clause categories it is marketed for. Accuracy varies by document type and language; our technical disclosure specifies accuracy ranges by category. The full benchmark report is available under NDA.

If you don't have documented accuracy benchmarks, this question reveals the gap. You need numbers before you can answer this question honestly — and law firms will push back if the answer is vague.

"How does the system behave when it encounters documents outside its training distribution?"

The system flags low-confidence extractions rather than silently returning incorrect outputs. When the model's confidence score for a clause falls below a defined threshold, the output is marked for human review rather than presented as reliable. Legal professionals using the system are trained to treat flagged outputs as suggestions requiring verification.

"How does your system resist adversarial inputs — for example, documents crafted to manipulate its outputs?"

Our system is evaluated against adversarial document test cases as part of each model release. These include prompt injection attempts via contract text and deliberately formatted documents designed to confuse extraction. Results are documented in our release notes. We do not disclose the specific adversarial test suite publicly to prevent gaming, but results are available under NDA.

"What cybersecurity measures protect the system and the documents processed through it?"

Documents are processed in isolated compute environments with no persistent storage after the session ends. Data in transit is encrypted via TLS 1.2+. Our security posture is documented in our SOC 2 Type II report, available under NDA.

Why Law Firms Ask This Now

Law firms using AI for due diligence face their own liability exposure if they rely on outputs that turn out to be wrong due to a known system deficiency the vendor never disclosed. Article 15 questions are often drafted by risk committees trying to establish whether the AI vendor has done the work to characterize its own failure modes.

The right answer isn't "our system is accurate." The right answer is: "our system achieves documented accuracy across defined use cases, degrades gracefully at the boundaries of its training distribution, and is evaluated against adversarial inputs before each release."

That answer can be assembled from your existing engineering documentation. The hard part is organizing it into the format the questionnaire expects.

The Specific Procurement Scenario

Your legal team received this from a BigLaw CTO:

"Before we can proceed with procurement, we need to understand how your system performs under edge-case and adversarial conditions per Article 15 of the EU AI Act. Specifically: documented accuracy rates, your graceful degradation protocol, and evidence of adversarial testing."

The response that moves the deal forward is specific and evidence-backed. It references your actual benchmarks, names the failure mode your system is designed to handle, and offers the adversarial testing report under NDA rather than describing adversarial testing in abstract terms.

Complizo generates structured Article 15 answers from your existing technical documentation, so you're not writing from scratch every time a new firm's risk committee asks.

Try Complizo free at complizo.com

More from this blog

Complizo

87 posts