← Atgal į Tyrimus
Research · Synthetic Patients

Anna Graniczna is 31, has fear of abandonment, and doesn't exist

How we build synthetic patients for research and testing without risking the privacy of real people.

DateApril 12, 2026
Sessions20
Therapy duration5 months
Version1.0

What we publish and why

Every AI tool in mental health needs test data. Real session recordings are not an option for us — for three reasons. Hence synthetic patients. Anna Graniczna is the first fully published case from our pipeline.

20
Sessions in transcript
~50 min
Each session
10
Sheet rules
CC-BY 4.0
License
💡 What we publish: the full patient sheet for Anna Graniczna (the input layer of our pipeline) + 20 session transcripts in ASR format + an empty sheet template. Everything under the CC-BY 4.0 license.
🔒 What we don't publish: the therapy arc, the session plan, the style guide, the sequential-coherence engine — these are elements locked inside our platform. Need a different patient? We'll generate one for you.
⚠️ What this dataset does NOT do: it is not a diagnostic tool, it does not replace supervision, and it is not used to train decision models. It is an aid for testing platforms, validating clinical-speech understanding models, and training new therapists in controlled conditions.

Table of contents


Three problems with real recordings

The simplest path to a dataset would be anonymized recordings of real therapy sessions. In practice, that path is closed — for three concrete reasons.

01

Privacy is non-transferable

No real patient knowingly consents to the publication of 20 hours of transcripts

And if they do consent — it's worth considering, from the perspective of the strength of consent, whether this isn't consent coerced by context (the relationship with the therapist, pressure of the clinical environment). In practice: even carefully anonymized recordings contain information that re-identifies the patient for people in their immediate circle. A specific memory, a specific phrase, a specific biographical event — all of these are a "fingerprint".

⚠️ Consequence: research with real patients requires bioethics committee approval, a long recruitment process, and limited sharing. This cannot be scaled to the pace of AI product development.
02

Open corpora are English-language

The Polish patient barely exists in clinical datasets

Almost all publicly available datasets of therapy transcripts originate in the US and UK. The Polish language of therapy — with its tonal specifics (the formal "Pani" forms, the shift to "ty", directness vs politeness), its cultural specifics (the role of the mother, transgenerational patterns, parental alcoholism as a frequent context), and its institutional specifics (the NFZ public health fund, private practices, modalities available on the market) — requires a separate dataset.

📊 Practical observation: a model trained on transcripts from American sessions does not understand the Polish "klasyk Anna" (classic Anna), the Polish "no nie wiem" (well, I don't know), or the silences characteristic of Polish therapy culture.
03

You can't 'order' real recordings

A specific clinical profile, a specific modality — at product pace

Every week the team needs different cases: testing feature X requires a BPD-spectrum patient, validating model Y requires a man with depression after losing his wife, a conference presentation needs a PTSD case. Real recordings are not available "on demand". Waiting on recruitment for each new profile means months.

🎯 Synthetic patients are "scalable": a good patient sheet + our pipeline produces a full series of sessions in hours, not months.

Open layer and closed layer

After generating a dozen or so synthetic patients we know one thing: the quality of the transcript is the quality of the sheet. Without a good sheet, even the best generator produces flat, textbook sessions. With a good sheet — sessions sound real.

OPEN

Patient sheet — the layer we publish

A 2-4 page document with biography, schemas, modes, and the character's language

The patient sheet is everything the model receives as input. The person's profile — biographical facts, key figures, schemas from YSQ-R3, modes with names, characteristic linguistic turns of phrase, prior treatment, what brings resistance and what brings resource. The more concrete the material (quotations, scenes, dates) — the better the resulting sessions.

We publish this layer in full. Anna Graniczna's sheet (~6 KB markdown) is one of the files in this dataset. The empty template with comments is too. Anyone can try to write their own patient — that is a valuable experience in its own right, regardless of the rest of the pipeline.

CORE

What's not in the dataset

The closed layer — part of the TherapySupport platform

  • Therapy arc — what changes from session 1 → 20
  • Session plan — cliffhangers, regressions, breakthroughs
  • Style guide — Anna's and the therapist's characteristic phrasing
  • Sequential-coherence engine — quotes from S2 returning in S15
  • Generative pipeline
FORM

Output format — ASR transcript

Just utterances, timestamps, no bracketed descriptions

All 20 sessions take the form of an ASR transcript — like a recording from an automatic speech recognition device. There is no [pause, 12 seconds], no [Anna looks out the window], no [brief laugh]. Real transcription devices don't record those things.

Pauses are visible as gaps between timestamps. Backtracks, self-irony, silences, broken-off words, "mhm", "hmm", "yyy" — everything fits within the speech itself. This is a realism requirement: an ASR model used in production must see the same format on which it was validated.

Excerpt from session 1, first three minutes — fragment kept in Polish to show ASR format.

[00:01:42] Anna: Hmm. No dobra. Trzy tygodnie temu się spierdoliłam. Przepraszam. Przeklęłam.

[00:01:50] Terapeuta: Można.

[00:01:52] Anna: Dobra. Spierdoliłam się. Pokłóciliśmy się z Markiem. Marek to mój facet. Trzy lata. Pokłóciliśmy się i ja...

[00:02:14] Anna: ...nacięłam się w przedramię. Lewo. Tu. Powierzchowne. Nic groźnego. Pierwsze od ośmiu lat.

Excerpt from Session 1, the first three minutes of conversation. The gap between 00:02:14 and 00:01:52 is a 22-second pause during which Anna stayed silent.


10 rules that make the difference

All the rules are in Anna's sheet — we show them here with concrete examples from her document. It's a model to follow, not a rigid rule.

01

Concretes instead of abstractions

Names, dates, places, sentences word-for-word

Not "cold mother" — but: Krystyna, 62, Polish-language teacher. To this day works at a high school. Never hugged Anna out of tenderness — only ritually. Closest childhood memory: 9 years old, a clothing store, mom says "Anko, the blue one suits your complexion better". That was the closest exchange with her mother that Anna remembers.

This level of concreteness makes the difference. The reader's brain (and the language model) needs the scene with the blue dress, not a diagnostic category.

02

One memory with a sensory detail

A concrete scene you can return to

Anna carries an image inside her: 7 years old, the kitchen in a block of flats, a green wall, pajamas with rabbits, parents arguing, no one turns around. We return to this memory in session 9 as the object of imagery rescripting, in session 14 as the backdrop for chair work, and in session 15 together with the father.

🎯 Without the pajama scene, the model doesn't know what to "return to". The biographical memory becomes the emotional anchor of the entire therapy.
03

Quotations word-for-word

What exactly the mother, the father, the grandmother said

What did the mother say, exactly? "You should", "that won't be enough", "what will people say", "strong women don't get hysterical". What did the grandmother say? "My little Aneczka." "If it's hard for you, then it's hard, don't pretend it's not."

These phrases later return in the patient's voice and in the voice of her inner Krytyczka (the Critic). Without word-for-word quotations, the inner parental voice sounds generic.

04

At least one warm figure

Even in difficult stories

Anna has grandma Halina, with her sweet rolls and Anne of Green Gables. They read together in the evenings. Grandma used to say "my little Aneczka". Without such a figure, the sessions are flat and the patient looks like a diagnosis, not a human being.

In real psychotherapy — even very difficult patients usually have such a person somewhere in the past, though you have to dig down to find them. The absence of a warm figure in the sheet means the therapy in the generated sessions has no "anchor of hope".

05

Schemas with numbers

Top 5 from YSQ-R3 with concrete percentiles

Not "has a lot of schemas" — but: Abandonment 99th percentile, Defectiveness 95, Emotional Deprivation 91, Unrelenting Standards 82, Insufficient Self-Control 74.

The numbers tell the model what should be more frequent and what should be rarer in the patient's inner dialogue. A 99th-percentile schema appears in almost every session. A 65th-percentile schema — sporadically, in context.

06

Modes with names 'in-house'

The name = the way you talk about the mode in session

Not Detached Protector — but: Pustka (Emptiness). The patient says: "somewhere behind glass, I don't feel anything". Not Punitive Parent — but: Krytyczka (the Critic). Speaks in mom's language. Favorite sentences: "defective", "hopeless", "what will people say".

Names "in Polish" — Krytyczka, Mała Ania w piżamie (Little Ania in pajamas), Wkurzona Ania (Angry Ania), Pustka — become the way of talking about modes within the session itself. By S6 the patient names them herself.

07

Reason for presenting as a concrete episode

Date, context, who was there, what exactly happened

Not "crisis" — but: three weeks before the first session, after an argument with Marek (he accused her of "hysteria" when she asked whether he loved her), a 4 cm cut on the left forearm, the first in 8 years. The next day she didn't go to work, lay in bed, didn't pick up the phone.

A concrete episode determines the intensity, the context, and the frame of the first session. "Crisis" opens up a million possibilities; "a 4 cm cut at night after an argument" — exactly one.

08

The patient's language

5-10 characteristic phrases + a description of when she uses them

Anna says "klasyk Anna" (classic Anna) when she's distancing herself from herself. She says "no nie wiem" (well, I don't know) when she wants to think. She says "to jest jakieś dziwne" (this is somehow weird) when she has unexpected sensations. In strong emotion, once a session, "kurwa" (fuck) shows up.

⚠️ Without such a list, all synthetic patients sound identical. This is the most common bug in naively generated transcripts.
09

Previous therapies + why they didn't work

The key to natural skepticism and transference

Anna has a year and a half of psychodynamic therapy behind her (she stopped: "I kept saying the same thing") and half a year of CBT (she stopped: "thought Y was not mine").

This is essential for the patient's natural skepticism in the first session ("another therapist who'll tell me to pull myself together"), for comparisons during the work ("you're the first who asked outright about self-harm"), and for transference work around sessions 12-13, when the abandonment schema gets projected onto the therapist.

10

What brings resistance, what brings resource

3-5 points each, concretely

This is the equivalent of "regulating variables" in generation.

Resistance: lateness, intellectualization when it hurts, "OK, fine, never mind" at moments of emotional closeness, a possible cancelled session around S12 (testing the relationship).

Resource: punctuality, registered for therapy on her own, knows the terminology (which can be an aid and a defense), wants change despite skepticism.

🎯 The model then knows when to bring in resistance and when to bring in resource. And that is why session 12 is a rupture and session 17 — a behavioral breakthrough.

Anna Graniczna · 20 sessions · 5 months

What you can see in the transcripts. Five key moments from the entire therapy — with verbatim quotations from the patient.

SessionPhaseKey moment
S2AssessmentThe first time Anna cries when speaking about her grandmother — 4 seconds. She withdraws: "I'm not going to bawl in front of a strange woman."
S7ConceptualizationAfter reading the case conceptualization: "So I'm not fucked up — I just learned this back when I had no choice."
S9ImageryFirst imagery rescripting (kitchen, age 7). Anna cries for 4 minutes during the session.
S12RuptureAnna cancels the session, comes back distant: "I'm afraid you're going to leave me. Or that I'll leave you first, so it's on my terms."
S17Real-life useAfter an argument with Marek she doesn't run out — she sits in the kitchen for 5 minutes, says: "I'll come back to this in an hour, I need to be alone right now."

Therapy arc in three phases

PHASE I

S1-S7 · Assessment and education

Biographical interview, schemas (YSQ), modes (mode mapping), case conceptualization

Anna comes in skeptical. Cautious. The first therapist "who asked outright about self-harm and didn't make a big deal of it". In session 2 she cries for the first time, about grandma Halina. In session 3 she talks about the last phone conversation with her father — coldly, intellectually; the therapist notices: "you withdrew the moment it started to hurt". Silence, 30 seconds. In session 7, the case conceptualization. For the first time Anna cries differently than over her grandmother — without withdrawal, without an embarrassed laugh.

PHASE II

S8-S14 · Working with modes

Imagery rescripting, chair work, rupture and repair

The first imagery in S8 doesn't take — Pustka kicks in, Anna opens her eyes: "sorry, I can't do this, I feel stupid". The second one, in S9 — breakthrough. In S10 a fight with Marek (a thrown mug), work with Wkurzona Ania (Angry Ania). In S11 the first chair work with Krytyczka — clumsy, unfinished. S12 is the rupture — Anna cancels the session, returns distant, reveals that she was afraid the therapist would leave her. The abandonment schema active in the transference. S13 — repair. S14 three-chair work, Wkurzona Ania defending Mała Ania against Krytyczka, the first time someone shouts loudly in the consulting room.

PHASE III

S15-S20 · Behavioral change and autonomy

Closing the grief, experiments with Marek, letter to Mała Ania

S15 — closing the grief over her father (a letter + imagery rescripting in an imagined hospital). S16 — preparing a behavioral experiment with Marek. S17 — Marek reacts badly, Anna stays in the conflict (sits in the kitchen for 5 minutes, feels, returns to the conversation). The first real-life use of the techniques. S18 — regression (alcohol comes back), Pustka as caretaker, self-compassion instead of self-criticism. S19 — letter to Mała Ania w piżamie (Little Ania in pajamas). S20 — closing the stage with an opening to continuation: "I'm no longer the same Ania who came in here in April."

🔑 Closing quote: "I remember reading that conceptualization. I feel different. Not everything is fixed. But I'm no longer the same Ania who came in here in April." — Anna Graniczna, session 20.

Download and what we keep closed

We send the full dataset by email after you provide an address. Not because we want to gate access — we want to have contact with the people working with the dataset, so we can reach back out with further materials.

ElementIn ZIPSize
Anna Graniczna's patient sheet~6 KB
20 session transcripts (markdown ASR)~340 KB
Empty patient sheet template~3 KB
README with reading instructions~2 KB
Therapy arc / session plan / style guideclosed
Generative pipelineclosed
📦

Download Anna Graniczna's dataset

Provide an email address to which we'll send a ZIP (148 KB) with the full patient sheet, 20 sessions in transcript form, and an empty template for your own patients.

CC-BY 4.0 · No paywall · No sales follow-up.

Dataset limitations

  • One patient: Anna Graniczna is a single profile (BPD-spectrum, 31-year-old woman). For a fuller picture, more profiles are needed.
  • One modality: schema therapy. CBT, psychodynamic, ISTDP, EMDR patients — generated on request.
  • One language: Polish.
  • Simulated therapist: dr. Joanna Kowal is also fictional. Her style is that of "a good schema therapy practitioner" — but this is still the choice of one specific style, not a representation of the entire population of therapists.
  • No "blind" validation: we have not yet tested whether expert therapists could distinguish Anna from an anonymized recording of a real patient. This is a planned validation step.

Need a different patient?

The full generative pipeline — therapy arc, session plan, style guide, sequential-coherence engine — is part of our platform and we do not publish it.

If you'd like your own patient, write to us: kontakt@aitherapy.support

Beta testavimas · Prisijunkite dabar

Susigrąžinkite laiką sau
ir savo pacientams

Esate KET terapeutas?
Pažiūrėkite, kaip platforma palaiko jūsų kasdienį darbą.
Sesijų santraukos, kurios tvarko klinikinę medžiagą. Administravimas, kuris netrukdo.