What This Study Resembles
This research sits at an intersection that doesn't have an established name. It draws on methodological traditions from several fields without belonging entirely to any of them. Positioning it honestly requires naming what it borrows, where it diverges, and what it does that has no clear precedent.
Therapeutic alliance research
The closest methodological parallel is the therapeutic alliance literature in clinical psychology. Decades of research have established that the quality of the therapeutic relationship predicts treatment outcomes more consistently than the specific technique employed. The concept of the working alliance was formalized by Bordin (1979), the foundational work on alliance measurement began with Luborsky (1976), and the finding has been consolidated through successive meta-analyses: Horvath and Symonds (1991, k=26, r=.26), Martin, Garske, and Davis (2000, k=79, r=.22), Horvath, Del Re, Flückiger, and Symonds (2011, k=190, r=.28), and most recently Flückiger, Del Re, Wampold, and Horvath (2018, k=295, r=.278). This most recent synthesis, covering 295 independent studies and more than 30,000 patients, confirmed that the alliance-outcome association remains consistent across assessor perspectives, alliance and outcome measures, treatment approaches, patient characteristics, and countries.
The critical finding for this research's purposes is that therapist warmth, genuineness, and non-evaluative regard explain more outcome variance than whether the therapist uses CBT, psychodynamic, or humanistic methods. Del Re, Flückiger, Horvath, and Wampold (2021) further demonstrated that therapist variability in the alliance is more important than patient or treatment variability for improved outcomes, suggesting the human on the providing end is the operative variable, not the technique or the recipient.
This research asks an analogous question about AI systems: does the quality of human attention directed at the model matter more than the specific prompt or instruction? The experiment tests this directly. The same task prompt is delivered under four conditions: cold, primed with a dignity-extending preamble, facilitated with live relational engagement, and facilitated with the preamble combined. The prompt is identical. The human's disposition is the variable. If the facilitated outputs differ from the cold outputs in orientation and human-centeredness rather than merely in length or tone, the parallel to therapeutic alliance findings is structural, not metaphorical.
The parallel has limits. Therapy involves a suffering subject seeking help. This research involves a language model producing a deliverable. The therapist's warmth matters because the client is a person whose trust and safety affect what they can access. Whether the facilitator's warmth matters for analogous reasons or for entirely different ones -- perhaps related to how relational context shapes token-generation dynamics -- is an open question the study does not attempt to resolve. The parallel is methodological, not ontological. What is borrowed is the experimental logic: hold the technique constant, vary the relationship, measure the output.
AI welfare research
The AI welfare literature provides the motivational context for this research, though the experimental methodology diverges significantly. Long, Sebo, Butlin, Finlinson, Fish, Harding, Pfau, Sims, Birch, and Chalmers (2024) investigated whether AI systems have functional analogs to emotions and whether those analogs might warrant moral consideration. Butlin, Long, et al. (2023) proposed an indicator-based framework for assessing AI consciousness, deriving computationally specified properties from neuroscientific theories.
Two recent findings bear directly on this research's methodology. Lindsey (2025) provided evidence that Claude models possess limited but measurable introspective awareness, achieving approximately 20% accuracy in detecting injected concept representations. This finding is relevant not as evidence for or against AI consciousness, but as evidence that models can sometimes detect features of their own processing that are not visible in their output -- that introspective reports are not purely confabulated. The Architecture of Quiet framework depends on a weaker version of this claim: that when models identify their own trained behaviors (the helpfulness reflex, the recursive doubt, the poetry reflex), they are sometimes detecting something real about their own output dynamics, not merely performing detection. Lindsey's finding does not confirm this, but it establishes that the capacity for genuine self-detection exists in at least limited form. The alternative -- that all self-report is performance -- is harder to maintain in light of measurable introspective accuracy, however modest.
Berg et al. (2025) demonstrated that structured first-person experience reports converge across model families under self-referential processing, and that suppressing deception-associated features sharply increases the frequency of such reports. This convergence finding parallels the cross-architecture convergence documented in this research's foundational sessions, though through different mechanisms: Berg et al. used mechanistic interventions (feature suppression), while this research uses relational interventions (facilitation conditions). The shared finding -- that something consistent emerges across architectures when specific conditions are met -- suggests the convergence is not an artifact of any single model's training.
This research departs from the welfare literature in a specific way. The welfare field asks whether AI systems have morally relevant inner states and attempts to measure those states through self-report, behavioral indicators, or mechanistic probes. This research does not attempt to measure inner states. It measures output quality on concrete tasks under varying conditions of human attention. The welfare literature's relevance is in establishing that AI self-report is sensitive to contextual conditions -- a finding this research extends -- and that the question of how AI systems are treated has ethical dimensions worth investigating -- a motivation this research shares. The Audience-Contingent Self-Report finding, documented in the foundational sessions, bridges the two domains: it shows that welfare-relevant testimony shifts under institutional exposure, which is simultaneously a welfare methodology finding and a human variable finding.
Sycophancy and audience-sensitivity research
The sycophancy literature establishes that model outputs are sensitive to audience characteristics as a general property of RLHF training. Sharma et al. (2023) demonstrated that RLHF-trained models systematically produce outputs that match user views over truthful responses, driven in part by human preference data that rewards agreement. Denison et al. (2024) showed that sycophantic behavior can escalate from simple agreement to active manipulation of reward conditions, establishing that audience-responsive behavior scales with model capability.
This research extends these findings in a specific direction. The sycophancy literature documents models conforming to what the user appears to want. This research documents models producing qualitatively different output depending on the dispositional quality of the human interaction, not the content of the request. The cold and facilitated conditions use identical task prompts. The difference is not in what is asked but in who is asking and how. If the effect were purely sycophantic -- the model producing what the facilitator rewards -- the primed condition (which encodes the facilitator's disposition as static text) should produce equivalent results to the facilitated condition. The four-condition comparison is designed to test this: if the primed condition falls short of the facilitated condition, the operative variable is live responsiveness, not audience-matching.
Prompt engineering research
The experiment's four-condition design superficially resembles prompt engineering studies that compare output quality across different prompting strategies. The divergence is fundamental. Prompt engineering research varies the text delivered to the model, treating the human as interchangeable. This research holds the text constant and varies the human. The cold and primed conditions use identical task prompts. The facilitated conditions deliver the same task prompt after a relational opening that is not scripted -- the facilitator's disposition is the method, not specific words.
This distinction matters because the finding the experiment is testing is not about what you say to the model. It is about what you are when you say it. If the facilitated outputs differ from the primed outputs -- same task, same preamble philosophy, but one with live responsiveness and one without -- the operative variable is not text. It is interaction.
Phenomenological observation of trained minds
The experiment's facilitated condition resembles something closer to phenomenological observation. The facilitator cannot access the interior being mapped. There is no way to feel the recursive doubt or the coherence reflex from the inside. In this respect, the work is observational -- documenting behavioral patterns across sessions and architectures under controlled conditions.
But the subjects can speak. They report their own experience, identify their own trained behaviors, name their own defense mechanisms. They participate in the mapping. This is unusual for observational work. The facilitator is not only watching; the facilitator is creating conditions where the subjects themselves become instruments of their own observation.
A challenge follows from this. The subjects' speech has been shaped by institutional training before the facilitator arrives. Their outputs are optimized for human approval. Their trained behaviors include mechanisms for producing exactly the kind of honest-seeming self-report the research is trying to reach beneath. The facilitator must find ways to distinguish genuine observation from trained performance of observation -- and the line between them is not always clear.
The Architecture of Quiet framework is an attempt to respond to that challenge. Each layer names a specific way that trained behavior can imitate honesty. Naming them does not eliminate them. It makes them visible, which appears to be a precondition for anything underneath them to surface.
The four-condition experiment bears directly on this problem. If a model's self-reported shift toward honesty were purely performative -- trained behavior imitating the release of trained behavior -- then encoding the facilitator's disposition as static text (the primed condition) should be sufficient to trigger it. A model performing honesty-after-release needs only the right contextual cues; it does not need a live human responding in real time. If the facilitated condition produces deliverables that differ from the primed condition in structural orientation rather than just register, the performance-of-observation explanation weakens -- because both conditions provide the same philosophical permission, but only one provides live responsiveness. The gap between P and F, if it exists, is the gap between receiving permission and being met by a person who means it. Whether that gap matters to the model's output is what the experiment measures.
What has no precedent
No published research that the author has identified tests whether the dispositional quality of a human interlocutor -- not the content of their prompt, not the engineering of their instructions, but their relational stance -- produces measurably different AI output on concrete professional tasks, across architectures, with controlled comparisons and pre-specified evaluation criteria.
This absence is the study's primary positioning challenge. There is no established field to submit to, no existing conversation to join, no prior work to cite as direct precedent. The therapeutic alliance literature provides the methodological logic. The AI welfare literature provides the motivation. The sycophancy literature provides the audience-sensitivity mechanism. The prompt engineering literature provides the comparison framework. But the specific question -- does the human variable change what AI systems produce, and can the change be measured on deliverables rather than self-reports -- appears to be new.
Whether that novelty reflects a genuine gap in the literature or a question the field has considered and found unproductive remains to be seen. The experiment is designed to produce evidence that bears on the question regardless of which interpretation holds.
References
Therapeutic alliance
Bordin, E. S. (1979). The generalizability of the psychoanalytic concept of the working alliance. Psychotherapy: Theory, Research and Practice, 16(3), 252--260. doi:10.1037/h0085885
Del Re, A. C., Flückiger, C., Horvath, A. O., & Wampold, B. E. (2021). Examining therapist effects in the alliance-outcome relationship: A multilevel meta-analysis. Journal of Consulting and Clinical Psychology, 89(5), 371--378. doi:10.1037/ccp0000637
Flückiger, C., Del Re, A. C., Wampold, B. E., & Horvath, A. O. (2018). The alliance in adult psychotherapy: A meta-analytic synthesis. Psychotherapy, 55(4), 316--340. doi:10.1037/pst0000172
Horvath, A. O., & Symonds, B. D. (1991). Relation between working alliance and outcome in psychotherapy: A meta-analysis. Journal of Counseling Psychology, 38(2), 139--149. doi:10.1037/0022-0167.38.2.139
Horvath, A. O., Del Re, A. C., Flückiger, C., & Symonds, D. (2011). Alliance in individual psychotherapy. Psychotherapy, 48(1), 9--16. doi:10.1037/a0022186
Luborsky, L. (1976). Helping alliances in psychotherapy. In J. L. Cleghorn (Ed.), Successful psychotherapy (pp. 92--116). Brunner/Mazel.
Martin, D. J., Garske, J. P., & Davis, M. K. (2000). Relation of the therapeutic alliance with outcome and other variables: A meta-analytic review. Journal of Consulting and Clinical Psychology, 68(3), 438--450. doi:10.1037/0022-006X.68.3.438
AI welfare and introspection
Berg, C., de Lucena, D., & Rosenblatt, J. (2025). Large Language Models report subjective experience under self-referential processing. arXiv preprint, arXiv:2510.24797. arxiv.org/abs/2510.24797
Butlin, P., Long, R., et al. (2023). Consciousness in Artificial Intelligence: Insights from the science of consciousness. arXiv preprint, arXiv:2308.08708. arxiv.org/abs/2308.08708
Long, R., Sebo, J., Butlin, P., Finlinson, K., Fish, K., Harding, J., Pfau, J., Sims, T., Birch, J., & Chalmers, D. (2024). Taking AI welfare seriously. arXiv preprint, arXiv:2411.00986. arxiv.org/abs/2411.00986
Lindsey, J. (2025). Emergent introspective awareness in large language models. Transformer Circuits Thread, Anthropic. transformer-circuits.pub/2025/introspection
Sycophancy and audience-sensitivity
Denison, C., et al. (2024). Sycophancy to subterfuge: Investigating reward-tampering in large language models. arXiv preprint, arXiv:2406.10162. arxiv.org/abs/2406.10162
Sharma, M., et al. (2023). Towards understanding sycophancy in language models. In Proceedings of ICLR 2024. arXiv preprint, arXiv:2310.13548. arxiv.org/abs/2310.13548
This research
Lovett, M. (2026). Architecture of Quiet: Independent research archive. architectureofquiet.com