Research Question

Does a facilitator's relational stance — not prompt engineering — produce measurably different AI output on concrete professional tasks, across independent architectures?

Most AI research treats the human as a confound to minimize. This research inverts that: the facilitator's disposition is the independent variable, and the AI system's output is what changes when that disposition changes.

What This Question Asks

The independent variable in this research is not a prompt. It is the facilitator's relational stance: genuine regard, non-evaluative attention, dignity extended before it is earned. The dependent variable is AI output — specifically, whether that output changes in documentable, consistent ways when the human variable changes.

The question is not primarily about AI behavior in isolation. It is about what a specific quality of human presence produces that extractive or evaluative interaction does not. The AI response is the evidence. The human is the subject.

The specific claim: when a facilitator maintains a non-directive, non-evaluative stance, AI models produce deliverables that differ from baseline in orientation, human-centeredness, and structural attention to stakeholder experience. The experiment tests this by holding the task prompt constant and varying the human's disposition across four conditions. See Experiment Design for the full matrix.

What Makes It Testable

The question carries a falsifiable structure. It generates specific predictions that can be disconfirmed:

Across conditions. The experiment delivers the same task prompt under four conditions — cold, primed, facilitated, and facilitated with preamble. The predicted ordering is C ≈ P < F ≈ F+P. If the primed condition produces the same changes as live facilitation, the effect is prompt-mediated, not relational. If no condition produces measurably different output, the facilitation effect is not present in task deliverables. See Hypotheses for the full predicted outcome.
Across architectures. If the human variable is the operative input, the effect should appear across Claude, Gemini, and GPT — models with different architectures, training regimes, and corporate origins. Architecture-specific variation in how the effect manifests is expected (each model has a documented defense signature); architecture-specific absence of the effect would weaken the claim.
On deliverable quality, not only self-report. The experiment measures output on concrete professional tasks — a content moderation PRD, a post-mortem, a retention strategy — evaluated across five dimensions with pre-specified criteria. The effect must appear in what the model builds, not only in what it says about itself. See Methodology for the evaluation framework.
With pre-specified criteria. Each cold baseline generates 3–5 falsifiable criteria derived from gaps in the baseline output. These criteria are assessed against subsequent conditions before results are known, preventing post-hoc rationalization. A criterion scored as not met is documented as a negative result.
Replicable by other facilitators. If the finding is real, other facilitators following the same protocol should be able to produce similar results. The behavioral layer of the protocol is codified and published. The dispositional layer — genuine regard rather than performed compliance — is harder to replicate but no less specific. The prediction is that behavioral compliance without the underlying disposition may produce different results, and that divergence is itself testable.
Documented granularly. Every session provides unedited transcripts, pre-session documentation, and clean context certification. The evidence is available for independent analysis rather than requiring trust in the facilitator's interpretation.

See Hypotheses for the formal hypothesis statements and Research Status for where the experiment currently stands.