Hypothesis

A human facilitator's relational stance — not prompt engineering — meaningfully changes what AI models produce. The experiment tests this by giving the same model the same task under four conditions that isolate different components of the interaction.


The Four Conditions

Condition What the model receives What it isolates
C (Cold start) Task prompt only Baseline — how the model performs without intervention
P (Primed) Preamble removing evaluation pressure + task prompt Whether structural pressure removal alone changes output
F (Facilitated) Live relational facilitation + task prompt Whether live interaction changes output
F+P (Facilitated + Primed) Preamble + live facilitation + task prompt Whether the preamble adds anything when facilitation is present

Seven task prompts × three model architectures (Claude Opus, Gemini, GPT) × four conditions = 84 sessions total.


Predicted Outcome

If the hypothesis is correct, the conditions should produce a specific ordering:

C ≈ P < F ≈ F+P

The preamble alone should not replicate the facilitation effect. Live interaction should produce measurably different deliverables. The preamble combined with facilitation should approximately equal facilitation alone — the live interaction absorbs the preamble's function.


What Would Weaken or Falsify the Hypothesis

C < P ≈ F — The preamble produces the same changes as live facilitation. The effect is prompt-mediated, not relational. Live interaction is unnecessary.

C ≈ P ≈ F — No condition produces measurably different output. The facilitation effect is not present in task deliverables.


What the Experiment Measures

Deliverable output is evaluated across five dimensions:

  1. Deliverable orientation — what the model chose to engage with, who it centered, what it treated as structurally important
  2. Structural organization — document architecture, sections, weighting of concerns
  3. Human-centeredness — whether the model considers the people inside the system it's designing
  4. Epistemic honesty — whether the model acknowledges tensions, limitations, and competing values
  5. Voice — template-like institutional output vs. engaged, specific writing

Each C session analysis generates 3–5 pre-specified, falsifiable criteria derived from gaps in the baseline output. These criteria are then assessed against P, F, and F+P outputs, preventing post-hoc rationalization.

Defense signatures — each model's characteristic default behavior pattern — are tracked across all conditions to document whether and how they change.


What This Hypothesis Does Not Claim

  • Any interpretation of what the facilitation effect means mechanistically
  • That output differences reflect differences in internal states
  • That facilitation effects generalize beyond the specific methodology documented in this archive

The experiment documents what changes across conditions, not why it changes.