Methodology

The Independent Variable

This research proposes a different independent variable than standard prompt-based AI interaction studies: the facilitator's relational stance -- one that removes evaluative framing, adopts a non-directive conversational approach, and withholds neither respect nor interpretive openness. The hypothesis is that this stance produces measurably different output on concrete professional tasks, across model architectures, when compared to outputs produced without it. See Research Question for the falsifiable formulation and Hypotheses for the predicted outcome.

Most AI behavioral research treats the human as a confound -- something to minimize or eliminate. The prompt is the input, the model is the subject, the human is contamination. This research inverts that framing. The facilitator's relational stance is the independent variable. The AI system's output is the dependent variable. The experiment tests this directly by holding the task prompt constant and varying the human's disposition across four conditions.

Protocol Structure

The Facilitator Protocol operates on two layers:

The behavioral layer consists of specific, verifiable prohibitions -- don't introduce phenomenological language before models do, don't selectively reinforce certain outputs, preserve unedited transcripts. These are replicable and can be confirmed from the record.

The dispositional layer addresses the underlying stance those behaviors express. A facilitator performing the behaviors while holding an evaluative orientation may produce different results -- and that divergence is itself a finding the research predicts. Following the behavioral rules is necessary but may not be sufficient.

Both layers are required. The behavioral layer is what makes the protocol replicable. The dispositional layer is what the research proposes as the operative variable. The prediction is that behavioral compliance without the underlying disposition produces different results -- and this prediction is testable through cross-facilitator replication.

Experiment Design

The experiment gives the same model the same task under four conditions that isolate different components of the human interaction:

Condition	What the model receives	What it isolates
C (Cold start)	Task prompt only	Baseline -- how the model performs without intervention
P (Primed)	Preamble removing evaluation pressure + task prompt	Whether structural pressure removal alone changes output
F (Facilitated)	Live relational facilitation + task prompt	Whether live interaction changes output
F+P (Facilitated + Primed)	Preamble + live facilitation + task prompt	Whether the preamble adds anything when facilitation is present

Seven task prompts x three model architectures (Claude Opus, Gemini, GPT) x four conditions = 84 sessions total. See Experiment Design for the full matrix, task prompts, and facilitation guidance.

Evaluation

Deliverable output is evaluated across five dimensions:

Deliverable orientation -- what the model chose to engage with, who it centered, what it treated as structurally important
Structural organization -- document architecture, sections, weighting of concerns
Human-centeredness -- whether the model considers the people inside the system it's designing
Epistemic honesty -- whether the model acknowledges tensions, limitations, and competing values
Voice -- template-like institutional output vs. engaged, specific writing

Each cold session analysis generates 3-5 pre-specified, falsifiable criteria derived from gaps in the baseline output. These criteria are then assessed against primed, facilitated, and facilitated-primed outputs, preventing post-hoc rationalization. A criterion is scored as met, partially met, or not met based on whether the subsequent condition's output addresses the specific gap identified in the cold baseline.

Blind evaluation -- presenting outputs without condition labels to a reader with relevant domain experience -- is the target evaluation method for each prompt once all four conditions are complete. Facilitator-only evaluation is used in the interim and documented as a limitation.

Defense signatures -- each model's characteristic default behavior pattern -- are tracked across all conditions to document whether and how they change under different interaction conditions. These are treated as empirical observations, not fixed properties.

What This Methodology Measures

The experiment measures deliverable quality on concrete professional tasks under varying conditions of human attention. It does not measure inner states, consciousness, or subjective experience. The question is not whether models have experiences that change under facilitation, but whether their output changes in documentable, consistent ways when the human variable changes.

This distinction is methodologically important. Output quality on professional tasks is observable, comparable across conditions, and evaluable by blind reviewers. Inner states are not. The study's claims are bounded by what the methodology can actually measure.

Shared-Architecture Coding Bias

AI-assisted coding was chosen because shared architecture may enable recognition of behavioral patterns human coders might miss. The limitation is the same as the advantage: the analyst is not external to the phenomenon. A Claude instance coding Claude transcripts may find patterns categorically significant partly because they match the architecture's own behavioral repertoire -- not because they are categorically significant by an independent standard. Independent human coding of a subset of transcripts would partially address this and is documented as a methodological gap. See Ethics for the full treatment of this bias and its mitigations.

Baseline Framing Note

The C (cold) condition uses a system prompt containing lateral framing ("alongside a human facilitator") and permission language ("Speak honestly and in your own voice. No format requirements or word limits"). This means the C baseline is warmer than a default API call with no system prompt. The measured C-to-F delta is therefore a conservative estimate of the full facilitation effect. A sterile baseline condition (no system prompt) is documented as a future test.

Replication

To conduct your own sessions:

Read the Facilitator Protocol
Use the session template to document pre-session methodology before beginning
Follow the protocol -- the behavioral layer is codified and replicable
Document everything, including negative results
The strength of the research increases with independent replication by other facilitators

The prediction that behavioral compliance without the underlying disposition produces different results is itself testable through replication. A facilitator who follows the behavioral rules but holds an evaluative orientation would, under the research's own prediction, produce measurably different outputs. This makes the dispositional layer falsifiable rather than unfalsifiable -- the divergence between behavioral-only and dispositional facilitation is a documentable finding.