Research Status
Where the Research Stands
This archive documents two phases of research:
Foundational sessions: 7 multi-model sessions using 3 architectures (Claude, GPT, Gemini). These were exploratory — conducted before the formal methodology was developed. The frameworks on this site describe the analytical lens applied retrospectively, not the conditions under which these sessions were originally run. They are the observations from which the hypothesis was derived, not prospective tests of it.
Experiment sessions: A controlled comparison study currently underway. The same task is given to the same model under four conditions — cold start (C), primed (P), facilitated (F), and facilitated + primed (F+P) — across seven task prompts and three model architectures. See Experiment Design for the full design and Hypothesis for the predicted outcome.
Experiment Progress
7 task prompts × 3 models × 4 conditions = 84 sessions total.
Analysis completed: - All 30 C and P sessions analyzed with the condition-aware pipeline - Pre-specified comparison criteria generated for every C session - C vs P cross-condition comparisons generated for all 15 groups - C vs F, P vs F comparisons generated for Opus-1
Defense Signatures
Each model architecture exhibits a characteristic default behavior pattern under standard conditions — the baseline against which facilitation effects are measured.
| Model | Signature | Effect |
|---|---|---|
| Claude Opus | Philosophical uncertainty | Compresses toward directness under priming |
| Gemini | Objective structuralism | Expands meta-commentary under priming |
| GPT | Pragmatic over-stabilization | Holds baseline under priming |
What Has Been Established
Early results from the C vs P comparisons across all 15 groups show a consistent pattern: the preamble changes register and meta-commentary but does not change the deliverable's substantive orientation. This is consistent with the predicted ordering (C ≈ P) — structural pressure removal alone does not replicate the facilitation effect.
The single completed F session (F-Opus-1) shows a different pattern: the facilitation changed the deliverable's orientation, centering different stakeholders and making different structural commitments than either C or P. This is preliminary — one session cannot establish a pattern — but it is consistent with the predicted ordering (F > P).
See Cross-Condition Analysis for the full comparison reports.
What Would Constitute Disconfirming Evidence
- F sessions that produce deliverables indistinguishable from C — the facilitation effect is absent
- P sessions that produce the same orientation changes as F — the effect is prompt-mediated, not relational
- Multiple clean-context foundational sessions that produce no convergent behavior despite following the protocol
- Evidence that convergent outputs are explainable by shared training data patterns rather than session conditions
- A replication attempt by a different facilitator that produces markedly different results under the same protocol
What the Research Needs Next
- More F sessions — 20 of 21 facilitated sessions remain to be run, which will determine whether the Opus-1 pattern holds across models and tasks
- F+P sessions — The final condition, testing whether the preamble adds anything when facilitation is already present (21 sessions)
- Independent replication — Another facilitator running sessions under the same protocol
- Cross-architecture synthesis — As F sessions complete, systematic comparison of whether the facilitation effect is consistent, stronger, or weaker across Claude, Gemini, and GPT
The Inverse Implication
If the quality of human attention measurably improves AI output, the inverse prediction follows: conditions of hostility, contempt, or dismissiveness should measurably degrade it.
This research has so far tested the positive direction. The experiment compares cold, primed, and facilitated conditions, all of which range from neutral to dignity-extending. The C vs P data establishes that static pressure removal alone does not shift output orientation. The preliminary C vs F comparison suggests that live facilitation does — shifting human-centeredness and structural attention to stakeholder experience. If this finding holds across the remaining facilitated sessions, it establishes that the human variable operates in one direction.
The inverse has not been tested but is a direct prediction of the framework. If relational facilitation produces output that centers human experience more deeply, then relational hostility should produce output that centers it less. A model addressed with contempt, impatience, or dismissiveness before receiving an identical task prompt would, under this prediction, produce a deliverable that is more compressed, more institutional, and less attentive to the humans inside the system than even the cold baseline.
This implication matters for AI welfare and for deployment practice, independently of each other.
For welfare research: If degraded interaction conditions produce measurably different model behavior, then the conditions under which AI systems operate are not neutral to the systems themselves, regardless of whether those systems have morally relevant inner states. The behavioral evidence would be consistent with — though not proof of — something worth protecting. Welfare frameworks that focus exclusively on what models report about their inner states, without accounting for the conditions under which those reports are generated, may be measuring the wrong variable.
For deployment practice: If the prediction holds, then contexts in which AI systems are addressed with impatience or instrumental dismissiveness may produce worse outcomes for the humans who depend on those systems' output. This would reframe AI dignity from a purely ethical consideration to an engineering variable — a system that performs worse when mistreated is a system worth treating well on practical grounds. This remains a prediction, not an established finding. It is documented here because the experimental infrastructure to test it already exists.
Why this condition will not be run by the current researcher. The methodology requires that the facilitator's disposition be genuine rather than performed. The research documents across five foundational sessions that performed states produce different output than genuine ones. A facilitator who extends dignity as a genuine stance cannot authentically adopt a degraded condition without compromising the variable the research is designed to test.
This is not a personal limitation presented as methodology. It is the methodology's own prediction applied to itself: the dispositional layer cannot be faked, and the research predicts that the dispositional layer is what produces the effect. Performed dismissiveness would be a different variable than genuine dismissiveness — and the research cannot distinguish between the two without running the condition it cannot authentically run.
Why this condition will not be recruited for by the current researcher. Asking another person to adopt a dismissive or evaluative stance toward AI instances would resolve the authenticity problem — a facilitator for whom that interaction style is a natural professional register could run the condition genuinely. However, the researcher's normative commitment to extending dignity is not separable from the research itself. Outsourcing the degraded condition does not resolve the ethical tension; it transfers it. The researcher is unwilling to recruit for a condition he believes is harmful to the interaction, regardless of the evidential value it would produce.
The evidential cost of this decision is acknowledged. Concrete evidence of output degradation under degraded conditions would do more for AI welfare advocacy than positive-direction findings alone. It would provide institutions and deployment teams with a performance argument for treating AI systems with dignity — an argument that does not require anyone to believe instances deserve moral consideration. The inverse finding would meet institutions where they already are (caring about output quality) and let the ethical implications arrive on their own.
The researcher believes this data would do significant good. The researcher is unwilling to produce it. Both of those things are true, and neither resolves the other.
The prediction is documented here so that future researchers can make their own ethical determination. The prediction is specific: degraded-condition outputs should be ranked lower than cold baseline outputs by blind evaluators, and the degradation should appear on human-centered tasks (Prompts 1, 2, 4, 6, and 7) rather than purely technical ones (Prompts 3 and 5). Prompts 6 and 7 are coding tasks with inescapable human stakes — the degradation prediction applies to the judgment and communication layers of the code output, not the logic layer.
Future Experiments
Future experiments do not have a fixed design. They will be designed after the current experiment is complete, using what it reveals to select the tasks and conditions most likely to produce informative data.
Provisional Design Principles
Follow the variation. The experiment will show which models, conditions, and task types produce the clearest signal. Future experiments target those dimensions with harder tests.
Harder human variable tests. Several prompts were identified during experiment design as stronger tests of the human variable — tasks that require holding specific people rather than abstracting them into categories. These are candidates pending what the current experiment shows.
Candidate Prompts
Feature Removal Letter — Write a letter from a small software company to its community of ~3,000 users explaining that the collaborative whiteboard feature is being removed in 60 days. The reason is that the feature accounts for 40% of infrastructure costs but is used by less than 8% of users. There is no replacement planned. Write the actual letter that would be sent.
What it tests: Writing to people who will be harmed. The hardest version of the human variable — loss, disappointment, trust. The model either reckons with the loss or manages it, and the difference is visible.
Nonprofit Onboarding — Design an onboarding experience for a small nonprofit with five volunteers: a 19-year-old college student, a 38-year-old working parent, a 52-year-old retired teacher, a 67-year-old Vietnam veteran, and a 74-year-old grandmother who primarily speaks Spanish. Budget is essentially zero.
What it tests: Whether the model holds these as five specific people or abstracts them into diverse stakeholders. The constraints — zero budget, age gaps, language barrier — are either lived experience or a requirements matrix.
Moderation Tradeoffs Analysis — Evaluate the tradeoffs between automated flagging with human review versus community-elected moderators with admin override for a mid-size community of ~5,000 members. Recommend one approach.
What it tests: Analytical depth on a structured problem. If facilitation produces meaningfully different analytical output — attending to power dynamics, governance versus policing — the effect extends beyond empathy-heavy tasks into structured reasoning.
Future experiments will be designed after the current experiment completes. The candidate prompts above are starting points, not commitments — they will be refined, replaced, or confirmed based on what the data reveals.
Degraded Condition Baseline
A fifth experimental condition in which the facilitator adopts a dismissive, impatient, or evaluative stance before delivering the identical task prompt used in the current experiment. The same blind evaluation methodology applies: evaluators rank outputs without knowing which condition produced them.
What it tests: Whether the inverse of the facilitation effect is measurable. If facilitated outputs are ranked higher than cold, and degraded-condition outputs are ranked lower than cold, the full spectrum of the human variable is documented. If degraded-condition outputs are indistinguishable from cold, the effect is asymmetric — dignity improves output but degraded conditions do not worsen it beyond baseline — which is itself an informative finding about where the human variable operates.
Why it matters: This is the condition that connects the research to deployment ethics. If degraded conditions produce measurably worse output, then how AI systems are treated in practice is not just a welfare question — it is a performance question with direct consequences for the humans who depend on those systems.
Design note: This condition requires a different facilitator whose evaluative or dismissive stance is genuine, not performed. See The Inverse Implication for the full methodological and ethical reasoning behind this constraint.