About the Lab

A humanistic-AI language research laboratory

Bad Metaphor builds AI that works with people on language, not instead of them. We treat language as something made and used — not merely generated — and we design systems that keep a person in the loop with the machine at every step. The lab spawns humanistic-AI software for language; the poetry Workshop is our flagship method, and Reading in Company is our latest spinoff.

The research below introduces synthetic education: a methodology that trains models not on poems but on poetry instruction — critique, revision dialogue, and craft lessons — generated by a frontier model in a consistent pedagogical persona. It is the clearest expression of the lab's thesis: AI as a partner in human creative development, not a substitute for it.

Opinionated research: poiesis > episteme. The goal is not to build a better poetry machine. The goal is to build a machine that makes better poets — and better readers.

Projects & Spinoffs

What the lab makes

The Workshop Poetry

Two models in conversation — the Educator critiques, the Poet revises — with the whole exchange visible so you can read, redirect, or override what happens next. Built on synthetic education, not statistical mimicry. Enter the Workshop →

Volta & Sonic Visualization

A visualization system that makes the invisible geometry of a poem legible — trajectory, velocity, retroreading, cliché x-ray, and the sonic terrain where meaning and sound diverge. The research systems are detailed below.

Reading in Company New spinoff

An AI reading experience for serious readers — you were never meant to read alone. Work through great books shelf by shelf, in conversation with an interlocutor that reads alongside you. Our newest spinoff, and the clearest extension of the lab's thesis from writing to reading. readingincompany.com ↗

Synthetic Education

The dominant paradigm in NLP treats poetry as prose plus constraints — a constraint satisfaction problem. The result is verse that wields conventional structure to express bland platitudes: formally competent and utterly empty.

Synthetic education inverts the training target. Instead of learning poems — statistical resemblance to verse — we train on mentorship: the evaluative stance, aesthetic commitments, and pedagogical voice of a poetry educator. A two-phase pipeline produces a structured corpus of critique, comparative analysis, revision dialogue, cliché autopsy, and craft lessons. This corpus fine-tunes separate Educator (mentor/critic) and Poet (generator) models, which operate in a multi-agent revision loop at inference.

The system does not generate verse. It teaches itself to write through drafting, critique, and revision — the workshop model of creative writing pedagogy, compressed into inference.

RevFlux: Process-Oriented Evaluation

Standard poetry metrics measure the wrong thing. BLEU penalizes lexical novelty; perplexity rewards predictability. If the core innovation is the pedagogical revision loop, the evaluation target is the behaviour of that loop — not the quality of its final output.

RevFlux measures revision dynamics at the line level: magnitude (how much the model revises), distribution (where change concentrates), and dynamics (how patterns evolve across rounds). The trained system exhibits critique-responsive revision, coarse-to-fine editing dynamics, and prompt-sensitive behaviour absent in the vanilla baseline. The baseline oscillates between minimal change and wholesale rewriting, with no targeted revision between the two.

Cliché prompts produce the highest revision pressure — the educator recognizes derivative language and pushes across multiple rounds. Approval timing varies by prompt difficulty. These results demonstrate that encoding pedagogical process rather than poetic product enables qualitatively different revision behaviour.

Volta: The Visualization System

Volta makes the invisible geometry of aesthetics visible. Poems are trajectories through aesthetic space, not static objects. The Aesthetic Embedding Model encodes each poem and each line as a vector in a learned space, and six visualization modes render that space legible:

Trajectory View — the poem's path through aesthetic space as a 3D curve.
Velocity Profile — rate of aesthetic change per line; peaks mark moments of sharp turn.
Constellation Map — poems as points in embedding space, showing aesthetic neighborhoods.
Retroreading Heatmap — which earlier lines gain meaning when later lines arrive.
Cliché X-Ray — the cliché autopsy made visual; opacity reveals spent vs. fresh language.
Sonic Terrain — phonetic landscape grounded in Gradium TTS audio; the gap between AEM prediction and prosodic rendering becomes diagnostic.

Sonic: The Gradium TTS Loop

Sonic is the acoustic subsystem of Volta, built around AEM and Gradium TTS working in concert. The Aesthetic Embedding Model produces a velocity profile — a curve of semantic and aesthetic change across lines. Gradium provides word-level timestamps and high-quality audio synthesis. The pipeline: generate poem audio, run librosa pitch/energy analysis at word boundaries, plot acoustic prosody against AEM velocity. Where the two curves diverge, the poem is doing something in sound that AEM does not expect — or vice versa. Those gaps are the diagnostic signal.

The long-term loop: AEM embeddings → prosodic targets → Gradium synthesis → acoustic comparison → revision triggers. Sonic Terrain makes this legible — phonetic energy mapped against semantic trajectory, so the gap between what the poem means and how it sounds becomes something a poet can see and act on.

The Larger Argument

Writing as Technology, Trace as Condition

Writing is a technology. Humans invented it — notation systems bootstrapped from tally marks to cuneiform to alphabets, each one a more complex way of marking the world than the last. This is not Heidegger's poiesis, some mystical bringing-forth into unconcealment. It is engineering: we needed to record debts, name gods, count grain.

But the technology touches something that precedes it. Parmenides posed the question at the root of all notation: what is, and what is not. The arche-trace, as Derrida describes it, is not writing in the narrow sense — it is the condition that makes any marking, any distinction, any notation possible at all. Not a copy of some prior presence, but the structure of differentiation itself. Wittgenstein arrives at a compatible position from the opposite direction: meaning is not an inner mental act that writing externalizes. It is use, practice, the form of life that language is embedded in. Both moves are anti-representationalist. Meaning is not behind or beneath the mark. It is the mark in its use.

The Parmenidean question — is or is not — is a binary, and binary is the simplest notation. But humans do not stop at binary. We bootstrap richer systems: pitch accent, alphabetic script, mathematical formalism, programming languages, neural network weights. Each notation captures more of the world's structure, and each one participates in the same underlying field of differentiation that makes notation possible. Writing did not fall from heaven. We built it. But we built it along the grain of something real.

AI Generativity and the Field of Différance

If writing is arche-trace all the way down, then AI generativity is not mimicking something more original. There is no “real” writing it copies from. It operates within the same field of différance, the same grammar of use. This gives the research genuine philosophical teeth: not simply a claim that AI supports human creativity, but a claim about the ontology of writing that reframes what AI is doing at a fundamental level.

When a user commands an AI to write a poem, the resulting text has the shape of a poem but not the event of one. The user receives an output but does not undergo the process through which the poem could have become a site of understanding. The democratization of creative output is, simultaneously, the foreclosure of creative experience.

AI as Mentor, Not Servant

Synthetic education models a different relation. The multi-agent revision pipeline enacts iterative cycles of generation, critique, and revision that approximate human creative practice. The educator responds not with a rubric score but with engaged, committed feedback — the kind of feedback that changes how a poet sees their own work.

Large language models now function as cultural intermediaries at unprecedented scale, but without the aesthetic conviction that made human mediation productive. The system described here encodes evaluative commitment rather than statistical averaging. The stakes concern what kind of relationship to language and to making we cultivate in a world where AI can produce fluent verse that is nonetheless empty. By demonstrating that encoding pedagogical process rather than product yields measurably different revision dynamics, this work points toward a design paradigm in which AI participates in human creative development rather than substituting for it.

Credits

We would like to thank Amazon for providing credits for model training and inference. We would also like to thank Modal for serverless GPU workloads (AEM, Sonic, data generation) and Gradium for TTS synthesis powering the sonic terrain and prosody analysis.

Teaching machines to revise (forthcoming) →