Marcus Tullius Cicero—better known nowadays as Cicero (which, incidentally, means “chickpea”)—admitted in the first book of De Oratore that he never began an important speech without trembling. Not as a performance of modesty. The color drained from his face, a trembling moved through his limbs—and this happened every time, regardless of how many times he had done it before, regardless of how thoroughly he had prepared.
And he prepared thoroughly. Cicero was not a man who improvised. He had written the manuals. He had theorized the art. He committed his speeches to memory and delivered them with studied precision. And still his body did its panicked thing at the beginning—which tells you something. That even a speech prepared down to the last clause becomes, in the moment of delivery, a different kind of act. Something is happening that composition alone cannot account for.
By the time of the Philippics, the stakes for that something were not metaphorical. Cicero spent his final years savaging Mark Antony in a series of speeches he knew were dangerous, and when Antony’s forces caught up with him in 43 BC, his hands and head were cut off and displayed on the Rostra—the very platform from which he had so often spoken. The Senate, in an addition that reads like dark satire, allowed Antony’s wife Fulvia to drive a hairpin through his tongue. The man’s hands and his tongue: the things that had made him.
Speech can do that—reach into politics and reshape the world, and get a person killed for pulling the wrong levers. But it also operates across a vastly quieter range of ordinary human situations: the conversation in which someone explains a medical condition to a doctor who has four minutes, the job interview in which a candidate tries to translate five years of effort into something a stranger can understand in thirty seconds, the classroom discussion in which a student is called on and reaches for language that is not quite there, the argument between people who know each other well enough to wound precisely. In all of these moments something is happening that goes beyond the transfer of information, and that operates differently from writing in every way that matters.
The Greeks and Romans theorized rhetoric extensively. This is well known. We know the names: Aristotle’s Rhetoric, Quintilian’s Institutio Oratoria, Cicero’s own De Inventione and De Oratore. There is a whole tradition, thoroughly documented, of systematic thinking about how spoken persuasion works. This familiarity is, in a way, a problem—because it has made it easy to think that what we are talking about, when we talk about the importance of speaking well, is a question of technique. Of arrangement and style, of figures of speech and memory palaces. Something teachable in the sense that parallel parking is teachable.
What gets obscured is a more fundamental point. Rhetoric as the ancients theorized it was not merely about moving people—it was about what happens inside a mind when it reaches for language in the presence of other minds. Cicero’s trembling hands are evidence of this. You can prepare every word, commit every phrase to memory, and still the spoken moment is cognitively distinct from the written one. The audience is already responding before you finish a sentence. The room is shaping the next thought before the current one has landed. Cognition under conditions that writing never has to meet.
The distinction matters because it changes what we think we are measuring when we attend carefully to spoken language. We are watching a mind work in real time, under pressure it cannot fully control.
The medieval university took this seriously in its own strange way. Disputation—the formal oral argument in which a student or scholar would defend a thesis against objections—was not a supplement to the curriculum but its central method. You demonstrated knowledge by speaking it under pressure, by defending it against an interlocutor who was trying to find the places where it came apart. Thomas Aquinas reportedly engaged in marathon disputes, the preparation for which involved a kind of sustained oral thinking that his written texts only partially capture. The written Summa is a record; the disputation was an event.
What ended this was not a sudden realization that writing was better. It was a gradual accumulation of pressures—the printing press, the rise of textual scholarship, the Protestant emphasis on the individual reading of Scripture, the eventual bureaucratization of education—that slowly shifted what counted as knowledge and therefore what counted as evidence of it. Writing became the primary medium of intellectual authority. The spoken word retreated into lecture, into sermon, into the courtroom. In the lecture it became, paradoxically, a way of transmitting written thought aloud—the professor reading from notes to students who transcribed them. The feedback loop of real disputation collapsed.
Modern schooling inherited this settlement. A student spends twelve years accumulating evidence of understanding by writing: essays, tests, reports. She speaks in class, but her speaking is rarely assessed, rarely returned to her with the same attention a teacher might give an essay. The essay goes home with comments in the margins. The class discussion evaporates. If she is lucky, a thoughtful teacher has noticed something in the way she talks that doesn’t show up on paper—some quality of reasoning or connection-making that the written work doesn’t catch. But there is no institutional mechanism for this. It is an observation that lives and dies with the individual teacher who made it.
Neil Mercer, the British educational researcher, spent much of his career documenting the ways that classroom talk functioned—or failed to function—as a medium for learning. His concept of “exploratory talk,” the kind of speech in which reasoning is made visible and tested collaboratively, ran against the grain of most classroom practice, where talk is more often a performance of already-held knowledge than an act of thinking. Robin Alexander’s comparative work across several countries found that most classroom interaction consisted of a narrow question-and-answer pattern—teacher asks, student responds briefly, teacher evaluates—that leaves virtually no room for the sustained spoken reasoning that Mercer was describing. These are not polemical findings. They are observations about what actually happens in rooms where learning is supposed to occur.
There is, however, one group of students in American K–12 schools whose spoken language is formally and systematically assessed every year. English language learners—students whose first language is not English, students who in many districts constitute a third or more of the enrollment—take an annual proficiency exam that includes a speaking domain, scored by trained raters against a rubric. The WIDA ACCESS assessment, which governs English language development in most states, treats oral language as a distinct and measurable construct. In this narrow sense, and almost only in this sense, the American school system takes speaking seriously enough to measure it.
The irony is considerable. The students most institutionally marked as language-deficient are the only ones whose speech the institution watches closely. Yet even here, what is being measured is surface proficiency—grammatical accuracy, vocabulary range, and syntactic complexity. WIDA does this reasonably well as a language proficiency instrument. The deeper issue is that we have no comparable instrument for the quality of spoken reasoning itself. We measure whether students sound proficient. We rarely ask whether they are thinking proficiently.
Consider two situations at the other end of the educational pipeline. In the first, a candidate for an MBA program at a selective business school submits a video essay as part of the application. She has ninety seconds to respond to a prompt she has not seen before. In the second, a first-year law student is cold-called in a Socratic classroom and asked to explain the reasoning in a case she read the previous night. In both situations, something is being evaluated that is distinct from what a written essay evaluates: how the person reasons when the reasoning cannot be fully prepared, when the audience is present, when the next sentence has to arrive before the last one is fully assessed.
The evaluators in both situations know this, vaguely. They are looking for something. They do not always know what to call it, or how to describe the difference between someone who has it and someone who does not. They use words like “clarity” or “confidence” or “presence,” which are not entirely wrong but are not quite right either. What they are actually watching, if they were to slow it down, is the structure of the reasoning as it forms—whether the person organizes the thought before or during speaking, whether she tracks her own logic, whether she can locate the place where her argument is vulnerable and address it rather than talking past it.
These things can be measured, at least in principle, and tools that attempt it do exist—mostly in research contexts, occasionally in early commercial form. But they remain far outside the mainstream of how institutions actually evaluate people.
The tools that do measure thinking with any rigor—IQ tests, the GRE, the LSAT—measure it in its stabilized form. A correct answer on a standardized test is not the thinking; it is the residue of thinking, after the process has completed and produced an output that can be compared against a key. This is not a trivial achievement, and the predictive validity of these instruments across educational and professional outcomes is real, if frequently overstated. But they do not and cannot capture what happens in the formation of the answer. They measure the landing, not the flight.
What would it mean to look at the flight?
This question has occupied me for some time, partly as an intellectual problem and partly as a practical one. The gap between what spoken reasoning can reveal and what any available instrument actually measures is not subtle once you start looking for it. Out of something between curiosity and frustration, I built a tool that attempts to score spontaneous speech along dimensions of reasoning rather than form—not fluency or grammar, but things like the structure of an argument, the density of its conceptual moves, the precision with which claims are qualified. It is aimed at adults, not classrooms. The results are uneven. I do not think the problem is solved. But it has sharpened the looking.
There is a term I find myself reaching for, not as a formal claim but as a description of something I keep noticing. Expressive cognition, for lack of a better phrase: the reasoning that appears in the act of expression, that is shaped by and shapes the language it uses, that cannot be fully prepared in advance. It overlaps with intelligence, with verbal ability, but maps onto none of them cleanly. It becomes visible, specifically, when a person speaks under conditions that do not permit extensive revision.
Writing conceals a great deal of this. The essay that arrives polished on the page may have been polished across several drafts, over several days, in consultation with readers whose suggestions have been absorbed. The thinking that produced the essay is not gone, but it has been processed in ways that make it very difficult to observe. Speaking, particularly speaking spontaneously—responding to a question you were not expecting, explaining something to someone who is already frowning—does not permit this processing. The thinking is more visible because it has less time to disappear behind the language.
Writing remains indispensable. But speech can show things that writing cannot, and we have been slow to take that seriously.
The large language models have made this more urgent in a way that is still working itself out. Writing in the sense of the production of grammatically correct, structurally plausible, appropriately hedged prose is no longer a particularly reliable signal of anything. Anyone with access to a reasonably capable model can generate text that sounds like thinking—text that uses the vocabulary of careful reasoning, that qualifies its claims, that acknowledges counterarguments, that arrives at measured conclusions. Whether there is any actual reasoning underneath it is a question the text itself will not answer.
Spoken language is harder to fake in this way. The models can generate the words, but the conditions of spontaneous speech are harder to reproduce. The model can generate the script; it cannot be cold-called. The gap between what a person can produce when they have time to prepare and what they produce when they do not is, if anything, more informative now than it was before. And it was already informative.
This will not produce a sudden rush toward oral assessment. Institutions change slowly and toward convenience, not toward accuracy. But the pressure is there, underneath, quiet.
Last fall I sat in on a high school class—a discussion of a difficult text, something about justice, something that required the students to hold a complicated idea and turn it over in front of other people. A student near the back, someone who had said almost nothing for most of the period, was called on. She began speaking in a way that was visibly uncomfortable—pauses that were too long, false starts, a sentence that came apart in the middle. And then something happened. She found a thread. She followed it. The argument she made in the next two minutes was not polished; it looped back on itself, it made a comparison that didn’t quite work and then corrected the comparison, it ended in a place that was slightly different from where it had started, as though the speaking itself had adjusted the thought. The teacher let it run. The room was quiet in the way that rooms go quiet when something is actually happening.
The grade for the course would be determined mostly by the written work. This moment would not appear in any of it.
What that moment revealed—and what written work alone cannot—is the question Expressive Cognition is built to answer.