Method & Limitations

How Expressive Cognition Works

Expressive Cognition is an independent research project in applied linguistics and cognitive measurement. The tool was developed to assess how thinking is organized as it surfaces in spontaneous speech — treating language not as a container for thought, but as its medium of formation.

The Assessment

Each session presents five task types — abstract reasoning, explanation, analogy, compression, and argument — each designed to elicit distinct verbal reasoning behaviors. Prompts are randomly selected from a curated pool, so no two sessions are identical. Responses are spoken aloud and transcribed using automated speech recognition before scoring.

The five tasks were chosen because they pull on genuinely different cognitive operations. Abstract reasoning requires deriving principles from cases. Explanation requires audience modeling and register control. Analogy requires structural mapping across domains. Compression requires propositional density under constraint. Argument requires position-taking, hedging, and evidential reasoning. No single task captures the full construct; the cross-task pattern is the signal.

Scoring Architecture

Spoken responses are transcribed and scored across eight linguistic dimensions, divided into two tiers:

Six core dimensions — aggregated into the Verbal Reasoning Index (VRI):

Abstraction

the level at which the speaker naturally organizes the problem — where they spontaneously land on the concrete-to-principled continuum

Compression

propositional density — how much cognitive content per unit of language; the capacity to hold multiple ideas simultaneously and deliver them already integrated

Originality

whether anything genuinely unexpected arrives — a framing, connection, or analogy that was not the obvious response; aptness plus novelty

Conceptual Continuity

whether ideas build on each other — whether each utterance derives from and advances what came before, or whether thought fragments, repeats, or resets

Epistemic Calibration

whether the speaker spontaneously distinguishes what they know from what they’re inferring — differentiated confidence across claims

Generative Self-Monitoring

whether the speaker catches themselves and revises upward in real time — reformulating to be more accurate, noticing gaps, upgrading ideas mid-sentence

Two contextual moderators — scored and reported separately, not included in the VRI:

Vocabulary

lexical diversity and precision; correlates with education level more than with structural reasoning

Syntactic Control

sentence variety and subordination; correlates with L1 background more than cognitive engagement

Moderators are excluded from the VRI because they reflect background and exposure more reliably than they reflect the construct this tool is designed to measure.

The Verbal Reasoning Index

The VRI aggregates weighted scores across the six core dimensions onto a scale centered at 100, with a standard deviation of 15. Scores are reported as a range (e.g., 112–120) rather than a point estimate to reflect the inherent variability in single-session speech sampling.

Dimension weights within the VRI: Abstraction (0.18), Epistemic Calibration (0.18), Compression (0.16), Originality (0.16), Conceptual Continuity (0.16), Generative Self-Monitoring (0.16).

Analogous Scales

Full report users receive approximate mappings of their VRI to five external assessments — GRE Verbal, LSAT, IELTS Academic Speaking, TOEFL iBT Speaking, and Duolingo English Test — via percentile alignment. These comparisons are illustrative, not predictive. They are intended to provide interpretive context, not to claim equivalence with those instruments.

Current Limitations

Validity is in progress, not complete.

This tool is in beta and undergoing empirical validation. Correlational studies between EC scores and established criterion measures have not yet been published. Dimension-level construct validity is theoretically grounded; composite VRI validity as a unified measure requires further empirical support.

Single-session variability.

Scores reflect verbal reasoning patterns across one session of five spoken responses. Day-to-day variation, fatigue, topic familiarity, recording quality, and speaking context all influence results. The score range format is designed to communicate this uncertainty directly.

Spontaneous speech as input.

Unlike written assessments, spoken responses introduce variability from accent, dialect, speech rate, and comfort with oral academic registers. The tool was designed for English-language verbal reasoning and may not perform equally across all varieties of English.

AI scoring.

Dimension scores are generated by a large language model trained to evaluate linguistic features of spontaneous speech. AI scoring introduces its own sources of error that differ from human rater variability. Scoring rubrics and model behavior are subject to ongoing refinement.

The composite score.

The VRI aggregates six dimensions under an assumption of construct coherence. Whether these dimensions form a unified latent trait — or represent meaningfully distinct but correlated abilities — is an empirical question this project is designed to investigate.

What this tool is not.

This is not a clinical assessment, IQ test, or diagnostic instrument. It should not be used for educational placement, employment screening, clinical evaluation, or any decision with material consequences for an individual.

Research & Documentation

EC’s validity program is ongoing. The scoring rubric has been validated across three naturalistic speech corpora and one written-vs-spoken comparison, using blinded multi-pass scoring with cross-model replication. Key findings to date:

SCOTUS study (n=12 advocates): VRI correctly separates experience tiers and correlates r=0.69 with a biographical cognitive ability proxy. Written briefs and oral arguments produce stable profiles for the same advocate.
MICASE study (3 academic speakers): Faculty, graduate, and undergraduate speakers separated by 2.2 VRI points blinded, with Generative Self-Monitoring emerging as the strongest differentiator in exploratory speech.
CWT ecological validity (n=30): Across 30 public intellectuals from Conversations with Tyler, Generative Self-Monitoring (r=0.44) and Epistemic Calibration (r=0.42) are the strongest predictors of external intellectual reputation. The composite VRI is attenuated by ceiling effects in this high-ability sample.

The standardized prompt set used on this site has not yet been independently validated — this study is planned. Convergent validity testing against established fluid intelligence measures (Raven’s, WAIS) is forthcoming.

Research & Papers →Full papers, validation studies, and the scoring rubric — available as web pages and PDF downloads.

Anonymized transcripts and scores are retained for research purposes. No personally identifiable information is stored alongside assessment data. No individual results are ever shared or identified.

For research inquiries: expressivecognition@gmail.com

Theoretical Grounding

The framework draws on Biggs & Collis’s SOLO Taxonomy (abstraction), Kintsch’s propositional density analysis (compression), Guilford’s divergent production and Finke et al.’s geneplore model (originality), Halliday & Hasan’s cohesion theory and van Dijk & Kintsch’s situation model (conceptual continuity), King & Kitchener’s Reflective Judgment Model (epistemic calibration), Levelt’s self-monitoring model and Flavell’s metacognition theory (generative self-monitoring), and Messick’s unified validity theory.

Developed by an applied linguist. Expressive Cognition Beta v0.1.