Method
The 95% rule: why difficulty is the whole game
To read comfortably in a new language, you need to already know almost every word on the page. That one fact — and the volume it implies — is the heart of reading-based learning, and the reason StepText works the way it does.
Pick up a novel in a language you're learning before you're ready, and you already know what happens. You translate the first sentence like a puzzle, look up four words in the second, and quietly close the book by the third. It wasn't a motivation failure. It was a coverage failure — and coverage is the most useful idea in reading-based learning that almost nobody outside research talks about.
How many words you need to know
"Coverage" is the share of the words on a page that you already know. The foundational finding here comes from Marcella Hu and Paul Nation's 2000 study: for comfortable, independent reading — the kind you do for pleasure, without a dictionary — you need to know about 98% of the running words. Around 95% is a workable minimum with some effort; drop to 90% and comprehension falls apart.
Those percentages sound abstract until you turn them into unknown words. At 95% coverage, one word in twenty is new — roughly one per line, around twenty per page. That's the upper edge of tolerable. At 98%, it's one word in fifty — a couple per page — few enough that you can guess them from context and keep moving. The gap between "exhausting" and "enjoyable" is shockingly narrow, and it's measured in a few percentage points of vocabulary.
An honest footnote, because we promised honesty: the precise 98% figure comes from a study of 66 university students, and a 2023 replication couldn't fully reproduce it. Treat it as a strong rule of thumb, not a law of physics. The shape of the finding — comprehension depends on knowing the large majority of words, and falls off a cliff below that — is solid. The exact number is a guideline.
What that means in real vocabulary
How big a vocabulary does 95–98% coverage take? For general texts, researchers estimate roughly 3,000–5,000 word families to reach ~95%, and around 8,000–9,000 to reach the ~98% comfort zone. That's a lot of words — and it explains why the jump from "studied the basics" to "can read a real book" feels so enormous. You're not missing a trick. You're missing several thousand words, and there's only one efficient way to get them.
The chicken-and-egg problem reading solves — and creates
Here's the bind. The best way to learn those thousands of words is to read a lot, because words are mostly learned by meeting them in context, again and again. But you can't read a lot until you already know thousands of words. Coverage is both the reward of reading and its entry fee.
The traditional escape hatch is the graded reader: a book written with a controlled vocabulary so that coverage stays high enough to be comprehensible while still stretching you a little. It works on exactly the principle above — keep the unknown words sparse, leave a few to guess, let the reading itself teach them. Krashen's famous slogan for the sweet spot is "i+1": input a little beyond your current level. Coverage is what "a little beyond" actually means in numbers.
How much reading it really takes
This is where the research gets bracing. Paul Nation worked out how much text you'd need to meet the most frequent 9,000 word families often enough to learn them — and the answer ranges from a couple of hundred thousand words for the common bands up to a few million words for the rarer ones. In books, that's somewhere between two and twenty-five novels' worth of reading. Separately, a study by Beglar & Hunt found that the learners who actually got faster at reading had read on the order of 200,000 words in a year.
Two things follow. First: reading really can deliver a language's vocabulary on its own — the volume exists, it's reachable. Second: the volume is large, and rarer words need a genuinely big amount of text. There's no version of this where twenty minutes of tapping gets you there. The job is to make a large amount of comprehensible reading something you'll actually do.
Why words come slowly, and in pieces
It also helps to know that vocabulary doesn't arrive in one clean moment. Stuart Webb's work on repetition shows that a couple of encounters start the process, around ten build partial knowledge, and it can take twenty or more before you reliably know a word's spelling, meaning, and grammar all at once. A case study by Pigada & Schmitt found the form of a word (how it's spelled) sticks fast, while its meaning takes many more meetings. Reading is a slow, partial, cumulative process — which is exactly why it needs to be high-volume and comfortable enough to sustain.
Where StepText comes in
Now the whole design makes sense. The hard part of reading-based learning is the coverage cliff: authentic text is too far over the 95% line to be readable, and graded readers eventually run out. StepText's blend is a way to manufacture good coverage on demand. We start a text mostly in a language you already know — coverage near 100% by construction — and then weave the target language in at a level you choose, so the unknown share rises gently instead of all at once. You're always reading in the comprehensible band, and the band moves with you.
That's also why the proportion is adjustable and why it climbs over time. We're not decorating the text; we're tuning its difficulty to keep you just past the edge of what you know — the place where, the research keeps telling us, the learning actually happens.
None of this is a shortcut. The volume still has to be read; the words still arrive slowly and in pieces. What calibration buys you is the ability to keep going — to put in the two-to-twenty-five books without hitting the wall on page three. In reading-based learning, difficulty isn't one variable among many. It's the whole game.
Sources
- Hu, M. & Nation, P. (2000). Unknown Vocabulary Density and Reading Comprehension. Reading in a Foreign Language 13(1).
- Kremmel, B. et al. (2023). Replicating Hu & Nation (2000). Language Learning.
- Nation, I.S.P. (2014). How much input do you need to learn the most frequent 9,000 words? Reading in a Foreign Language 26(2).
- Beglar, D. & Hunt, A. (2014). Pleasure reading and reading rate gains. Reading in a Foreign Language 26(1).
- Webb, S. (2007). The Effects of Repetition on Vocabulary Knowledge. Applied Linguistics 28(1).
- Pigada, M. & Schmitt, N. (2006). Vocabulary acquisition from extensive reading: a case study. Reading in a Foreign Language.
- Krashen, S. The Case for Comprehensible Input.