(Please also see the main blog
page, with all entries.)
08 May 2013
We've
previously posted
on the relationship between age and vocabulary size for native speakers.
But now that it's almost two years later, and we've collected more than
five times as much data, it's time
to revisit the graph.
The overall shape hasn't changed, of course (it's just smoother),
but with so much more participation,
we can now calculate not just median vocabulary levels per age, but also
various percentiles as well. This gives a better idea of the
distribution of vocabulary sizes among survey participants. So here
is the crown jewel of our results:
To give you an idea of just how much data is compressed into this
single graphic, there are over 20,000 respondents which alone make up
just the single age of 21, with over 2,000 for each
point in the percentile lines.
Now, remember that these percentiles are not for the population
as a whole, but rather just those who have taken the test online.
Comparing with self-reported SAT scores from previous analysis,
overall participation is in roughly the 98th percentile of the American
population as a whole — it is apparently a very "elite" group of
people who spend their time taking vocabulary tests on the Internet!
But regardless, it's fascinating to see how test-takers age 50, for example,
range from slightly over 20,000 words (10th percentile), to slightly
over 30,000 words (median), to nearly 40,000 words (90th percentile).
(You'll also notice how we've cut off some of the percentile lines, restricting
them to only a subset of ages. This is because data was too noisy in these
areas.)