Frequently Asked Questions:
What is VocabularySize.com?
VocabularySize.com is a free service designed to assist teachers and researchers in implementing some of the best practice principles derived from the latest research in second language vocabulary acquisition. It is intended to be a tribute to the many years of research of Paul Nation, who tirelessly gave his time and ideas to the applied linguistics community. One of his contributions was the Vocabulary Size Test (VST) which was the original inspiration for this website. Paul Nation has recently claimed that he has retired. Nevertheless, he continues to write, research, and supervise post-graduate students at Victoria University of Wellington in New Zealand as well as distance-based students throughout the world. He continues to make frequent trips abroad to teach or collaborate with other researchers and shows no sign of slowing down any time soon.
How can I use this site?
Teachers can use this site to profile their classes’ vocabulary knowledge. By measuring the average and range of vocabulary sizes or word part knowledge levels of their classes, teachers can adjust their materials and methods to more closely match their students’ needs and abilities. They can identify which students would benefit from additional vocabulary support or instruction as well as track their progress throughout a program of language instruction to ensure their students’ development. To help identify results, teachers can create custom test sessions which collect additional information, such as student ID numbers or names, which will appear in the final test report. The reports can be viewed online or downloaded.
Researchers can use this site to pilot and validate novel measures of word knowledge or use the existing tests as a data point in a larger research project. Researchers interested in creating new tests should contact us to discuss their ideas. They will also eventually be able to download population-normed measures of vocabulary difficulty to use as a baseline for other research and anonymised data sets to use in novel statistical analyses.
Why is vocabulary size important to measure?
Vocabulary size has been the topic of interest for more than a century, such as Holden’s presentation on the topic to the Philosophical Society of Washington in 1875. Some attempts at measuring vocabulary size were simply done out of curiosity or as an early attempt to operationalise and quantify intelligence, but there’s also a long tradition of trying to measure vocabulary size to improve the efficiency and efficacy of second and foreign language education. One of the biggest challenges in learning a second or foreign language is not necessarily mastering the grammar, as many people assume, but rather accumulating enough words in order to communicate ideas. Because of the importance of words in using a language, vocabulary size has often been found to strongly correlate with other language proficiency measures such as reading and listening comprehension as well as general language proficiency. An excellent introduction to the topic of vocabualry size, its measurement, and its uses can be found in Eyckmans (2004) Measuring Receptive Vocabulary Size: Reliability and Validity of the Yes/No Vocabulary Test for French-speaking Learners of Dutch.
What's the catch?
There’s no catch. Everyone benefits from using this site. Teachers get a free service to help them better understand their students’ abilities. Researchers get a chance to trial or validate their vocabulary-related tests. Learners can get an objective measure of their vocabulary size which they can use to set learning goals and follow their progress. And everyone benefits from the population-normed performance statistics that are collected from the users of this site.
What about my privacy?
We don’t like giving out personal information to strangers either, so only a teacher or researcher can create a customised test which asks you for your personal information. If you take a test which asks for personal information such as your student ID number, date of birth, or other personal question, it means you were given a special test session URL and password to access that customised test. The personal information you provide on these customised tests can only be seen by the teacher or researcher who invited you to take the test so that they can match you to your test score. Your personal information will never be given to anyone else.
Whether you take a customised test or not, all responses to the test questions at VocabularySize.com are recorded along with basic information like your native language, age, gender, and your language learning experience. These data can not be used to identify who you are, so they will be shared with second language vocabulary acquisition researchers. These researchers want to know more about how people learn a second language and can discover some interesting patterns by comparing vocabulary size to native language, age, gender, and language learning experience, but they will never be able to know who you are from that information.
Don't other websites already offer a similar service?
There are other websites that measure vocabulary size, but we felt that, from a teacher’s perspective, most are not usable in a classroom setting. Our aim with VocabularySize.com is to make an existing free resource, such as the VST, even more useful by integrating it into a web-based service that gives teachers the power to estimate their students’ vocabulary sizes quickly and efficiently.
What other sites can I visit to measure my vocabulary size?
Here is a list, in no particular order, of some sites we know about:
English vocabulary size tests
- The Compleat Lextutor
- This site, created by Tom Cobb, hosts many different vocabulary-related tests, including the VST. Cobb has continuously pioneered vocabulary-based web resources on his site and offered them free of charge for many years, for which we are deeply indebted. The Compleat Lextutor also offers a range of other vocabulary–related services which are invaluable to any serious student, researcher, or practitioner of vocabulary-focused language acquisition.
- Paul Meara has been developing a wide–ranging suite of vocabulary-based assessment tools for many years which are distributed for free. The majority of the tests are designed as stand-alone programs for the Windows environment including two vocabulary size tests based on the yes/no format. One measures vocabulary sizes up to the 5,000 most frequent word level while the other measures from the 5,000 to 10,000 frequency levels. Only one test, Lex30, is available online and is unique in that it is one of the only tests which measures productive vocabulary size (Meara & Fitzpatrick, 2000). _lognostics also has an extensive bibliography of vocabulary-related research.
- It’s All in a Word
- Vivian Cook’s extensive work in applied linguistics also covers vocabulary acquisition. As part of a book he published in 2009, he created two vocabulary size tests based on frequencies form the British National Corpus. The basic test measures up to the most common 20,000 words and the advanced test measures beyond 150,000 words. Cooks’ tests seem to use word types as the unit of measure.
- Lexxica’s V-Check
- The team at Lexxica, researchers Charles Browne and Brent Culligan along with their financial backer Guy Cihi, were (probably) the first to really develop a truly interactive, online, usable vocabulary size test. It also has a facility for teachers to collect their students’ scores from a central location. The utility of their service has been an inspiration to VocabularySize.com and we commend their efforts in bringing true technological advances to the realm of vocabulary size measurement. Their website also has an extensive library which explains the challenges and techniques for estimating vocabulary size.
- Words & Tools’ Lemma test
- Unfortunately, this test no longer exists on the web, but a working copy can still be found at The Internet Archive. This test, based on the Collins Cobuild corpus, was created by Boo Hever and defines knowing a word as the ability to identify synonyms and/or associates. The test used to be available as a software package that would generate a new test each time. Hever was one of the true pioneers of vocabulary size measurement.
- Bruce Zhang’s Vocabulary Size Test
- Zhang’s test was created in 2002 and has been available online ever since. We’re not sure exactly how the test estimates vocabulary size, but some general details about the development of the test note that measures of word frequency are based on internet search engine results.
- English Vocabulary Tester
- We are not entirely sure who is behind this project, but the test seems to be based on Diack’s 1975 book, Test Your Own Wordpower.
- Frank Horace Vizetelly’s vocabulary size test
- Vizetelly’s test is one of many from the turn of the 20th century when estimating vocabulary sizes was a trendy research topic. Many of these tests were built upon faulty sampling techniques and tended to overestimate.
Other vocabulary size tests
- 語彙数推定テスト (NTT Communication Laboratories’ Japanese vocabulary size test)
- This check list (yes/no) test is based on NTT’s database of word familiarity. It was developed in the late 1990s based on familiarity judgements by several hundred subjects. It is one of the few tests which use empirical evidence of knowledge. A description of the test, its creation, and its limitations is available in Japanese.
How is the VST different from other tests of vocabulary size?
The VST was created to address some of the shortcomings of other tests of vocabulary size. The Vocabulary Levels Test (VLT), for example, was originally created by Paul Nation in 1983 and later improved and validated by others (Beglar & Hunt in 1999, Schmitt, Schmitt, & Clapham in 2001). It was designed as a diagnostic test to guide teachers towards the types of words that might be most useful, yet lacking, in their students’ vocabulary. Many teachers and researchers have used the limited range of words tested in the VLT to estimate vocabulary size, contrary to its design. This is an unfortunate, but recurring, problem. Not only was the design of the VLT not designed to measure vocabulary size, but there were not many accurate frequency lists at the time it was created, so the items at each level represent a certain amount of compromise, guessing, and intuition. Later analysis has shown that many of the items are not representative of the frequency they putatively represent.
Later, Meara & Jones, in 1987 and 1990, developed the Eurocentres Vocabulary Size Test 10ka (EVST) based on frequency counts from Thorndike and Lorge (1944). The format of the EVST is a yes/no test where the learner indicates whether a word is known or not known. Based on the answers, an estimated vocabulary size can be calculated. Although the EVST is useful, it doesn’t verify the degree to which the word is known. The VST differs from the VLT and the EVST in the following ways:
- The multiple-choice format verifies knowledge of each word on the test
- Each word is presented in a sentence which does not give any clues to the meaning
- The words are taken from the British National Corpus, which was compiled fairly recently.
- The words are sampled from published frequency lists based on word families, which are more appropriate when measuring receptive vocabulary knowledge (Bauer & Nation, 1993)
- Because the test items are selected from known, published lists, the actual words that are represented at each frequency level can be examined and further tested. If errors are found, corrections can be made and previous scores recalculated.
How can you calculate my vocabulary size by testing my knowledge of so few words?
There are many approaches to estimating vocabulary size, but many are flawed in their approach. Despite Thorndike pointing out some of the most common flaws as early as 1924 in The Vocabularies of School Pupils, they still persist to this day. The most common, yet flawed, technique is to simply open a dictionary, browse through a number of words and make note of what percentage of the words are known or not. This percentage is then multiplied by the total number of entries in the dictionary to arrive at an estimate of how many words are known.
The problem is that words are not randomly distributed throughout the dictionary nor are they equally difficult or likely to be known. The VST avoids this problem by first arranging words into word families which means that words such as nation, national, nationalise, and international are all considered to be members of the same family. Word families are used to avoid the over-counting that can occur when different forms of a word are given their own entry in a dictionary (Bauer & Nation, 1993). They are also more appropriate to use as a unit of counting when dealing with receptive word knowledge. Then the word families are arranged in order of frequency based on the fairly strong relationship between frequency and difficulty. Higher frequency words also tend to have more members in their word family than lower frequency words. Then representative words are sampled from this list at a rate of 1:100. Therefore, each item on the VST represents itself, the members of its word family, and 99 other word families which are roughly equivalent in terms of difficulty and word family size.
So by testing just 140 words, we can roughly estimate how many unique word families are known, up to a maximum of 14,000 word families. A more expansive test is currently under development to test up to 30,000 word families. The original test development and description can be found in Nation & Beglar (2007) A Vocabulary Size Test. A recent validation of the VST by Beglar (2010), A Rasch-based validation of the Vocabulary Size Test, also suggests that representative words can be sampled at a rate of up to 1:200 without any sacrifice of precision. In future, we hope to refine the test further to estimate vocabulary sizes of up to 30,000 word families by testing knowledge of approximately 50 items through a computer-adaptive test format.
So, word frequency is pretty important?
Yes and no. Word frequency is not a perfect predictor of word difficulty nor can it always tell us the importance of a word in a text, but it is one of the easiest and best ways to estimate how likely a given word will be known. The assumption is that high frequency words are easier to understand because they are used most often and, because of their numerical dominance, more important to all aspects of language. This idea has been the basis of many studies for about 100 years. One of the earliest and most extensive compilations of word frequency for its time was Thorndike’s The Teacher’s Word Book (1921). Thorndike, and others of that time, routinely substituted the term most frequent for most important but does note that frequency…
…is not a perfect measure of the importance of words, for two reasons. First, a word may be very important for a pupil or graduate to know and yet not figure largely in the world’s reading. Second, tens of thousands of hours of further counting would be required to measure the frequency of occurrences of all these words with exactness. If a complete count were made, there would probably be several hundred words found more deserving of a place in the top ten thousand than some of these now included, and the order of the list would be somewhat changed. [iii–iv]
Fortunately, computers can now accomplish the task that was probably beyond Thorndike’s imagination and we now have accurate frequency counts for hundreds of thousands of words.
Even with these improved lists, however, Thorndike’s first qualification above still holds true. For example, words which are often, but not always, lower frequency in general can still be relatively high frequency within certain topics or disciplines. These types of words are often called technical vocabulary and some research such as Chung & Nation (2003) suggests that technical vocabulary can be just as important, or in some cases even more important, than high frequency words.
What's the history of the site?
This site started as a small weekend hobby project to make the VST more accessible to teachers and researchers. From there it is expanding to a full-scale word knowledge testing platform and research database thanks to the hard work of a very talented group of students at Victoria University of Wellington (Team STBC). We are affiliated with Victoria University of Wellington.
How can I help make this site better?
Many people have already volunteered to help make this site more user-friendly to a wide audience by translating the website and creating localised version of the vocabulary tests, but we are always looking for more volunteer translators to make this site more accessible. There are several ways to help:
- interface: translating a few hundred words used for menus, instructions, and feedback
- information: translating or summarising several thousand words in the FAQ and other information-only pages
- localisation: translating the entire English VST or creating a vocabulary size test of another language.
If you want to offer some of your time and skill as a translator, please contact us.
We would also like to hear any ideas for new features and tests. The back-end, or framework of this site was created to host a large range of vocabulary knowledge tests which can include the use of multiple languages as well as various formats such as text, graphics, video, and more. In turn, the data collected will be used in future to provide learner-calibrated estimates of word difficulty which can be used to analyse word lists or texts. We also hope to store other properties of words from various frequency lists and databases to create a research platform which facilitates quick comparisons across frequency lists and extensive word tagging capabilities. If you have any suggestions or want to offer some of your time and skill to help in this project, please contact us.