That word was not found π. Please try another, or change the character set in the menu.
Click or tap any hanzi, anywhere, or search in Chinese or English.
You're viewing Simplified characters. Check the menu for more options, or check out the Japanese version.
This is free and open source software. Check out the code on GitHub.
Just interested in how characters are composed? Check out the components tool.
Cards due:
What does the text below mean?
Studying complete.
You can add more
cards when you see the button.
Show Answer
Delete this card
This is a new card!
Previous attempts: correct.
Right ; Wrong .
Card added as an example use of .
Card added on .
Click a bar in the chart for details.
Click a box in the calendar for details. Brighter colors mean more studying.
Click a box in the calendar for details. Brighter colors mean more cards added.
Click a bar in the chart for details.
Green:
75% correct or
better.
Blue:
between 50% and 75%. Orange: between 25% and 50%. Red: less than 25% correct.
This site is a prototype, but it's decently usable in its current state. Feel free to see the (currently at a hackathon level of quality) code or contact the author on github.
The idea is to emphasize the word-forming connections among hanzi to help learners remember them. I've found this more fun and effective than other methods, like studying stroke order, learning radicals or components, writing each character out 100 times, or doing spaced repetition on cards mapping hanzi to pinyin and English.
The site is a progressive web app. This means it uses modern browser APIs to make an installable app. Follow the directions for your platform to install it. A truly native app downloadable from the app stores may be a future work item.
The examples came from Tatoeba, which releases data under CC-BY 2.0 FR, and from OpenSubtitles, pulled from opus.nlpl.eu.
Definitions and pinyin transcriptions of individual words were pulled from CEDICT, which releases data
under
CC BY-SA 4.0.
Accordingly,
some of the files in data
should be considered released under that same
license.
That depends on which character set you choose. The simplified and traditional choices should include everything present in CEDICT. Cantonese should also include everything in the CC-Canto project. The HSK set should have all the old HSK 2.0 words and characters. Ping on github with any issues. More examples and definitions will be added in the future.
When you add words to your study list, they will be presented to you as flashcards. You'll be shown the sentence and asked what it means; click "Show Answer" to see how tatoeba translated it. When you click "I didn't know that", the card will be added back to the end of your to-study list. When you click "I knew that!", it will be shown one day later, then two days if you get it right again, then four, and so on. It is meant to be a very, very basic spaced repetition system.
The export button downloads a file that can be imported into a different (better) spaced repetition system, like Anki.
If you are signed in, the data is stored on our servers, and synced across any other device where you sign in. If you are not signed in, all data for the site is stored in localStorage. It does not leave your browser, and clearing your browser data will clear it.
As you search, click, or tap hanzi or connections in the diagram, you are shown example sentences. Then, when you add words to your study list, the examples are converted to flashcards.
This section indicates how many times you've viewed examples for each of the characters in a given word, and how many cards contain those characters. The numbers are based on how things were when you viewed the examples, so if it's your first time seeing examples for a character, it'll say seen 0 times.
In most languages, there are some words that are used much more frequently than others. If you learn those words first, you'll be able to understand more of what you hear and read than if you'd start with less-common words. With Chinese, the same is true of characters: the most common ones are used an outsize proportion of the time, and they are the best ones to start with.
HanziGraph tries to help learners know how important a word is via color-coding in the diagrams and by surfacing raw frequency stats alongside the definitions and examples. This way, learners can concentrate on words that provide the biggest 'bang for your buck', so to speak.
Both word and character frequency data is based on analysis of millions of lines of subtitles, wikipedia articles, UN declarations, and website text. Particularly for words, the subtitles are given priority, since they tend to be more colloquial.
The flow diagrams are Sankey diagrams. They were generated by analyzing which words are most commonly used before and after the search term. Specifically, the top collocations of length 2 and 3 are shown. You can read the diagram itself from left to right, with taller bars meaning a word was more commonly used. The analysis was done on movie and TV subtitles, so (in theory) the diagram represents colloquial speech. You can click any of the words to see examples for it, much like the graph diagram.
The source of each sentence is shown.
Human-written sentences are greatly preferred,
so most sentences come from Tatoeba.
OpenAI's gpt-3.5-turbo
model, plus the OpenSubtitles dataset
on Opus, were used to fill
in the gaps.
We're always on the lookout for other datasets. Please feel free to
report
anything weird or inappropriate.
In addition to the source, the average frequency rank of the characters in the sentence is shown, with emojis indicating how common (π₯π₯π₯) or uncommon (π₯Άπ₯Άπ₯Ά) the average is. More emojis means a more extreme average, whether more common or more rare. The emoji counts were based on analysis of the distribution of the averages across all sentences. The idea is to indicate how difficult a sentence is likely to be.