Chinese characters

Chinese characters
	Script error: No such module "Infobox multi-lingual name".

Chinese characters;
Type	Logographic
Languages	Chinese; Japanese; Korean; Ryūkyūan; Vietnamese; Zhuang; Miao; Hachijō; Template:Midsizediv
Time period	c. 13th century BCE – present
Parent systems	(Proto-writing) Oracle bone script Chinese characters; ;
Child systems	Zhuyin; Kana; Yi script; Khitan small script; Nüshu; Jurchen script; Tangut script;
Direction	Left-to-right
ISO 15924	Hani, 500
Unicode alias	Han
Unicode range	U+4E00–U+9FFF; CJK Unified Ideographs; Template:Midsizediv;
	This article contains IPA phonetic symbols. Without proper rendering support, you may see question marks, boxes, or other symbols instead of Unicode characters. For an introductory guide on IPA symbols, see Help:IPA.

Short description: Logographic writing system

Chinese characters^{[lower-alpha 1]} are logographs used to write the Chinese languages and others from regions historically influenced by Chinese culture. Chinese characters have a documented history spanning over three millennia, representing one of the four independent inventions of writing accepted by scholars; of these, they comprise the only writing system continuously used since its invention. Over time, the function, style, and means of writing characters have evolved greatly. Informed by a long tradition of lexicography, modern states using Chinese characters have standardised their forms and pronunciations: broadly, simplified characters are used to write Chinese in mainland China, Singapore, and Malaysia, while traditional characters are used in Taiwan, Hong Kong, and Macau.

After being introduced in order to write Literary Chinese, characters were later adapted to write the languages spoken in other countries throughout the Sinosphere. In Japanese, Korean, and Vietnamese, Chinese characters are known as kanji, hanja, and Template:Langr respectively. Each of these countries used existing characters to write both native and Sino-Xenic vocabulary, and created new characters for their own use. These languages each belong to separate language families, and generally function differently from Chinese. This has contributed to Chinese characters largely being replaced with alphabets in Korean and Vietnamese, leaving Japanese as the only major non-Chinese language still written with Chinese characters.

Unlike in alphabets, where letters correspond to a language's units of sound, called phonemes—Chinese characters correspond to morphemes, a language's smallest units of meaning. Writing systems that function this way are known as logographies. In Chinese, morphemes are usually single syllables, characters may represent multi-syllable words when writing other languages.^{[lower-alpha 2]} Characters are not ideographic, as they correspond to spoken morphemes, but not to the abstracted ideas themselves. Most characters are made of smaller components that may provide information regarding the character's meaning or pronunciation.

Development

Chinese characters are accepted as representing one of four independent inventions of writing in human history.^{[lower-alpha 3]} According to Qiu Xigui, in each instance writing evolved from a system using two distinct types of ideographs. Ideographs could either be pictographs visually depicting objects or concepts, or fixed signs representing concepts only by shared convention. These systems are classified as proto-writing, because the techniques they used were insufficient to carry the meaning of spoken language by themselves.^[3]

Qiu notes various innovations that were required for Chinese characters to emerge from proto-writing. Firstly, pictographs became distinct from simple pictures in use and appearance: for example, the pictograph Template:Hani, meaning 'large', was originally a picture of a large man, but one would need to be aware of its specific meaning in order to interpret the sequence Template:Hani as signifying 'large deer', rather than being a picture of a large man and a deer next to one another. Due to this process of abstraction, as well as to make characters easier to write, pictographs gradually became more simplified and regularized—often to the extent that the original objects represented are no longer obvious.^[4]

The severe limitations of this system compelled an innovation which allowed spoken language to be encoded directly in the written symbols.^[5] In each historical case, this was accomplished by some form of the rebus technique, where the symbol for a word is used to indicate a different word with a similar pronunciation, depending on context. This allowed for words that lacked a plausible pictographic representation to be written down for the first time. This technique, called jiajie (假借) in Chinese, preempted more sophisticated methods of character creation that would further expand the lexicon. The process whereby writing emerged from proto-writing took place over a long period; when the purely pictorial use of symbols disappeared, leaving only those representing spoken words, the process was complete.^[6]

Classification

Chinese characters have been used in several different writing systems throughout history. The concept of a writing system includes the written symbols that are used, called graphemes—these may include characters, numerals, or punctuation—as well as the rules by which the graphemes are used to record language.({{{1}}}, {{{2}}}) Chinese characters are logographs, graphemes that denote words or morphemes of the language. Writing systems that use logographs are called logographies, as contrasted with alphabets and syllabaries, where graphemes correspond to the phonetic units in a language.^[7] In special cases, characters may correspond to non-morphemic syllables; due to this, written Chinese is often characterised as morphosyllabic.^[8]^{[lower-alpha 4]}

The Sinosphere has a long tradition of lexicography attempting to explain and refine the use of characters; for most of history, analysis revolved around a model first popularised in the 2nd-century Shuowen Jiezi dictionary.({{{1}}}, {{{2}}}) Newer models have since appeared, often attempting to describe both the methods by which characters were created, the characteristics of their structures, and the way they presently function.^[10]

Structural analysis

Most characters can be analysed structurally as compounds made of smaller components (偏旁; piānpáng), which may have their own functions. Phonetic components provide a hint to a character's pronunciation, and semantic components indicate some element of the character's meaning. Components that serve neither function may be classified as forms with no particular meaning, other than their presence distinguishing one character from another.^[11]

A straightforward structural classification scheme may consist of three pure classes of semantographs, phonographs and signs—having only semantic, phonetic, and form components respectively, as well as four classes corresponding to each possible combination of the three component types.({{{1}}}, {{{2}}}) According to Yang Runlu, of the 3,500 characters used frequently in Standard Chinese, pure semantographs are the rarest, accounting for about 5% of the lexicon, followed by pure signs with 18%, and semantic–form and phonetic–form compounds together accounting for 19%. The remaining 58% are phono-semantic compounds.^[12]

Qiu presents "three principles" of character formation, with semantographs describing all characters whose forms are wholly related to their meaning, regardless of the method by which the meaning was originally depicted, phonographs that include a phonetic component, and loangraphs encompassing existing characters that have been borrowed to write other words. He also acknowledges the existence of character classes that fall outside of these principles, such as pure signs.^[13]

Semantographs

Pictographs

Graphical evolution of pictographs

Template:Zhc

While relatively few in number, most of the earliest characters originated as pictographs, representational pictures of physical objects.({{{1}}}, {{{2}}}) In practice, their forms have become regularised and simplified after centuries of iteration in order to make them easier to write. Examples include 日 ('Sun'), 月 ('moon'), and 木 ('tree').^{[upper-alpha 1]}

As character forms developed, distinct depictions of various physical objects within pictographs became reduced to instances of a single written component.({{{1}}}, {{{2}}}) As such, what a pictogram is depicting is often not immediately evident, and may be considered as a pure sign without regard for its origin in picture-writing. However, if a character's use in compounds, such as 日 in Template:Zhc still reflects its meaning and is not phonetic or arbitrary, it can still be considered as a semantic component.^[14]

Due to the regularisation of character forms, individualised components may form part of a compound pictograph: For example, within a given character the component Template:Kxr often carries a meaning related to mouths, but within 高 ('tall')—a pictogram of a tall building—it instead depicts a window, ultimately lending to the character's meaning of 'tallness'. In another instance, the same 'mouth' radical depicts the lip of a vessel in the modern form of the pictogram 畐 ('full').^{[upper-alpha 2]}

Pictographs have often been extended from their original concrete meanings to take on additional layers of metaphor and synecdoche, which sometimes even displace the pictogram's original meaning. Over time, this process sometimes creates excess ambiguity between different senses of a character, which is then usually resolved by adding additional components to create new characters used for specific senses. This can result in new pictographs, but usually results in other character types.^[15]

Indicatives

Also called simple ideographs, characters in this small category represent abstract concepts that lack concrete physical forms, but nonetheless can be depicted visually in an intuitive way. Examples include 上 ('up') and 下 ('down')—these characters originally had forms consisting of dots placed above and below a line, which later evolved into their present forms, which have less potential for graphical ambiguity in context.^[16] More complex indicatives include 凸 ('convex'), 凹 ('concave'), and 平 ('flat and level').^[17]

Compound ideographs

Also referred to as logical aggregates, associative idea characters, or syssemantographs, characters in this class are formed by combining two or more pictographs or ideographs to suggest a new, synthetic meaning. The canonical example is 明 ('bright'), often interpreted as the juxtaposition of the two brightest objects in the sky: 日 ('Sun'), and 月 ('moon'), together expressing their shared quality of brightness. Though the historicity of this particular etymology has been contested in recent scholarship, it is definitively a canonical reading: for example, the common compound word 明白 means 'understanding', touching on the derived association of 明 with 'illumination'. The addition of the abbreviated Template:Kxr radical on top results in the compound ideograph Template:Zhc, alluding to the heliotropic behaviour of plant life. Other commonly cited examples include Template:Zhc, composed of pictographs Template:Kxr and Template:Kxr, and Template:Zhc, composed of Template:Kxr and Template:Kxr.^{[upper-alpha 3]}

Many traditional examples of compound ideographs are now believed to have actually originated as phono-semantic compounds, made obscure by subsequent changes in form.^[18] Peter Boodberg and William Boltz go so far as to deny that any compound ideographs were devised in ancient times, maintaining that "secondary readings" that are now lost are responsible for the apparent absence of phonetic indicators,^[19] but their arguments have been rejected by other scholars.^[20]

Compound ideographs are common in kokuji, characters originally coined in Japan. An example of a modern compound ideograph used in written Chinese is 砼 ('concrete'), which combines the Template:Kxr, Template:Kxr, and Template:Kxr radicals.^{[upper-alpha 4]}

Phonographs

Phono-semantic compounds

These characters are composed of at least one semantic component and one phonetic component.^[21] They may be formed by one of several methods, often a phonetic component added to disambiguate a loangraph or a semantic component added to represent an extended sense of the original character. A compound's phonetic component may have been selected as to indicate an additional layer of meaning to the character as a whole. As a result, determining whether a given character is a phono-semantic compound or a ideographic compound is often non-trivial.^[22]

Examples of phono-semantic compounds include Template:Zhc, Template:Zhc, Template:Zhc, Template:Zhc, and Template:Zhc. On the left-hand side of each, these characters have three short strokes: Template:Kxr, a reduced form of the Template:Kxr radical. In these cases, this indicates to the reader that the meaning of each character is related to the concept of "water". The remainder of each character is the phonetic component: Template:Zhc is pronounced identically to Template:Zhc in Standard Chinese, Template:Zhc is pronounced similarly to Template:Zhc, and Template:Zhc is pronounced similarly to Template:Zhc.^{[lower-alpha 5]} While the discrepancy in pronunciation for these examples is rather tame, the accumulation of sound changes over time often results in a character's composition being totally arbitrary to a modern reader.

While the phonetic components within some compounds do precisely relate the pronunciation, most only provide an approximation, even before the emergence of any later sound changes. Some may only share the initial or final sounds of their phonetic components.^[25] The table below lists characters that each use 也 for their phonetic part—save the final one, which uses a previous character in the list—it is apparent that none of them share its modern pronunciation. The Old Chinese pronunciation of 也 has been reconstructed by Baxter and Sagart (2014) as /*lAjʔ/, similar to that for each compound.^[26] The table illustrates the sound changes that have taken place since the Shang and Zhou dynasties, when most of the characters in question entered the lexicon. For a modern reader, the resulting drift is such that the phonetic component no longer provides any hint as to each character's pronunciation.^[27]

Phono-semantic compounds sharing phonetic component 也
Char.	Gloss^{[lower-alpha 6]}	Component		OC^{[lower-greek 1]}	MC^{[lower-greek 2]}	Modern^{[lower-greek 3]}
Char.	Gloss^{[lower-alpha 6]}	Sem.	Phon.	OC^{[lower-greek 1]}	MC^{[lower-greek 2]}	Mandarin	Cantonese	Japanese
也	Template:Gcl	N/A^{[lower-alpha 7]}		/*lAjʔ/	yae^X	yě [jè]	jaa⁵ [ja̬ː]	ya ja
池	'pool'	水 (氵) 'water'	也 /*lAjʔ/	/*Cə.lraj/	drje	chí [ʈʂʰǐ]	ci⁴ [tsʰȉː]	chi ja
馳	'gallop'	馬 'horse'		/*[l]raj/	drje	chí [ʈʂʰǐ]	ci⁴ [tsʰȉː]	chi ja
弛	'loosen'	弓 'bow'		/*l̥ajʔ/	sye^X	chí [ʈʂʰǐ] shǐ [ʂì]	ci⁴ [tsʰȉː]	chi ja shi ja
施	'set up'	㫃 'flag'		/*l̥aj/	sye	shī [ʂí]	si¹ [síː]	se ja shi ja
地	'ground'	土 'earth'		/*[l]ˤej-s/	dij^H	dì [tî]	dei⁶ [tèi]	ji ja chi ja
他㐌	Template:Gcl	人 (亻, 𠂉) 'person'		/*l̥ˤaj/	tha	tā [tʰá]	taa¹ [tʰáː]	ta ja
她	Template:Gcl	女 'female'		N/A^{[lower-alpha 8]}		tā [tʰá]	taa¹ [tʰáː]	N/A^{[lower-alpha 8]}
拖	'drag'	手 (扌) 'hand'	㐌 /*l̥ˤaj/	/*l̥ˤaj/	tha^H	tuō [tʰwó]	to¹ [tʰɔ́ː]	ta ja da ja

This method is still used to form new characters: for example Template:Zhc is the Template:Kxr radical plus the phonetic Template:Zhc—described in Chinese as "不 gives sound, 金 gives meaning". Many Chinese names for chemical elements and other characters related to chemistry were formed in this way.^[28]

Loangraphs

The phenomenon of an existing character for a word being used to write another homophonous or nearly-homophonous word was necessary to the emergence of the Chinese writing system, and it has remained common in the writing system ever since. Some loangraphs may represent words that have never been written another way—this is often the case with abstract grammatical particles such as 之 and 其—but this is not always so.^[29]

Loangraphs are also used to write words borrowed from other languages, such as the various Buddhist terminology introduced to China in antiquity, as well as contemporary non-Chinese words and names. For example, in the name Template:Zhc, each character is commonly used as a loangraph for its respective syllable. However, the barrier between a character's pronunciation and meaning is never total: when transcribing into Chinese, loangraphs are often chosen deliberately as to create certain connotations. This is regularly done with corporate brand names: for example, Coca-Cola's Chinese name is Template:Zhc, with the loangraphs selected as to possess a plausible meaning of "delicious and enjoyable".^[30]

Signs

Some characters and components are merely signs, whose meaning purely derives from their having a fixed, distinctive form. Basic examples of pure signs are found with the numerals beyond four, e.g. Template:Zhc and Template:Zhc, whose forms do not give visual hints to the quantities they represent.({{{1}}}, {{{2}}})

Traditional Shuowen Jiezi classification

The Shuowen Jiezi is a character dictionary authored by the scholar Xu Shen c. 120 CE. In its postface, Xu analyses what he sees as all the methods by which characters are created, introducing a categorisation scheme which would later become known as the Template:Zhp. Mature formulations of this scheme stated that every character belonged to one of six categories, each mentioned with varying emphasis in the Shuowen Jiezi. For nearly two millennia afterwards, this framework would serve as the traditional lens through which characters were analysed throughout the Sinosphere.^[31] Xu based most of his analysis on examples of Qin seal script that were written down several centuries before his time—these were usually the oldest forms available to him, but Xu stated that he was aware of the existence of even older forms.^[32]

Modern scholars agree that the theory presented in the Shuowen Jiezi is problematic, failing to fully capture the nature of Chinese writing, both in the present, as well as at the time Xu was writing.^[33]^[34] The traditional Chinese lexicography as embodied in the Shuowen Jiezi presupposes either a phonetic or semantic purpose for every character component, providing implausible etymologies for characters later accepted as being pure signs.^[35]^[36] However, the model has proven resilient, and it continues to serve as a guide for students in the process of memorising characters. One of the most important innovations contained in the Shuowen Jiezi is its grouping of a particular component considered to be of particular structural importance called a radical. Over 500 radicals are recognised within the Shuowen Jiezi—while this number would be reduced substantially in future dictionaries, the underlying concept would remain ubiquitous.({{{1}}}, {{{2}}})

History

According to Qiu Xigui, the broadest trend in the evolution of Chinese characters over their history has been simplification, both in graphical shape (字形; zìxíng), the "external appearances of individual graphs", and in graphical form (字体; 字體; zìtǐ), "overall changes in the distinguishing features of graphic[al] shape and calligraphic style, [...] in most cases refer[ring] to rather obvious and rather substantial changes".^[37]

Traditional invention narrative

Several works of Classical Chinese literature indicate that knotted cords were used to keep records prior to the invention of writing.^[38]^[39] Works that reference the practice include chapter 80 of the Tao Te Ching^[40] and the "Xici II" chapter within the I Ching.^[41]

According to tradition, Chinese characters were invented during the 3rd millennium BCE by Cangjie, a scribe of the legendary Yellow Emperor. Cangjie is said to have invented symbols called Template:Zhc due to his frustration with the limitations of knotting, taking inspiration from his study of animals, landscapes, and the stars in the sky. On the day that these first characters were created, grain rained down from the sky; that night, the people heard the wailing of ghosts and demons, lamenting that humans could no longer be cheated.^[42]

Neolithic

In recent decades, a series of inscribed graphs and pictures have been found at Neolithic sites in China, including Jiahu (c. 6500 BCE), Dadiwan and Damaidi from the 6th millennium BCE, and Banpo (5th millennium BCE). Often these finds are accompanied by media reports that push back the purported beginnings of Chinese writing by thousands of years.^[43]^[44] However, because these marks occur singly without any implied context and are made crudely, Qiu Xigui concludes that "we do not have any basis for stating that these constituted writing nor is there reason to conclude that they were ancestral to Shang dynasty Chinese characters."^[45] However, they do demonstrate a history of sign use in the Yellow River valley from the Neolithic through to the Shang period.^[44]

Oracle bone script

The earliest known examples of writing directly ancestral to modern characters are a body of inscriptions made on bronze vessels and oracle bones during the late Shang dynasty (c. 1250 – 1050 BCE),^[46]^[47] with the very oldest dated to c. 1200 BCE.^[48]^[49] Oracle bones and the script they bore were first documented by modern scholars in 1899, after examples were discovered being sold as "dragon bones" for medicinal purposes, with the symbols carved into them identified as being Chinese writing. By 1928, the source of the bones had been traced to a village near Anyang in Henan, which was excavated by the Academia Sinica between 1928 and 1937. To date, over 150,000 such fragments have been found.^[46]

Oracle bone inscriptions are records of divinations performed in communication with royal ancestral spirits.^[46] The inscriptions range from a few characters in length at their shortest, to around 40 characters at their longest. The Shang king would communicate with his ancestors by means of scapulimancy, inquiring about subjects such as the royal family, military success, and weather forecasting. The interpreted answers would be recorded on the divination material itself.^[46]

Oracle bone script is a well-developed writing system,^[50]^[51] suggesting that the Chinese script's origins may lie earlier than the late second millennium BCE. Although these divinatory inscriptions are the earliest surviving evidence of ancient Chinese writing, it is widely believed that writing was used for many other non-official purposes, but that the materials upon which non-divinatory writing was done—likely on wood and bamboo—were less durable than bones and shells, and have since decayed away.^[52]

Zhou scripts

The traditional notion of an orderly procession of scripts, with each suddenly invented and displacing the one previous, has been conclusively superseded by modern archaeological finds and scholarly research. More often, two or more scripts coexisted in a given area, and scripts evolved gradually. As early as the Shang dynasty, oracle bone script existed as a simplified form alongside the normal script found in bamboo books, since preserved in bronze inscriptions, as well as the elaborate pictorial forms—often clan emblems—found on many bronzes.^[53]

Based on studies of these bronze inscriptions, it is clear that the mainstream script evolved in a slow, unbroken fashion from the Shang to the Zhou dynasty, until assuming the form that is now known as small seal script in the state of Qin, without any sudden shifts.^[54]^[55]

Other scripts had evolved during the late Zhou, especially in eastern and southern regions. These include decorative scripts such as the bird-worm seal script, and the regional 'ancient' forms of eastern Zhou states, preserved as variant forms in the Shuowen Jiezi.

Qin unification and small seal script

Small seal script, which had evolved conservatively in the state of Qin during the Eastern Zhou, became standardised as the orthographic convention used throughout all of China by the imperial Qin dynasty. However, more than one script was in use at the time: a little-known, rectilinear, 'vulgar' form of the characters had coexisted alongside the more formal seal script for centuries in the Qin state; the popularity of this vulgar form grew as the practice of writing itself became more widespread.^[56] An immature form of clerical script called "early clerical" or "proto-clerical" had already developed by the Warring States period in the state of Qin^[57] based upon this vulgar form, with influence from seal script as well.^[58] The coexistence of the three scripts—small seal, vulgar and proto-clerical, with the latter evolving gradually into clerical script—runs counter to the traditional belief that the Qin dynasty only used one script, and that the clerical script was suddenly invented during the early Han.

Han clerical script

The proto-clerical script matured gradually, and by the early Han period its sophistication was comparable to small seal script.^[59] Recently discovered bamboo slips show the emergence of mature clerical script by the end of Emperor Wu of Han's reign in 141–87 BCE.^[60]

As in previous eras, multiple scripts were in use during the Han, although mature clerical script—also called Template:Zhc^[61]—was dominant. An early type of cursive script was also in use as early as 24 BCE,^{[lower-alpha 9]} incorporating cursive forms popular at the time, as well as elements from the vulgar writing that originated in Qin state. By the time of the Jin dynasty, this Han cursive style became known as Template:Zhc, sometimes known in English as 'clerical cursive', 'ancient cursive', or 'draft cursive'. Some believe this name, which uses the character Template:Zhc, arose because the style was considered by the Jin to be a more orderly form than what would become the modern form of cursive, called Template:Zhc, which had first emerged during the Jin and is still used today.^[62]

Neo-clerical

Around the midpoint of the Eastern Han, a simplified and easier form of clerical script appeared, which Qiu terms Template:Zhl.^[63] By the end of the Han, this had become the dominant script used by scribes, though clerical script remained in use for formal works, such as engraved stelae. Qiu describes neo-clerical as a transitional form between clerical and regular script, remaining in use through the Three Kingdoms period and into the Jin dynasty.^[64]

Semi-cursive

By the late Han, an early form of semi-cursive script^[63] had begun developing from a cursive form of neo-clerical script.^{[lower-alpha 10]} This semi-cursive script was traditionally attributed to Liu Desheng (劉德升; c. 147 – 188 CE), although such attributions refer to early masters of a script rather than to their actual inventors, since the scripts generally evolved into being over time. Qiu provides examples of early semi-cursive script, lending credence to its having popular origins, rather than being solely Liu's invention.^[65]

Regular script

The innovations of regular script have traditionally been credited to Cao Wei calligrapher Zhong Yao (c. 151 – 230), often called the "father of regular script". The earliest surviving manuscripts written in regular script are copies of Zhong Yao's work, including at least one copied by Wang Xizhi, often called the "Sage of Calligraphy". Regular script developed out of a neatly written form of early semi-cursive, with the addition of a Template:Zhl technique to end horizontal strokes, plus heavy tails on strokes which are written the downward-right diagonal. Thus, early regular script emerged from a neat, formal form of semi-cursive, which had itself emerged from neo-clerical, a simplified, convenient form of clerical script. It developed further during the Eastern Jin in the hands of Wang Xizhi and his son Wang Xianzhi. However, the style was still not widely used, as most writers continued to use neo-clerical and semi-cursive styles in their daily writing, with the conservative clerical script also remaining in use on some stelae. Modern cursive script began to emerge during this time, exemplified by the example of calligraphers such as Wang. It was influenced by semi-cursive, as well as the new regular style.^[66]

It was not until the Northern and Southern period that the use of regular script became dominant.^[67] Thereafter, the style would continue to evolve, with some regarding Ouyang Xun as having produced the first mature examples of the form during the early Tang dynasty. After this point, there would not be another major stylistic shift in Chinese character forms outside of calligraphic contexts.

Structure

Structural templates used in compounds, with red marking possible positions for radicals

Broadly, Chinese characters are rectilinear units of uniform width. Within the square allotted to each character, most are constructed from smaller components, which are in turn drawn with a series of strokes.^[68]^[69] Strokes can be considered both the basic unit of handwriting, as well as the basic unit of graphemic organisation within the system. Individual strokes are generally categorised according to technique and graphemic function, as exemplified by the Eight Principles of Yong. In the transition from seal to clerical script, many formerly bespoke, interlinked character components became discrete and regularised.^[70]^[71]

Characters are assembled according to predictable visual patterns, with some components usually not seen in certain positions within a character, and some taking distinct, visually congruous forms only when in a certain position—such as the Template:Kxr radical appearing as Template:Kxr on the right side of characters, but as Template:Kxr at the top of characters. Both the order in which strokes are drawn within a given component, as well as the order components are written in a character is largely fixed.^[72] This is summed up in practice with a few rules of thumb: generally components and characters are assembled from left-to-right, and from top-to-bottom, with 'enclosing' components started before, then closed after, the components they enclose.^[73]

For example, 字 is made up of two components, with each in turn composed of three strokes, drawn in the following order:

Character

Component

Stroke

宀

(1) [㇔] Error: {{Lang}}: Latn text/non-Latn script subtag mismatch (help)

(2) [㇔] Error: {{Lang}}: Latn text/non-Latn script subtag mismatch (help)

(3) [㇇] Error: {{Lang}}: Latn text/non-Latn script subtag mismatch (help)

子

(4) [㇇] Error: {{Lang}}: Latn text/non-Latn script subtag mismatch (help)

(5) [㇚] Error: {{Lang}}: Latn text/non-Latn script subtag mismatch (help)

(6) [㇐] Error: {{Lang}}: Latn text/non-Latn script subtag mismatch (help)

Variants and allographs

Variants of the Chinese character for 'turtle', collected c. 1800 from printed sources. The one at left is the traditional form used today in Taiwan and Hong Kong, 龜, though 龜 may look slightly different, or even like the second variant from the left, depending on font. The modern simplified forms used in China, 龟, and in Japan, 亀, are most similar to the variant in the middle of the bottom row, though neither is identical. A few more closely resemble the modern simplified form of the character 电; 'lightning'

Over a character's history, graphical variants with identical meanings called allographs emerge via several processes, possibly to facilitate ease of handwriting, or to create a more 'correct' composition to the writer, according to the principles generally used to compose and explain characters.^[74] For example, individual components may be replaced with visually-, phonetically-, or semantically similar alternatives.^[75]

The boundary between character structure and style, and thus between allographs of the same character versus semantically distinct characters, is often non-trivial or unclear.^[76]

Methods and styles

Ordinary handwriting on a lunch menu in Hong Kong

There are numerous styles, or Template:Zht in which characters can be written. Most that are used throughout the Sinosphere originated within China, but may have minor regional variations. Styles created outside China tend to remain localised in their use, these include the Japanese edomoji and the Vietnamese lệnh thư script.^[77]

Seal script is still used, though usually only in the seals that lend the style its name. Clerical and regular script styles are ubiquitous in print; semi-cursive styles are also common when writing by hand. Modern use of fully cursive script is limited due to being continuous and abbreviated to the point where individual strokes are no longer differentiable, though historically it has been revered for its beauty and the freedom it is seen to embody.

Calligraphy

Chinese calligraphy is usually done with ink brush, and was considered one of the four arts to be mastered by Chinese scholars. The set of rules is deliberately minimalist, but each character has a set number of brushstrokes. Strict regularity is not required, since strokes may be accentuated for dramatic effect of individual style. Calligraphy was considered a means by which scholars could artfully express their thoughts and teachings.^[78]

Printing and typefaces

'Song' typefaces (宋体; 宋體; sòngtǐ)—also called 'Ming', especially in Japan, Taiwan, and Hong Kong—are named for the respective periods whose printed styles are being imitated, considered to be periods during which woodblock printing flourished in China. Ming and sans-serif are the most popular in body text.

Sans-serif typefaces, called Template:Zhl in Chinese and 'Gothic' (ゴシック体) in Japanese, are characterised by simple lines of even thickness for each stroke, akin to sans-serif styles in Western typography.

Typefaces that emulate regular script are also common, but not as common as Ming or sans-serif typefaces in body text. Most typefaces in the Song dynasty were regular script typefaces, which resembled a particular calligrapher's handwriting, while most modern regular script typefaces tend toward general-purpose use.

Use with computers

Even before the advent of computers, the very first electromechanical input/output and text encoding methods to be designed were done so for use with alphabet-based writing systems, exemplified by the design of typewriters and the Morse code and ASCII standards. Adaptation of these technologies for use with a logography of thousands of characters was non-trivial.^[79]

Like English and other languages, Chinese characters are output on printers and screens in different fonts.^[80] In addition to the international system of measuring with points, Chinese characters are also measured by a unit called zihao (字号), first invented for Chinese printing in 1859.^[81]

Input methods

Predominantly, Chinese characters are input as strings of Latin characters, which enables the use of a standard keyboard. Phonetic encodings are usually based on existing transcription schemes, such as pinyin for Mandarin, and Jyutping for Cantonese. Writing a given character usually involves typing out its phonetic transcription, possibly followed by a number representing the tone: for example, Template:Zhc could be input as xiang1gang3 using pinyin, and as hoeng1gong2 using Jyutping.

Encodings may also be based on the form of characters. Using the existing rules of stroke order and how components are assembled into whole characters,^[82] characters may be assigned a more unique shorthand than its phonetic transcription using one of several methods, potentially increasing the speed of typing. Popular form-based encoding methods include Wubi on the mainland, and Cangjie—named after the mythological inventor of writing—in Taiwan and Hong Kong. For example, Template:Zhc is encoded as NGMWM using the Cangjie method, with each letter corresponding to the components 弓土一田一, with some omitted according to predictable rules.^[83]

Contextual constraints may be used to improve candidate character selection. When ignoring tones, 大学 and 大雪 are both transcribed as daxue, the system may prioritize which candidate should appear first based on the surrounding context.^[84]

Encoding and interchange

Text is represented digitally by a series of binary code points. Since there are potentially tens of thousands of characters that may see use,^[85] each requires its own encoding. In The Unicode Standard, which is the encoding now used for the majority of internet traffic worldwide, the Basic Multilingual Plane (BMP) is a sequence of 2¹⁶ code points: of these, most are assigned to Chinese characters, which are termed CJK Unified Ideographs by the standard.^[86] Before Unicode became predominant, the Chinese government published the GB2312 standard in 1980, which included 6,763 simplified characters. Of these, 3,755 frequently-used ones were ordered by pinyin, with the rest by radical indexing. The latest version of GB encoding is GB18030, which supports both simplified and traditional Chinese characters, and is completely one-to-one with the relevant segments of the Unicode codespace.^[87] The Big5 standard was jointly developed by five Taiwanese IT companies during the early 1980s, and remains the most widely used non-Unicode encoding for Chinese characters, being comparatively popular in Taiwan, Hong Kong, and Macau.

Vocabulary and adaptation

Writing first emerged during a stage of development in the Chinese language known as Old Chinese. In most cases, each character corresponds to a morpheme that was originally an independent Old Chinese word.^[88] However, in most modern varieties, many words are compounds of two or more morphemes, and are therefore written with several characters. In Japanese and Korean, morphemes are often multiple syllables, and as such single characters may represent several spoken syllables.^[89]

Classical Chinese is the form of written Chinese used in works of literature at the time Old Chinese was dying out. The style continued to be imitated by later authors—this subsequent form is referred to as Literary Chinese, which became entrenched as the spoken language diverged. The use of Literary Chinese was loosely analogous to that of Latin in pre-modern Europe; it remained the typical written form of the language until the 20th century, well after the spoken varieties had diverged. While it did not remain static over time, it retained many properties of spoken Old Chinese. Over time, with numerous sound mergers occurring throughout different varieties, the introduction of polysyllabic words increasingly served the function of reducing ambiguity between words that had since become homophonic.^[90] Today, it has been estimated that over two-thirds of the 3,000 most common words in modern Standard Chinese are polysyllables, with the vast majority of these being two-syllable words.^[91]

After the introduction of both Literary Chinese and the Chinese writing system to surrounding countries, local languages such as Korean, Japanese and Vietnamese eventually began to be written down as well, though the use of Literary Chinese remained predominant until the modern period. Characters were used for record-keeping, histories, and official communications in each of these languages.^[92] In these languages, Chinese characters have often been used to represent Chinese loanwords.^[93] Some characters were imported with similar pronunciations to those in the specific Chinese variety at the time of borrowing: these readings are known as Sino-Xenic pronunciations, and have been useful in the linguistic reconstruction of Middle Chinese.

Chinese characters were used in Vietnam during the millennium of Chinese rule that began in 111 BCE; they were adapted to write Vietnamese c. the 13th century, creating the Template:Langr script. Writing also arrived in Korea during the 2nd century BCE, alongside other cultural elements such as Buddhism; the practice of writing in Korea became widespread over the following three centuries. From Korea, writing spread to Japan during the 5th century CE.^[94]

Currently, the only non-Chinese language normally written with Chinese characters is Japanese. Vietnam abandoned the use of Template:Langr and Literary Chinese in the early 20th century in favour of a Latin alphabet, and Korea has largely replaced the use of hanja with hangul. Since education regarding Chinese characters is not mandatory in South Korea, the usage of hanja is rapidly disappearing.^[95]

Old Chinese

Line drawings of various ordinary objects such as books, baskets, buildings, and musical instruments are displayed beside their corresponding Chinese characters — Excerpt from a 1436 primer on Chinese characters

Words in Old Chinese were generally monosyllabic; as such, each character denoted an independent word.^[96] Affixes could be added to form a new word, which was often written with the same single character. In many cases, the pronunciations then diverged due to the systematic sound changes caused by the affixes. For example, many additional readings in modern varieties reflect the Middle Chinese 'departing tone', the major source of the 4th tone in modern Standard Chinese. Many scholars now believe that this Middle Chinese tone is the reflex of an Old Chinese derivational suffix /*-s/ called the qusheng 去聲 that served a range of semantic functions—possibly the only example of inflectional morphology extant in the otherwise analytic language.^[97]^[98] For example:

Character	OC^{[lower-greek 4]}		MC^{[lower-greek 2]}		mod.	Gloss
傳^[99]	*drjon	>	drjwen'	>	chuán (help·info)	'to transmit'
傳^[99]	*drjons	>	drjwen^H	>	zhuàn (help·info)	'a record'
磨^[99]	*maj	>	ma	>	mó (help·info)	'to grind'
磨^[99]	*majs	>	ma^H	>	mò (help·info)	'grindstone'
宿^[100]	*sjuk	>	sjuwk	>	sù (help·info)	'to stay overnight'
宿^[100]	*sjuks	>	sjuw^H	>	xiù (help·info)	'celestial mansion'
説^[101]	*hljot	>	sywet	>	shuō (help·info)	'speak'
説^[101]	*hljots	>	sywej^H	>	shuì (help·info)	'exhort'

Another common sound change occurred between voiced and voiceless initials, though the phonemic voicing distinction has disappeared in most modern varieties. This is believed to reflect an Old Chinese de-transitivising prefix, but scholars disagree on whether the voiced or voiceless form reflects the original root. Each pair of examples below reflects two words of opposite transitivity.

Character	OC^{[lower-greek 4]}		MC^{[lower-greek 2]}		mod.	Gloss
見^[102]	*kens	>	ken^H	>	jiàn (help·info)	'to see'
見^[102]	*gens	>	hen^H	>	xiàn (help·info)	'to appear'
敗^[102]	*prats	>	pæj^H	>	bài (help·info)^{[lower-alpha 11]}	'to defeat'
敗^[102]	*brats	>	bæj^H	>	bài (help·info)^{[lower-alpha 11]}	'to be defeated'
折^[103]	*tjat	>	tsyet	>	zhé (help·info)	'to bend'
折^[103]	*djat	>	dzyet	>	shé (help·info)	'to be broken by bending'

Vernacular Chinese varieties

Multi-syllable words began entering the language during the Western Zhou period; it is estimated that between 25% and 30% of the vocabulary used in Warring States period texts is polysyllabic. The process has accelerated over the centuries as phonetic change has increased the number of homophones.^[104] The most common process of Chinese word formation after the Classical period has been to create compounds of existing words. Words have also been created by appending affixes to words, by reduplicating words, and by borrowing words from other languages.^[105] While polysyllabic words are generally written with one character per syllable, abbreviations are occasionally used.^[106]

In addition, there are a number of Template:Zhl that are not generally used in formal written Chinese but represent colloquial terms in various spoken varieties of the language. In general, it is common practice to use standard characters to transcribe previously unwritten words in Chinese dialects when obvious cognates exist. However, when no obvious cognate exists due to factors like irregular sound changes or semantic drift over time, or an origin in a non-Chinese language, like a substratum or loanword, then characters to transcribe it are borrowed according to the rebus principle, or invented in an ad hoc manner.^[107] These new characters are generally phono-semantic compounds, e.g. Min Nan 侬 ('person'), although there are examples of compound ideographs, e.g. northeast Mandarin 孬 ('bad').^{[citation needed]}

There may be several ways to write a dialectal word—often, one that is etymologically correct, and one or several that are based on the word's pronunciation—e.g. the etymological 觸祭 versus the phonetic 戳鸡 (⁷tshoq¹ci) in Shanghainese, meaning 'eat'. Speakers of a dialect will generally recognise a dialectal word if it is transcribed according to pronunciation, while the etymologically correct form may be more difficult to recognise.^{[citation needed]} For example, few Gan speakers would recognise the character 隑 as meaning 'to lean' in their dialect,^{[upper-alpha 5]} because this sense of the character is now archaic in Standard Mandarin.

In Taiwan, there is also a body of semi-official characters used to represent Taiwanese Hokkien and Hakka. An example of an Hakka vernacular character is 㓾 (cii¹¹, 'kill').^{[upper-alpha 6]} Other varieties of Chinese with a significant number of speakers—like Shanghainese Wu, Gan Chinese, and Sichuanese Mandarin—also have their own series of characters, but these are not often seen, except on advertising billboards directed toward locals and are not used in formal settings except to give precise transcriptions of witness statements in legal proceedings.^{[citation needed]} Standard Chinese is the preferred written language within every region of mainland China.

Japanese

In the Japanese writing system, Chinese characters used are known as kanji. Japanese historically borrowed many words from Chinese, which were written with their original characters, while native Japanese words were also written with orthographic borrowings of Chinese characters with similar meanings. Most kanji arrived via both borrowing processes, and thus have both native Japanese readings, known as kun'yomi, as well as Chinese-original readings, known as on'yomi. Moreover, Chinese words were often borrowed multiple times from different varieties and at different times, resulting in several distinct on'yomi readings for the same character.^[108] Modern Japanese uses kanji for most word stems, as well as hiragana and katakana, a pair of syllabaries collectively known as kana. The syllabaries were derived by simplifying Chinese characters selected to represent Japanese syllables; they differ from one another in part because each selected different characters for each syllable, and used different strategies to reduce the characters for easy writing. Katakana selected smaller components from each character, while hiragana were based on cursive forms of whole characters.^[109]

Due to Japanese being a synthetic language, many words consist of multiple syllables, and as such many kanji have multi-syllable pronunciations. For example, the kanji 刀 has a native kun'yomi reading of katana. In different contexts, it can also be read with the on'yomi reading tō, such as in the Chinese loanword 日本刀, nihontō, 'Japanese sword', whose pronunciation descends from the Chinese pronunciation at the time of borrowing. (In contemporary Standard Chinese, the word is pronounced rìběndào.) Loanwords prior to the Meiji era were typically written with unrelated kanji whose on'yomi had the same pronunciation as the syllables in the loanword. These spellings are called called ateji: for example, 亜米利加 was written for modern アメリカ, Amerika, 'America', 歌留多 or 加留多 for modern カルタ, karuta, 'card', 'letter', and 天婦羅 or 天麩羅 for modern テンプラ, tenpura, 'tempura'. Only some ateji spellings are still in common use, such as 缶, kan, 'can'.

Korean

As early as the Gojoseon period, Literary Chinese was the dominant form of written communication in Korea. Although the hangul alphabet was invented by the Joseon king Sejong in 1443, it was not taken up by Korean literati, and did not come into widespread use until the late 19th century.^[110]^[111] Even today, much of the Korean vocabulary, especially in areas of science and sociology, comes directly from Chinese. However, due to the lack of tones in the Korean language, many dissimilar Sino-Korean words took on identical pronunciations, and as such are spelled identically in hangul.^[112] For example, the phonetic dictionary entry for 기사, gisa yields more than 30 different entries. In the past, this ambiguity had been efficiently resolved by parenthetically displaying the associated hanja. While hanja are sometimes used for Sino-Korean vocabulary, their use for native Korean words is rare.

When learning to write hanja, students are taught to memorise a native Korean word with the same meaning and the Sino-Korean pronunciation for each character.^[113] Examples of listings include:

Hanja	Hangul		Gloss
Hanja	Native translation	Sino-Korean	Gloss
水	물, mul	수, su	'water'
人	사람, saram	인, in	'person'
大	큰, keun	대, dae	'big'
小	작을, jakeul	소, so	'small'
下	아래, arae	하, ha	'down'
父	아비, abi	부, bu	'father'
韓	나라 이름, nara ireum	한, han	'Korea'

South Korea

Hanja are still used in South Korea, particularly in newspapers, weddings, place names, and the practice of calligraphy—although to nowhere near the extent of kanji use in Japanese society. At present, Chinese characters are sometimes used for the disambiguation of homophonous words. Additionally, their use still possesses connotations of erudition and cultural Confucianism; knowledge of Chinese characters is considered to be a high class attribute by many Koreans, and an indispensable part of a classical education.^[111] There is a clear trend toward the exclusive use of hangul in ordinary South Korean contexts.^[114] The extent of hanja use has become a politically contentious issue in the country, with some seeing its total abandonment, including ending hanja education in schools, as a "purification" of the national language and culture. Others support returning to a level of ordinary hanja use previously seen during the 1970s and 80s.^[115] There are hanja that are used more widely, alongside its hangul counterpart, such as the word 'voice', with the hanja still being considered higher in register.^[116]

Policies regarding the teaching of hanja have historically vacillated, often swayed by the inclinations of individual education ministers. Students in grades 7–12 are presently taught 1,800 characters,^[115] albeit with a principal focus on simple recognition, with the aim of achieving newspaper literacy.^[111] Hanja retains its prominence in Korean academia, as the vast majority of Korean documents, history, and literature (such as the Veritable Records of the Joseon Dynasty) were written in Literary Chinese using hanja. Therefore, a working knowledge of Chinese characters is still important for anyone wishing to interpret and study older Korean texts, or anyone who wishes to read scholarship in the humanities. Working knowledge of hanja is also useful for understanding the etymology of Sino-Korean vocabulary.^[117]

North Korea

A 1949 law in North Korea apparently banned the use of all so-called foreign languages, which has been interpreted as including hanja. However, due to the country's isolation accurate reports about its use of hanja are difficult to obtain. A textbook for university history departments published in the country in 1971 contained 3,323 distinct characters, and in the 1990s North Korean school children were still expected to learn 2,000 characters, more than in South Korea or Japan.^[118] A 2013 textbook appears to integrate the use of hanja in secondary school education.^[119] Currently, North Korea is estimated to teach around 3,000 hanja to North Korean students by the time they graduate university; in some cases, the characters appear within advertisements and newspapers, but cultural use is narrower than in the South, mostly restricted to dictionaries and textbooks.^[120]

Okinawan

Chinese characters are thought to have been first introduced to the Ryukyu Islands in 1265 by a Japanese Buddhist monk.^[121] After the Okinawan kingdoms, which included the Ryukyu Kingdom, became tributaries of Ming China, Literary Chinese saw use in court documents, but popular writing and poetry largely used hiragana. After Ryukyu became a vassal of Japan's Satsuma Domain, Chinese characters became more popular, as well as the use of kanbun. Katakana and hiragana are usually used to write modern Okinawan, but Chinese characters are still used.

Vietnamese

[[File:Tale of Kieu parallel text.svg|thumb|right|upright=1.1|The first two lines of the classic Vietnamese epic poem The Tale of Kiều, written in both Template:Langr and the Vietnamese alphabet

Borrowed characters representing Sino-Vietnamese words

Borrowed characters representing native Vietnamese words

Until the early 20th century, Literary Chinese (Template:Langr) was used for all official or scholarly writing in Vietnam. However, the Template:Langr script began to be developed around the 13th century to record folk literature in the Vietnamese language. Chinese characters, called Template:Langr (𡨸漢), Template:Langr (𡨸儒), or Template:Langr (漢字), are now limited to ceremonial use in Vietnam.

The oldest written Chinese text found in Vietnam is an epigraphy dated to the year 618, erected by local Sui officials in Template:Langr.^[122] Similar to Zhuang sawndip, some Template:Langr characters were created by combining semantic character components with phonetic components that resembled Vietnamese syllables.^[123] This process resulted in a highly complex system whose use was limited to a small portion of the Vietnamese population, never more than 5%.^[124] The oldest Template:Langr written alongside Chinese is a Buddhist inscription dated to 1209.^[123] Before 1945, the library of the French School of the Far East (EFEO) in Hanoi collected a total of around 20,000 Chinese and Vietnamese epigraphy rubbings from throughout Indochina.^[125] The oldest surviving extant manuscript in Vietnamese is a late 15th-century bilingual copy of the Buddhist Sutra of Filial Piety, currently kept by the EFEO. It features Chinese text in larger characters, and an Old Vietnamese translation in smaller characters glossing the text.^[126] Every Template:Langr book in Vietnam after the Template:Langr is dated between the 17th and the 20th centuries, with most being hand-copied works, and few printed texts. By 1987, the library of the Institute of Hán-Nôm Studies in Hanoi had collected a total of 4,808 Template:Langr manuscripts.^[127] [[File:Page of Phật thuyết đại báo phụ mẫu ân trọng kinh.png|thumb|left|upright=0.9|A page from a bilingual copy of the Sutra of Filial Piety, with Literary Chinese alongside an early form of Template:Langr, representing the Old Vietnamese pronunciation. Sometimes, pairs of characters are used to represent the consonant clusters present in Old Vietnamese]] Literary Chinese and Template:Langr fell out of use during the French colonial period, and were gradually replaced with the Vietnamese alphabet, which uses Latin characters and remains the primary writing system for Vietnamese.^[128]^[129] Contemporaneous use of Template:Langr in Vietnam is often connected with traditional culture, such as the practice of calligraphy.

Other languages

Several minority languages of South and Southwest China were formerly written with scripts based on Chinese characters, but also included many locally created characters. The most extensive is the sawndip script used to write the Zhuang languages of Guangxi, which is still in use despite efforts to encourage the writing of Zhuang with a Latin-based alphabet. Other languages written with such scripts include Miao, Yao, Bouyei, Mulam, Kam, Bai, and Hani.^[130] All these languages are now officially written using Latin-based scripts. According to surveys, traditional sawndip script has twice as many users as the official Latin script.^[131]

The dynasties founded by non-Han peoples that ruled northern China between the 10th and 13th centuries developed scripts that were inspired by Chinese characters but did not use them directly: the Khitan large script, Khitan small script, Tangut script, and Jurchen script—though Chinese characters were used to phonetically transcribe the language of the Jurchen people, renamed the 'Manchu' after the founding of the Qing dynasty. Other scripts within China that have adapted a few Chinese characters but are otherwise distinct include the Geba script, Sui script, Yi script, and the Lisu syllabary.^[130]

Transcription

Along with the Persian and Arabic scripts, the Mongolian language was also written with Chinese characters phonetically transcribing Mongolian sounds. Notably, the only surviving copies of The Secret History of the Mongols were written in such a manner.

According to the 19th century missionary John Gulick:

"The inhabitants of other Asiatic nations, who have had occasion to represent the words of their several languages by Chinese characters, have as a rule used unaspirated characters for the sounds g, d, b. The Muslims from Arabia and Persia have followed this method ... The Mongols, Manchu, and Japanese also constantly select unaspirated characters to represent the sounds g, d, b, and j of their languages. These surrounding Asiatic nations, in writing Chinese words in their own alphabets, have uniformly used g, d, b, etc., to represent the unaspirated sounds."^[132]

Standardisation

In each region, the latest published standards for character forms are:

Polity	Standard	Characters	Latest revision
China	Table of General Standard Chinese Characters	8105	2013^[133]
Hong Kong	List of Graphemes of Commonly-Used Chinese Characters	4762	2012^[134]
Taiwan ^{[lower-alpha 12]}	Chart of Standard Forms of Common National Characters	4808	1983^[136]
	Chart of Standard Forms of Less-Than-Common National Characters	6341	1983^[137]
	Chart of Rarely-Used National Characters	18388	2017^[135]
Japan	Jōyō kanji	2136	2010^[138]
South Korea	Basic Hanja for Educational Use	1800	2000^[139]

In the modern period, each polity using Chinese characters has standardised their forms, pronunciation, and stroke orders. Most characters have a single standard stroke order, but some may differ by region, occasionally resulting in different stroke counts.

Received forms

With the use of woodblock printing and the compilation of large character dictionaries such as the 1716 Kangxi Dictionary, there was a considerable standardisation in forms prior to the later efforts of the 20th century, especially during the Ming.

Simplified characters

Although most closely associated with the PRC, the modern process of character simplification began well before 1949. One of the early proponents of character simplification was Lufei Kui, who proposed in 1909 that simplified characters should be used in education. In the years following the Xinhai Revolution and its associated May Fourth Movement, many anti-imperialist Chinese intellectuals began pointing to the traditional writing system as an obstacle to the modernisation of China, proposing that it should either be reformed or abolished entirely.

During the 1930s and 1940s, discussions on character simplification took place within the Kuomintang government, and a large number of the intelligentsia maintained that character simplification would help boost literacy throughout the country.^[141]^[142] In 1935, a table of 324 simplified characters collected by Qian Xuantong was introduced as the first official batch of simplified characters; however, it was rescinded in 1936 due to fierce opposition within the party.

Traditional (們)

Simplified (们)

Comparison of strokes between character forms,^{[lower-alpha 13]} showing systematic simplification of the component Template:Kxr

Cursive script were the source of inspiration for many of the simplified forms, while others were already used in print, albeit not for most formal works. With the goal of increasing functional literacy, a major concern at the time, discussions on character simplification took place among Chinese intelligentsia and within the Kuomintang (KMT) government during the Republican period.^[143] This earlier initiative to simplify the Chinese writing system was later inherited and implemented by the Communists after its subsequent abandonment by the KMT.

Since the 1950s, the PRC has officially encouraged the use of simplified characters on the mainland. Along with the Republic of China, Hong Kong and Macau—at the time still under colonial rule—were not affected by the reform. In other Sinophone countries, the use of simplified characters is generally more common among younger people, while many older generations literate in Chinese still use traditional forms. Outside of China, Chinese-language shop signs are also often written using traditional characters.

In other Sinophone countries, the use of simplified characters is generally more common among younger people, while many older generations literate in Chinese still use traditional forms. Outside of China, Chinese-language shop signs are also often written using traditional characters.

People's Republic of China

Most simplified forms in use today are the direct result of PRC initiatives during the 1950s and 1960s. Before largely settling on simplifying the existing system, some within the PRC, including Mao Zedong, also explored the total replacement of Chinese characters with a phonetic script, usually based on the Latin alphabet, culminating in projects such as Gwoyeu Romatzyh and Latinxua Sin Wenz.^[144]

The PRC initiated the first round of simplifications with two documents published in 1956 and 1965. The reforms both simplified the forms of many characters in use, and reduced the total number of characters in the lexicon.^[145] The majority of first round characters were drawn from conventional abbreviated or ancient forms.^[146] For example, the orthodox character 來 was written as 来 in the earlier clerical script; it used one fewer stroke, and was thus adopted as a simplified form. The Template:Zhc character was written as 云 in the ancient oracle bone script. This simpler form had remained in use later as a phonetic loan with a meaning of 'to say', and with the original meaning of 'cloud' it was instead written with an added Template:Kxr radical as a semantic indicator. When using simplified forms, these two characters are merged into 云.^{[upper-alpha 7]}

A second round of simplifications was promulgated in 1977, but it was poorly received by the public, and fell out of official use very quickly, ultimately being formally rescinded in 1986. The second round of simplifications were unpopular in large part because the vast majority of its forms were completely new, in stark contrast to the many familiar variants present in the first round.^[147]

Two revised lists of simplified characters were published in 1988: the List of Commonly Used Characters in Modern Chinese having 2,500 common characters and 1,000 less common characters, and the Chart of Generally Utilised Characters of Modern Chinese with 7,000 characters, including those in the smaller list. In 2013, the revised Table of General Standard Chinese Characters replaced the 1988 lists as the new standard: it includes 8,105 characters, with 3,500 categorised as primary, 3,000 as secondary, and 1,605 as tertiary.^[148] GB 2312, an early version of the national encoding standard used in the PRC, has 6,763 code points; its modern, mandatory successor GB 18030 has a much higher number.^[149] The Chinese Proficiency Test (HSK) covers 2,663 characters and 5,000 words at its highest level, while the Chinese Proficiency Grading Standards for International Chinese Language Education would cover 3,000 characters and 11,092 words at the highest level.^[150]^[151]^[152]

Singapore

Singapore underwent three successive rounds of character simplification promulgated by the Ministry of Education, with the first two having some simplifications that differed from those used in mainland China. The first round was published in 1969, and consisted of 498 simplified and 502 traditional characters. The second round in 1974 consisted of 2287 simplified characters, including 49 differences from the PRC system that were removed with the final round in 1976.^[153] In 1993, Singapore adopted the revisions made by mainland China in 1986.

Unlike in mainland China, where personal names may only be registered using simplified characters, Singapore parents have the option of registering their children's names in traditional characters.^[154]

Malaysia

Malaysia uses simplified characters in Chinese-language schools. Chinese-language newspapers in the country are published in either simplified or traditional characters—often, headlines are printed with traditional forms, and the body with simplified forms.^[155]

Philippines

In the Philippines, most Chinese schools and businesses still use traditional characters with bopomofo, owing to Taiwanese influence due to a shared Hokkien heritage. Recently, more Chinese schools have switched to using simplified characters alongside pinyin, and many schools use some combination of the two. Since most of the readership of Chinese-language newspapers in the country belong to an older generation, they are still largely published using traditional characters.^[156]

Traditional characters

Taiwan

In Taiwan, the Ministry of Education's Chart of Standard Forms of Common National Characters lists 4,808 characters; the Chart of Standard Forms of Less-Than-Common National Characters lists another 6,341 characters. The Chinese Standard Interchange Code (CNS11643)—the official national encoding standard—supported 48,027 characters in its 1992 version; currently encoding over 96,000 characters,^[157] while BIG-5, the most widely used non-Unicode encoding, supports only 13,053. The Test of Chinese as a Foreign Language (TOCFL) covers 8,000 words at its highest level. The Taiwan Benchmarks for the Chinese Language (TBCL), a guideline designed to describe levels of Chinese language proficiency, covers 3,100 characters and 14,425 words at the highest level.^[158]^[159]

Hong Kong

In Hong Kong, which uses traditional characters, the Education and Manpower Bureau's List of Graphemes of Commonly-Used Chinese Characters, containing 4,759 characters, is intended for use in elementary and junior secondary education.

North America

Most Chinese-language newspapers and signage in the United States and Canada use traditional characters.^[160] There is some effort to get municipal governments to implement more simplified character signage due to recent immigration from mainland China.^[161]

Kanji

After World War II, the Japanese government also instituted a series of orthographic reforms. Some characters were given simplified forms called shinjitai; the older forms were then labelled the kyūjitai. The use of numerous variant forms was discouraged, and lists of characters to be learned during each grade of school were created: first the 1850-character tōyō kanji list in 1945, and then the 1945-character jōyō kanji list in 1981, with a 2136-character revision in 2010. The Japanese government restricts characters that can be used in names to the jōyō kanji plus an additional list of 983 jinmeiyō kanji historically prevalent in names.^[162] While these lists serve as a guideline, unlisted characters are still widely used by native Japanese speakers, such as the kyūjitai form of 'dragon' (龍) alongside the shinjitai form (竜).

Hanja

The South Korean Basic Hanja for Educational Use is a set of 1,800 characters standardised in 1972, with the first 900 hanja taught to middle school students, and the rest taught to high school students.^[139]

In March 1991, the Supreme Court of Korea published the 2,854-character Table of Hanja for Use in Personal Names.^[163] The list expanded gradually: by 2015 there were 8,142 hanja, including the set of basic hanja, permitted for use in Korean names.^[164]

Special cases

Contractions and abbreviations

Some compound words and set phrases have been represented by single-character contractions, often considered ligatures instead of characters representing a single morpheme. They are often used in handwriting or for decorative purposes, but are sometimes seen in print. They are called 合文; héwén, 合书; 合書; héshū or 合体字; 合體字; hétǐzì in Chinese; in the special case where two characters are combined, they are known as Template:Zhl. For the sake of standardisation, the Chinese government has sought to limit the use of polysyllabic characters in writing.^[1] A popular example is the 'double happiness' character 囍 formed as a ligature of 喜喜, and referred to by its disyllabic name 双喜; 雙喜; shuāngxǐ.^{[upper-alpha 8]}

Numerals are also sometimes written as ligatures—for example, 廿; niàn; 'twenty' is normally read as 二十; èrshí in Standard Chinese,^{[upper-alpha 9]}^[1] and as jaa⁶ in Cantonese.^[165] Calendars often use this and other numeral ligatures to save space, with 廿 being standard. Thus, one may write "21 March" as 三月廿一.

The use of contractions is as old as the writing system itself. In oracle bone script, personal names, ritual items, and even whole phrases are contracted into single characters: for example, Template:Zhc becomes Template:Zhc. A dramatic example found in medieval manuscripts writes Template:Zhl as a contracted character, composed of four 十 arranged in a 2×2 grid—derived from the Template:Kxr components within the original characters. Other historical examples include contractions used to represent SI units, which have generally fallen out of use. In Chinese, SI units usually consist of two morphemes, such as Template:Zhl and Template:Zhl. In the 19th century, these were often contracted, with 瓩 used for 千瓦 and 糎 used for 厘米. Some of these were also used in Japan, where they used pronunciations borrowed from European languages. Miscellaneous examples include Template:Zhc, a contraction of Template:Zhc.^{[upper-alpha 10]}

Multi-syllable morphemes

A small number of morphemes in Chinese are disyllabic, some of which even date back to the Classical period.^[166] Excluding loanwords, these are typically words for plants and small animals, usually written with a pair of phono-semantic compounds sharing a common radical. Examples are Template:Zhc and Template:Zhc—the first character of 'butterfly' and the second character of 'coral' each have 胡 for a phonetic component, with the Template:Kxr and Template:Kxr radicals as their respective semantic components, also present within the other character of each word. Neither of the aforementioned hú characters exist as independent morphemes, except as poetic abbreviations of the disyllabic words.

A notable example regards the name for the pipa, a type of lute. The instrument's name 枇杷 was originally shared with one for the loquat,^{[lower-alpha 14]} which has a shape reminiscent of the instrument. The name for the instrument was originally written with the Template:Kxr radical as 批把, referring to the upward and downward strokes made when playing the instrument. The name for the fruit was later changed to its present 枇杷, with the Template:Kxr radical; the name for the instrument became 琵琶, with 珡 ('guqin') incorporated into both characters.^{[upper-alpha 11]}

With the erhua phenomenon in Mandarin varieties, expressed via the fusion of the diminutive 儿; ér suffix, some monosyllabic words may be written with two characters, such as in Template:Zhp.

Rare and complex characters

Rare or antiquated character variants more often appear in personal or place names. As many computer-based systems have prioritised the most common characters, this can create problems. As a representative example, the name of Taiwanese politician Yu Shyi-kun contains the rare character Template:Zhc; printing this character is often nontrivial. Newspapers have dealt with this problem in ways including using software to combine two extant characters into a similar-looking compound, embedding a picture of the character instead of encoding it as text, substituting a homophonic character with the expectation that the reader would make the correct inference.Lua error: Internal error: The interpreter has terminated with signal "24". Generally, printed materials in Taiwan will annotate such a character with bopomofo. Japanese newspapers often replace obscure characters with katakana instead, as is accepted practice in Japanese style guides.Lua error: Internal error: The interpreter has terminated with signal "24".

There are also extremely stroke-rich characters, which tend to be rare. A notable example is Template:Zhc, which fell out of use by the end of the 5th century, containing 64 strokes. This character may not necessarily be seen as the most complex or difficult, as it simply requires writing the 16-stroke character Template:Zhc four times within the space allotted for one. Another 64-stroke character created in the same manner is Template:Zhc, composed of the character Template:Zhc in quadruplicate.

One of the most complex characters found in modern Chinese dictionaries is Template:Zhc with 36 strokes.^{[upper-alpha 12]} Other stroke-rich characters include the triplicated Template:Zhc with 39 strokes, and the quadruplicated Template:Zhc with 52, both meaning 'the loud noise of thunder'—however, these are not commonly used. As an example, the most complex character that can be input with a representative IME^{[lower-alpha 15]} is Template:Zhc. It is composed of the Template:Kxr radical in triplicate, having a total of 16 × 3 = 48 strokes. Among the most complex characters presently in common use are Template:Zhc with 32 strokes, Template:Zhc—also the character in the jōyō kanji list having the most strokes, with 29—Template:Zhc with 28, and Template:Zhc with 25. Also occasionally in modern use is Template:Zhc, a variant of Lua error: Internal error: The interpreter has terminated with signal "24". with 33 strokes.

In Japanese, an 84-stroke kokuji exists: , normally read taito. It is composed of the 'cloud' character Lua error: Internal error: The interpreter has terminated with signal "24". atop the aforementioned triple-'dragon' character, also possessing the meaning of 'appearance of a dragon in flight': it has readings Lua error: Internal error: The interpreter has terminated with signal "24"., Lua error: Internal error: The interpreter has terminated with signal "24"., and Lua error: Internal error: The interpreter has terminated with signal "24"..^[167]

Lua error: Internal error: The interpreter has terminated with signal "24"., 'verbose'
Lua error: Internal error: The interpreter has terminated with signal "24"., meaning unknown
Lua error: Internal error: The interpreter has terminated with signal "24"., 'snuffle'
Lua error: Internal error: The interpreter has terminated with signal "24"., 'appearance of a dragon in flight'
Alternative form of Lua error: Internal error: The interpreter has terminated with signal "24".
Lua error: Internal error: The interpreter has terminated with signal "24"., a kind of noodle from Shaanxi

Lexicography

Dozens of schemes have been devised for indexing Chinese characters and sorting them into dictionaries. Most of these are specific to the dictionary for which they were invented, and relatively few have seen widespread use. Often, character dictionaries incorporate several mechanisms by which users may locate entries. Methods for arranging Chinese dictionaries are divided into form-based orders that sort by visual properties, sound-based orders usually based on an extant transliteration scheme, and meaning-based orders.Lua error: Internal error: The interpreter has terminated with signal "24".Lua error: Internal error: The interpreter has terminated with signal "24".

Many character dictionaries are indexed using a technique known as radical-and-stroke sorting, where characters are grouped by radicals, which are in turn sorted by stroke number. Classification by radical was introduced by the Shuowen Jiezi, which used 540 radicals. The set of 214 Kangxi radicals were popularised by the Kangxi Dictionary promulgated in 1716, but were originally introduced in the Zihui in 1615. Another form-based system is the four-corner method, where characters are classified according to the shapes at each of the character's corners. In modern Chinese, characters and words are also ordered by their frequency of use within a given corpus. Stroke-based sorting includes techniques that combine sorting by stroke count and stroke order, as well as YES sorting.

Most modern Chinese dictionaries arrange the main character entries alphabetically according to pinyin spelling, while also providing a traditional radical-based index.Lua error: Internal error: The interpreter has terminated with signal "24".Lua error: Internal error: The interpreter has terminated with signal "24". To find a character with an unknown pronunciation using one of these dictionaries, a reader locates the character in the radical index, where they are further sorted by stroke count. The corresponding entry in the radical index will provide the character's pronunciation, or the page number of the character's main entry in the dictionary.

Studies have suggested that literate individuals within China have an active vocabulary of three to four thousand characters, while specialists in fields like classical literature or history may have a working vocabulary of five to six thousand.Lua error: Internal error: The interpreter has terminated with signal "24".Lua error: Internal error: The interpreter has terminated with signal "24". Estimates of the total number of characters in modern use can be sourced from encoding schemes and dictionaries: according to sources from mainland China, Taiwan, Hong Kong, Japan, and Korea, this number is likely around 15,000.Lua error: Internal error: The interpreter has terminated with signal "24".Lua error: Internal error: The interpreter has terminated with signal "24". For comparison, Unicode encodes over 90,000 CJK Unified Ideographs.^[169] There exist roughly 1,500 Japanese kokuji,^[170] Korean gukja, over 10,000 sawndip used to write Zhuang, and almost 20,000 Template:Langr characters created in Vietnam.^[171]