A diacritic is a mark near or through a character that changes its phonetic value or significance. For example, diacritics appear above the letter "e" in the word "résumé," distinguishing the noun from the verb "resume." Diacritics are more common in various European languages than they are in English.
The following are common diacritics:
Sounds unique to Eastern European languages were once written with two letter combination called "digraphs." In De Ortographia Bohemica (1412), Jan Hus proposed the use of diacritics in place of digraphs. Eight-bit character encoding, introduced in the 1980s, allows dozens of characters with diacritics to be rendered on computer and transmitted electronically. Unicode was incorporated into Windows in 2000. It allows for an almost unlimited character set.
All the major style guides advise the writer to select a widely available reference work and to follow the spellings given in this work. Modern computer software allows dictionary and encyclopedia spellings to be reproduced exactly. Better known names, for example "Istanbul" or "Zurich," are often spelled without diacritics in English even though diacritics are part of the local language spelling. Lesser known names are generally spelled in the manner of the original language. Diacritics are not normally used for sports figures or for Vietnamese names. These are just rules of thumb, and each case should be checked separately in an appropriate reference work.
Merriam-Webster[1] | American Heritage[2] | Oxford[3] | Webster's New World[4] | Random House[5] | Encyclopedias | |
---|---|---|---|---|---|---|
Britannica[6] | Columbia[7] | |||||
Be·neš, Edvard | Be·neš, Eduard | Beneš, Edvard | Beneš, Edvard | Be·neš, Ed·u·ard | Edvard Beneš | Eduard Beneš |
Koś·ciusz·ko, Tadeusz Andrzei Bonawentura | Kos·ci·uśz·ko or Kos·ci·us·ko, Thaddeus | Kosciusko, Thaddeus | Kosciusko, Thaddeus | Kos·ci·us·ko, Thaddeus | Tadeusz Kościuszko | Thaddeus Kosciusko |
Mit·ter·rand, François (-Maurice) | Mit·ter·rand, François Maurice | Mitterrand, François | Mitterrand, François (Maurice) | Mit·ter·rand, Fran·çois (Mau·rice Ma·rie) | François Mitterrand | François Maurice Mitterrand |
Tō·jō Hideki | To·jo, Hideki | Tojo, Hideki | Tojo, Hideki | To·jo, Hi·de·ki | Tōjō Hideki | Tōjō Hideki |
Vö·rös·marty, Mihály[8] | N/A | N/A | N/A | N/A | Mihály Vörösmarty | Mihály Vörösmarty |
Wa·łe·sa [sic.], Lech | Wa·łę·sa, Lech | Wałęsa, Lech | Wałęsa, Lech | Wa·łę·sa, Lech | Lech Wałęsa | Lech Wałęsa |
The U.S. Board on Geographic Names sets U.S. government usage in geography. The “conventional” name is the name BGN deems suitable for English language usage. The “approved” name is the official name in the local language.
Merriam-Webster[1] | American Heritage[2] | Oxford[3] | Webster's New World[4] | Random House[5] | Encyclopedias | U.S. Board on Geographic Names[9] | |||
---|---|---|---|---|---|---|---|---|---|
Britannica[6] | Columbia[7] | Conventional | Approved | ||||||
Is·tan·bul | Is·tan·bul | Istanbul | Istanbul | Is·tan·bul | Istanbul | Istanbul | N/A | İstanbul | |
Jy·vas·ky·la | N/A | Jyväskylä | N/A | Jy·väs·ky·lä | Jyväskylä | Jyväskylä | N/A | Jyväskylä | |
Lü·beck | Lü·beck | Lübeck | Lü·beck | Lü·beck | Lübeck | Lübeck | N/A | Lübeck | |
Plo·iesti or Plo·esti | Plo·ieş·ti or Plo·eş·ti | Ploieşti | Plo·ieş•ti or Plo·eş·ti' | Plo·eş·ti | Ploieşti | Ploieşti | N/A | Ploiești | |
Zu·rich | Zu·rich | Zurich | Zu·rich | Zu·rich | Zürich | Zürich | N/A | Zürich | |
Vietnamese towns | |||||||||
Ho Chi Minh City | Ho Chi Minh City | Ho Chi Minh City | Ho Chi Minh City | Ho Chi Minh City | Ho Chi Minh City | Ho Chi Minh City | Ho Chi Minh City | Thành Phố Hồ Chí Minh | |
Ha·noi | Ha·noi | Hanoi | Hanoi | Ha·noi | Hanoi | Hanoi | N/A | Hà Nội | |
Hai·phong | Hai·phong | Haiphong | Haiphong | Hai·phong | Haiphong | Haiphong | N/A | Hải Phòng | |
Hue[10] | Hue | Hué | Hue | Hué | Hue | Hue | N/A | Huế |
Eight characters with diacritics are included in International Morse Code: Ä, Á, Å, Ch (a Czech digraph), É, Ñ, Ö, and Ü. This encoding method, which includes only capital letters, was developed by Friedrich Clemens Gerke in 1848 and was adopted as an international standard in 1865.
In the early 1900s, teletype displaced Morse code for most purposes. Teletype was encoded using Baudot. Baudot is a five-bit code developed in 1870 that includes only capital letters and has no diacritics. Baudot, in turn, was displaced by ASCII, a seven-bit code developed in 1963 that includes both upper and lower cased letters. IBM introduced Extended ASCII, an eight-bit encoding standard, with the original PC in 1981. This set includes 37 characters with diacritics. Latin-1, a slightly revised version of the IBM character set, was adopted as an international standard in 1987.[11]
Unicode, implemented by the Windows operating system since 2000, includes Latin-1 as well as a comprehensive collection of Nordic, Eastern European, and even Asian characters. Unicode characters can be up to four bytes long. This allows for over 1.1 million characters to be encoded, although only 113,000 codepoints have been assigned so far.[12]
This eight-bit character set covers Western European languages. It is a variation of IBM's "Extended ASCII" set. This set is often referred to as "ANSI." However, the standard approved by the American National Standards Institute is for an eight-bit character set, not this set specifically. The set includes the following diacritics:
In Unicode, the Latin-1 characters have codepoints from U+0000 to U+00FF.
Latin-2 is an eight-bit character set intended for use with Eastern European languages. It includes the following diacritics:
Turkish can be encoded as Latin-5, while the Nordic languages may be encoded as Latin-6. Since the shift to Unicode, the various eight-bit character sets have become less relevant.
The following references may be consulted to determine proper spelling, including the correct use of diacritics:
American dictionaries
British dictionaries
Encyclopedias
Sports
Categories: [Grammar] [Linguistics] [Communication]