English as a hybrid Romance-Germanic language

This original article by Dan Polansky and Yuwash investigates the hypothesis that English is a hybrid Romance-Germanic language rather than Germanic language, as it is often classified. The hypothesis is not necessarily part of scientific mainstream; many linguists would classify a language more on the basis of grammatical properties than the mixed origin of its core vocabulary.

English appears to be a hybrid Romance-Germanic language based on the mixed origin of its vocabulary. The degree to which English vocabulary is permeated with words stemming from Latin is remarkable. To determine the proportion of words that are of Romance origin (Latin or Latin via French), one needs to look at something like top 5,000, 10,000 or 80,000 words; if, by contrast, one includes the large swaths of the bottom-ontology scientific vocabulary, Latin and Greek are expected to outnumber everything else as origin of the words, but that is to be expected for many European languages and is not interesting or distinguishing English from them.

Anecdotally, when I (Dan Polansky) see or hear Italian, it reminds me of English; when I see or hear Danish, it reminds me of German.

Case for English being hybrid based on vocabulary

In sections below, various sources show that, in the core English vocabulary, words of Romance origin (from Latin, French, etc.) dominate words of Germanic origin. Let us emphasize that this concerns the core vocabulary. In Simons 2017, French origin and Latin origin combined reach 40% of vocabulary for about 1000 most common English words, reaching 50% of vocabulary for about 2000 most common English words, and rising slowly higher as the number of most common English words analyzed increases. (In Finkenstaedt and Wolff 1973, there are more than twice as many words from Latin and French than of Germanic origin, but since the basis for this analysis are 80,000 words and we aim to look at core vocabulary, this is a much weaker argument). To disregard this lexical dominance and classify modern English merely based on its grammatical features appears debatable.

Simons 2017

The graph in Simons 2017^[1], in article section Visualizing the data, suggests that French origin and Latin origin combined reach 40% of vocabulary for about 1000 most common English words, reaching 50% of vocabulary for about 2000 most common English words, and rising slowly higher as the number of most common English words analyzed increases. Simons 2017 indicates wordfrequency.info as its source for word frequencies, where the website indicates that "The data is based on the one billion word Corpus of Contemporary American English (COCA) -- the only corpus of English that is large, up-to-date, and balanced between many genres."

Williams 1975

w:Joseph M. Williams has conducted a survey^[2] over 10 000 words based on data “compiled from several thousands of business letters” (which originates from Roberts 1965^[3]). The breakdown is as follows:

Classification of most frequent words by origin (in %)
Decile	English	French	Latin	Danish	Other
1	83	11	2	2	2
2	34	46	11	2	7
3	29	46	14	1	10
4	27	45	17	1	10
5	27	47	17	1	8
6	27	42	19	2	10
7	23	45	17	2	13
8	26	41	18	2	13
9	25	41	17	2	15
10	25	42	18	1	14

Simons 2017 has some reservations about the methodology^[1].

Issues by Dan Polansky with the above section (the section in the current form was authored mostly by Yuwash):

What does "English" mean? Does it mean "Middle English? Or "Old English"? Or does it refer to an ancestor of "Old English"?

Finkenstaedt and Wolff 1973

A compurized survey conducted by Finkenstädt and Wolff over the “Shorter Oxford Dictionary (3rd edition)” containing around 80 000 words has yielded the following distribution^[4]^[5]:

French, including Old French and early Anglo-French (28.3%)
Latin, including modern scientific and technical Latin (28.24%)
Germanic languages (Old/Middle English, Old Norse, Dutch) (25%)
Greek (5.32%)
No etymology given (4.03%)
Derived from proper names (3.28%)
Other (less than 1 % each) (5.83%)

Piechart based on AskOxford

The following pie chart is relevant. The chart description at File:Origins of English PieChart.svg refers to http://www.askoxford.com/asktheexperts/faq/aboutenglish/proportion?view=uk, which is available in Wayback Machine^[5]; the AskOxford page refers the data to Thomas Finkenstaedt and Dieter Wolff (1973) and indicates the data to be "the result of a computerized survey of roughly 80,000 words in the old Shorter Oxford Dictionary (3rd edition)". The chart contains minor mismatches, e.g. where AskOxford states "28.24%" for Latin, the chart rounds it up to "29%", which is arguably unconventional rounding; similar puzzling rounding up is there for other categories. Compared to the AskOxford data, the category "No etymology given" with 4.03% is missing in the chart; it seems the chart lets these 4.03% dissolve in other categories.

The discussed pie chart based on AskOxford:

The chart is mentioned in Simons 2017 via an old version of the English Wikipedia article Latin influence in English (section moved to w:Foreign-language influences in English).

There is a similar chart based on Finkenstaedt and Wolff (1973)/AskOxford, with no labels for the percentages:

Above, the category "Unknown/Other" is large enough to possibly match the union of two categories from AskOxford.

Langfocus 2016

The pie chart mentioned in "Piechart based on AskOxford" section is described in a Langfocus 2016 video^[6]. It lacks attribution but it’s obviously the same image.

Langfocus 2016 also relates the creole hypothesis, by which English is a creole language. The theory highlights huge simplification in English grammar that took place, including considerable reduction of inflection. Old English had an inflection system not unlike many other inflected languages, Langfocus 2016 tells us.

Poorly identified data from Wikipedia

Wikipedia article "Foreign-language influences in English" features the following data:

French (langue d'oïl): 41%;
"Native" English (derived from Old English): 33%;
Latin: 15%;
Old Norse: 5%;
Dutch: 1%; and
Other: 5%.*

The Wikipedia article indicates the source to be Williams 1975. However, the above data bear no clear relation to the actual data published in section Williams 1975; not even the categories match.

The above data first appeared in Wikipedia in https://en.wikipedia.orghttps://en.wikiversity.org/w/index.php?title=Latin_influence_in_English&oldid=393376561, 20 October 2010, by an anonymous IP editor, tracing the data to https://www.amazon.com/dp/0029344700, which is a 1986 edition of Williams 1975, with no page number.

Dawkins and Pinker 2009

In a conversation with Stephen Pinker, Richard Dawkins relates how he was at a linguistic conference and mentioned to linguists he thought English was a hybrid between Germanic and Romance languages, and met with disagreement from the linguists.^[7] While that does not prove much, it shows that a very intelligent and well educated native English speaker can find the hypothesis worth considering.

Romance adjectives

One witness to the hybrid character is that when one wants to create an adjective proper for a noun (rather than an attributive use of the noun), one switches to a Romance word. For example:

tree → arboreal
sky → celestial
earth → terrestrial
star → stellar
water → aquatic
life → biotic

Hybrid words

One consequence or sign of the lexically hybrid character of English are the hybrid Romance-Germanic words, that is, words that contain morphemes of different origin. For instance, lucidness is a Romance-Germanic hybrid since lucid is Romance and -ness is Germanic. In terminology design, one sometimes desires to avoid hybridness, but that is in general hard to do in English. Indeed, hybridness is a Greek-Germanic hybrid, and hybridity is a Greek-Romance hybrid.

That said, the quest for avoidance of hybrids can still be implemented to a limited extent when designing new vocabulary.

References

↑ ^1.0 ^1.1 The English language is a lot more French than we thought, here’s why by Andreas Simons, Apr 13, 2017, medium.com -- Has interesting graphs, like showing how the proportion of origins changes as we move from the most common words to less common words. Not an academic article; use with caution
↑ Williams, Joseph M. (1975). "From middle English to modern English". Origins of the English language, a social and linguistic history. New York: The Free Press. pp. 67-68. ISBN 0-02-935280-0. OCLC 1082209079. Wikidata.)
↑ Roberts, Aaron Hood (1965). A statistical linguistic analysis of American English. The Hague: Mouton. OCLC 833225043.
↑ Finkenstaedt, Thomas; Dieter Wolff (1973). Ordered profusion; studies in dictionaries and the English lexicon. C. Winter. ISBN 978-3-533-02253-4. OCLC 263621959.
↑ ^5.0 ^5.1 "What is the proportion of English words of French, Latin, or Germanic origin?". Ask the experts. Oxford University Press. 2008. Archived from the original on 2008-08-18.
↑ Is English Really a Germanic Language?. Japan: Paul Jorgensen. Sep 8, 2016. Event occurs at 2:10.
↑ Steven Pinker - The Genius of Charles Darwin: The Uncut Interviews, Richard Dawkins Foundation for Reason & Science, 2009, 28:20

Decile	English	French	Latin	Danish	Other
1	83	11	2	2	2
2	34	46	11	2	7
3	29	46	14	1	10
4	27	45	17	1	10
5	27	47	17	1	8
6	27	42	19	2	10
7	23	45	17	2	13
8	26	41	18	2	13
9	25	41	17	2	15
10	25	42	18	1	14

Decile	English	French	Latin	Danish	Other
1	83	11	2	2	2
2	34	46	11	2	7
3	29	46	14	1	10
4	27	45	17	1	10
5	27	47	17	1	8
6	27	42	19	2	10
7	23	45	17	2	13
8	26	41	18	2	13
9	25	41	17	2	15
10	25	42	18	1	14