Controlled vocabulary

This article or chapter is incomplete and its contents need further attention. Some information may be missing or may be wrong, spelling and grammar may have to be improved, use your judgment!

Definition[edit | edit source]

“A controlled vocabulary is a vocabulary consisting of a “prescribed list of terms or headings each one having an assigned meaning.”1 The way a controlled vocabulary defines the relationships between these terms or headings will vary in degree of complexity according to the purpose of the vocabulary, from simple alphabetically arranged flat lists to ontologies with richly defined relationships.” (Currier et al., 2005:9)

“Controlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri and taxonomies. Controlled vocabulary schemes mandate the use of predefined, authorised terms that have been preselected by the designer of the vocabulary, in contrast to natural language vocabularies, where there is no restriction on the vocabulary.” (Wikipedia, retrieved 15:51, 28 February 2009 (UTC)).

See also pedagogical vocabulary and all sorts of standards

To do:

Clear up the relationship of types (classes) defined by a controlled vocabulary and instances. Instances, for example, could be part of an ontology but not of a classification scheme.
Different use cases, e.g. library sciences vs. information technology.

Purpose[edit | edit source]

A controlled vocabulary is a list of terms (e.g. words, phrases) that is used to tag (label) information in a consistent way.

According to Wikipedia, Controlled vocabularies solve the problems of homographs (words with same spelling but different meaning), synonyms (different words with same meaning) and polysemes (words with multiple meanings) by ensuring that each concept is described using only one authorized term and each authorized term in the controlled vocabulary describes only one concept. In short, controlled vocabularies reduce ambiguity inherent in normal human languages where the same concept can be given different names and ensure consistency. (retrieved 15:51, 28 February 2009 (UTC))

For example, in library sciences controlled vocabulary is defined as “An established list of preferred terms from which a cataloger or indexer must select when assigning subject headings or descriptors in a bibliographic record, to indicate the content of the work in a library catalog, index, or bibliographic database.” (ODLIS, retrieved 15:51, 28 February 2009 (UTC)).

In research and development, controlled vocabularies can help to talk to each other, i.e. contribute to shared understanding / common grounding in a community of practice . E.g. repertory grid technique can be used to elicit constructs from stakeholdes. These can then be negotiated and common vocabularies can be built, e.g. with the method developed by Shaw and Gaines (1989). Such group elicitation procedures work at least for groups of people that understand a topic in a similar ways.

Types of controlled vocabularies[edit | edit source]

Currier (2005) distinguish between the following kinds of controlled vocabularies to which we added metadata schemes.

Flat list: A simple flat list of terms

Glossary: An alphabetical list of terms with some explanation

Subject headings list: See subject heading. A systematic list of subject headings like the ones used for library catalogues. A subject header provides one of the access points to information.

Taxonomy: In a wide sense almost any kind of well defined list of terms; In one narrow sense, a mono-hierarchical classification of terms. I.e. a child term inherits in principle the properties of the parent term. E.g. controlled vocabularies are a kind of vocabularies, or XHTML is a kind of XML application which is a kind of formalism for defining a formal grammar. This is the equivalent of a kind of typology.; In another narrow sense: “controlled vocabulary in which concepts are represented by preferred terms, formally organized so that paradigmatic relationships between the concepts are made explicit, and the preferred terms are accompanied by lead-in entries for synonyms or quasi-synonyms” (Willpower Information, retrieved 15:08, 27 February 2009 (UTC)). In other words, one also could define a taxonomy with non-hierarchical relationships, but we would rather call these "thesauri".

Classification scheme: A classification scheme is primarily developed for browsing, rather than as indexing or search tools. ([Pedagogical vocabularies project). We therefore could qualify it as a kind of taxonomy.

Metadata schemes: Metadata are a kind of classification scheme or taxonomy. The most well known scheme for the Internet is Dublin Core and in e-learning Learning Object Metadata Standard is popular.; See metadata.


Thesaurus: A thesaurus is like a taxonomy or a classification scheme, but richer. Leonard Will defines it as “controlled vocabulary in which concepts are represented by preferred terms, formally organized so that paradigmatic relationships between the concepts are made explicit, and the preferred terms are accompanied by lead-in entries for synonyms or quasi-synonyms” (Willpower Information, retrieved 15:08, 27 February 2009 (UTC).; Joan M. Reitz provides a similar definition: “Also refers to an alphabetically arranged lexicon of terms comprising the specialized vocabulary of an academic discipline or field of study, showing the logical and semantic relations among terms, particularly a list of subject headings or descriptors used as preferred terms in indexing the literature of the field.” (T, retrieved 15:08, 27 February 2009 (UTC)).

Topic map: See topic maps, an ISO standard to organize a forest of resources. It's something in between a taxonomy and an ontology.

Ontology: In computer science, ontology refers to “a model for describing the world that consists of a set of types, properties, and relationship types. Exactly what is provided around this varies, but this is the essential of an ontology. There is also generally an expectation that there be a close resemblance between the real world and the features of the model in an ontology” (Garshol, cited by Pedagogical vocabularies project); See ontology

Types of uncontrolled vocabularies[edit | edit source]

There exist other ways to describe information in a somewhat systematic way.

Glossaries made from words in the text: A good example is the glossary of a book, which is usually made by the author or by people trained to spot the most important concepts. The glossary can be very systematic (e.g. use "see" and "see also" links), but it emerges from the content.

Free indexing and folksonomies: Any phrase can be used. A good example are folksonomies, i.e. free indexing by (usually) many users. Folksonomies are sets of free tags assigned by users to an object. Computer systems then may display maps of emergent organization (via statistical analysis) or at least some visualization as in tag clouds. This wiki's categories would fall are a kind of folksonomy.; See Tagging

Encyclopedias and similar: This wiki for example, can be thought of being an interconnected list of concepts and very close to flat lists and glossaries. Emergent relationships then can be visualized in various ways. E.g. with something like the SVG visualization of this page or more complicated graphs that we won't show here since they could bring your PC to a halt.

Free text indexes: E.g. what Google does to this wiki. Using it's webmaster and/or tools you can get an idea how Google looks at your website.

Formalisms[edit | edit source]

Many of todays controlled vocabularies, in particular taxonomies and ontologies are defined in XML and XML applications like RDF and Topic maps or languages built on top of of RDF like OWL, SKOS, Dublin Core or the IEEE Learning Object Metadata Standard

But many other formal, less formal or non-formal ways exist. E.g. LDAP Schemas are built with ASN.1, Abstract Syntax Notation-1 (X.691). An LDAP schema can be considered a hierarchical taxonomy of properties that can describe a person.

Typologies[edit | edit source]

A typology is a list of types that share similar features. These features are usually described with controlled classification criteria (vocabularies), but also can be determined through analysis of subjective representations, e.g. with repertory grid technique, by analyzing taggings, or statistical content analysis.

Also, there is a distinction between type (the set of objects that share the same or similar features) and the token (the instance). E.g. there exists a type of educational software that is called learning management systems and an example (token) would be Moodle.

Links[edit | edit source]

Glossary of terms relating to thesauri and other forms of structured vocabulary for information retrieval. (Good).

Reitz, Joan M., ODLIS — Online Dictionary for Library and Information Science (Good)

Controlled vocabulary (Wikipedia)

Bibliography[edit | edit source]

Currier Sarah, Lorna M. Campbell, Helen Beetham (2005). Pedagogical Vocabularies Review, JISC Pedagogical Vocabularies Project, Final Draft, 23rd December 2005 Pedagogical vocabularies project

Falconer, Isobel, Gráinne Conole, Ann Jeffery, and Peter Douglas (2006). Learning Activity Reference Model – Pedagogy, LADIE reference model guides, The e-learning framework. word doc -archive (broken)

Garshol, Lars Marius (2004). Metadata? Thesauri? Taxonomies? Topic Maps! Making sense of it all. Journal of Information Science, 30 (4), pp. 378-391. HTML Preprint.

Reitz, Joan M. (2004). Dictionary for Library and Information Science, Libraries Unlimited, ISBN 1591580757.

Shaw, Mildred L G & Brian R Gaines (1989). Comparing Conceptual Structures: Consensus, Conflict, Correspondence and Contrast, Knowledge Acquisition 1(4), 341-363. ( A reprint is available from Knowledge Science Institute, University of Calgary, HTML