Thesaurus
Contents
Definition
A hierarchical arrangement of related words and phrases often displayed in systematized lists of synonyms
Description
Thesaurus is "a controlled and dynamic vocabulary of semantically and generically related terms/concepts which covers a specific domain of knowledge".
Thesauri - are based on concepts and relationships between them. Relationships commonly expressed in a thesaurus include hierarchy, equivalence (synonyms) and association. These relationships are generally represented by the notation BT (broader term), NT (narrower term), USED FOR (synonym), and RT (associative or related term). Associative relationships may be more detailed in some schemes. For example, the INIS Thesaurus (see below) has defined eight relationships, many of which are associative. Preferred terms for indexing and retrieval are identified. Entry terms (or non-preferred terms) point to the preferred terms to be used for each concept.
Many thesauri are large; they may include more than 50,000 terms. Most were developed for a specific discipline or a specific product or family of products.
Thesauri basically take taxonomies and extend them to make them better able to describe the world by not only allowing subjects to be arranged in a hierarchy, but also allowing other statements to be made about the subjects.
Thesaurus can be used as
- Tool for document indexing
- Tool to describe knowledge/information in structured form
- Communication language between user and computer
- Tool for multilingual and semantic search
Thesaurus Structure (Relationships)
The main types of relationships include:
- hierarchical (between broader and narrower concepts e.g. flowers and roses)
- equivalence (between synonyms and near-synonyms e.g. motor-bikes, motor-cycles and motorcycles)
- associative (between concepts that are closely related in some non-hierarchical way, e.g. between a disease and the virus that causes that disease)
BT (Broader Term) - refers to the term above the current one in the hierarchy (term with wider or less specific meaning). In practice some systems allow multiple BTs for one term, while others do not.
NT (Narrower Term) - an inverse property known which is implied by the BT.
SN (Scope Note) - is a string attached to the term explaining its meaning within the thesaurus. This can be useful in cases where the precise meaning of the term is not obvious from context (i.e. technical solution vs solution in chemistry).
USE (a specific term instead) - refers to another term that is to be used instead of the current term and implies that the terms are synonymous (an inverse property known as UF or USED FOR). For example, on 'topic navigation maps' we could put a 'USE' property referring to 'topic map'. This would mean that we recognize the term 'topic navigation map', but that 'topic maps' means the same thing and we encourage the use of 'topic maps' instead. If we do this we would also have a 'UF' property on 'topic map' referring to 'topic navigation map', since this is implied by the 'USE' relationship.
TT (Top Term) - refers to the topmost ancestor of this term. The term at the other end of this property is the one that would be found by following the 'BT' property until you reach a term that has no 'BT'. This property is strictly speaking redundant, in the sense that it doesn't add any information, though it may be convenient.
RT (Related Term) - refers to a term that is related to the current term, without being a synonym or a broader/narrower term. For 'topic map' we could use this to indicate that 'subject-based classification' and 'ontology' are terms related to 'topic map'.
One could say that taxonomies are thesauri that only use the BT/NT properties to build a hierarchy, and don't make use of any of the properties described below, so it could be said that every thesaurus contains a taxonomy. In short, thesauri provide a much richer vocabulary for describing the terms than taxonomies do and so are much more powerful tools. As can be seen, using a thesaurus instead of a taxonomy would solve several practical problems in classifying objects and also in searching for them.
In multilingual thesauri equivalence also applies between corresponding terms in different natural languages. Establishing correspondence is not always easy, and the standard provides recommendations for handling the difficulties that commonly arise.
International Standards for Thesauri Development
There are several international standards which define the basic rules for thesaurus development:
- UNESCO Guidelines for the establishment and development of monolingual thesauri. 1970 (followed by later editions in 1971 and 1981)
- ISO 2788 Guidelines for the establishment and development of monolingual thesauri. 1974 (revised 1986)
- ISO 5964 Guidelines for the establishment and development of multilingual thesauri. 1985
- ISO 25964 Thesauri and interoperability with other vocabularies. Part 1 - Thesauri for information retrieval published 2011; Part 2 - Interoperability with other vocabularies published 2013.
INIS Thesaurus
INIS Thesaurus is one of the main products of the International Nuclear Information Systems (INIS) and is the result of a systematic study performed by subject specialists at the INIS Secretariat and INIS Member States.
The domain of knowledge covered by the INIS Thesaurus includes
- physics (in particular, plasma physics, atomic and molecular physics, and especially nuclear and high-energy physics),
- chemistry,
- materials science,
- earth sciences,
- radiation biology,
- radioisotope effects and kinetics,
- applied life sciences,
- radiology and nuclear medicine,
- isotope and radiation source technology,
- radiation protection,
- radiation applications,
- engineering,
- instrumentation,
- fossil fuels,
- synthetic fuels,
- renewable energy sources,
- advanced energy systems,
- fission and fusion reactor technology,
- safeguards and inspection,
- waste management,
- environmental aspects of the production and consumption of energy from nuclear and non-nuclear sources,
- energy efficiency and energy conservation,
- economics and sociology of energy production and use,
- energy policy, and
- nuclear law.
The terms in the INIS Thesaurus are listed alphabetically. For each alphabetical entry, a "word block", containing the terms associated with this particular entry, is displayed. In the word block, terms that have a hierarchical relationship to the entry are identified by the symbols BT for Broader Term, and NT for Narrower Term; a term with an affinitive relationship is identified by RT, for Related Term; terms with a preferential relationship are identified by USE or SEE, and UF for Used For, and SF for Seen For. In case of multiple USE relationships for a forbidden term, all listed descriptors should be used to index or search a given concept. In case of multiple SEE relationships, one or more of the listed descriptors should be considered for indexing or searching this concept.
Over the years the INIS Thesaurus has evolved as a result of systematic study and it contains over 40 000 terms.
Using the INIS Thesaurus as a starting point, INIS, in cooperation with the Member States, has developed INIS Multilingual Thesaurus with navigation capabilities including the full thesaurus hierarchy. It is available in all official languages of the IAEA. Namely, Arabic, Chinese, English, French, Russian, and Spanish, as well as in German and Japanese.
It represents a unique multilingual thesaurus in the nuclear field and serves as a major tool for indexing and describing nuclear information and knowledge in a structured form.
The INIS Multilingual Thesaurus is available on-line on the Internet: https://nkp.iaea.org/Thesaurus/