Library Science

Indexing language

Juran Sarkhel (2017).

(Professor of Library & Information Science, University of Kalyani, India)


1.1 Introduction:

Indexing language (IL) is an artificial language made up of expressions connecting several kernel terms and adopted to the requirements of indexing. The function of an IL is to do whatever a natural language (NL) does and in addition organize the semantic content through a different expression providing a point of access to the seekers of information. An IL is a system for naming subjects and has controlled vocabulary. The vocabulary of an IL may be verbal or coded. A classification scheme uses coded vocabulary in the form of notation and authority lists uses verbal vocabulary. It is a prerequisite to understanding the features of the language used for the representation of the subject content of the documents in terms of their linguistics structures and functions for the purpose of studying the structure of indexing language. Thus, there are areas of linguistics which are of common interest to information scientists.

1.2 Meaning and Scope of Indexing Language:

A language is a code through which messages are transmitted. It is a communication medium based on the association of thoughts/ideas. In terms of linguistics, all spoken languages (i.e. natural language) consist of three basic elements: vocabulary, syntax, and semantics. Vocabulary is a list of terms/words used in a particular natural language. Syntax comprises a grammatical structure or a set of rules that govern the sequence of occurrence of terms/words in a sentence. Semantics refers to the study of what meaning is and how it operates. It is, in other words, a systematic study of how meaning is structured, expressed and understood in the use of a language. The syntax is used to resolve word meaning through the determination of context.

Information systems are concerned with the communication of information about the documents to the potential users of those documents. The means of communication are the subject indexing language or simply an indexing language. An IL is a system for naming subjects of the records of information (i.e. document). It is an artificial language made up of expressions connecting several kernel terms/notations. The function of an IL is to do whatever an NL does and in addition organize the semantic content through a different expression providing a point of access to the seekers of information. Thesauri, the readymade lists of subject headings and classification schemes are the examples of subject indexing languages.

1.3 Natural Language (NL) versus Indexing Language:

If the terms that appear in the documents are used without required modifications, it is a NL. Since the usage of a NL leads to many problems, such as those arising from the use of different words by different authors to denote the same idea, an alternate to NL is, to use artificial language adapted to the specific needs. The artificial language uses concept indexing rather than term indexing. The terms are representatives of a NL used by authors. The concepts imbibe standard description established in IL. The NL is flexible and advantageous to authors to use different terms to denote the same concept. The indexer, who is more concerned with the ideas conveyed rather than the language niceties, depends upon artificial language. The differences between the natural language and indexing language are furnished below:

 

Natural LanguageIndexing Language
A natural language is a set of codes and their admissible expression used for communication of ideas in speech and writing in our day to day life.An indexing language is a set of codes and their admissible expression used for representing the content of the documents as well as queries of the users.
A natural language is “natural” in the sense that it grows freely in the lips of human being, totally free from any control whatever.An indexing language is “artificial” in the sense that it may depend upon the vocabulary of a natural language, though not always, but its syntax, semantics, and orthography would be different from the natural language.
A natural language is developed for communication of ideas among human beings in their day to day life.Indexing language is developed and used for a special purpose, i.e. for the representation of the thought content of the documents as well as queries of the users.
A natural language is a free language and there is no control of synonyms and homographs. One concept may be denoted by more than one term. There is no standardization of terms or words. Anybody can use any words/terms to express her his/ ideas.An indexing language is a controlled language. There is a restriction in using the words/terms in indexing language. Synonyms and homographs are controlled. There is standardization of terms/words. One concept is denoted by only one term.
Natural language provides auxiliaries like prepositions, conjunctions, etc. to bring out the correct meaning of the sentence.Such auxiliaries are not available in an indexing language. The order of terms according to the syntactical rules of an indexing language along with the relational symbols like role operators or indicator digits bring the correct meaning of a subject heading.

1.4 Structure of Indexing Language:

Like natural language, an indexing language consists of three elements: (a) Vocabulary (not free vocabulary, but controlled vocabulary), (b) Syntax, and (c) Semantics. All the structured indexing languages are based upon careful subject analysis. The following figure presents the structure of an indexing language:

Structure of Indexing Language

1.4.1 Controlled Vocabulary: An indexing language operates with a controlled vocabulary. An IL having controlled vocabulary attempting to indicate the relationship between terms in the index vocabulary is systematically structured. The vocabulary of an IL is either verbal or coded. Subject heading lists and thesauri come within the purview ofverbal controlled vocabulary. A classification scheme employs coded vocabulary in the form of its notation. Thus, for example in Colon Classification (CC) Schedule ‘Indian History’ is rendered as V44. In Sear’s List of Subject Headings which employs verbal vocabulary, it is rendered as: India – History. There are also controlled vocabularies like Thesaurofacet, Classaurus, etc., which possess both the characteristics verbal as well as coded controlled vocabularies. In any case, the selection of terms to be used in each discipline is primary and coding is done at a later stage. The need, objectives, methods of vocabulary control, etc. are discussed in detail in another post.

1.4.2 Syntax: The etymological meaning of syntax is ‘putting things together in an orderly manner’. In the context of an indexing language, syntax refers to a set of rules or grammar which governs the sequence of words in a subject heading, or notations in a classification number.

Most of the subjects treated even in modern macro documents are of compound nature. This means that the name of a subject can no longer be represented by a single word or term. When a number of terms have to be used in representing the subject coextensively, the syntax is necessary to put the terms in a most helpful and known searchable order. In other words, we can say that syntax of an indexing language provides a pattern of the relationship which we recognize between the terms used in the system, i.e. between the terms in the index vocabulary or controlled vocabulary. This recognition is based on a careful subject analysis which is basic to the indexing language.

The order of terms according to the rules of syntax of an indexing language assumes greater importance of presenting the correct meaning of a subject heading. Apart from the order of terms prescribed by its rules of syntax, it becomes necessary, at times, to use relational symbols or indicator digits to bring out the correct relations between terms. In this connection, it is to be pointed out here that natural language provides auxiliaries like prepositions, conjunctions, etc. to bring out the correct meaning of the sentence. But in an indexing language, such auxiliaries are not available and hence, the correct meaning of a subject heading has to be expressed largely through the order of terms along with the relational symbols like role operators or indicator digits. The syntactical relationship is documented dependent relationship.

1.4.3 Semantics: As stated earlier, semantics refers to the systematic study of how meaning is structured, expressed and understood in the use of an indexing language. Various types of semantic relationships are evident in an indexing language. These relationships include equivalence relationships, hierarchical relationships, and associative relationships. Meaning of the term can be derived from its hierarchy. The semantic relationship is documented independent relationship. The syntactical rules of an indexing language are also used to resolve the meaning of the term in a subject heading (consisting of a string of terms) through the determination of context.

1.5 Attributes of an Indexing Language:

Indexing language is designed for a special purpose. It serves three purposes-representing subject content of documents, organizing a searchable file, and representing subject content of the queries of the users while searching the index file. A positive result in searching is achieved only when the content representation of the document by the indexer and that of the queries by the searcher match. This matching of the file is very much dependent on the organization of the index file in a predetermined order and the awareness of the users of it. Various attributes of an indexing language like vocabulary control, concept coordination, multiple access, syndetic devices, relation manifestation, and structural presentation play very important roles in the successful organization of the index file and subsequent matching of the index and queries of the users.

1.5.1 Vocabulary Control: The vocabulary of an indexing language is controlled for standardization ofterms-i.e. one concept should be denoted by only one term. This is done by controlling synonyms, near-synonyms and word forms, and by distinguishing among homographs.

1.5.2 Concept Coordination: The contents of most of the documents of present days cannot be represented by only one term. Because of the use of multiple terms and multiple relationships among terms to represent the content of the document it has become imperative to make available standard guidelines for the coordination of concepts dented by the terms. One of the essential components of an indexing language, i.e. syntax governs the sequence of words in a subject proposition. The natural language provides auxiliaries like prepositions, conjunctions, etc. among the substantive words to bring out the correct meaning of the sentence. But such auxiliaries are not available in an indexing language. The correct meaning of a subject is expressed mainly through the order of terms according to the rules of syntax, sometimes along with the relational symbols like role operators or indicator digits of an indexing language. These rules of syntax will vary from one indexing language to another. Coordination of concepts is carried out by the indexer at the time of indexing (i.e. at input stage) in pre-coordinate indexing and by the searcher at the time searching (i.e. at output stage) in post-coordinate indexing.

1.5.3 Multiple Access: The syntactical rules of the given indexing language help us to determine the order of significance in a linear representation of the subject of a document. It provides only single access in the searchable index file. Rigidity ofthe significance order may not meet the approaches ofall the users ofthe index file. In order to satisfy the approaches of all the users, indexing languages have introduced the mechanism for multiple index entries by rotating or cycling ofthe component terms representing the subject ofthe document. The rotation is carried out in such a way that each of the component terms gets access position as a lead term in the index entries. Each lead term is followed by other component terms in order to maintain the context and correct meaning ofthe subject proposition. The provision of a mechanism for multiple index entries by rotating or cycling of the component terms is a special feature of indexing language. However, it has been observed that even the acceptance of this multiple access mechanism covers only a fraction ofthe possible number ofthe total permutations, which in turn, results into the failure ofthe index file to provide a particular pattern of combinations which the user is looking for. Consequently, a large portion of probable approach points is left uncovered.

1.5.4 Syndetic Device: Syndetic device is an organizational framework in which related subjects are linked together in an underlying classificatory structure.

• Cross References: Related and equivalent subjects are linked to each other by a network of references via connecting terms such as See also and See / USE / OF respectively.

• Inversion of Headings: The strict adherence to the natural language order of terms in a subject heading would often lead to headings in which the first word is not the most significant. In such a situation natural language order of terms is inverted in a subject heading.

1.5.5 Relation Manifestations: The range of an indexing language is not simply a matter ofvocabulary. Provision for rules of syntax is to be made for the expression of relationships between the terms comprising the vocabulary. These relationships, as conceived by a team lead by J. C. Gardin during the SYNTOL (Syntagmatic Organization Language) programme for the development of a meta-language as a common ground between different information retrieval systems in the 1960s, are of two kinds: Paradigmatic and Syntagmatic relations.

• Paradigmatic relationship: Paradigmatic relations also called semantic or generic relations, usually find expression in the organization of the vocabulary itself. Thus in classification schedules, it is through the successive degrees of subordination that such relations are made explicit. In the readymade lists of subject headings or thesauri, the relationships are expressed through the manifestation of hierarchical relationships through the relationship indicators BT and NT. A paradigmatic relationship is documented independent relationships because this relationship is established without any reference to a document.

• Syntagmatic relationship: In addition to the expression of paradigmatic relationships, rules and facilities are provided for the coordination of terms from the vocabulary in order to express more complex meanings. Syntagmatic relationships, also called syntactical relationships are achieved by means of syntactical rules of the given indexing language. Two major syntactic devices very much common in indexing languages are the use of a word or teen order and relators or linking mechanisms. Kaiser’s Thing-Process, Ranganathan’s PMEST, Coates’s Thing-Material-Action and Relationship Table are some of the examples of formulae for determining term order and thus standardizing and controlling syntax. A major principle underlying term order is significance. Component order of a compound subject heading can be expressed in more than one way. The question of order can be answered by reference to significance-that is, by an analysis of the relative importance to the searcher of the concepts concerned. The result of such analysis is to bring into prominence key concepts. Syntagmatic relationships are document dependent relationships because these relationships are established with reference to the concepts associated with the content of a given document. As for example, a document entitled “Social aspects of literacy among rural women in India” will call for the combination of concepts from Sociology, Education and Geography. They represent the syntagmatic relation in the context of this specific document and an indexing language must have the mechanism to represent these concepts in a subject heading.

1.5.6 Structural Presentation: The basic objective of an indexing language is to provide a subject approach to the contents of documents to the users. It is generally agreed that a user-oriented approach may not be confined to the specific subject only. A user who starts a search for looking a specific subject ‘Conservation of tiger’ may avoid to notice a document on `Conservation of wildlife’ which may contain equally valuable information on `Conservation of tiger’ because he believes that the more specific subject `conservation of tiger’ will not be covered in the document ‘Conservation of wildlife’. Similarly, ‘Conservation of the tiger’ may contain equally valuable information on the `Conservation of wildlife’. Thus it appears that the broader, as well as narrower subjects, may help the user even with a specific search. This situation calls for structuring the indexing language in such a systematic manner that the semantic network of concepts and the relationship between concepts are displayed in it. All the indexing languages display such relationships in one way or other and thus all of them are structured. A classification scheme displays such relationship by notation whereas a verbal indexing language like a readymade list of subject headings and thesaurus display such relationship by the relationship indicators BT and NT.

1.6 Types of Indexing Languages:

1.6.1 Classification Schemes: It has already been mentioned in sub-section 1.4. 1 of this Unit that controlled vocabulary of an IL is of two types: verbal and coded vocabularies. A classification scheme employs coded vocabulary in the form of its notation. Libraries have long been using notational schemes of library classification to organize information resources on the shelves, and to provide means for locating information resources in the bibliographical tools-such as catalogues, bibliographies, abstracting and indexing documents. We know that classification is a mental process of a grouping of entities in order of their degree of likeness and separating entities according to their degree of unlikeness. All class designations of subjects are the names of subjects irrespective of the fact that whether they are in terms of class numbers or verbal specifications. The assignment of class designation in the notational plane is called class number, and the preparation of tools to be used for this purpose is classification schemes. Modern classification schemes such as Dewey Decimal Classification (DDC), Universal Decimal Classification (UDC), Library of Congress (LC) Classification, Colon Classification (CC), etc. were devised several decades ago or more. Although over the years they have been modified and improved, their main objectives remain unchanged. The Web version of DDC, 22nd edition, i.e. WebDewey includes all updates since its publication in 2003 plus supplemental data. The most important feature of WebDewey is that it gives additional points of access by combining DDC numbers and Library of Congress Subject Headings (LCSH). It also gives access to many pre-built numbers, especially in the Literature class which are not available in the print version Although classification schemes were mainly designed for organizing bibliographic items, many researchers have also used classification schemes to organize information resources on the Web.

1.6.2 Subject Heading Lists: Subject heading has been defined as a word or group of words (phrase) indicating a subject under which all materials dealing with the same theme is entered in a catalogue or bibliography, or is arranged in a file. A vocabulary control device depends on a master list of words/terms that can be assigned to documents. Such a master list of terms is referred to as a list of subject headings. A subject heading list is alphabetical list terms and phrases, with appropriate cross-references and notes, that can be used as a source of subject headings in order to represent the subject content of a document. A list of other semantically related terms or phrases is displayed under each term or phrase. A printed list of subject headings incorporates the thought and experience of many librarians of various types of libraries.

General Principles

The rules for subject headings in a dictionary catalogue were formulated by Charles Ammi Cutter in 1876 in his ‘Rules fora Dictionary Catalog’. The impact of Cutter’s principles on construction and maintenance of subject headings is still discernible today. Both, the LCSH and the SLSH adopted the Cutter’s principles in assigning subject headings for a document. The general principles that guide the indexers in the choice and rendering of subject headings from the standard lists of subject headings are discussed in the ‘following sub-sections.

• Specificity: The principles of specific and direct entry requires that a document be assigned directly under the most specific subject heading that accurately and precisely represents its subject content. If a document is about penguins, it should be entered directly under the most specific heading ‘Penguins’, not under the heading ‘Birds’ or even under ‘Water Birds’ which includes Penguins’. If the name of a specific subject is not available, a broader heading is the most specific authorized heading in the hierarchy that covers the content of the work. In many cases, several headings may be assigned in order to cover different aspects of a subject.

• Common Usage: This principle states that the word(s) used to express a subject must represent common usage. There may be problems in the selection of subject headings when the same concept is expressed by two or more terms. According to this principle, subject headings are to be chosen to keep in mind the needs of the users who are likely to use the index file. If a choice between spellings is made for dialectal reasons (for example, between American and British English), the most widely accepted spelling ofwords, based on users warrant, should be adopted. If a popular and a scientific name refer to the same concept, the form most likely to be sought by the users shouldbe chosen. After deciding on the name ofheading, a cross-reference should be made from the non-preferred to the preferred form.

• Uniformity: The principle ofuniform heading is adopted in order to bring consistency in the use of subject headings. A subject heading list has to be very precise and exact in order to ensure that each concept is represented by a single preferred term. Both synonyms and homographs are to be controlled. It should list the other synonyms and variants as non-preferred terms with USE references to the preferred term. One uniform term must be selected from several synonyms and other variants, and this term must be applied consistently to all documents on the topic. If several meanings are attached to one term (e.g, Crane as a bird / Crane as lifting equipment) that term must be qualified so that it will be clear to the users for which the meaning is intended.

• Consistent and Current Terminology: The principle states that the term(s) chosen as subject headings should be both consistent and current as has already been said regarding the justifications for uniform headings. By principle, common usage prevails when there is a problem of choices among synonymous terms and other variants. Changes in usage also present many practical difficulties. A term chosen on the basis of common usage may become obsolete with the passage of time. Subsequently, a list of subject headings may incorporate current terminology as long as entries pose a problem because of the large number of entries listed under the existing subject headings. In such a situation a subject authority file is to be maintained. Once a heading is changed, every record that was linked to the old heading can be linked to the new heading and this decision is recorded in the subject authority file.

• Form Heading: Form headings refer to those words or phrases which represent the literary or artistic form (e.g., Essays, Poetry, Fiction, etc.). These are the words or phrases that follow a subject heading and indicated by a dash. These words or phrases are used to make the subject more specific. Assignment of form headings to individual works as well as to collections and materials about the form enables the libraries to provide access to these kinds of materials to the users. Apart from literary works themselves, there are also many kinds of library materials about literary forms that require subject headings. For a document on how to write an essay, the heading “Essay” represents a subject. A topical subject heading and a form headings can be distinguished by using the singular form for the topical subject heading and plural for the form heading (e.g., Short story, Short stories). In addition to the literary form headings, there are some other form headings that are determined by the general format and purpose of the documents, such as Almanacs, Encyclopaedias, Dictionaries, and Gazetteers.

• Cross Reference: Cross-references direct the user from term/phrase not used as headings to the term/ phrase that is used, and from broader and related topics to the one chosen to represent a given subject. Three types of cross-references are used in the subject headings structure. These are discussed below:

• See (or USE) references: These references guide users from terms that are not used as headings to the authorized headings for the subject in question. ‘See’ or ‘USE’ references ensure that inspite of different names for (or different forms of the name of) a given subject a user shall still be able to locate materials on it.

• See also (including BT, NT, and RT) references: These references guide users to the headings that are related either hierarchically or associatively and are used as entries in the index file. By connecting related headings, the ‘see also’ (RT, for the related term) references draw the user’s attention to material related to his interest. By linking hierarchically related headings, ‘see also’ (BT, for broader term; NT, for narrower term) references directs the users to search specific deviations or aspects of his subject of interest.

• General references: General references direct the users to a group or category of headings instead of individual headings. It is sometimes called a ‘blanket reference’. The provision of general references in the standard list of subject headings obviates the need to make long lists of specific references and thus ensure economy of space.

Subject Authority File:

A subject authority file consisting of subject authority records ensures uniformity and consistency in subject heading terminology and cross-references. The process of creating subject authority records and maintaining subject authority file is called subject authority control. Subject authority control is exercised at two levels: central and local. At the central level, a central agency (e.g. Library ofCongress) maintains the subject authority file (in card or machine-readable form) or subject heading list (in print form) and makes changes to existing headings and cross-references as well as adding new ones. At the local level, a library creates local subject authority records only for headings not yet appeared as established headings in a subject heading list along with needed maintenance information. Thus, the subject authority control at the local level includes correcting erroneous headings and cross-references, updating obsolete headings, and adding or revising cross-references necessitated by new headings. ALA Glossary has defined the subject authority file as “A set of records indicating the authorized forms of terms used as subject headings in a particular set of bibliographic records; the references made to and from the authorized forms; and the information used, and its sources, in the establishment of the headings and the determination of the references to be made”. (ALAGlossary of library and Information Science. Chicago: American Library Association, 1983, p.220). This definition suggests that a subject authority record should contain the following items of information: (a) established subject heading; (b) scope notes, if any, (c) cross-references made from it to other headings; and (d) sources or authorities on which the decision on the form of heading was based. A subject authority record is made when subject headings are established and used for the first time.

The functions of a subject authority file are discussed below:

• Indexing: The subject authority file serves as the source of indexing vocabulary and as the means of verifying or validating headings assigned to individual indexing records. It helps to ensure that: a) the same heading is assigned to all works on the same subject, b) each heading represents only that particular subject, and c) all headings assigned to indexing records conform to the established forms.

• Maintenance: Necessary adjustments to indexing records are needed to be added from time to time as a result of changes in the indexing vocabulary. When existing subject headings are revised or new headings are added; cross-references are often affected and should be adjusted. The subject authority file reflects the most current status ofheadings and cross-references and thus, serves as the source for verification and validation of subject headings as to the indexing records. It is also useful when a library converts its manual form to the online mode and wishes to have previously existing records reflect current practice.

• Retrieval: Subject authority file helps the users in two ways: (1) subject headings displayed in the subject authority file show the user the terminology and form of subject access points in the index file; and (2) the cross-references guide the users to related headings when user’s input terms fail to retrieve useful records.

1.6.3 Thesaurus:

The term thesaurus has been derived from Greek and Latin words which mean ‘a treasury’ and it has been used for several centuries to mean a lexicon or treasury of words. Modern usage may be said to date from 1852, when the first edition of Thesaurus of English Words and Phrases was published by Peter Mark Roget. A thesaurus (plural: thesauri) with which we are concerned is meant for information retrieval and is used as a valuable vocabulary control device for indexing and searching in a specific subject area. The journey of the thesaurus from the linguistic domain to information retrieval is evident from the following timeline:

1736: The term ‘thesaurus’ first appeared in OED. It came from the Greek word `thesaurus’ which means ‘Treasury or storehouse of knowledge’.

1852: Appeared Peter Mark Roget’s Thesaurus. It was a linguistic thesaurus showing the word(s) by which the given idea most fitly and aptly expressed, i.e. Classification of ideas.

1957: Dorking Conference. Miss Helen Brownson first brought the idea of `thesaurus’ in terms if IR through a paper presented there.

1959: H P Luhn gave the idea of the application of thesaurus in IR.

1969: The first thesaurus used in IR system was developed by Do Pont in USA.

Definition:

There are many different definitions of thesauri, varying from quite modest definitions that focus on the relations between words without stating which kinds of relations that are meant, to such definitions that state more exactly which relations that are concerned. The definition of Thesaurus provided by World Science Information System of UNESCO (known as UNISIST) on the basis of its function and structure seems to be most comprehensive to understand the meaning and scope of the thesaurus:

“In terms of function, a thesaurus is a terminological control device used in translating from the natural language of documents, indexers or users into a more constrained `system language’ (documentation language, information language)”. “In terms of structure, a thesaurus is a controlled and dynamic vocabulary of semantically and generically related terms which covers a specific domain of knowledge”.

Purpose:

A thesaurus is a semantic network of terms. Its purposes are

a) To provide a map of a given field of knowledge, how concepts or ideas about concepts are related to one another, which helps an indexer or a searcher to understand the structure of the field.

b) To provide a standard vocabulary for a given subject field which will ensure that indexers are consistent when they are making index entries to information storage and retrieval system.

c) To provide a system of references between terms which will ensure that only one term from a set of synonyms is used for indexing one concept, and that indexers and searchers are told which of the set is the one chosen; and to provide guide to terms which are related to any index term in other ways, either by classification structure or otherwise in the literature.

d) To provide a guide for users of the system so that they choose the correct term for a subject search; this stresses the importance of cross-references. If an indexer uses more than one synonym in the same index—for example, “abroad”, “foreign” and “overseas”—then documents are liable to be indexed haphazardly under all of these; a searcher who chooses one and finds documents indexed there will assume that he has found the correct term and will stop his search without knowing that there are other useful documents indexed under the other synonyms.

e) To locate a new concept in a scheme of relationships with existing concepts in a way which makes sense to users of the system.

f) To provide classified hierarchies so that a search can be broadened and narrowed systematically if the first choice of search term produces either too few or too many references to the materials in the store.

g) A desirable purpose, but one which it would be premature to say is being achieved, is to provide a means by which the use of terms in a given subject field may be standardized.

Basic Thesaural Relationships:

Basic thesaural relationships or the semantic relationships in a thesaurus refer to two types of relationships: (1) Hierarchical Relationship; (2) and Non-Hierarchical Relationship. The following figure shows the different types ofrelationships displayed in a thesaurus.

Thesural Relationship

1. Hierarchical Relationship Hierarchical relationships are based on degrees or levels of superordination and subordination, where the superordinate term represents a class or a whole, and subordinate terms refer to its members or parts. This relationship is of four types: Genus-Species (Generic) relationship, Whole-Part relationship, Instance relationship, and Poly-hierarchical relationship.

Reciprocity in the hierarchical relationships is expressed by the relationship indicators: BT (Broader Term), ie. a label for the superordinate (parent) term; and NT (Narrower Term), i.e. a label for the subordinate (child) term.

• Genus-Species (Generic) Relationship links genus and species and represents the basis of the scientific taxonomic system. As for example

Examples of Hierarchical relationship indicator (BT and NT),

Mammals

BT Vertebrates

Vertebrates

NT Mammals

• Whole-Part Relationship covers situations in which one concept is inherently included in another, regardless of context, so that the terms can be organized into logical hierarchies, with the whole treated as a broader term. As for example:

Central nervous system

NT Spinal cord

Spinal cord

BT Central nervous system

• Instance Relationship identifies the link between a general category of things or events, expressed by a common noun, and an individual instance of that category, often a proper name. As for examples:

Mountain regions

NT Himalayas

Himalayas

BT Mountain regions

• Polyhierarchical Relationship occurs when some concepts belong, on logical grounds, to more than one category. In the following example, the term pianos are assigned to subordinate positions on the basis of its generic relationship to two broader terms-in other words, pianos would be an NT to both stringed instruments and wind instruments.

THR PR

2. Non-Hierarchical Relationship: Relationship between terms other than hierarchical is called Non-hierarchical relationship, which may further be grouped as Equivalence (or Preferential) Relationship and Associative Relationship.

a. Equivalence (or Preferential) Relationship refers to the relationship between preferred and non-preferred terms in which each term is regarded as referring to the same concept. When the same concept can be expressed by two or more terms, one of these is selected as the preferred term. A cross-reference to the preferred term should be made from any “equivalent” entry term. Reciprocity in the equivalence relationships is expressed by the relationship indicators: USE, which leads from a non-preferred (entry) term to the preferred term, and UF or USED FOR, which leads from the preferred entry term to the non-preferred term(s).

Four basic types of equivalence relationship are evident: (a) Synonyms; (b) Lexical variants; (c) near-synonyms; and (d) Generic posting.

• Synonyms: Synonymy occurs when a concept can be represented by multiple terms having the same or similar meanings. A thesaurus compensates for the problems caused by synonymy by ensuring that each concept is represented by a single preferred term. It lists other synonyms and variants as non-preferred terms with USE references to the preferred term.

• Lexical variants: Lexical variants differ from synonyms in that synonyms are different terms for the same concept, while lexical variants are different word forms for the same expression. These forms may derive from spelling or grammatical variation or from abbreviated formats. The following examples indicate the preferred grammatical forms of terms.

b. Nouns and Noun Phrases: The grammatical form of a term should be a noun or noun phrase. Nouns used as a team are divided into two categories: Count nouns and Noncount (mass) nouns. Count Nouns are names of objects or concepts that are subject to the question “How many?” but not “How much?” These should normally be expressed as plurals. For examples: books, penguins, singers, vertebrates, windows, etc. Mass (noncount) nouns are names of materials or substances that are subject to the question “How much?” but not “How many?” These should be expressed in the singular. Some examples of Singular mass nouns are: milk, water, etc.

Where the singular and plural forms of a term represent different concepts, separate terms for each are entered in the thesaurus. The distinction should be indicated by a qualifier. Some examples are: Bridge (game) / Bridges (structures); Damage (injury) / Damages (law); Wood (material) / Woods (forested areas). Noun phrases are compound terms that are established as preferred terms if they represent a single concept.

Noun phrases occur in two forms: (a) Adjectival noun phrases like Red rose, Marine birds, Cold fusion, Historical drama, etc.; and (b) Prepositional noun phrases like Plaster of Paris, Prisoners of war, Hospitals for children, etc.

c. Adjectives: Adjectives and adjectival phrases used alone are established as terms in a thesaurus under certain special circumstances. Single adjectives are used in a “nominal” way; that is, the noun is obvious from the context or the adjective is used to describe an attribute of the content object other than topics, such as colour or size. For examples: small, medium, large, blue, green, red, yellow, etc.

As an alternative to the creation of multiple compound terms, adjectives may appear as separate terms when designed to be pre-coordinated in indexing or post coordinated in searching. They should generally not be assigned as indexing terms in isolation. Given the possibility of false coordination in searching (e.g., the linking of an adjective with the wrong noun), adjectival terms should be used sparingly. Some examples of the use of adjectives as terms in pre- and post coordination are: Airborne / Airborne troops; Offshore / Offshore drilling; Mobile / Mobile homes, etc.

Certain noun phrases may be used to modify other nouns, e.g., high frequency can modify the noun waves.

Adjectives may be used alone in general cross-references to direct the user to or from a group of terms beginning with a corresponding noun, e.g., “cardiac . . . see also the terms beginning with heart.”An example of a reference in the opposite direction (noun to adjective) is: “France see also the terms beginning with French (French art, French language, French literature, French wines).”

d. Adverbs: Adverbs such as “very” or “highly” should not be used alone as terms. A phrase beginning with such an adverb may be accepted as a term only when it has acquired a specialized meaning within a domain. Some examples of adverbial phrases are: very high frequency, very large scale integration, very low-density lipoproteins, etc.

e. Abbreviations: Abbreviations are selected as preferred terms only when they have become so well established that the full form of the term or proper name is rarely used, e.g. AIDS rather than Acquired Immune Deficiency Syndrome; Lasers rather than Light Amplification by Stimulated Emission of Radiation; UNESCO rather than United Nations Educational, Scientific, and Cultural Organization; etc. The full form of terms are selected as preferred terms when the abbreviated form is not widely used and generally understood, e.g. Automated teller machine rather than ATM; Prisoners of war rather than POW; etc. Cross-references should be made from the non-preferred forms to the preferred form.

e. Popular and Scientific Names: If a popular and a scientific name refer to the same concept, the form most likely to be sought by the users of the thesaurus should be chosen as the preferred term. For example, Penguins is chosen as the preferred term in a nontechnical thesaurus with a cross-reference from the scientific equivalent, Sphenisciformes. However, Sphenisciformes is selected as the preferred term in a zoological thesaurus with a cross-reference from the popular name, Penguins.

• Near-synonyms: Near-synonyms are terms whose meanings are generally regarded as different, but which are treated as equivalents for the purposes of a controlled vocabulary. The extent to which terms are treated as near-synonyms depends in large measure upon the domain covered by the controlled vocabulary and its size. Near-synonyms may include antonyms or represent points on a continuum. As for examples, Seawater/saltwater [variant terms]; Smoothness/ roughness [antonyms].

• Generic Posting: It is a technique in which the name of a class and the names of its members are treated as equivalents, with the broader class name functioning as the preferred term. As for examples, Waxes OF Plant waxes; Plant waxes USE Waxes.

3. Associative Relationships: This relationship covers associations between terms that are neither equivalent nor hierarchical, yet the terms are semantically or conceptually associated to such an extent that the link between them is made explicit in the thesaurus, on the grounds that it may suggest additional terms for use in indexing or retrieval. The associative relationship used in thesauri is indicated by the abbreviation RT (Related Term). As a general guideline, whenever one term is used, the other should always be implied within the common frames of reference shared by the users of the thesaurus. Either of the following types of terms can be linked by the associative relationship:

a) Those belonging to the same category, and

b) Those belonging to different categories.

• Relationships between terms belonging to the same category: Relationships are needed for terms belonging to the same category in various special situations, primarily to guide the user in locating the desired term. Each of the terms belonging to the same category has its own particular meaning, but the boundary between them is often confused with common usage, to the extent that a user checking one of them in the index should be informed of documents indicated by others. As for examples:

RT

• Relationships between terms belonging to the different categories: It is possible to establish many grounds for associating terms belonging to different categories. Related Term references are often made between etymologically related terms, i.e., terms that contain the same root, but which do not represent the same kind of thing. The following are some representative examples of typical relational situations.

a) Process / Agent

Temperature control

RT Thermostats

Thermostats

RT Temperature control

b) Process / Counteragent

Inflammation

RT Anti-inflammatory agents

Anti-inflammatory agents RT

Inflammation

c) Action / Property

Polling

RT Public opinion

Public opinion

RT Polling

d) Action / Product:

Weaving

RT Cloth

Cloth

RT Weaving

e) Action / Target:

Harvesting

RT Crops

Crops

RT Harvesting

f) Cause / Effect:

Cloud

RT Rain

Rain RT Cloud

g) Concept or Object / Property:

Poisons

RT Toxicity

Toxicity

RT Poisons

h) Concept or Object / Origins:

Americans

RT United States

United States

RT Americans

i) Concept or Object / Units or Mechanisms of Measurement Associative Relationships:

Electric current

RT Amperes

Amperes

RT Electric current

j) Raw Material / Product:

Wheat

RT Flour

Flour

RT Wheat

k) Discipline or Field of Study / Object or Phenomenon Study

Neurology

RT Nervous system

Nervous system

RT Neurology

I) Discipline or Field of Study / Practitioner

Mathematics

RT Mathematicians

Mathematicians

RT Mathematics

m) Antonyms:

Height

RT Depth

Depth

RT Height

n) Phrases Containing Syncategorematic Nouns and their Apparent Foci:

Ships

RT Model ships

Model ships

RT Ships

o) Coordinate Ideas:

Hinduism

RT Buddhism

Christianity

Islam

1.6.4 Thesaurofacet:

Thesaurofacet: a thesaurus and faceted classification for engineering and related topics were developed from the English Electric Company’s Faceted classification for Engineering, the first edition of which was published in 1958. Thesaurofacet came about when the third edition ofFaceted Classification for Engineering, published in 1961, was up for revision. This system was used to organize documents belonging to the libraries of the corporation of English Electric. However, with the growing trends in science and technology and the need for using computer techniques and post-coordinate indexing, a decision was taken in 1967 to commission Jean Aitchison, a member of Classification Research Group, to review the indexing needs of the company and the result of that review was the compilation of a new and improved 4th edition of the faceted classification system called Thesaurofacet, published in 1970. In the 4th edition, the alphabetized index to the classification scheme was replaced with a thesaurus.

Thesaurofacet covers the whole field of science and technology but subjects are treated in varying depth and only engineering and allied fields are covered exhaustively. Full subject coverage includes engineering and fields directly related to engineering like computers, measurement and testing, physics and management. Relevant management-related concepts were borrowed from the “Classification of Business Studies” developed by the London Graduate School of Business Studies.

Thesaurofacet is considered as a multi-purpose retrieval language tool because it has classification schedules and a faceted thesaurus. The classification consists of main classes and facets and has a notation system that consists of letters in upper case and numbers from 2 to 9. The faceted thesaurus is the key to the uniqueness of the tool because it offers the user options to identify topics within the system. Because the two are linked, each term in the system appears twice, once in the schedule, and once in the thesaurus, with a notation that links the two parts together. However, the information given about the term in the thesaurus is not the same information given about that term in the schedule. The two parts of the system are complementary and should be used conjunctively and not separately. Finally, Thesaurofacet can be used for the arrangement of books on the shelves and arrangement of entries in the subject catalogues. Further, the index terms are intended to be used for indexing and searching.

If we are asked for information on Documentation, we turn to the thesaurus and find:

Documentation use

Information Science

At Information Science we find

Information Science ZR

UF             Documentation

 Librarianship

Library science

RT              Communication (Sociology)

 Data processing

 Information theory

 Librarians

We also see the notation to the right and we are told to look for ZR in the classification schedules. At ZR in the classification schedule we find that Thesaurofacet divides Information Science using subjects and facets:

Main Class

Subject Field(s)

Fundamental Facets

Sub-Facets

Hierarchies and Arrays

ZR           Information science

ZR2         LIBRARIES

  By type:

ZR3         National libraries

ZR4          Public libraries

ZR5           Municipal libraries

ZR6           County libraries

ZRB           Educational Libraries

 By management:

ZRP           Library management

Here Information Science is called Main Class. Main class can be divided into subject fields, in this case LIBRARIES, and INFORMATION RETRIEVAL. Subject fields are broken down into fundamental facets, and are printed by bold typeface (e.g. by type, by management, by equipment, etc.). This is where the schedules start to use facet analysis. If we take a look at one of the fundamental facets as an example, Information Retrieval is broken down ‘By type of language’ into the facet Index languages. The facet is broken down into sub-facet Natural Language and Controlled index languages. Terms are then listed in hierarchies and arrays. The schedules are typically used for a broad subject search and browsing through the possible topics that exist within a class. So a user may start with the main class and navigate through the facets and arrays to locate an ideal or more specific topic.

The classification schedules also allow for the combination or synthesis of topics and notation. For example, there is a note in the main class ZL SOCIOLOGY that tells us that this topic can be combined with ZM PSYCHOLOGY to create the synthesized subject field called ZL/ZM SOCIAL PSYCHOLOGY. Synthesis can indeed be used wherever required, there being no preferred combination order unless there is an instruction in the schedules.

Thus it appears that two types or styles of”faceted classifications integrated with thesauri”: the First type uses subject fields as main subdivisions, and facet analysis is used to determine the relationships and the second type of faceted thesauri are those in which concepts are first divided by facets.

1.6.5 Classaurus:

The vocabulary control device used for POPSI has been designated as Classaurus. It is a category-based (faceted) systematic scheme of hierarchical classification in verbal plane incorporating all the essential features of a conventional IR thesaurus—i.e. control of synonyms, quasi-synonyms, etc. RTs are not shown in the classaurus. A scheme of this type, for its application, calls for a complementary alphabetical index giving the address of each term occurring in the systematic part. The purpose for which a classaurus is used does not necessarily warrant any principle-based arrangement of the terms in the array. Even if the terms in each array are arranged alphabetically the purpose is not going to be disturbed. This feature of the classaurus makes it largely amenable to computerization.

The structure and style of presentation of a classaurus can be systematically presented as follows:

A) Systematic Part

A1 Common Modifiers

A1.1 Form

A1.2 Time

A1.3 Environment

A1 .4 Place

A2 Inter-subject Relation Modifiers

A3 Discipline and Sub-disciplines

A4 Entities

A4.1 Part

A4.2 Type

A5 Properties

A6 Actions

In respect of the systematic part, the following points are to be noted:

a) Each term in the systematic part under each category is enumerated by displaying its COS SCO relationship in a hierarchy of arrays.

b) For each term in the systematic part, the following follows vertically: (a) Definition/ Scope note (if required), and synonyms, quasi-synonyms, and antonyms.

c) No RTs (i.e. non-hierarchically related terms) are enumerated for any term in the classaurus because of its category-based structure. It is assumed that RTs should not be dictated by the designer of the classaurus, rather it should be dictated by the document itself. Any term may be related to any other terms depending upon the nature of the thought-content of the document. Hence, RTs should not be determined beforehand.

d) Each array in the classaurus is open.

e) Each term in the systematic part is assigned a unique address which can be used as a class number.

B) Alphabetical Index Part: This part contains each and every term including synonyms, quasi-synonyms, and antonyms occurring in the systematic part along with its address.


This Article Collected from:

  • Sarkhel, J. (2017). Indexing languages. Retrieved from http://egyankosh .ac. in/handle /123456789/35770

Tags

Md. Ashikuzzaman

Work at North South University Library, Bangladesh.

Leave a Reply

Your email address will not be published. Required fields are marked *

Close