Thesaurus : An overview

0
6

1.0 Definition of Thesaurus:

Derivationally the word thesaurus originated from the Greek language and symbolized the concept of treasury or storehouse of knowledge. The Webster’s dictionary has defined thesaurus as, ‘a useful literary collection or selection especially a book of synonyms and antonyms.’ (Webster, 1987). The Unesco has defined the thesaurus as, ‘a vocabulary of controlled indexing language formally organized so that the a priori relationships between concepts (e.g., ‘broader’ and `narrower’) are made explicit’ (Unesco, 1981a). This definition emphasizes a priori relationships, which are document independent. The British Standards 5723 defines a thesaurus as, ‘a means of displaying the terms in a controlled indexing language, together with indications of their apparent relationships’ (British, 1979). Ajoy Kumar Roy provides a definition, giving a list of those whose natural language expressions are controlled. He defines thesaurus as ‘an IR thesaurus is chiefly a terminological control device for transformation of natural language (NL) expressions, used by authors, referees, publishers, indexers and users who form the various links in the information transfer chain into a more constrained system of vocabulary’ (Roy, 1981). Thesaurus can also be defined functionally and structurally. Functionally a thesaurus is a terminological control device used in translating the natural language into the systems language. Structurally it is a tool consisting of a controlled set of terms linked by hierarchical or associative relations, which mark any needed equivalence relations (synonyms) with terms from natural language and concentrate on a particular area of knowledge (Guinchat and Menou, 1983).Thesaurus

While emphasizing the standardization function Rowley (1992) defines thesaurus as, ‘a compilation of words and phrases showing synonyms and hierarchical and other relationships and dependencies, the function of which is to provide a standardized vocabulary for information storage and retrieval systems’.

Emphasizing the systems which make use of thesaurus Aitchison, Gilchrist, and Bawden (2000) defined thesaurus as, ‘a vocabulary of controlled indexing language, formally organized so that a priori relationships between concepts are made explicit, to be used in information retrieval systems, ranging from the card catalogue to the Internet’. The last word in this definition indicates the potential usefulness of thesaurus in Internet information retrieval.

Davis and Rush (1979) explain the concept of vocabulary control as: `Indexing may be thought of as a process of labeling items for future reference. Considerable order can be introduced in this process by standardizing the terms that are to be used as labels. This standardization is known as vocabulary control, the systematic selection of the preferred term.

In any information storage and retrieval system (ISRS), indexing plays an important role. An index describes the subject of documents with the help of descriptors arranged in systematic order. The indexer and the user are the two main parties who have to use precise terms in indexing and searching respectively. Use of precise terms is essential for having optimum recall and precision. The indexer, as well as the user in the process of information storage and retrieval (ISR), always come across a variety of terms (vocabulary) representing the same concept. This vocabulary must be controlled. Vocabulary control is warranted for two reasons, i. e.

1. To promote consistent representation of subject matter by indexers, thereby avoiding the dispersion of related materials, through control (merging) of synonymous and nearly synonymous expressions and by distinguishing among homographs.

2. To facilitate the conduct of a comprehensive search by linking together terms whose meanings are related paradigmatically or syntagmatically (Lancaster, 1985).

The thesaurus is a vocabulary control device. Inversely, the vocabulary contained in a thesaurus (as well as in other vocabulary control devices such as lists ‘ of subject headings, classification schemes, etc.) is called controlled vocabulary.

1.1 Language thesaurus and information retrieval (IR) thesaurus:

The various thesauri available can be categorized into two broad categories, i.e. the language thesaurus and IR thesaurus. A language thesaurus is primarily a dictionary of synonyms, though structurally it may differ, to some extent, from ordinary word dictionary. Roget’s thesaurus (Roget, 1987) is a classic example of language thesaurus, which, according to Vickery (1960a) is a tool for writing in English. The IR thesaurus collects terms of subjects and organizes in such a manner so as to bring out the relations between concepts. Thinking on the same line Schultz, quoted by Lancaster, expresses that the purpose of Roget’s thesaurus is to give an author a choice of alternative words to express one and the same concept (Lancaster 1972). The IR thesaurus tends to be more prescriptive. According to Wall (1962) however, the purpose of both the above categories of thesauri is same.

1.2 Natural versus controlled language:

Another name for controlled vocabulary is controlled languages. The . opposite of controlled language is natural language, which is synonymous with ordinary discourse, also called as free text. The search based on natural language is called free text search. Internet search is mostly free text search. Both of these indexing languages have their own advantages and disadvantages (Fugmann 1987). The debate over the pros and cons of natural and controlled indexing language is going on for a long period. This debate is divided into four eras by Rowley (1994). Aitchison and Gilchrist (1987) provide a comparative account of natural language and controlled language. Keen and Digger (1972), Bhattacharyya (1974), Cleverdon (1966), Knapp (1982), Perez (1982), Dubois (1987), Rowley (1990), Fidel (1991, 1992), Milstead (1995), Knapp, Cohen and Juedes, (1998) wrote in favour of natural language.

Markey (1980) mentioned that free text gave higher recall and controlled terms gave higher precision. Blair and Maron (1985), Coco (1984) supported the controlled language. Milstead (1995) to believes that thesauri [controlled language] have prolonged life.

Free search is not absolutely uncontrolled, for one has to control at least synonyms and homonyms and free search are useful only for super new concepts, which are not yet formed part of terminological control devices. Because of these . two reasons Henzler (1978), Ulmann (1967), Holst (1966), Lancaster (1972a) preferred hybrid systems, i.e. the combination of natural and controlled languages.

1.3 History of IR thesaurus:

Long back in 1878, W.F. Poole though not under the name ‘thesaurus’ regarded that some device of vocabulary control will be potentially useful in IR. While describing the feature of Poole’s index, he insisted that references, additional to those in the index itself should be issued as a separate publication like Roget’s thesaurus (Poole, 1878). Poole’s this dream remained unaccomplished for the next 81 years until first fully operational thesaurus was developed and used at Du Pont in 1959.

In modern times, it was the group of researchers at Cambridge Language Research Unit in England, who began to discuss the applicability of thesaurus concept to IR. (Joyce and Needham, 1958).

Roberts (1984) summarizing the history of thesaurus mentions that in February 1947, C. N. Mooers made reference to the use of thesaurus in the context of mechanized information retrieval.

The credit for re-introducing the idea of thesaurus into the normally available professional literature goes to C. L. Bernier and E. J. Crane of Chemical Abstracts Services (Bernier and Crane, 1948).

Thinking on the idea of thesaurus Whelan (1958) recommended a lattice arrangement of index terms because it provided greater freedom of conceptual connections implied by such a form. Whelan used the word ‘head term’ what Mooers called ‘descriptors.’

It is believed that the word ‘thesaurus,’ in print media, in the context of IR was first used by H.P. Luhn in 1959 (Kumar, Raghavendra Rao, and Kamath, 1975). However, actual use of a thesaurus in ISRS was started by the engineering department of E. I. Du Pont Nemours and Company, the USA in 1959 (Lancaster, I972b).

From 1960 onward there were fast developments in the concept of a thesaurus. During this period the thesaurus was widely adopted for vocabulary control. Initially, it was used only in manual post-coordinate indexing systems, then in mechanized ISRS and now it is used even in pre-coordinate indexing systems and of course in computerized ISRS.

3.4 Thesaurus and other vocabulary control devices:

There are other devices like authority lists, lists of subject headings, classification schemes, etc. which, are used for vocabulary control. To resolve the chaos of terminology of thesaurus itself, some professionals opinioned that, all the above-listed vocabulary control devices should be called thesaurus (Korotkin, 1965), (Strater, 1969). Nevertheless, there exist some basic differences between these various vocabulary control devices.

1.4.1. Thesaurus and lists of subject headings:

Charles Ammi Cutter (Cutter, 1889) in his Rules for dictionary catalogue gave useful rules for formulating subject headings. The famous rule was that a subject is to be entered under its most specific heading. The ALA in 1895 issued its list of subject headings (American, 1895) with the general principle that ‘the heading shall be under which it is supposed that the majority of educated Americans will look, with cross references from other forms of heading’. This was followed by the first edition of Library of Congress list of subject heading (Library of Congress, 1911) which had made extensive use of sub headings.

List of subject headings is said to be one of the simplest forms of vocabulary control devices. Lancaster (1972c) has differentiated the thesaurus from a list of subject headings on the basis of conventions and application. According to him, thesaurus distinguished between Broader Terms (BT), Narrower Terms (NT) and Related Terms (RT). Whereas the lists of subject headings coupled all these together under see also references. The subject heading stands alone in an alphabetical subject catalogue, whereas the thesaural descriptor is used in conjunction with other descriptors even though in itself it may already be highly pre-coordinated.

1.4.2 Thesaurus and classification schemes:

Bates (1988) summarizes the fundamental differences between classification and the thesaurus as

a) Indexing terms are grounded on a linguistic basis, whereas classification schemes organize conceptual categories.

b) The aim of indexing dictionaries is the exposure of durable solid words and expressions for document description, while the aim of classification schemes is the creation of absolutely distinct and incompatible conceptual categories that are exhaustive in their aggregate.

c) The classification schemes’ structure, which is not evident in indexing dictionaries, is the part of the realization of such an accurate division of categories.

Lu (1990) however, in the process of making his own model of IR came to the opposite conclusion from those of Bates. According to Lu relationships that are merely lexical can be expressed with the structure of a classification “tree”, while merely semantic relations with a structure of indexing “network”.

Classification as Miller (2001) puts it, is an indexing and retrieval tool of a mono-hierarchical, mono-aspectual information system, in which every concept can be included in one or another category according to this category’s aspect.

Whereas a thesaurus is the same type of aid for multi-hierarchical, multi-aspectual system offering access via multiple aspects

Weinberg (1995) provides a more detailed comparison of thesaurus and classification.

In spite of various differences between thesaurus and classification schemes there is a great compatibility between the principles followed in the construction of thesaurus and classification schemes (Gopinath and Prasad, 1975).

1.4.3 Classification schemes and lists of subject headings: Classification brings together related terms by careful application of principles of division. In the lists of subject headings, related terms get separated due to alphabetization. Relationships between separated headings are indicated through a syndetic structure. Lists of subject headings provide pre-coordinated headings. Such headings pose two problems.

a) The problem of word order in compound headings; and

b) Lack of specificity.

Ranganathan (1945), Pettee (1946), Coates (1960) addressed these problems.

The pre-coordinated headings failed to provide access to every term in the compound heading. The sequence of terms in the pre-coordinated heading may not be suitable. Solutions to these problems were presented by post-coordinate systems.

1.5 Thesaurus and post-coordinate indexing systems:

The post-coordinate indexing systems are based on the principle of entering a subject of a document under isolated terms and combining these terms at the time of a search. For selecting an appropriate index/search term and to provide rules for coordination at a searching stage the thesaurus is used.

Thesaurus has been widely adopted for vocabulary control in post-coordinate indexing systems for which it was designed (Lancaster, 1972d). It is now used in pre-coordinate systems also (Aitchison, 1982).

1.6 Purposes of thesaurus:

According to Foskett (1980), the purposes of the thesaurus are

a) To provide a map of a given field of knowledge.

b) To provide a standard vocabulary for a given subject field.

c) To provide a system of references between terms.

d) To provide a guide for users of the system.

e) To locate new concepts in a scheme of relationships with existing concepts in a way which, makes sense to users of the system.

f) To provide classified hierarchies.

g) To provide means by which the use of terms in given subject field may be standardized.

The thesaurus can also be used for generation of keyword lists which form the basis for planning, priority setting, and other research management tasks (Nestsel, et.al., 1992). The thesaurus is also useful in computer-assisted indexing and abstracting. Thesaurus helps in defining terms.

Thesaurus can be used for three basic purposes (Rowley and Farrow, 2000a).

1. In indexing but not in searching.

2. In searching but not in indexing.

3. In both indexing and searching.

Day by day the number of online ISR system including the Internet is increasing. The online ISR systems are making more and more information available in the full-text format. The thesaurus plays a vital role in retrieving relevant information from these systems (Aitchison, Gilchirst and Bawden, 2000a).

1.6.1 Information technology-based ISRS and thesaurus:

Borko (1986) claims that the use of thesaurus in ISRS will enable efficient subject search based on the inverted file of keywords extracted automatically from the natural uncontrolled language of the document. Austin (1986) questions this claim of Borko. Batty (1988) explains how authors and publishers can make information more accessible [on line] by providing indexing information that uses controlled terms from a thesaurus.

Kosovac (2000) explains the usefulness of thesauri in the retrieval of networked information available both, on Internet and Intranet.

1.7 Thesaurus construction methodologies:

1.7.1 Approaches to thesaurus construction:

Lancaster (1972e) has identified four approaches to thesaurus construction:

i) Convert an existing vocabulary; e.g. convert a list of subject headings into a thesaurus.

ii) Extract the vocabulary from an existing, more general thesaurus or develop a specialized thesaurus putting within the framework of a more general thesaurus; i.e. create a microthesaurus.

iii) Generate the vocabulary empirically [so called empirical method] on the basis of indexing a representative set of documents.

iv) Collect terms together from diverse sources, including glossaries and other publications and from subject specialists.

The third approach mentioned above is named differently by different people, e.g. inductive method (Aitchison, Gilchrist and Bawden, 2000b), bottom-up approach (Lancaster, 1985a), stalagmitic method (Wooster, 1970), literature-based method (Blagden, 1968). This method was used in the construction Du Pont thesaurus (Lancaster, 1972e).

The fourth approach mentioned above is also called as committee approach, deductive method (Aitchison, Gilchrist and Bawden, 2000b), stalactitic method, top-down approach (Lancaster, 1972e) and subject specialist based method (Blagden, 1968). This method was used in the construction of Thesaurus of Engineering and Scientific Terms (Thesaurus, 1969).

Super thesaurus construction (Stern and Rischette, 1991) and dynamic thesaurus (Rees-Potter, 1991) are new ideas which put forth the concept of the universal thesaurus. From the organization point of view their size, though may be a matter of concern.

The latest idea is of ‘thesaurus federations’ (Kramer, Nikolai, and Habeck, 1997) which is a project based on the development of ‘switching’ rules between different thesauri included in the federation.

The idea of construction of ‘objective thesaurus’ especially based on and • meant for Internet is also emerging. Such a thesaurus, as put forth by Miller (2001a) is based on real information.

1.7.2 User-oriented thesaurus:

Most of the thesauri are constructed by considering the indexers’ convenience. However, the growth of online ISR systems and users’ unassisted searching have made the traditional concept of indexer oriented thesaurus obsolete (Bates, 1986). Bates suggested the construction of the so-called ‘user-thesaurus’. Bates ‘user thesaurus’ includes:

i) A list of all terms in use in the database.

ii) Terms actually used in a given database and those that are not used.

iii) Scope notes for problems likely to be encountered by end- user.

iv) Self-explanatory names for terms or relationships and,

v) A vast entry vocabulary, geared to end-user requirements.

Bates argument is that the users’ search query terms should be considered as a source for thesaurus vocabulary. This view is fully supported by Strong and Drott (1986) and Pejtersen (1980). The US National standard on thesaurus construction has also recommended the construction of user thesaurus under the caption ‘user warrant’ (National, 1994).

A further account of developing a thesaurus based on generators (literary warrant) and user warrant is provided by Lopez-Huertas (1997).

Classaurus developed by Bhattacharyya (1982) is also a type of thesaurus. It has qualities and strength of both a thesaurus and classification. Fugmann (1990) and Devadasan (1985) reported on the automatic construction of classaurus.

1.7.3 Standards and guidelines for thesaurus construction:

Thesaurus construction involves decisions to be taken over various aspects of a thesaurus. In order to have consistency in these decisions and ultimately to have a useful thesaurus, various agencies have prepared standards/guidelines. All these standards/guidelines provide prescriptions and explanations over forms of terms, formations of thesaural relationships and display of terms.

Krooks and Lancaster (1993) and Williamson (1996) have presented a detailed evaluation of principles, guidelines, and standards evolved from 1959 to 1993.

Most of the individual thesauri in their introductory part provide explanations of principles, standards followed during the process of their compilations. However, there exist specially prepared guidelines (Unesco, 1981) and International standard ISO-2788 (International, 1974) for thesaurus construction. The second edition of the International standard was published in 1986 (International, 1986). There also exist a number of national standards for thesaurus construction, e.g. the French standard AFNOR NFZ-47-100 (Association, 1981) the British standard, BS-5723 (British, 1987), the German standard, DIN 1463 (Deutsches, 1987-1993), the US standard, ANSI / NISO Z39.19 (National, 1994). From this list of various national standards, it can be noticed that there is no Indian standard for thesaurus construction.

These national standards, as well as the second edition of the International Standard, are based on the Unesco’s guidelines for thesaurus construction (Unesco, 1981).

3.7.4 Manuals for thesaurus construction: In addition to the national and international standards comprehensive manuals are published, e.g. Gilchrist (1971), Soeregal (1974), Lancaster (1986), Aitchison, Gilchrist, and Bawden (2000c). A general overview of thesaurus and thesaurus construction is provided by Foskett (1980a), Aitchison (1992) Gilchrist (1994, 1997), Taylor (1999), and Dextre Clarke (2001). Lancaster (1985b) has also brought out a brief manual under the General Information Programme (PGI) of Unesco. A tutorial on thesaurus construction is also available (Craven, 2001).

1.7.5 Use of information technology in thesaurus construction: Thesaurus construction involves intellectual and mechanical activities, e.g. selection of terms and establishment of their relations is intellectual work whereas; arranging thesaural entries is mechanical activity. Thesaurus construction and its maintenance (i.e. adding, deleting and modifying terms) is a time-consuming job. To expedite the work of thesaurus construction and to improve its quality the computer and related technologies have been used since long. Experiments were carried out for testing the usefulness of computer in text analysis for thesaurus construction (Sparck Jones, 1971), (Salton, 1971). Stevens (1980) and Crouch (1990) provide a detailed account of automatic thesaurus construction. Lancaster (1986) in his book entitled vocabulary control devoted a full chapter to discussing the issue of automatic thesaurus construction.

Automatic thesaurus construction can be a part of automatic ISRS. In automatic thesaurus construction, thesaurual relationships are mainly identified statistically (Aitchison, Gilchrist, and Bawden, 2000d). Svenonius (1986, 1987) has presented an account of the development of algorithms for establishing associative relationship using co-occurrence data.

Schwanhausser (1975) and Ravichandra Rao (1975) described the procedure for the computer generation of a thesaurus from a set of descriptors manually assigned to documents.

1.7.5.1 Thesaurus management software: The complicated task of thesaurus construction and maintenance can now be simplified by the use of thesaurus management software. INDEX, PROTERM, CICADE, LIDOS, TMS, DOMESTIC, BASIC are some of the thesaurus management software. List and information about thesaurus management software along with their web address is made available by Willpower (2002).

Ritzier (1990) presents an evaluation of three thesaurus management softwares, i.e. INDEX, PROTERM, and TMS.

`THESHI program’, available with Unesco’s Micro-CDS/ISIS is useful in online thesaurus construction (Chowdhury, 1999a) Another program available for online thesaurus construction is VOCON (Chowdhury, Neelameghan and Chowdhury, 1995).

Karisiddappa and Prasad (1993) described the use of the programming language PROLOG in thesaurus construction.

Milstead (1991) enumerates specifications for thesaurus construction software.

In order to choose appropriate thesaurus construction software from among the many available in the market, there is a need of some criteria to evaluate them before purchase. Ganzmann (1990) discusses the criteria for evaluation of thesaurus construction software.

There are others who have dealt with various aspects of thesaurus construction, e.g. Chandran (1975) has presented a case study of the selection of candidate terms for a thesaurus. Seetharama (1975) has explained concept representation in thesaurus by use of elemental and pre-combined descriptors. Thesaurus construction is a complicated time-consuming process. Prasad, Mahajan, and Thyagaragan (1975) calculate the time required for each process. Aptagiri and Prasad (1995) explain the use of Natural Language Processing (NLP) technique in grouping concepts in thesauri. Miller (1997) has illustrated the problems generally encountered during the process of thesaurus construction and methods for solving them. Once constructed, the thesaurus must be evaluated for assessing its quality. Davis and Rush (1979a) have given criteria for evaluating thesaurus.

1.8 Thesaurus relationships:

An intrinsic feature of a thesaurus is its ability to distinguish and display the relationships between the terms it contains.

Broadly there can be two types of thesaural relationships (Aitchison, Gilchrist and Bawden, 2000e).

i) The micro level concerned the semantic links between individual terms. These are also called inter-term relationships.

ii) The macro level concerned with how the terms and their inter-relationships relate to the overall structure of the field.

The inter-term relationships can be of three types (Unesco, 1981b) i.e.

1.8.1 The equivalence relationship

1.8.2 The hierarchical relationship

1.8.3 The associative relationship

A critical analysis of these relationships has been made by Willets (1975), Maniez (1988) and Dextre Clarke (2000). Dextre Clarke observes that the standard rules on thesaural relationships are not always seriously applied. For this, he, however, does not blame the thesaurus constructor. According to him, the inconsistency appears due to pragmatic and subjective decisions about what will serve human users best

1.8.1 The equivalence relationship: This is the relationship between preferred and non-referred terms. Such a relationship must be paid attention to when one and the same concept is represented by different terms. The preferred term is the one, which is selected for indexing and the non-preferred terms are non-selected. However, the non-preferred term also forms part of an entry vocabulary directing the user towards the preferred term. Conventions used for expressing reciprocity are:

USE – is written as a prefix to the preferred term.
UF – is written as a prefix to the non-preferred term.

The equivalence relationship includes synonyms, lexical variants, quasi-synonyms, and factored and non-factored terms.

1.8.1.1 Synonyms are terms whose meanings can be regarded as the same in a wide range of context so that they are virtually interchangeable. Some of the common classes of synonyms are-

i) Popular names and scientific names

ii) Terms of different linguistic origin

iii) Common nouns and trade names

iv) Variant names of emergent concept

v) Current or favored terms versus outdated term.

1.8.1.2 Lexical variants are different word forms for the same expression, which includes spellings, irregular plurals, direct versus indirect order, abbreviated forms, etc.

1.8.1.3 Quasi-synonyms are also known as near synonyms. Quasi synonyms are generally regarded as different in ordinary usage but treated as though they are synonyms. Quasi synonyms include terms having significant overlaps, e.g. Gifted people / genius.

1.8.1.4 Factored and non-factored terms: Many a times compound terms are formed to express a concept. But the second, third term in the compound phrase may also be of significance. Such compound terms are factored and cross-references are provided from the compound term to the element used in the combination.

1.8.2 The hierarchical relationship: This relationship shows levels of super-ordination and subordination. The super ordinate terms represent a class or whole and subordinate terms refer to its member and parts. Conventions used to express reciprocity are:

BT (i.e. broader term) is written as a prefix to the super ordinate term
NT (i.e. narrower term) is written as a prefix to subordinate term

This relationship helps in locating broader / narrower concepts. This feature of a thesaurus (i.e. hierarchical relationship) distinguishes itself from unstructured lists. Through this relationship, only a thesaurus helps to improve recall and precision performance. Indirectly, it also helps to clarify the scope of the term.

Terms forming part of the chain in a systematic part usually display hierarchical relations. However, not all terms indented under a term in a systematic display will be hierarchically related to the one at the level above. They may be associatively related too (Bean, 1998). The hierarchical relationships cover four situations i.e.

1.8.2.1 Generic relationship: It identifies the link between a class or a category and its member or species. According to the Unesco guidelines (Unesco, 1981c) this relationship may be identified by using the convention:

BTG used for broader term generic

NTG used for narrower term generic

1.8.2.2 Whole-part relationship: This relationship covers a limited range of situations where the name of a part implies the name of its possessing whole serving as the super-ordinate term.

There are four main classes of this type of relationship.

i) System and organs of a body

ii) Geographical locations

iii) Discipline or field of discourse

iv) Hierarchical social structure

1.8.2.3 Instance relationship: This relationship identifies links between a general category of things or events, expressed by a common noun and an individual instance of that category, the instance then forming a class-of-one, which is represented by a proper name.

1.8.2.4 Polyhierarchical relationship: Some concepts can belong, on logical grounds to more than one category at the time. They are then said to possess polyhierarchical relations.

1.8.3 The associative relationship: This relationship is found between terms, which are closely related conceptually but not hierarchically and are not members of the equivalence set. Associatively related terms are known as ‘related terms’ (RT). The Unesco guideline identifies following two kinds of terms to be linked by the associative relationship (Unesco, 1981d).

i) Those that belong to the same category: These are the terms that relate to siblings with overlapping meaning such as boats and ships.

ii) Those that belong to different categories. Number of criteria for identification of this type of relationship are recommended (Unesco, 1981e), e. g.

a) An operation and its agent
b) An action and the product of the action, etc.

Aitchison and her co-authors believe that this relationship is less easy to define (Aitchison, Gilchrist and Bawden, 2000f). Biswas and Smith (1989) emphasize the importance of the associative relationship. Molholt (1996) has pointed weaknesses in establishing this relationship in many existing thesauri.

Willets (1975) discussed the nature of associative relationship and methods for its establishment.

Efforts are made by Perreault (1965), Neelameghan (1975) and Rajan . (1975) to list categories that cover associative relationship. Sparck Jones (1971) has claimed that it is possible to establish associative relationship using a computer.

The present research provides a more concrete solution to the problem of the establishment of an associative relationship. The research reveals that depth classification schedules are very useful in this task, as the isolates across facet represent the associative relationship. Spectators also help to establish an associative relationship.

1.9 Thesaurus display methods:

The thesaurus differentiates itself from other vocabulary control devices on account of the display also. The terms and their relationships are displayed in various ways in a thesaurus. The Unesco guidelines (Unesco, 19810 identifies three types of displays, i.e.

a) Alphabetical display

b) Systematic display

c) Graphic display

1.9.1 Alphabetical display:

It is a most common form of display. It organizes all indexing terms in single alphabetical order. The Unesco guidelines (Unesco, 19810 has given list and order of ancillary items to be provided along with the terms in the alphabetical thesaurus. It has also provided the prefix conventions to be followed. These are –

1. Scope notes or definitions

2. OF references to non-preferred equivalent terms

3. TT references to top terms, if necessary

4. BT references to broader terms

5. NT references to narrower terms

6. RT references to related terms

Use of BTI, BT2 / NTI, NT2 to indicate multi-level hierarchies is also prescribed by the Unesco guidelines (Unesco, 1981f). These conventions are used in the CAB thesaurus (CAB, 1999).

The Root (British, 1985) thesaurus has used different mathematical symbols instead of the BT / NT conventions. In multilingual thesauri, such symbols help to resolve the language barrier.

In order to make the second, third terms in the compound heading a lead term the permuted index display is also used (Aitchison, 1996).

1.9.2 Systematic display:

The thesaurus, which has a systematic display, always has two parts i.e. a systematic part and an alphabetical part. The systematic part provides an overall structure or macro classification (International, 1986). The systematic part may be displayed in various ways such as-

(a) Broad subject groups: e.g. Thesaurus of ERIC description (Thesaurus, 1995).

(b) Faceted classification based thesaurus, known as thesaurofacet (Aitchison, Gomersall and Ireland, 1969).

1.9.3 Graphic display:

Two-dimensional figures may be used to display indexing terms and their inter-relationships. The graphic display is always supported by alphabetical listings. Importance of graphical displays in thesaurus is elaborated by Rolling (1965).

The Unesco guidelines (Unesco, 19810 has identified two types of graphics displays i.e. tree structure and arrow graph. Terminograph is another form of thesaural display, which is also called ‘box charts’. This form of display is used by the SPINES thesaurus (SPINES, 1976).

1.9.4 Screen displays: A number of thesauri are now available online. These thesauri display the thesaurus entries on the computer screen. Song (2000) discusses the pros and cons of screen display and suggests methods for improving the screen display.


Original Reference Article:

  • Kumbhar, R. M. (2003). Contruction of vocabulary control tool thesaurus for library and information science.

LEAVE A REPLY

Please enter your comment!
Please enter your name here