Home
Foreword
Acknowledgments
1. Controlled Vocabularies in Context
2. What Are Controlled Vocabularies?
3. Relationships in Controlled Vocabularies
4. Vocabularies for Cultural Objects
5. Using Multiple Vocabularies
6. Local Authorities
7. Constructing a Vocabulary or Authority
8. Indexing with Controlled
Vocabularies
9. Retrieval Using Controlled Vocabularies
Appendix: Selected Vocabularies and Other Sources for Terminology
Glossary
Selected Bibliography
Printer Friendly PDFs



Introduction to Controlled Vocabularies

7. Constructing a Vocabulary or Authority


Constructing a rich and complex controlled vocabulary or authority is a time-consuming and labor-intensive process. However, the benefits are worth the cost, because the resulting vocabulary helps to ensure consistency in indexing and facilitates successful retrieval. It also saves labor, because catalogers do not have to repeatedly record the same information. The issues discussed in this chapter concern both the construction of a local authority and the construction of a new vocabulary for broader use. Further information is found in Chapter 6: Local Authorities. Given that an authority in this context is also a kind of vocabulary, both are intended by the use of the term vocabulary below.

7.1. General Criteria for the Vocabulary

Before beginning the project, the creators of the vocabulary must agree upon and document the intended compliance with standards, construction methods, plans for maintenance, desired structure, types of relationships, display formats, and policies regarding compound terms, true synonymy, and types of acceptable warrant. A first step in resolving these issues is to determine the purpose, scope, and audience of the vocabulary.

7.1.1. Local or Broader Use
Is the vocabulary intended strictly for local use or to be shared in a broader environment? Local authorities should be customized so that they work well with the specific situation and the specific collection or collections at hand. Each institution should develop a strategy for creating local authorities customized for their specific collections.

However, if the collection is or will be queried in consortial or federated environments, controlled vocabularies should be customized for retrieval across different collections; depending upon the particular situation, the requirements are different and the terminology is broader or narrower in scope.

In today's automated environment and with the growing tendency to share data, it can generally be assumed that a vocabulary will someday be shared with others or incorporated into a larger context, even if this is not an immediate goal of the project. Thus, it is wise to create a vocabulary that is compliant with national and international standards. Furthermore, the vocabulary should use the structure and editorial rules of existing standard vocabularies in order to make it easier to achieve interoperability in the future.

Builders of local vocabularies should investigate the possibility of contributing new terms to an existing standard vocabulary, such as the AAT or the Library of Congress Authorities. Contributing to a common resource allows an institution and others in the academic or professional community to effectively share terminology, thus avoiding redundant efforts and enhancing interoperability.

7.1.2. Purpose of the Vocabulary
What is the purpose and intended audience of the new vocabulary or local authority? Vocabularies and authorities are typically used for cataloging, retrieval, or navigation.

In an ideal situation, separate—although closely related—vocabularies are used for cataloging and for retrieval. A vocabulary primarily designed for cataloging contains expert terminology. At the same time, it is designed to encourage the greatest possible consistency among catalogers by limiting choices of terminology according to the scope of the collection and the focus of the field being indexed. In contrast, a vocabulary for retrieval is typically broader in scope and contains more nonexpert and even wrong terminology (e.g., misspelled names or incorrect, but commonly used, terms).

In a structured vocabulary intended for cataloging, equivalence relationships should be made only between terms and names that have true synonymy (identical meanings) in order to allow accuracy and precision in indexing and retrieval. However, a vocabulary for retrieval may link terms and names that have near synonymy (similar meanings) in order to broaden the results. In fact, due to limited resources, many institutions use the same vocabulary for both cataloging and retrieval, thus requiring a compromise between the two approaches.

If the vocabulary is to be used for navigation or browsing on a Web site, it should be very simple and aimed at the nonexpert audience rather than at specialists. Typically, such a vocabulary is not used for cataloging or retrieval beyond navigation.

7.1.3. Scope of the Vocabulary
No vocabulary can contain all terminology. Boundaries for the vocabulary should be set, and the realm of knowledge encompassed should be precisely defined. Will it have a broad scope but shallow depth? Or will it have narrow or specific scope, but deep depth? An example of the latter is the AAT, for which the scope is limited to art and architecture, but the depth of hierarchies within this realm may be very extensive.

If the vocabulary is complex, as when the scope is broad or the hierarchies are deep, facets and other divisions should be established in order to divide the terms in a logical and consistent way throughout the vocabulary. The vocabulary may grow and change over time, which will affect the continuing need for divisions within the hierarchies. The levels of granularity and specificity that will be needed by the users of the vocabulary should be carefully considered. This issue is further discussed in Chapter 8: Indexing with Controlled Vocabularies.

7.1.4. Maintaining the Vocabulary
Terminology for art and material culture may change over time; vocabularies must be living, growing tools. What methodology will be used for keeping up with changing terminology? If it is possible to contribute terminology to a published vocabulary (such as the Getty vocabularies or the Library of Congress Authorities), a plan and methodology should be developed to submit new terms; this will of course have an impact on workflow, so that must be taken into consideration.

7.2. Data Model and Rules

The following basic issues related to the data model, minimum records, editorial rules, and other topics should be resolved before beginning work on a new vocabulary.

7.2.1. Established Standards
When populating the authority, use established authoritative standards and vocabulary resources for models, rules, and values. In order to avoid duplication of effort and to allow future interoperability, developers of a new vocabulary should attempt to incorporate existing authoritative standards and vocabularies in whole or in part, if they overlap with the scope of the intended new vocabulary. Whenever possible, the vocabulary should be populated with terminology from existing controlled vocabularies, such as the Getty vocabularies and the Library of Congress Authorities, rather than inventing terms from scratch. The unique numeric or alphanumeric identifiers of incorporated records should be included so that information may be exchanged with others and updates from the original vocabulary sources may be received.

Standard, published sources for terms or names and other information should be used when it is necessary to make new vocabulary records. Appropriate sources are discussed in Chapter 6: Local Authorities. The sources for information in the authority record should be systematically cited. If the name or term does not exist in a published source, it should be constructed according to CDWA, CCO, the Editorial Guidelines of the Getty Vocabulary Program, AACR2, or other appropriate rules.

Among synonyms, one of the terms or names should be flagged as the preferred term/name and chosen according to established rules and standards.

7.2.2. Logical Focus of the Record
Establish the logical focus of each vocabulary record. The scope of the vocabulary should be defined by determining what will be included and omitted from the vocabulary. Will there be limitations of time period, geographical extent, or topical subjects? How will each record be circumscribed? For the purpose of this discussion, a record is defined as a grouping of data that includes the terms that have an equivalence relationship to each other; links to related records; broader contexts; the scope note; and other information as required.

If only a small number of terms are needed for an application, perhaps all terminology may be included in a single vocabulary, with distinctions between broad types made through the use of facets. However, for medium-sized and large vocabularies, it is generally more efficient to create separate vocabularies for different types of data. A primary criterion for judging when to make separate vocabularies or a single vocabulary is to consider how similar the data is for various records. For example, a vocabulary for people's names requires information that is quite different from information about geographic names: people have biographies and very shallow hierarchies (if any), while geographic places have coordinates and a position in an administrative hierarchy. Based on these differences, it is more efficient to create separate vocabularies for people and geographic places.

7.2.3. Data Structure
Establish the entity-relationship model and data structure. After the scope is defined, the relationships between various types of data should be established. The following should be determined: Which data needs to have controlled terminology? Which elements must be a text field? Where multiple values may exist for a field, which fields must be grouped together? How are various types of information otherwise related? When designing the data model, a standard such as CDWA or CCO should be consulted, as should existing vocabulary data models such as those used for the Getty vocabularies. The model advocated in these standards is a relational model, which allows maximum versatility, power, and linking for complex, large data sets and intensive editorial requirements. However, implementers may decide on another data model if their needs are different.

In addition to the issues outlined here, there will be dozens of other technical decisions that must be made before constructing the vocabulary. What technology will be used? How will authority files, lists, and other controlled vocabularies be integrated into the rest of the system? These are critical questions that depend upon local needs and resources. If an institution is tied to a particular software, a vocabulary that operates within the parameters of that software may have to be designed, and compromises relative to the standards should be made as necessary.

7.2.4. Controlled Fields vs. Free-Text Fields
Accommodate both controlled fields and free-text fields. Controlled fields contain data values drawn from controlled terms and are formatted to allow for successful retrieval. Free-text fields communicate nuance, uncertainty, and ambiguity to end users.

The primary function of an indexed field is to facilitate end-user access. Access is improved when controlled terms are used to populate database fields. Fields in one controlled vocabulary may be controlled by terms in another controlled vocabulary; for example, the place names in a personal name vocabulary may be controlled by a geographic place name vocabulary.

Consistency is less important for a free-text field than for a controlled field, but it is still desirable. Although free-text fields by definition contain uncontrolled terminology, the use of terminology that is consistent with the terms in controlled fields is recommended for the sake of clarity. Using a consistent style, grammar, and sentence structure is also recommended.

7.2.5. Minimum Information
Establish the minimum required information for each record by determining which information in the data model is required and which is optional. The standards and vocabularies listed above may provide guid-ance. The data that is needed in order to use and display the vocabulary must be decided upon and supplied for every record. For example, the use of preferred terms and hierarchical placement is required for every record. Other data may be desirable but not required; a strategy may be adopted for data to be supplied incrementally over time. For example, developers of the vocabulary could work in phases, beginning with a set of minimal records and then, at a later date, filling out and supple-menting the records.

7.2.6. Editorial Rules
Identify and adopt appropriate editorial rules for building the vocabulary to ensure consistent data. If an existing set of standard rules must be altered due to local requirements, the local rules should be thoroughly documented. Once the rules are in place, they should be applied consistently and without fail. To avoid altering established rules on a case-by-case basis when existing rules do not work in a given situation, a system should be in place whereby an authorized individual or team may update the rules and distribute the revisions to all users of the vocabulary.

What do editorial rules comprise? They include the following: a list of which fields are required; how to choose a preferred term for each record; which variant terms to include; the required parameters for choosing hierarchical positions for new records and how to construct new branches of the hierarchies; how to establish other relationships between terms and records; the format and syntax used to fill in each field; the language allowed for each field (is the data in English only or multilingual?); character sets; the authorized sources for each field; and decision trees regarding how to choose which information is preferred when sources disagree. Ideally, the rules should include many examples illustrating how to enter the data and make decisions.

References to a computer system should be as generic as possible in the editorial rules, so that they do not have to be entirely rewritten when new systems are adopted over time. Training or documentation on the functionality of a specific computer system should be separate from the editorial rules, so far as is practical.

7.3. Imprecise Information

For vocabularies covering art and cultural heritage, developers should take into account that information in this field of study is often imprecise or ambiguous. There is often no one established fact, date, or opinion. Systems that catalog this information must allow for the expression of multiple possibilities and the flagging of information as possibly or probably. The following examples show some of the complex issues involved.

The name and identity of a person may be unknown. A work of art may have been created by an anonymous artist who has a known oeuvre (body of artistic works) from which approximate life dates and loci of activity may be surmised. When an artist's name is not known, scholars and museums devise appellations based on various attributes: the name of an artwork (e.g., Master of the Ovile Madonna); a client (e.g., the Beardsley Limner—a combination of the word limner, referring to a painter of portraits or miniatures, and a sitter's name, Mrs. Hezekiah Beardsley); a location (e.g., Frankfurt Master); a stylistic attribute (e.g., Master of the Mountain-like Clouds); the artist's initials, if known (e.g., Master E.L.G.); or a relationship to a known artist (e.g., Pseudo Pier Francesco Fiorentino). Most anonymous artists have multiple appellations, in different languages and formats. All these appellations must be associated with the identity. If there is a suspicion that the anonymous artist may be identified with a named individual, a relationship must be established between the two entities. For example, the Master of the Parlement de Paris worked during the fifteenth century, and the style of the works and their locations would probably make him a French or Flemish artist. A vocabulary such as the ULAN provides a record for such anonymous artists, listing the appellations and all variations on them and recording approximate dates and loci of activity.

Even with named artists, biographical information may be uncertain. Uncertain dates may be expressed as ca. (circa) or possibly or in terms of a century or the reign of a ruler. The loci of activity may be uncertain (e.g., either France or Flanders), and relationships to other artists may be presumed but not documented.

In an example for geographic information, as in the TGN, the exact location of a documented historic place may be uncertain; thus, a lost settlement must be accommodated in the hierarchy.

In an example for a vocabulary of generic terms, such as the AAT, there may be multiple logical hierarchical placements for the term within the vocabulary. There may be disagreement among scholars regarding whether a concept is a period or a culture and when and where it started or ended.

Vocabularies may track such uncertain or ambiguous information in several ways, often all used together in one vocabulary. Ambiguous information may be accommodated via repeatable fields to allow indexing of multiple possible values. For example, if there are multiple possible nationalities or loci of activity for an artist, all of them should be indexed to provide access (e.g., El Greco was a Greek artist who worked in Spain). Where uncertainty or variability may exist in the hierarchical context, polyhierarchical links allow multiple parents to be recorded. Finally, note fields may be used throughout the record to allow expression and explanation of ambiguity; important information in such notes should be indexed to allow retrieval. For example, an artist's life dates for display may be born ca. 532 BCE, died before 490 BCE. This uncertain information could then be indexed as birth date:–542, death date:–490, with rules provided for estimating uncertain life spans when precise dates of birth and death are unknown.

7.4. Rules for Constructing a Vocabulary

Devise consistent editorial rules for the establishment of warrant, choice of terms, placement in the hierarchy, and writing of scope notes and other data. Where possible, existing rules should be consulted, including the Editorial Guidelines of the Getty vocabularies, the CCO and CDWA chapters on authorities, AACR2, or other standard guidelines. A brief discussion of some important principles is included below.

7.4.1. Establishing Terms
Terms should be included based on how closely they represent concepts included in the vocabulary. For persons, places, iconography, etc., the name must be proven to represent the person, place, or subject intended by a given vocabulary record. For terms in a Generic Concept Authority, the terms representing a given concept should be true synonyms for the concept, established through literary warrant.

Criteria in choosing terms should include the elimination of ambiguity and the control of synonyms. Vocabularies should eliminate the ambiguity that occurs in natural language, including the ambiguity surrounding homographs. Homographs are words or terms that share the same spelling. A homograph may be a homonym or a polyseme. Homonyms have different meanings and unrelated origins, whereas polysemes are usually considered to have multiple meanings.

For each term, it is necessary to provide descriptors, alternate descriptors, and other variant terms (used for terms) based on the principle of true synonymy. Terms that represent variant spellings, current and historical usage, various languages, and various forms of speech should be included.

The preferred term and other descriptors should be flagged. The preferred term is the term or name that should be automatically designated as the default term by algorithm in displays. The preferred term should be the one most commonly used in scholarly literature in the language of the catalog record. If sources disagree on the preferred form of the name or term, the source highest in the list of prioritized preferred sources should determine which name or term to use.

It is important to develop a methodology for establishing authoritative terms already in use or a means to test and validate emerging terms through usage. The use of literary warrant is recommended for validating terms and distinguishing them from a word or words used in a casual sense. To establish literary warrant, the term should be found in scholarly authoritative literature or reference sources; the usage of the term should consistently refer to the same concept in the sources. Use these sources to establish both descriptors and variants based on common usage.

For less formal vocabularies, as in a local online retrieval system, terms may be based on user warrant, which takes into account the language of users. For such vocabularies, developers should look at searches in search and retrieval systems to help devise nonexpert paths to the more formal expert terminology and associated materials. Organizational warrant may be another informal means of establishing vocabulary terms for local use, based on the needs and conventions of the organization for which the vocabulary is being developed.

7.4.1.1. Capitalization
The controlled vocabulary should serve as an orthographic authority in addition to noting preferred terminology. An appropriate combination of capitals and lowercase letters should therefore be used in terms, as dictated by usage. Generic terms should be expressed in lowercase (e.g., cathedral). Proper names should be capitalized as in standard usage (e.g., Henry de Gower). Acronyms and initialisms are generally all uppercase (e.g., USA); however, common usage may dictate only an initial capital, a mixture of upper and lowercase letters (e.g., MoMA), or letters and numbers.

7.4.2. Regulating Hierarchical Relationships
Hierarchical relationships should be recorded consistently and with an overall logic throughout the vocabulary. Some of the most important considerations are listed below.

In order for a record to be a child of a given parent, the relationships must be logical all the way up the tree. A child that is part of a given parent must also be a narrower context for its grandparent; for example, Luxor is part of its parent, Qinā governorate; its grandparent, Upper Egypt region; and its great-grandparent, Egypt. The relationships should be logical when traversing down the tree as well.

Each subset of narrower terms clustered under a broader term should be independent and mutually exclusive in meaning. Occasionally, meanings may overlap (although they are not identical) among siblings, but this should be avoided when possible. For example, the two children of municipal buildings, moot halls and town halls, are sometimes considered synonymous, so their meanings overlap. Ideally, this overlap should be captured in an associative relationship.

All records in the same branch of the hierarchy should refer to the same class of things, actions, properties, or other topics. That is, every subordinate term should refer to the same kind of concept as its superordinate term. For example, photographs are objects, and subordinate terms to photographs should be objects as well (e.g., aerial photographs). A term for a photographic technique such as dye toning should not be under the object term photographs; instead, dye toning would be better placed under photographic techniques. Associative relationships may be used to link objects such as photographs to related processes and techniques, but the object and the technique should be organized separately in the hierarchical structure.

7.4.2.1. Mixing Relationships
Ideally, a given vocabulary uses predominantly one type of hierarchical relationship throughout: whole/part, genus/species, or instance. If relationships are mixed in a single vocabulary, the relationship should be flagged for clarity, using codes prescribed in the ISO and NISO standards for thesaurus construction (BTP and NTP for partitive, BTG and NTG for generic, and BTI and NTI for instance relationships). The following is an example of mixed hierarchical relationships, with corresponding codes:


dresses
   BTG main garments
   NTP bodices
   NTP skirts
   NTG gowns
   NTG sheath dresses


7.4.2.2. Incorporating Facets and Guide Terms
One way to achieve a consistent and harmonious arrangement in a medium-sized or large vocabulary is to structure the hierarchies using facets and guide terms.

Facets, also known as faceted displays, group the records into broad classes according to various criteria that make sense for the vocabulary. For example, the AAT includes facets for activities, objects, materials, agents (people), styles, physical attributes, and abstract concepts. A facet contains a homogeneous class of concepts, the members of which share characteristics that distinguish them from members of other classes. For example, in the AAT, marble refers to a substance used in the creation of art and architecture and is placed in the Materials Facet. Impressionist denotes a visually distinctive style of art and is placed in the Styles and Periods Facet. Rather than using facets with this type of topic designation, vocabularies sometimes use geographic or temporal facets.

The tree structure of hierarchical vocabularies often descends from the root, which is the single highest level of the hierarchical structure. The facets are located directly below the root, as with the Objects Facet in the example from the AAT above. Each facet may have one or more additional levels, known as subfacets or hierarchies. In the example above, Visual Works is a subfacet.



Guide terms, also known as node labels, are levels that collocate similar sets or classes of records as necessary (illustrated in the example above with angled brackets). They should logically illustrate the principles of division among a set of sibling terms, as with levels dividing a long list of types of photographs by form, function, technique, and subject in the example above. They should be consistent with other divisions in the same or a similar hierarchy. Guide terms may represent the instance relationship in a vocabulary that otherwise comprises either whole/part or genus/species relationships.

It is advisable to avoid making overly complex divisions that cause unnecessary complexity in the structure; such divisions hinder the ability of end users to access the data through browsing the hierarchies, in addition to making parent strings (hierarchical context displayed in horizontal format) unwieldy and difficult to read. In the example above, the AAT has used a large number of guide terms in the hierarchy to provide an orderly arrangement of a large number of types of photographs. If the number of types of photographs had been small, the guide term subdivisions would have been unnecessary.

Guide terms should not be used for indexing or cataloging. In displays, they should be enclosed in angled brackets (e.g., <photographs by technique>), italicized, or otherwise visually distinguished from terms that are intended for indexing.

7.5. Displaying a Controlled Vocabulary

Display issues relate to the choice of fields or subfields and how data is presented to different users. Issues of display relate to how vocabulary terms and other controlled information are displayed in a work record (i.e., the record containing information for the object being described) for certain groups of end users. A separate issue, discussed here, concerns how to display data in the controlled vocabulary itself.

7.5.1. Display for Various Types of Users
The display of a controlled vocabulary should anticipate the requirements of various types of users. Controlled vocabulary developers should ideally create different views of the vocabulary for different classes of users.


Creators: Vocabulary creators and those responsible for the maintenance of the vocabulary require access to complete information about each term and the ability to edit and add terms, relationships, and other information. They are typically experts in the subject domain of the controlled vocabulary. They require access to the revision history of the records and other administrative information that is not displayed to other users of a controlled vocabulary.

Indexers: Indexers and expert searchers typically have expertise in the subject domain of the controlled vocabulary. They require the ability to search and view equivalence, hierarchical, and associative relationships as well as definitions, dates, and notes for terms. They must have a way of suggesting or adding new terminology when the existing terms do not meet their needs.

End users: End users of the controlled vocabulary are typically unfamiliar with the jargon and complexities of thesaurus construction and online information retrieval. They probably do not understand the conventions of controlled vocabulary notation (e.g., BT, NT, UF, AD). They may have expertise in the subject area and understand its terminology. In other cases, end users are the general public, who do not have subject expertise and may need to come to the pertinent vocabulary terms for their queries through more common language or by browsing through hierarchies.


The types of displays and documentation available to indexers can be useful to end users as well, when designed with their needs in mind. End users may benefit from on-screen instructions in addition to any printed documentation that may exist.

7.5.2. Technical Considerations
The information in controlled fields is not always user-friendly, because it may need to be structured in a way that facilitates retrieval or machine manipulation (required for sorting, arithmetic calculations, etc.). Information intended for display, however, should be in a format that is easily read and understood by users.

Information for display may, in some cases, be expressed in a free-text field; in other cases, it may be concatenated or otherwise displayed from controlled fields. If the controlled terms are self-explanatory, they can be displayed as they are or concatenated with other terms. For example, a preferred geographic name and the broader hierarchical contexts for the place may be drawn from hierarchically linked records and concatenated for display.

7.5.2.1. Display Independent of Database Design
As far as possible, display or technical constraints should not drive the database design. When planning a database design and the rules for data entry, immediate display demands should not dictate database structure or data entry practice. How information displays in one context should be secondary to consistent and accurate compiling of data. Allowing local display issues or the limitations of a particular computer system to drive how a database is designed or how information is inputted may offer short-term solutions to some problems but will make it more difficult to migrate and share vocabulary data in the long term.

When vocabularies are used in an application for indexing or retrieval, the application must deal with issues surrounding how to gain access to the vocabulary data itself, how to display vocabulary data, and how to apply vocabulary data in a query across target resources. In applications that provide access to the vocabularies, users should be allowed to find the names and other information associated with a concept by either spelling a term or browsing through hierarchies and alphabetical lists.

7.5.3. Characteristics of Displays
Designing a good display is critical. Catalogers' or other users' willingness and ability to use the vocabulary are dependent upon how well they can understand and find terms. There are several types of possible displays, ranging from simple alphabetical listings to complex graphical displays. It is often desirable to provide multiple views of the vocabulary, including hierarchical displays, full record displays, and search results displays. Various methods of display, typography, capitalization, sorting, and arrangement of the data on the page or screen can be used to make terms easy to find and understand.

Usability and accessibility standards should be applied rigorously to all controlled vocabulary display designs. User interface design should take into consideration accessibility issues for people with disabilities, which is a growing area of research and standardization.

7.5.3.1. Format of Display
Controlled vocabularies may be delivered in print or electronic formats. Electronic formats allow greater versatility in searching and displays, including Web functionalities such as hyperlinking that are not available in print format.

7.5.3.2. Documentation
Vocabulary creators should provide user documentation for the controlled vocabulary, explaining the scope, development process, structure, basic rules for construction, and how to use the vocabulary.

Separate documentation may be desirable for vocabulary creators, indexers, and searchers. With controlled vocabularies that are published in print form, this documentation should be part of the introductory material. If the controlled vocabulary is available online, the user documentation should also be available online, with the possibility to download and print it. In software applications, the documentation may be available as in-context online help. Comprehensive supporting documentation should include the following: the purpose of the controlled vocabulary; its scope, including the subject area covered and what is excluded; the meaning of conventions, abbreviations, and any punctuation marks used in nonstandard ways; and the rules and authorities to be used in selecting the preferred forms of terms and in establishing their relationships. The following should be noted: if the vocabulary complies with a national or international standard for controlled vocabulary construction; the total number of terms and records; the dates and policy for releasing updates; the contact information of the responsible organization to which comments and suggestions should be sent; and any special online navigation conventions or searching options.

7.5.3.3. Displaying Hierarchies
Thesauri, taxonomies, and any vocabularies with established relationships between records should include a hierarchical display that illustrates the relationships. A primary consideration for displays includes how to represent the relationships, whether through notation codes, indentation, or other graphical displays.

7.5.3.3.1. Indentation vs. Notations
In a flat display, which is often used in printed publications, the hierarchical relationships of thesauri may be indicated with relationship notations, such as BT (broader term), NT (narrower term), and UF (used for term), as in the examples below.


bobbin lace
BT lace
NT Antwerp lace
NT Brussels lace
NT Chantilly lace
NT duchesse lace





The flat format has the disadvantage of typically allowing only one level of narrower terms and broader terms to be shown with clarity. This means that if any of the narrower terms have further levels of narrower terms, they are not displayed under the ancestor broader term, making the full extent of relationships difficult to visualize. In some notational displays, multiple levels of narrower terms are displayed with traditional notations and rudimentary indentation as well as numbers to list multiple levels of narrower terms, as in the example below.


lace (needlework)
UF lacework
UF dentelle (lace)
BT needlework (visual works)
   NT1 bobbin lace
       NT2 Antwerp lace
       NT2 Brussels lace (bobbin lace)
       NT2 Chantilly lace
       NT2 duchesse lace
   NT1 needle lace
       NT2 Armenian lace
       NT2 Battenberg lace
       NT2 Brussels lace (needlepoint)
       NT2 Venetian lace
           NT3 Alen\xE7on lace
           NT3 Burano lace
           NT3 point de neige
           NT3 point plat de Venise
           NT3 punto a relievo
           NT3 rose point


The fully realized indented hierarchy tree display as shown in the example on the opposite page is more user-friendly than relationship notation codes, because the significance of indentation as a broader/narrower context indicator is familiar to most end users and requires no knowledge of specialized jargon. Even for expert users, indentation is often clearer and more easily understood at a glance.

Broader/narrower relationships may be indicated with indentation representing a tree structure. In an automated presentation, levels may be expanded or collapsed by using a file folder symbol or another sign (such as the hierarchy tree sign in the example). It is recommended to always display the top of the hierarchy and all levels of ancestors so that the user has a clear notion of where the terms are placed in the full hierarchy.



7.5.3.3.2. Alternative Hierarchical Displays
Algorithms may be established to allow display of the hierarchy by different languages or by other alternative displays. For example, if the language or other information is flagged in the data, this data may be used to establish alternative displays for the hierarchy. In the examples on the following page, the TGN is displayed with English names as the default (when there is an English name; otherwise it defaults to the vernacular), and the alternate display includes the vernacular name (local language of the place, transliterated into the Roman alphabet) for all places below the continent. The user may toggle back and forth between English or vernacular displays. The base language of the TGN is English, but terms and scope notes may be expressed and flagged in any language.

7.5.3.3.3. Display of Polyhierarchy
If a record has multiple parents, and if that record also has children, the children must display with the parent in all hierarchical views. Thus, these children must fit logically with not only their immediate parent but also logically belong to all of their grandparents.

When there are multiple parents, one of the parents should be flagged as a preferred parent to facilitate default displays and other technical requirements. When a record is displayed with a nonpreferred parent, there should be an indication alerting the end user to its status. In the chocolate pots example on the following pages, the nonpreferred parent relationship is indicated with an N in brackets, and in the second display, by a heading called additional parents.











Historical relationships may be included; dates may be used to circumscribe the duration of the relationship. In the example from the TGN on the previous page, a historical flag (indicated by the letter H) and a natural-language display date (to Flanders at various times) appear in the hierarchical display. See Chapter 4: Vocabularies for Cultural Objects for more information about dates for relationships.

7.5.3.3.4. Sorting of Siblings
Siblings in hierarchical displays are generally arranged alphabetically. They may also be arranged chronologically or in another logical order, if it is deemed to be more intuitive for the user.

Special coding of siblings may be necessary to enforce a special sorting order. In the example below, a sort order number is included to force sorting in an order other than alphabetical. The sort order was established manually by an editor, who used a chronological sequence to guide the ordering.









7.5.3.3.5. Faceted Displays and Guide Terms
The display of records may be organized according to the broad categories or facets. Facets may have a further hierarchical arrangement as well so that narrower facets are arranged within broader categories.


Top of the AAT hierarchies
   Styles and Periods Facet
       Styles and Periods
           <styles and periods by region>
               <The Americas>
                   <American regions>
                       Central American
                       Caribbean
                       North American
                       South American
                   Pre-Columbian


Guide terms (node labels) are used to group both narrower and related terms into categories. Guide terms are not used for indexing but only for collocation of terms within a controlled vocabulary. They should be displayed in a way to distinguish them from terms representing concepts (postable terms). The recommended method for distinguishing guide terms is placing them in angled brackets.

7.5.3.3.6. Classification Notation or Line Number
In a tree structure, each term may be assigned a classification notation or line number, often built from the top down. When a hierarchical classification scheme is applied to such a tree structure, the notation may be inhospitable to interpolation at any level. A notation scheme consisting entirely of either letters or numbers is less versatile than a mixed alphanumeric notation. Computer-generated or human-assigned line numbers may be easily revised when terms are added, but the notation will not reflect the levels of hierarchy. See also Chapter 4: Vocabularies for Cultural Objects for a discussion of Iconclass, which is an example of an alphanumeric classification system that can be displayed as a hierarchy.






7.5.3.4. Full Record Display
Full record displays (also called term detail displays) include complete details for each record, including equivalence, associative, and hierarchical relationships as well as scope notes, sources, and other related information. In print formats, the term detail display is typically incorporated into the hierarchical display. In electronic formats, users should be able to select a term from any display type and see an expanded view of the detail for that record. Web implementations of controlled vocabularies may include a hyperlink from the term, wherever it appears, to the full term detail display. The user should be able to mark multiple records and view them together for comparison.

7.5.3.5. Displaying Equivalence and Associative Relationships
Relationships between terms in a record (equivalence relationships) and between records (associative relationships, or nonhierarchical relationships) should be clearly designated to users. It should be obvious to the user which terms are descriptors, as distinguished from alternate descriptors and other variant terms (called used for terms). The types and number of associative relationships should be evident.

Many controlled vocabularies use standard thesaural notations to express relationships between synonyms and related terms. Equivalence relationships may be expressed in a list, using notations for term type (e.g., D, AD, UF). In printed indexes, see references may be used. The standard thesaural notation for associative relationships is RT, for related term (term actually refers to a record, not just a single term).


aerial perspective
SEE atmospheric perspective

aerial photographs
AD aerial photograph
UF air photographs
UF air photos
RT bird's-eye views
BT <photographs by picture-taking technique>

aerials
SEE antennas


As is true for hierarchical relationships, displays that use the standard thesaural notations illustrated above for equivalence and associative relationships will likely be difficult for nonexperts to use. A more user-friendly display labels information in a way that both experts and nonexperts can understand. In the examples below, indications of term type are still included, but in a display that is easier for nonexperts to interpret (e.g., users can click on the hyperlink for a definition of used for term), along with language and other information about the term.


Terms
aerial photographs (preferred, descriptor, English-preferred)
aerial photograph (alternate descriptor, English)
air photographs (used for term, English)
air photos (used for term, English)
photographs, aerial (used for term, English)
photographies aériennes (descriptor, French-preferred)
photographie aérienne (alternate descriptor, French)

Related concepts
distinguished from. . . . aerial views
. . . . . . . . . . . . . . . . . . . . (<views by vantage point or orientation> , views (visual works), . . . Visual and Verbal Communication) [300015527]
distinguished from. . . . astrophotographs
. . . . . . . . . . . . . . . . . . . . (<photographs by subject type>, photographs, . . . Visual and Verbal Communication) [300134468]
distinguished from. . . . bird's-eye views
. . . . . . . . . . . . . . . . . . . . (<views by vantage point or orientation>, views (visual works), . . . Visual and Verbal Communication) [300015529]
distinguished from. . . . space photographs
. . . . . . . . . . . . . . . . . . . . (<photographs by picture-taking technique> , . . . Visual and Verbal Communication) [300246214]


7.5.3.5.1. Permuted Lists and Inverted Forms
Some controlled vocabularies include an auxiliary permuted or rotated list that gives access to every word in all the terms. In other words, a permuted display lists each compound term multiple times in the alphabetic sequence of the controlled vocabulary, once for each of the words in the term. A permuted listing is often useful in a printed product, but it is not needed for online displays, given that the terms may be found by keyword searching and other searching utilities. Furthermore, caution must be taken because automatically generated permuted displays may result in combinations that are misleading and incorrect. For example, the term library science appears as science—library in a permuted list, which may easily be misconstrued as a different concept.

Useful term inversions differ from a simple permuted list in that editors create the term inversions based on the need and appropriateness of such terms. Useful inversions should be included as used for terms, whereas a full permuted listing should not.

7.5.3.5.2. Displaying Homographs
Homographs are terms or names that are spelled alike but have different meanings. Homographs must be distinguished in displays.

One method is to disambiguate the term with a qualifier, which is a word or brief phrase. In many thesauri, the qualifier is included in the same field as the term, distinguished from the term by punctuation or formatting. A more versatile implementation is to put the qualifier in a separate field, as in the example below. If the term field is dedicated to the term only, it allows implementers to decide whether or not to include the qualifier in retrieval. The qualifier should, however, be displayed with the term for end users (as in the second example on the previous page). It is customary to display the qualifier in parentheses following the term (e.g., drums (walls)).



7.5.3.5.3. Sorting and Alphabetizing Terms
Terms consisting of alphabetic characters may be sorted word-by-word or letter-by-letter. Word-by-word sorting is familiar to users from alphabetized telephone directories. In word-by-word sorting, a space is significant (it is also called nothing before something filing); it keeps together terms that begin with the same word.

However, a disadvantage of word-by-word filing is that it separates compound words (e.g., bookbinding) from compound terms, which are terms consisting of two words (e.g., book jackets). Letter-by-letter sorting alleviates this problem. At its most effective, letter-by-letter sorting is performed on terms that have been normalized so that spaces, punctuation, diacritics, and capitalization are ignored (the normalized terms are stored in a table separate from the exact term strings and are generally not seen by end users). Letter-by-letter sorting is familiar to users from dictionaries. With either method, parenthetical qualifiers should be ignored in sorting; that is, terms with qualifiers should not be sorted in the same way as compound terms.


Below is an example of word-by-word sorting:

book catalogs
book cloth (textile material)
book cupboards
bookbinding
bookcases
bookends
Below is an example of letter-by-letter sorting:

bookbinding
bookcases
book catalogs
book cloth (textile material)
book cupboards
bookends


Resources such as the American Library Association Filing Rules, Library of Congress Filing Rules, and British Standard Alphabetical Arrangement and the Filing Order of Numerals and Symbols (BS 1749) contain rules for sorting and filing. However, these standards are not always compatible with each other. Electronic systems may enforce pre-established sorting rules and handling of nonalphabetic characters, while other systems provide options for developers to select the sorting rules.

7.5.3.5.4. Diacritics in Sorting
A typical database requires implementers to identify one—and only one—language for the data; the system applies pre-established sorting algorithms based on that language. However, the vocabularies discussed in this book include terms and names in many languages. Even when limiting the discussion to only the Roman alphabet, different languages have different rules for sorting characters with diacritics.

Since it is impossible to create a sorting rule that recognizes diacritics while still obeying rules of alphabetization for all languages, and since most Web users are accustomed to seeing terms and names sorted by standard ASCII characters without special weighting of diacritics, normalized diacritics should be used for sorting. For example, users expect to see all words beginning with the letter A sorted together in alphabetic displays—not those with accents or umlauts sorted before or after the rest of the As.

Normalization of diacritics by mapping them to ASCII characters in the Roman alphabet is the most practical way to deal with diacritics in sorting. If multiple alphabets are used in Unicode or another encoding scheme, the issues are even more complex. Normalization of diacritics for retrieval and sorting is discussed in Chapter 9: Retrieval Using Controlled Vocabularies.



7.5.3.5.5. Display of Diacritics
The display of diacritics may necessarily differ in systems for creators and for end users of vocabularies. Full diacritics or diacritical codes should display in the system used by creators of vocabularies and indexers. Some Web applications may not be able to display all diacritics, because certain diacritics in certain fonts may not display correctly. For this reason, creators of vocabularies and indexers should avoid such applications.

It may be unavoidable to expose end users to missing diacritics, because they typically do not have access to the native data in the editorial system. If end users are using a Web interface, implementers should make sure that it displays as many of the vocabulary's diacritics as possible. Some Unicode values are specific to certain fonts, so this should guide the choice of font.

For diacritics that cannot display on the Web, one solution for end users is to display the plain ASCII character that is equivalent to the diacritic. The disadvantage of this method is that the end user cannot tell that the word is missing a diacritic, the word without a diacritic is incorrect, and this practice may result in unintentional homographs being displayed in a single record. The alternative solution is to display the terms with whatever internal symbol appears in place of the diacritic, because this at least alerts the user that a diacritic is displaying incorrectly. As Web interfaces become increasingly more sophisticated in displaying Unicode, this problem is diminishing over time.

Note that diacritics may appear not only in the term field but also in display dates, notes, and several other fields of the data.

7.5.3.6. Search Results Displays
Results of search queries should display both the terms that met the criteria of the search and an indication of the hierarchy and other context of the terms. Display of results lists is further discussed in Chapter 9: Retrieval Using Controlled Vocabularies.

7.5.3.6.1. Headings or Labels
Headings or labels are used in search results displays and in other displays where a brief listing of the vocabulary record is required. The heading or label is a short display that identifies the vocabulary concept, combining the term or name with additional information. Ideally, the information is recorded in separate fields and concatenated with the name or term for heading displays. In the examples on the opposite page, biographical information is used to disambiguate people with homographic names, while the broader contexts and place types (terms describing the type of place) are used to disambiguate homographic place names.







7.5.3.6.2. Ascending or Descending Order of Parents
Ascending order refers to the display of hierarchical entities in a heading from smallest to largest, familiar to users in the U.S. from mailing addresses. Descending order refers to the display of hierarchical entities in a heading from largest to smallest. This display may be familiar to users from back-of-book indexes.

For the horizontal displays of hierarchical information in headings or labels, it is most user-friendly to display the parents in ascending order (e.g., Black Forest (Paulding county, Georgia, United States)), because this is how most users are accustomed to referring to such broader contexts in speech and print.

However, listing the parent string in descending order is useful in results lists or other displays that require meaningful sorting among homographs, because the homographs can be sorted alphabetically by parent string. In the example below, the records for Springfield in Africa and Europe sort alphabetically above the records in North America, with records in Canada above the records in the United States; among the subset of records in the United States, sorting is by state, then by county.



7.5.3.6.3. Displaying the User's Search Term
The results list should clearly demonstrate to the user why the results were returned. The users' search string may not necessarily match the preferred term; regardless, the term that made the match should be included in the results. It is recommended that the preferred term, the terms that matched the query, and other information (such as parent strings) be displayed to provide context.





7.5.3.7. Pick Lists
Some electronic implementations of controlled vocabularies use pick lists as a way to lead users to a small set of choices of terms for a given field. These are often implemented as drop-down lists. When the user comes to a particular controlled field, a full list of choices for terminology is displayed for users to select when indexing or constructing a query. Typically, pick lists do not include synonyms, although they could be tied to larger vocabularies that include synonyms and other information for the concepts.