Keywords used more than 25 times in articles published in journals on Eating Disorders indexed in MEDLINE and their equivalence with MeSH.
One of the most important authors in the indexing field, Jacques Chaumier, defined indexing as both a means and an end. From the former perspective, indexing is the description and characterization of a document's contents, with descriptions of the concepts it contains; however, its ultimate purpose is to enable the information stored in the system to be recovered. In other words, like many other authors Chaumier considers indexing to be the prerequisite for the adequate recovery of information (Rodríguez Perojo et al., 2006, as cited in Chaumier, 1986).
The process of searching for information must consist of a series of ordered steps that have to be followed when searching for the answer to a question, especially in the literature. However, a command of the vocabulary used is one of the determinant factors for success when searching for information, in terms of both describing and recovering articles of interest.
Based on the idea that information is the essential ingredient of knowledge, the bibliographical search is one of the essential parts of all thorough research work. A study is not only documented by its bibliography, but the bibliography is often also its firmest foundation and the best guarantee of its relevance. Knowledge of the existing reference works and their contents is the first requirement for solving any problem of information that arises in any professional activity. However, in order to make a truly effective use of them, it is necessary to be aware of the logical procedures that lead to satisfactory results.
This need has contributed to the rapid development of Information Recovery as an increasingly complex technique requiring knowledge of indexing languages. It is related to Documentation Sciences and Computing, and covers a clearly defined subject area (in this case Eating Disorders as part of Health Sciences) which includes procedures for the selection of documents, techniques for their dissemination and description and the various ways in which their files can be accessed.
Any researcher with a superficial knowledge of information recovery systems can undertake a bibliographical search on the Internet using their computer and obtain results that are more than sufficient in terms of the amount of references. Whether the contents of these results are what the researcher was really looking for or are as exhaustive as they should be is another matter (Sanz-Valero & Castiel, 2010).
In order to be able to recover relevant information it is therefore vital to understand the formal description of the documents (their indexing). This activity, which until a few years ago affected a group of texts that were easy to identify by type due to the fact that they were in similar formats, and were generally on paper, has been affected by the development of information and communication technologies, which has forced researchers to create reference systems for documents that are exchanged using data networks (Laguens García, 2006). Because of their volume, accessibility, quality, variety and even cost, these are now the most important information resource in the health sciences.
1.1. Computerized bibliographical databases
The computerization process of documentary archives in the Health Sciences began in 1964, in the U.S. National Library of Medicine -, with the development of a computerized search system called MEDLARS (Medical Literature Analysis and Retrieval System) -, which was designed to facilitate users' consultations of the Index Medicus -. This was the beginning of the computerization of bibliographical indexes, which led to the creation of the modern health sciences databases available on the Internet, with the consequent advantages: more speed, more thoroughness, greater precision and above all, constant and easy updating. The online availability of the MEDLARS led to the creation of the well-known MEDLINE - database (Sanz-Valero & Castiel, 2010).
Fortunately, today the health sciences have several databases which can deal with most conceivable enquiries. These databases have extensive coverage and powerful and sophisticated recovery systems.
As we are dealing with scientific language, the use of natural language can lead to ambiguous or unreliable results in terms of their precision and exhaustiveness when databases are consulted. Knowledge Organization Systems (KOS) - have been used to deal with these problems. These are a semantic resource which represents the terminology and the relations between the concepts in a domain. These systems include ontologies, taxonomies, and thesauri. In practice, Knowledge Organization Systems may be used to improve the intelligibility of scientific-technical documents and to optimize the storage of information and its subsequent recovery (Sánchez-Cuadrado, 2007). Knowledge Organization Systems can partially solve problems of natural language arising from polysemy and synonyms. Complications arising from the frequent use of acronyms and abbreviations of names are also reduced.
As a consequence, health sciences databases operate based on a language that is controlled, structured and hierarchical, called Thesaurus, which is used for indexing documents. Its aim is to express a specific idea that unambiguously identifies concepts in a specific subject as precisely as possible, and to use this idea to both store and recover information. The thesaurus is defined as -:
"The vocabulary of a controlled indexing language, formally organized so that the a priori relationships between concepts are made explicit".
In other words, it is an instrument enabling the systematization and recovery of information based on concepts which have the same meaning for the participants in the process.
The Thesaurus of the U.S. National Library Medicine is known as MeSH (Medical Subject Headings) - and it has a hierarchical structure, in root form, consisting of 16 broad categories (Topics), which cover all the MeSH included in it. It is constantly renewed, updated annually and a print copy is also published in January every year. In the psychology field, the American Psychological Association has developed a specific Thesaurus, the Thesaurus of Psychological Index Terms, which is the basic tool for accessing the PsycINFO database. The objective of both tools is to facilitate the development of information recovery systems, which behave as if they "understand" the meaning of the language of health sciences.
For example, a search for information on Dysphoria, Melancholy or Neurotic Depression can be undertaken by searching using the term "Depressive Disorder".
Likewise, if all the information in the bibliographical databases on Anorexia Nervosa, Binge-Eating Disorder, Bulimia Nervosa, Coprophagia, Female Athlete Triad Syndrome and Pica is required, using the term "Eating Disorders" is sufficient.
1.2. Keywords versus medical subject headings
Health sciences literature presents characteristics that make information management a complex recovery process. These difficulties are reflected in two aspects. First, there is an enormous volume of information that is constantly increasing, and an urgent need to locate the relevant responses. Second, this terminology is constantly being modified; generally as a result of new research (Morato et al., 2008).
Language is used in an unusual way in science and technology. When professionals refer to things that require a number of concepts in everyday language, they normally use a short expression with a high level of expressive effectiveness, which also has three major characteristics:
Univocity. Due to the use made of them in specialized research, the terms and propositions of scientific and technological language refer to only one specific concept, while those of everyday language are very often ambiguous and connotative.
Universality. The scientific and technological register tends to be universal, like the items to which it refers. As the situation referred to using the lexical units that comprise it in different languages is the same, their translation between languages is not usually problematic.
Verifiability. The fact that the truth of the data provided by scientific and technological language can be proven is in the final analysis the basis for our experience of reality. Words become substitutes for things. Words and the objects match each other. The features that describe scientific and technological terms belong to the real objects.
As a consequence, when writing a scientific text, which is the ultimate goal of all research work, using the correct Keywords is as important as working according to the scientific method. Their significance should not be underestimated, as incorrect use can hinder the dissemination of the document and even lead to it being completely forgotten due to problems of identification. In order to avoid this situation, the MeSH of the U.S. National Library of Medicine Thesaurus should be used as Keywords (De Granda Orive, 2005).
When we talk about Keywords in the health sciences, we are necessarily referring to a technique to help and guide the search for information, which is deemed to be a necessary step in the acquisition of knowledge to expand on or refine the information already possessed on a specific subject. Skill in discarding irrelevant information when searching for better evidence is an essential ability that has recently emerged as a result of the immense amount of information that is continually available to health sciences professionals. Indeed, effectiveness when searching for information is expressed using the same criteria as those used in a diagnostic test: in terms of sensitivity and specificity (Calvache & Delgado, 2006).
Keywords and MeSH are not exact synonyms, as while the former are words taken from natural language, the latter are univocal terms, which are hierarchically controlled and structured, belong to a thesaurus, and are organized formally in order to make the relationships between concepts explicit. Descriptors could be said to define concepts, rather than words, as they give an idea of the contents of the text they represent. For example, “Parenteral Nutrition, Total” is a concept consisting of more than one word which also delimits a subject area of knowledge.
The concurrence of Keywords with the MeSH is essential for the appropriate indexing of a scientific article when it is archived in bibliographical databases. However, it assumes a much greater importance in the recovery of documents.
MeSH are not only useful for carrying out bibliographical searches, but are also used to analyse studies by knowledge areas and they provide undeniable opportunities for an in-depth study of the subject that is impossible when only using the title or abstract of the paper (Sanz- & Red-Alonso; Tomás-Castera et al., 2009).
Some studies stress the importance of the appropriate use of MeSH in comparison with free text, highlighting greater sensitivity among the results obtained in bibliographical searches when they are used (Jenuwine & Floyd, 2004).
Knowledge of how to use MeSH correctly means that the results obtained have a high level of sensitivity (which in epidemiological terms would be considered true positives), preventing silences (articles related to the subject but not recovered) and minimizing noise (articles recovered that are not related to the search). However, in order to deal successfully with bibliographical databases in the health sciences area, the researcher must be aware of the four conditions for effective bibliographical searches: knowledge of the research question (the theoretical framework), correct use of the indexing terms (MeSH), an appropriate search strategy (or several combined strategies) and an appropriate assessment of the results. Finally, undertaking a systematic search helps this process to be as efficient as it is effective.
In view of the above, the objective of this study was to ascertain and analyse the Keywords used in articles published in journals on Eating Disorders indexed in the MEDLINE database and determine their relationship with the MeSH.
2. Material and methods
An observational, descriptive and transversal study based on a bibliometric analysis of the Keywords used in articles published in the following journals on Eating Disorders: Eating and Weight Disorders, Eating Behaviors, the European Eating Disorders Review and the International Journal of Eating Disorders. All are indexed in the MEDLINE database. The journal Eating Disorders was not studied as its articles do not have Keywords.
2.1. Sources of data
The data included in this study were obtained using direct searches and access using the Internet of the articles published in the journals mentioned above:
Eating and Weight Disorders
European Eating Disorders Review
International Journal of Eating Disorders
As criteria for inclusion, we decided that the articles had to be original and contain Keywords, and have been indexed in the MEDLINE database in the last 5 years (2006 to 2010).
A manual review of the Keywords in the studies published was carried out, and their relationship with MeSH was subsequently checked, using the same database, [
2.2. Variables studied
Number of Keywords (Kw).
Most commonly used MeSH.
Kw coinciding with the main MeSH (Major Topic).
Correctness of the Kw used in the years studied.
Frequency and percentage of articles containing all Kw matching MeSH.
Presence of the Major Topic in the title of the article
Correlation between Kw and MeSH.
Differences between the journals studied in terms of their Kw.
Delimitation of the knowledge area according to MeSH.
The indexing of the articles according to the Kw used.
2.3. Analysis of data
This is a descriptive study based on the calculation of the frequencies and percentages of the variables studied, with the most relevant data shown using tables and graphs. The quantitative variables were described using the Mean and Standard Deviation and the qualitative variables with their absolute value and percentage. The Median was used to measure the central trend. The existence of a linear trend between qualitative variables was analyzed using a Chi-square test. An analysis of variance (ANOVA) was used to compare the means between more than 2 groups for a quantitative variable with Tukey correction for multiple tests. The Pearson correlation coefficient was used to ascertain the linear relationship between two quantitative variables. The accepted level of significance was α ≤ 0.05 (Confidence interval of 95%).
The Statistical Package for the Social Sciences (SPSS) (version 15 for Windows) was used to enter and analyse the data. The quality control of the information was carried out using double tables and the errors were corrected by consulting the originals.
This study involved analysis of a total of 918 original articles from 4 journals selected from among those indexed in the MEDLINE database:
360 (39.22%) articles were from The International Journal of Eating Disorders (IJED)
219 (23.86%) from Eating Behaviors (EB)
174 (18.95%) from Eating and Weight Disorders (EWD)
165 (17.97%) from the European Eating Disorders Review (EEDR).
3.1. Keywords, medical subject headings or major topics in the indexing of articles
A total of 4,316 Keywords (Kw) were found in these articles, which presented the following statistical data: Maximum 10 and Minimum 2 Kw, Median and Mode equal to 5 Kw, Mean of 4.70 ± 0.04 (95%CI 4.62-4.79).
These articles were indexed in the MEDLINE database using a total of 13,278 MeSH, and presented the following statistics: Maximum 26 and Minimum 3.87 MeSH, Median and Mode equal to 14 MeSH, Mean of 14.46 ± 0.12 (95%CI 14.23-14.70).
A total of 3,549 Major Topics were observed among the MeSH used in indexing the articles studied (MeSH designating the main subjects in the article). The statistics for the articles as a whole were: Maximum 9 and Minimum 1 Major, Median and Mode equal to 4 Majors, Mean of 3.87 ± 0.05 (95%CI 3.77-3.96).
Of the 918 articles that contained Kw, 8 (0.87%) studies presented a total correspondence between the Kw and MeSH, as shown by the low level of association observed between these 2 variables (Pearson R = 0.12 p < 0.001).
Likewise, 3 articles presented a complete match between Kw and Major Topics (0.33%), with practically no association observed between the 2 variables analyzed (Pearson R = 0.09, p = 0.01)
3.2. Keywords used in the articles
1,868 different Kw were found in the articles studied, and 300 of these (16.06%) matched MeSH. The most frequently used Kw was Eating Disorders, on 297 occasions (6.59%); the 17 Kw used more than 25 times, 8 of which did not match MeSH, are shown in table 1:
|binge eating disorder||26||0.60||no|
No positive trend was observed in the increase of Kw matching MeSH, and no matching of Kw with Major Topics was observed (see Table 2). A comparison of the means of the variable Kw matching MeSH, by analyzing the variance with Tukey's correction presented no significance when compared by year. No statistical significance was obtained when comparing the Kw matching Major Topics by year.
|1. Total Kw1||739||1044||1009||794||740|
|4. Quotient 1:2||4.13||3.91||4.39||4.27||3.70|
|5. Quotient 1:3||5.43||5.02||5.94||5.40||5.14|
|1 Total Keywords; 2 Total Keywords matching MeSH; 3 Total Keywords matching Major Topics; 4 Percentage of articles with all Keywords the same as MeSH|
3.3. Keywords in the context of journals on eating disorders
After the data was segmented by journal, in a total of 165 articles reviewed in EEDR, all the Kw were found to match MeSH in 3 (1.82%), and this journal presented the best results in this respect.
The data observed for all Kw matching Major Topics were: 1 (0.38%) in the journal EB, 1 (0.61%) in EEDR and 1 (0.57%9 in the journal EWD. No article in the Journal IJED contained in which all Kw matched Major Topics.
The distribution of the Kw and their correctness with regard to MeSH is shown in table 3 for each of the journals analyzed.
|1 Total articles; 2 Total Keywords; 3 Total Keywords matching MeSH; 4 Total Keywords matching Major Topics.|
The comparison between the means (ANOVA and the Tukey post hoc test) for the journals according to the number of Kw matching MeSH showed no significant differences at a level of 0.05 (see table 4).
|IJED||1.13 ± 0.05||1.03-1.22|
|EB||1.12 ± 0.07||1.00-1.25|
|EWD||1.19 ± 0.07||1.05-1.33|
|EEDR||1.22 ± 0.08||1.08-1.37|
The comparison between the means (ANOVA and the Tukey post hoc test) for the journals according to the number of Kw matching Major Topics showed significant differences at a level of 0.05, between the journals European Eating Disorders Review and Eating Behaviors, with no significance observed for the other journals (see tables 5 and 6).
|IJED||0.88 ± 0.43||0.80-0.97|
|EB||0.76 ± 0.05||0.67-0.86|
|EWD||0.89 ± 0.06||0.77-1.01|
|EEDR||0.99 ± 0.59||0.88-1.11|
Boxplots could be used to provide a graphic image of the values of the Kw matching the MeSH and/or Major Topics. These graphs are based on quartiles and can be used to present these data in their entirety. Figure 1 shows the values for Kw matching MeSH and figure 2 shows the values for Kw matching Major Topics.
3.4. Use of abbreviations as keywords
The use of abbreviations as Keywords was checked by analyzing the Keywords used to facilitate the indexing of articles. 80 (8.71%) of the studies presented a total of 88 abbreviations or acronyms, 65 (7.08%) articles contained 1, 12 (1.31%) studies contained 2 and 3 (0.33%) studies contained 3.
3.5. Presence of the major topic in the title of the article
Of the 918 articles studied, 807 (87.91%) presented at least one Major Topic in the title of the paper. The statistics obtained from this variable were Maximum 5 and Minimum 0, Median and Mode equal to 1, Mean of 1.52 ± 0.03 (95%CI 1.46-1.58).
3.6. The knowledge area represented in the keywords used
A study of the hierarchical structure of the Thesaurus of the U.S. National Library of Medicine shows indexing of studies related with Eating Disorders; see figure 3.
As a consequence, we calculated the occasions on which one of these MeSH had been used correctly as a Keyword, and the results are shown in table 7.
|Female Athlete Triad Syndrome||0||0.00|
The most striking and interesting result of this study is the fact that only a minimal proportion of Keywords are used correctly. This is confirmed by the low level of association found between Keywords and MeSH, and also observed in the relationship with Major Topics. Equally of interest is the fact that half of the most frequently used Kws do not match MeSH, which is startling considering that the articles are to be indexed in the MEDLINE database.
Likewise, there is no apparent trend with the passing of the years; publishers now emphasize that Keywords included in articles should match MeSH, but nonetheless, no improvement in recent years has been observed.
Many studies stress the importance of the correct use of MeSH in comparison with free text when recovering scientific literature (Golder et al., 2006; Chang et al., 2006). The suitability of search equations (themed filters or documentary languages) is highlighted by using Descriptors to recover specific articles or a specific type of document with a high degree of sensitivity (Haynes et al., 2005). In the end, the implicit philosophy of search equations is the selection of evidence while considering major criteria such as validity, both internally (the level at which it was designed and carried out and the analysis which enable unbiased results to be obtained) and externally (the consistency of results with other studies and other available knowledge) (Cabello et al., 2006), and a sound methodological knowledge of search tools and strategies is necessary in order to achieve this.
In the world of scientific documentation, Keywords (subject headings) are the best tool for classifying information and one of the areas where most care is taken in the publication of any article in an internationally indexed journal. These Keywords have the following functions:
To give a brief idea of the contents of the article.
To show the reader the subject for seeking further information on the subject is covered in the article.
To carry out indexing, analysis and classification of the article in the international databases.
Today, when the search for information begins and ends in general search engines, this election and suitability of Keywords is of vital importance in optimizing information recovery.
Furthermore, as an information recovery system, the objective of PubMed is to provide effective access to documents in the MEDLINE database. To that end, the Keywords provided by the authors must match the MeSH assigned by the indexers when the article is classified in this database. In this respect, some studies show that in some areas of biomedicine, 60% of Keywords are closely related to MeSH (Névéol et al., 2010). The title and Keywords included in a study should facilitate access to the text by any reader, and as such it is worthwhile spending time on creating them correctly (Kremenak, 2009).
The evolution of scientific vocabulary towards Descriptors as a result of their importance in indexing studies in databases is ultimately measured by the frequency with which these ontologies are used (concepts consisting of one or several words, but with a univocal definition). Nonetheless, some studies emphasize the lack of importance placed on choosing appropriate Keywords, and that the likelihood of selection is simply proportional to the topicality of the subject at the time the choice is made (Shennan, 2008; Bentley, 2008).
Another very common error which was also highlighted in this study is the use of plural forms of Descriptors, such as adults or children, when they are both Keywords in the singular form. However, the opposite also occurs - i.e. the singular form is used as a Keyword when the MeSH is a plural, e.g. Humans. This should be taken into account when selecting Keywords as it can lead to confusion among those who are not experts in the subject (Wagner, 2006).
The language of the health sciences is well known for its extensive use of abbreviations and acronyms, which are generally accepted and understood by a minority of researchers in a specific area of knowledge; but they are unknown to other possible readers, despite their possible academic background (De Granda Orive, 2003), and some studies focus on their invention by authors (Cheng, 2004; Das-Purkayastha, 2004) or advocate their definition (PLEASE—Plea to Let Each Acronym, or Abbreviation, be Spelled out Every time) (Cheng, 1995). One of the many abbreviations we found - AAI (Adult Attachment Interview) - could act as an example. It is obviously not a MeSH, and if a search is carried out using Google, it could mean (among many other possibilities): “American Association of Immunologists”, “Airports Authority of India”, “Athletic Association of Ireland”, etc. However, in Spanish its main meaning is “Autorización Ambiental Integrada,” which is the administrative procedure for granting a permit for comprehensive protection of the environment.
Taking into account the data obtained and the discussion they provoke, failure to facilitate the recovery of documents to the greatest extent possible in the era of communication and information means condemning them to oblivion (Tomás-Casterá, 2009).
In order to understand the modern concept of visibility, we must first understand the ways in which the development of the media has transformed interaction in the world of scientific publication.
To an outside observer, it is strange that those involved should analyse the reasons behind attitudes that should be inherent in research and communication. The complexities of language could lead to different conclusions on the meaning of a text. There is usually a long and intricate process between the author's thought processes, the publisher and the words that appear on the page before the reader. This makes it all the wiser to use all the means at our disposal to reach the goal of the uniformity of scientific language (Sanz-Valero, 2006).
The development of the information society is undeniable. We are witnessing a series of technological, organizational, economic, social and institutional changes that are altering the relations of production and consumption, working habits, lifestyles and quality of life and the relations between the various public and private actors in our society. This new paradigm is based around handling data; finding the best information to make the best decision. Stored information is no longer an end product, but is instead a raw material which must be subjected to a process of transformation, in order to extract knowledge that can contribute to understanding a situation, and to strategic decision-making in a specific area of activity. The data-information-knowledge-decision sequence fosters and encourages an excess of publications. In the era of communication and information, the increase in health sciences publications is no longer excellent news, and has instead become a terrible nightmare. The MEDLINE database alone already contains more than 20 million references on biomedical documentation.
Technological training and literacy of individuals and groups is a necessary condition for the advancement and development of the so-called knowledge society. Living in this society requires attitudes, knowledge, competence and skill in using its techniques in order to be able to benefit from them. As a consequence, while the creation of knowledge has become the main source of wealth and welfare, access to the sources of information they create should be a basic right in modern society. Knowledge as the result of handling information is a basic tool for dealing with modern life - knowledge to evaluate, knowledge to make decisions, and knowledge to take actions. Knowledge is the “Golden Key” which opens large and small doors, providing access and inclusion in the world of technology. The key is obtained through training, judgment, culture and knowledge (Sanz-Valero, 2010).
Another key opening the door to scientific literature could perhaps be the correct use of indexing language, which would at least facilitate access to and recovery of the necessary document.
Incorrect use of Medical Subject Heading Terms (MeSH), failure to use Keywords that represent MeSH in the knowledge area, and the lack of at least one Major Topic in the title of the articles are factors that highlight the great difficulty detected in locating specialized information in the databases containing scientific output on Eating Disorders and leading to the invisibility of articles when general search engines are used.
Incorrect use of Keywords makes proper indexing difficult, and therefore inhibits the relevance and sensitivity of the bibliographical search, seriously affecting the visibility of these articles, as well as their correct classification by subject.
It is possible that the results found are due to a lack of information on the importance of the MeSH in the storage and recovery of scientific documentation from bibliographical databases, or perhaps the twofold nature of the Thesauri applicable to this knowledge area; the Medical Subject Headings of the U.S. National Library of Medicine, and the Psychological Index Terms of the American Psychological Association. Further studies are required to ascertain whether this is correct.
However, the importance of using Descriptors as Keywords in order to facilitate efficient access to this scientific literature must in any event be stressed.
- The United States National Library of Medicine (NLM), operated by the United States federal government, is the world's largest medical library. The NLM is a division of the National Institutes of Health. Its collections include more than seven million books, journals, technical reports, manuscripts, microfilms, photographs, and images on medicine and related sciences including some of the world's oldest and rarest works.
- MEDLARS (Medical Literature Analysis and Retrieval System) is a computerised biomedical bibliographic retrieval system. It was launched by the National Library of Medicine in 1964 and was the first large scale, computer based, retrospective search service available to the general public. In 1971 an online version called MEDLINE ("MEDLARS Online") became available.
- Index Medicus is a comprehensive index of medical scientific journal articles, published since 1879. It was initiated by John Shaw Billings, head of the Library of the Surgeon General's Office, United States Army. This library later evolved into the United States National Library of Medicine (NLM), which continues publication of the Index.
- MEDLINE (Medical Literature Analysis and Retrieval System Online) is a bibliographic database of life sciences and biomedical information. It includes bibliographic information for articles from academic journals covering medicine, nursing, pharmacy, dentistry, veterinary medicine, and health care. MEDLINE also covers much of the literature in biology and biochemistry, as well as fields such as molecular evolution.
- KOS is a family of formal languages designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary
- International Organization of Standardization: ISO 2788:1986, Documentation - Guidelines for the establishment and development of monolingual thesauri
- Homepage of the U.S. National Library Medicine Thesaurus: http://www.ncbi.nlm.nih.gov/mesh