A report from the International Mathematical Union (IMU) in cooperation with the International Council of Industrial and Applied Mathematics (ICIAM) and the Institute of Mathematical Statistics (IMS)
This is a report about the use and misuse of citation data in the assessment of scientific research. The idea that research assessment must be done using "simple and objective" methods is increasingly prevalent today. The "simple and objective" methods are broadly interpreted as bibliometrics, that is, citation data and the statistics derived from them. There is a belief that citation statistics are inherently more accurate because they substitute simple numbers for complex judgments, and hence overcome the possible subjectivity of peer review. But this belief is unfounded.
- Relying on statistics is not more accurate when the statistics are improperly used. Indeed, statistics can mislead when they are misapplied or misunderstood. Much of modern bibliometrics seems to rely on experience and intuition about the interpretation and validity of citation statistics.
- While numbers appear to be "objective", their objectivity can be illusory. The meaning of a citation can be even more subjective than peer review. Because this subjectivity is less obvious for citations, those who use citation data are less likely to understand their limitations.
- The sole reliance on citation data provides at best an incomplete and often shallow understanding of research—an understanding that is valid only when reinforced by other judgments. Numbers are not inherently superior to sound judgments.
Using citation data to assess research ultimately means using citation‐based statistics to rank things—journals, papers, people, programs, and disciplines. The statistical tools used to rank these things are often misunderstood and misused.
- For journals, the impact factor is most often used for ranking. This is a simple average derived from the distribution of citations for a collection of articles in the journal. The average captures only a small amount of information about that distribution, and it is a rather crude statistic. In addition, there are many confounding factors when judging journals by citations, and any comparison of journals requires caution when using impact factors. Using the impact factor alone to judge a journal is like using weight alone to judge a person's health.
- For papers, instead of relying on the actual count of citations to compare individual papers, people frequently substitute the impact factor of the journals in which the papers appear. They believe that higher impact factors must mean higher citation counts. But this is often not the case! This is a pervasive misuse of statistics that needs to be challenged whenever and wherever it occurs.
- For individual scientists, complete citation records can be difficult to compare. As a consequence, there have been attempts to find simple statistics that capture the full complexity of a scientist's citation record with a single number. The most notable of these is the h‐index, which seems to be gaining in popularity. But even a casual inspection of the h‐index and its variants shows that these are naïve attempts to understand complicated citation records. While they capture a small amount of information about the distribution of a scientist's citations, they lose crucial information that is essential for the assessment of research.
The validity of statistics such as the impact factor and h‐index is neither well understood nor well studied. The connection of these statistics with research quality is sometimes established on the basis of "experience." The justification for relying on them is that they are "readily available." The few studies of these statistics that were done focused narrowly on showing a correlation with some other measure of quality rather than on determining how one can best derive useful information from citation data.
We do not dismiss citation statistics as a tool for assessing the quality of research—citation data and statistics can provide some valuable information. We recognize that assessment must be practical, and for this reason easily‐derived citation statistics almost surely will be part of the process. But citation data provide only a limited and incomplete view of research quality, and the statistics derived from citation data are sometimes poorly understood and misused. Research is too important to measure its value with only a single coarse tool.
We hope those involved in assessment will read both the commentary and the details of this report in order to understand not only the limitations of citation statistics but also how better to use them. If we set high standards for the conduct of science, surely we should set equally high standards for assessing its quality.
Joint IMU/ICIAM/IMS‐Committee on Quantitative Assessment of Research
Robert Adler, Technion–Israel Institute of Technology
John Ewing (Chair), American Mathematical Society
Peter Taylor, University of Melbourne
Source And Full-Text Available At