Bibliometrics: the numbers game

In mid-December, British universities, their constituent units and departments, and most academics experienced the same kind of traumatic day familiar to 18-year olds awaiting the examination results on which their advancement to higher education, or not, depended. December 18th, 2014, was REF-Day. Since its predecessor (RAE-Day), 8 years before, a vast – by university standards – effort went into preparing bids on a department-by-department basis to rank them nationally and conflate individual assessments to build a sort of institutional league table for research excellence; hence REF stands for Research Excellence Framework (the RAE was the less meritorious-sounding Research Assessment Exercise). It resembled the Guide Michelin or Automobile Association star system for restaurants and hotels or guest houses. The reason for the 8-year frenzy of activity was that the outcomes aimed to inform the selective allocation of governmental research funding. Unsurprisingly, this kind of competition stemmed from the Tory government of Margaret Thatcher, which in 1986 set the scene for ‘performance-related’ funding rather than that based on peer review of each individual bid for major grants, which preceded it.

To itemise each aspect of the way the REF worked could take the majority of Earth Pages readers to an early and ignoble grave. It centred on departmental selection from its full-time researchers of those who were deemed to be ‘research active’ and those who were not, the former having to select four recently published works or ‘outputs’. They had to self-assess each according to its ‘impact’, defined as ‘an effect on, change or benefit to the economy, society, culture, public policy or services, health, the environment or quality of life, beyond academia’. Institutions vetted and bundled individual submissions, collated them in the subject areas designated by the REF, then sent them off to ‘REF Central’, where they were to be reviewed by subject-specialist panels that gave out the stars for each submitted item of work: **** = world-leading (30% were deemed to be); *** = internationally excellent (46%); ** = recognized internationally (20%); * = recognized nationally (3%); unclassified = below the standard of national recognition (1% – presumably those obviously lacking star quality were weeded out at institution level). There were more than 190 thousand ‘outputs’, which begs the questions; Were all of them read by at least one specialist panel member? Against what standards were they judged?

On average, each of the roughly 1000 panelists would have had to consider about 190 outputs in greater depth than a casual skim, or more if some were read by several panelists. Outputs were rated ‘in terms of their “originality, significance and rigour”, with reference to international research quality standards’, ‘the “reach and significance” of impacts on the economy, society and/or culture’ and the part they played in their department’s contribution to ‘the vitality and sustainability… of the wider discipline or research base’. On paper – and believe me, REF Central produced plenty of wordy PDFs of guidance – this level of scrutiny makes the adjective ‘daunting’ seem a bit of an understatement. Entering into this spirit of things in the gleeful manner of a Michelin or AA assessor does seem to me a bit hard to grasp. I wonder if the panels in reality just checked each submission for signs of an overly hubristic vision of self-worth.

To some extent, the issue of each output’s citation count or other bibliometric measure must at some stage have come into REF reckoning, and here is what spurred me to defy normal cautions about boredom as a contributor to general organ failure. Physicist Reinhard Werner of Leibniz University in Hanover, Germany believes that deciding on funding and hiring, or firing, needs to steer well-clear of impact factors, citations and other kinds of bibliometrics (Werner, R. 2015 The focus on bibliometrics makes papers less useful. Nature, v. 517, p. 245). Scientists cite other works for many reasons, some worthy and some less so. But it is rare that in doing so we express any opinion on the overall significance of the work that we choose to cite. Yet, conversely, a researcher can choose a field, phrase some findings and submit to such and such journal that will boost their citation frequency and impact. Just by writing about some mundane topic in a publicly accessible way, reviewing the work of lots of other people, or simply writing about this or that topic as observed or measured in an especially highly populous country where science is really booming does much the same thing. Werner makes a telling point, ‘When we believe that we will be judged by silly criteria, we will adapt and behave in silly ways’. Although he does not touch on the absurdities of the REF – why on Earth would he? – Werner comments on distortion of the job market, and peer-reviewed journals. He also pleas for a return to proper scrutiny of scientific merit and, I suspect, for cutting hubris off at the roots.