Wednesday 16 January 2013

The r-index

Everyone within academia knows what the h-index is. For those who don't, it's a metric for how productive you are as an academic. The h-index itself is h number of papers with at least h citations. It's something that can easily be looked up via an academic's Google Scholar profile, Scopus, Web of Knowledge, and maybe others I don't know about. A higher h-index is supposed to be correlated to how productive you are as a scientist (number of papers published) and how meaningful your contribution to your field is (how often people cite your work). Universities love it. Human resources departments love it. Governments love it. Finally, it is possible to sum up the entirety of someone's value as a scientist with a simple integer value!

Or is it?

There are many problems that mask a person's true contribution to a field if one looks at the h-index alone. For instance, someone who has few but extremely highly cited (= important) papers will have a low h-index, even if they work they did was very far-reaching. The system is also easily gamed. For instance, self-citations are usually included in an h-index calculation, so you can cite yourself all the way to the bank. This guy probably takes the cake (he is a real person - I checked). If one defines another index, say w, where w is the number of papers where you have cited yourself at least w times, this could be particularly revealing of your citation habits. I have heard this dubbed the w-index (where w stands for "wanker", rather than Wu). There are also groups of people who regularly cite each other, thereby avoiding raising their w-index, but gaming the system nonetheless.

Ever since the h-index became a Thing, people have been getting a little crazy about it, and it's being used as a be all and end all metric (much like impact factor). Most of the alternatives I have seen proposed just seek to normalize the number of citations against some other quantity. I'm not convinced that any of these actually provide a measure of how much real impact, or reach, a person's research has.

My husband and I were talking about this over dinner, and he came up with a good idea: why not count the number of unique citations? I think this is quite promising. It completely eliminates double-ups (legitimately citing the same paper in multiple publications), self-citations, and gratuitous citations.

The r-index would be defined as follows: the number of unique citations that a given author has, where unique implies that no two citations share a common author, including the author being cited. The r-index could also be applied to individual publications.

If you have a high r-index, then it indicates that your research has reach within and possibly beyond your own sub-field. It likely punishes against papers with more authors, but perhaps this could be a handy tool against gratuitous authorship. I hope, though, that it wouldn't push deserving authors out of a publication, but I think this is highly unlikely.

What I really would like is to look at some real life examples of scientists at various stages of their careers and see if this is a reasonable measure, and what the r-index of an average academic is. I calculated it for myself. It was easy. I am only a fledgeling! When I tried calculating it for my supervisor, I immediately gave up. This cannot be done manually (unless you have a lot of time on your hands). Some data extraction from a citation database is required here! Stay tuned!