There was an interesting PLoS ONE paper a couple months back that was enabled by Parity technology. The paper, “Estimates of the Continuously Publishing Core in the Scientific Workforce,“ by John P. A. Ioannidis and others, found that scientists who publish continuously (at least one paper per year) over an extended period of time have much more impact (in terms of number of citations and h-index) than those who don’t, even when controlling for the total number of publications. In other words, it’s not just how much you publish, but the fact that you publish continuously over time that is correlated with impact.
The key finding that has attracted attention is that while only 1% of authors have a so-called “uninterrupted, continuous presence” (UCP) over the 16 years examine, they account for 42% of papers and 87% of highly-cited (more than 1000 citations) papers in that time frame. The authors speculate on some of the causes and consequences of this skewed situation, and discuss how it varies by subject area, geographic location, and sector (academic, hospital, etc.).
Where Parity comes in is that the analysis is based on Elsevier’s Scopus database and relies on Scopus author identifiers to determine unique authors. And these author identifiers are created by Parity’s AP2 Author Profiling system. AP2 processes the entire Scopus corpus of articles and creates a profile for each distinct author, distinguishing between different authors having the same name or initials, and recognizing when a single author’s name has multiple variants (e.g. due to nicknames or different transliterations). Since the accuracy of these profiles is critical to the validity of the paper’s conclusions, the paper includes a small evaluation of the profile accuracy:
A total of 150,608 author identifiers fulfilled our definition of UCP authors. An in-depth evaluation of a random sample of 20 of these author identifiers showed that polysemy (the merging of two or more authors with the same name in the same record) had no major impact on this estimate: all 20 sampled identifiers reflected a specific author who had at least 1 paper published in each and every calendar year.
… An in-depth evaluation of a random sample of 20 author identifiers without UCP showed that 13 were clearly referring to unique authors who had not published any other Scopus-indexed papers; 2 clearly belonged to authors who had published also papers clustered under 1 or 2 other author identifiers but merging the different records of the same author would not create UCP and would not affect the citation h-index of the larger of the constituent records; and for 5 it was not possible to exclude split records and/or polysemy with perfect certainty, because the names were very common
In summary: there was 100% precision on 20 UCB (and hence good-sized) profiles. On 20 other authors, 13 had 100% recall, 2 had small recall errors which didn’t affect the analysis, and the rest had ambiguities in the data that made it difficult for even a human to judge.
We do our own accuracy measurements using various precision and recall metrics, and while we achieve high accuracy in our internal measurements, it is always helpful and encouraging to get an external confirmation like this.
Bookmark the permalink.