During my work for the KBOPENBIB project, I came across the following inconsistency in OpenAlex funding data which might be of interest to the wider community: the number of works per funder doesn’t always match up between works and funders object.
How to get the relevant data from OpenAlex’s API using openalexR
# query list of all funders, including works_count fieldfunders <-oa_request(query_url ="https://api.openalex.org/funders")df <-oa2df(funders, entity ="funders")# query works list grouped by fundersdf2 <-oa_fetch(entity ="works", group_by ="grants.funder")
You would expect that the number in the works_count field of the funder object to match the count you get by counting unique works ids per funder id in the works table.
Agreement
Number of funders
p
❌
8025
24.74%
✔️
24412
75.26%
Table 1: Agreement of funded publications per funder between works_count and “manual” count.
For roughly a quarter of funders the publication counts do not match up. The mean difference between works_count and manual count is 10.26, so works_count field is missing 10 publications on average. However, if we remove the National Natural Science Foundation of China from the data — which has a whopping number of 343744 missing publications — the mean drops down to -0.34.
Looking at funders with diverging publication counts in Table 2 we see that for a majority of funders the difference is only one publication.
I hope that this problem description contributes to the continuous improvement of OpenAlex. Until this inconsistency is addressed, I recommend to “manually” count work ids per funder instead of the works_count field.