Generating and formatting authorship lists for multi-centre research projects can be a challenging data wrangling task. In the case of collaborative research projects, there can be thousands of collaborators across hundreds of sites with a variety of roles.
ORCID provides a persistent digital identifier (an ORCID iD) that each individual own and control, and that distinguishes them from every other researcher. This is free to register for and can be used to empower collaborators to specify how their name should appear in publications. When working with 1000s collaborators, this provides a simple route to ensure accuate display of names on an authorship list and can be simply extracted from the ORCID website using the ORCID.
ORCIDs follow a specific format of 16 characters in the format of
“XXXX-XXXX-XXXX-XXXX” (16 characters in groups of 4 and separated by a
dash). The extraction from the ORCID website will not work if not in
this format. However, we can use the orcid_valid()
function to
investigate whether the ORCIDs on record are valid or not to use.
data <- tibble::tibble(n = c(1:7),
orcid = c("0000-0001-6482-9086", "0000000250183066", "0000-0002-8738-4902",
"00O0-0002-8738-490X", "0000-0002-8738-490X", "0000-0002-8738-490", NA))
collaborator::orcid_valid(data, orcid = "orcid", reason = T) %>%
knitr::kable()
n
orcid
orcid_valid_yn
orcid_valid
orcid_valid_reason
orcid_check_present
orcid_check_length
orcid_check_format
orcid_check_sum
1
0000-0001-6482-9086
Yes
0000-0001-6482-9086
NA
Yes
Yes
Yes
Yes
2
0000000250183066
Yes
0000-0002-5018-3066
NA
Yes
Yes
Yes
Yes
3
0000-0002-8738-4902
Yes
0000-0002-8738-4902
NA
Yes
Yes
Yes
Yes
4
00O0-0002-8738-490X
No
NA
Not ORCID format, Failed checksum
Yes
Yes
No
No
5
0000-0002-8738-490X
No
NA
Failed checksum
Yes
Yes
Yes
No
6
0000-0002-8738-490
No
NA
Not 16 characters, Not ORCID format,
Failed checksum
Yes
No
No
No
7
NA
No
NA
Missing ORCID
No
No
No
No
This will output the same dataframe with the “orcid_valid” column appended with a correctly formatted orcid (if it is valid to use). All non-valid orcids will be listed as “NA”. If you want to investigate further, you can use the argument (“reason==T”) in the function to get additional columns:
“orcid_check_present”: A binary value if any value was provided in that row or not (e.g. when a “NA” value was supplied)
“orcid_check_length”: A binary value if the ORCID supplied is 16 characters or not (e.g. “0000-0002-8738-490” where a character has been missed)
“orcid_check_format”: A binary value if the ORCID supplied fits the correct format of either 16 numbers or 15 numbers with an X at the end (e.g. “00O0-0002-8738-490X” value supplied)
“orcid_check_sum”: ORCID uses an internal “checksum” to make sure not just any random set of 16 characters can be entered. This is a binary value if the ORCID supplied either passes or fails this “checksum” (e.g. “0000-0002-8738-490X” is indistinguishable from a valid ORCID, except it fails the checksum, so it had to have been entered incorrectly)
If any of the values above are “No”, then the ORCID is not valid and so cannot be used. The final column “orcid_valid_reason” summarises all the reasons why an ORCID is not valid so these can be addressed.
Now we know what ORCIDs are valid, lets extract the names of just these
using orcid_name()
. Names on ORCID are recorded in 2 ways:
“Your given and family names” (“orcid_name_first” and “orcid_name_last”).
“Your published name” (orcid_name_credit”): This is the full name displayed on ORCID, however this is not automatically separated into first name / last name.
Given this is recorded in 2 different ways, there can be discrepancies between the two methods (and why both are returned). It is recommended that “Your given and family names” is preferentially used since this avoids any confusion about first/middle vs last names for authorship lists (since the format required for authorship lists is often that given names are converted into initials).
data %>%
collaborator::orcid_valid(data, orcid = "orcid", reason = F) %>%
filter(is.na(orcid_valid)==F) %>%
orcid_name(orcid = "orcid_valid", reason = F) %>%
knitr::kable()
n
orcid
orcid_valid
orcid_name_first
orcid_name_last
orcid_name_credit
orcid_name_credit_first
1
0000-0001-6482-9086
0000-0001-6482-9086
Kenneth A
McLean
Kenneth A McLean
Kenneth A
2
0000000250183066
0000-0002-5018-3066
Ewen
Harrison
Ewen Harrison
Ewen
3
0000-0002-8738-4902
0000-0002-8738-4902
Riinu
Pius
Riinu Pius
Riinu
If you need to format the names of collaborators as initials, this can
be simply done using author_name()
. This will convert every name in
the “first_name” column into initials, which can be placed before or
after the last name. This is shown in the “author_name” column below.
data %>%
collaborator::orcid_valid(data, orcid = "orcid", reason = F) %>%
collaborator::orcid_name(orcid = "orcid_valid", reason = F) %>%
collaborator::author_name(first_name = "orcid_name_first", last_name = "orcid_name_last",position = "left", initial_max=3) %>%
dplyr::select(n:orcid_valid, orcid_name_first:orcid_name_last, author_name)%>%
knitr::kable()
n
orcid
orcid_valid
orcid_name_first
orcid_name_last
author_name
1
0000-0001-6482-9086
0000-0001-6482-9086
Kenneth A
McLean
KA McLean
2
0000000250183066
0000-0002-5018-3066
Ewen
Harrison
E Harrison
3
0000-0002-8738-4902
0000-0002-8738-4902
Riinu
Pius
R Pius
Once you have your final list of authors, the report_auth()
function
aims to simplify the process of generating the fully formatted
authorship list, with inbuilt flexibility in how these are presented.
In order for the report_auth()
function to operate as intended, we
must first create a dataframe of all authors/collaborators containing at
least 1 column: “name”.
Example dataframe (data_author
):
data_author <- collaborator::example_report_author
knitr::kable(head(data_author, n=10)) # Please note all names have been randomly generated
name
hospital
country
Almond S
hospital N
England
Andersen J
hospital E
Scotland
Ashton A
hospital L
England
Avila E
hospital C
Scotland
Ayala N
hospital Q
England
Barker S
hospital D
Scotland
Beech J
hospital N
England
Berry A
hospital A
Scotland
Bowen P
hospital P
England
Bradford J
hospital I
England
At it’s most basic, report_auth()
can produce a formatted list of a
column of names.
collaborator::report_auth(data_author) %>% # Please note all names have been randomly generated
knitr::kable(, col.names= "")
Almond S, Andersen J, Ashton A, Avila E,
Ayala N, Barker S, Beech J, Berry A, Bowen P, Bradford J, Cameron G,
Cantrell G, Carlson F, Carrillo J, Cervantes S, Chamberlain H, Chan K,
Chung L, Clifford K, Conley M, Cullen F, Dalby D, Dean O, Dodson D,
Downes A, Duffy L, Ellwood M, Erickson K, Fenton U, Ferry A, Finney R,
Flores R, Fox B, Francis F, Frazier U, Fuentes A, Galindo C, Gardiner F,
Gibbons H, Gould C, Halliday S, Hanna L, Hardy E, Herman K, Hicks A,
Hodge M, Holder C, Hollis S, Houston M, Huff J, Jensen L, Kane K, Kearns
U, Keenan L, Kent M, Knights G, Lees S, Lennon J, Livingston F, Mackie
L, Marks H, Michael P, Mooney A, Morin Y, Moses S, Mustafa W, Nicholson
L, Ochoa A, O’Doherty H, Olsen H, O’Neill L, Owens I, Paine R, Patrick
S, Petty O, Phillips A, Pitt A, Plant N, Prosser E, Randolph T, Richmond
S, Riddle C, Riggs M, Rojas E, Rossi P, Rowe P, Saunders R, Skinner I,
Smart F, Stokes P, Villa Z, Wall R, Wardle A, Werner R, Whitfield A,
Whitney M, William C, Woods B, Wynn J, Yang K.
These names can be further grouped by another column in the dataframe:
collaborator::report_auth(data_author, group = "hospital") %>% # Please note all names have been randomly generated
knitr::kable(col.names= "")
Berry A, Chan K, Gould C, Jensen L
(hospital A); Clifford K, Kearns U, Livingston F, Rojas E (hospital B);
Avila E, Cullen F, Hanna L, O’Neill L (hospital C); Barker S, Gibbons H,
Kent M (hospital D); Andersen J, Cameron G, Dodson D, Downes A, Erickson
K, Francis F, Lees S, Moses S, Saunders R (hospital E); Dalby D, Houston
M, Morin Y, Stokes P (hospital F); Chamberlain H, Fox B, Keenan L,
Mackie L, Plant N (hospital G); Galindo C, Michael P, Prosser E
(hospital H); Bradford J, Flores R, Mooney A, O’Doherty H, Werner R
(hospital I); Dean O, Fuentes A, Hardy E, Herman K, Ochoa A, Pitt A,
Skinner I, Wynn J (hospital J); Hicks A, Holder C, Phillips A, Richmond
S, Whitfield A (hospital K); Ashton A, William C (hospital L); Carlson F
(hospital M); Almond S, Beech J, Ferry A, Lennon J, Smart F (hospital
N); Halliday S, Riggs M, Rossi P, Wardle A, Whitney M (hospital O);
Bowen P, Carrillo J, Fenton U, Kane K, Knights G, Riddle C (hospital P);
Ayala N, Conley M, Ellwood M, Hollis S, Mustafa W, Olsen H, Wall R
(hospital Q); Finney R, Frazier U, Paine R, Patrick S, Petty O, Villa Z
(hospital R); Cantrell G, Huff J, Rowe P, Woods B, Yang K (hospital S);
Hodge M, Owens I (hospital T); Cervantes S, Marks H, Nicholson L
(hospital U); Chung L, Duffy L, Gardiner F, Randolph T (hospital
V).
Or can be subdivided by another column in the dataframe:
collaborator::report_auth(data_author, subdivision = "country") %>% # Please note all names have been randomly generated
knitr::kable(col.names= "")
England: Almond S, Ashton A, Ayala N,
Beech J, Bowen P, Bradford J, Carlson F, Carrillo J, Chamberlain H,
Conley M, Dalby D, Dean O, Ellwood M, Fenton U, Ferry A, Flores R, Fox
B, Fuentes A, Galindo C, Halliday S, Hardy E, Herman K, Hicks A, Holder
C, Hollis S, Houston M, Kane K, Keenan L, Knights G, Lennon J, Mackie L,
Michael P, Mooney A, Morin Y, Mustafa W, Ochoa A, O’Doherty H, Olsen H,
Phillips A, Pitt A, Plant N, Prosser E, Richmond S, Riddle C, Riggs M,
Rossi P, Skinner I, Smart F, Stokes P, Wall R, Wardle A, Werner R,
Whitfield A, Whitney M, William C, Wynn J.
Northern Ireland: Cervantes S, Chung L,
Duffy L, Gardiner F, Hodge M, Marks H, Nicholson L, Owens I, Randolph
T.
Scotland: Andersen J, Avila E, Barker S,
Berry A, Cameron G, Chan K, Clifford K, Cullen F, Dodson D, Downes A,
Erickson K, Francis F, Gibbons H, Gould C, Hanna L, Jensen L, Kearns U,
Kent M, Lees S, Livingston F, Moses S, O’Neill L, Rojas E, Saunders
R.
Wales: Cantrell G, Finney R, Frazier U,
Huff J, Paine R, Patrick S, Petty O, Rowe P, Villa Z, Woods B, Yang
K.
Or groups can be further subdivided (for example by region/country, or by role)
collaborator::report_auth(data_author,
group = "hospital",
subdivision = "country") %>% # Please note all names have been randomly generated
knitr::kable(col.names= "")
England: Dalby D, Houston M, Morin Y,
Stokes P (hospital F); Chamberlain H, Fox B, Keenan L, Mackie L, Plant N
(hospital G); Galindo C, Michael P, Prosser E (hospital H); Bradford J,
Flores R, Mooney A, O’Doherty H, Werner R (hospital I); Dean O, Fuentes
A, Hardy E, Herman K, Ochoa A, Pitt A, Skinner I, Wynn J (hospital J);
Hicks A, Holder C, Phillips A, Richmond S, Whitfield A (hospital K);
Ashton A, William C (hospital L); Carlson F (hospital M); Almond S,
Beech J, Ferry A, Lennon J, Smart F (hospital N); Halliday S, Riggs M,
Rossi P, Wardle A, Whitney M (hospital O); Bowen P, Carrillo J, Fenton
U, Kane K, Knights G, Riddle C (hospital P); Ayala N, Conley M, Ellwood
M, Hollis S, Mustafa W, Olsen H, Wall R (hospital Q).
Northern Ireland: Hodge M, Owens I
(hospital T); Cervantes S, Marks H, Nicholson L (hospital U); Chung L,
Duffy L, Gardiner F, Randolph T (hospital V).
Scotland: Berry A, Chan K, Gould C, Jensen
L (hospital A); Clifford K, Kearns U, Livingston F, Rojas E (hospital
B); Avila E, Cullen F, Hanna L, O’Neill L (hospital C); Barker S,
Gibbons H, Kent M (hospital D); Andersen J, Cameron G, Dodson D, Downes
A, Erickson K, Francis F, Lees S, Moses S, Saunders R (hospital E).
Wales: Finney R, Frazier U, Paine R,
Patrick S, Petty O, Villa Z (hospital R); Cantrell G, Huff J, Rowe P,
Woods B, Yang K (hospital S).
Clear and consistent formatting of authorship lists allows the
contributions and affiliations of each collaborator/author to be
represented. Within report_auth()
, names are usually separated by a
comma (“,”), with groups separated by a semicolon (“;”). Furthermore the
name of groups are separated by round brackets (“()”). However, there is
a degree of inbuilt flexibility to facilitate customisation.
Below if for demonstration of this concept (not intented to reflect how these should be formatted!)
collaborator::report_auth(data_author, group="hospital", subdivision = "country",
name_sep = " +", group_brachet = "[]",group_sep = " --- ") %>% # Please note all names have been randomly generated
knitr::kable(col.names= "")
England: Dalby D +Houston M +Morin Y
+Stokes P [hospital F] — Chamberlain H +Fox B +Keenan L +Mackie L +Plant
N [hospital G] — Galindo C +Michael P +Prosser E [hospital H] — Bradford
J +Flores R +Mooney A +O’Doherty H +Werner R [hospital I] — Dean O
+Fuentes A +Hardy E +Herman K +Ochoa A +Pitt A +Skinner I +Wynn J
[hospital J] — Hicks A +Holder C +Phillips A +Richmond S +Whitfield A
[hospital K] — Ashton A +William C [hospital L] — Carlson F [hospital M]
— Almond S +Beech J +Ferry A +Lennon J +Smart F [hospital N] — Halliday
S +Riggs M +Rossi P +Wardle A +Whitney M [hospital O] — Bowen P
+Carrillo J +Fenton U +Kane K +Knights G +Riddle C [hospital P] — Ayala
N +Conley M +Ellwood M +Hollis S +Mustafa W +Olsen H +Wall R [hospital
Q].
Northern Ireland: Hodge M +Owens I
[hospital T] — Cervantes S +Marks H +Nicholson L [hospital U] — Chung L
+Duffy L +Gardiner F +Randolph T [hospital V].
Scotland: Berry A +Chan K +Gould C +Jensen
L [hospital A] — Clifford K +Kearns U +Livingston F +Rojas E [hospital
B] — Avila E +Cullen F +Hanna L +O’Neill L [hospital C] — Barker S
+Gibbons H +Kent M [hospital D] — Andersen J +Cameron G +Dodson D
+Downes A +Erickson K +Francis F +Lees S +Moses S +Saunders R [hospital
E].
Wales: Finney R +Frazier U +Paine R
+Patrick S +Petty O +Villa Z [hospital R] — Cantrell G +Huff J +Rowe P
+Woods B +Yang K [hospital S].
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.