library(reconqdata) library(dplyr) knitr::opts_chunk$set( echo = FALSE, error = FALSE, warning = FALSE )
Scientometric studies report steadily increasing trend in multi-authored scientific publications. It is clearly an evidence that contemporary science requires cooperation and is not anymore a traditionally individualistic activity [@moody2004structure]. The presented dataset comes from a study in which our overarching research goal was to understand why some scientist collaborate but some others do not. In particular, our approach was to think about incentives that might lead them to do so. Inspired by @coleman1994foundations and, among others, @lewis2012and we assume that the incentives to collaborate come from academically-relevant resources the scientists possess or control and the interests they might have in resources posessed or controlled by others. For example, a theorist and an experimentalist might be interested in each other's resources -- ability to develop theoretical model of the studied problem and skills in conducting experiments respecively. Unequal distribution of these resources across academic community and the necessity of pooling them to get ahead in contemporary science result in incentives to collaborate.
Current state of knowledge still lacks a universally accepted behavioral understanding of the scientific process, let alone standardized tools for measuring academically-relevant resources. Hence we conducted a qualitative study among Polish scientists with the goal to:
The data we hereby share is based on transcriptions and coding of the originally qualitative material. The study involved 40 interviews conducted on a sample of Polish scientists which we describe further in Section \@ref(sample). In Section \@ref(measurement) we describe the way in which the inventory of resources was constructed. A complete list with example quotes is provided at the website.
Data comes from r sum(nodes$is_ego) Individual in-Depth Interviews (IDI) conducted between April and August 2016 by two interviewers. The quota sample consists of 20 female and 20 male scientists from six Polish cities. Respondents represented a broad range of disciplines: natural sciences, social sciences, life sciences, the humanities, engineering, and technology on different levels of career from PhD candidates to professors. The interviewees mentioned r sum(!nodes$is_ego) collaborators in total. Interviews lasting between 24 and 90 minutes were recorded and later transcribed.
Each interview consisted of several parts three of which are of relevance here:
While collaboration networks assembled from part (2) include alter-alter ties, the data on resources from part (3) was acquired for ego-alter dyads only.
knitr::include_graphics("cork.jpg")
The data is contained in three inter-related tables diagramatically presented in Figure \@ref(fig:data-model). Below we describe each table in detail.
fname <- "data-model.png" suppressPackageStartupMessages(plot_data_model()) %>% datamodelr::dm_render_graph(width=800) %>% DiagrammeRsvg::export_svg() %>% cat(file="dm.svg") if(!file.exists(fname)) { dir.create(dirname(fname), recursive = TRUE) file.create(fname) } # Write a PNG magick::image_read_svg("dm.svg") %>% magick::image_convert(format="png") %>% magick::image_write(path=fname) unlink("dm.svg")
knitr::include_graphics(fname)
The table nodes contains information about every person in the study -- all egos and all alters. It has r nrow(nodes) rows and the following r ncol(nodes) variables:
id_interview -- Interview identification number.id_node -- Person identification number, unique within each interview.is_ego -- Binary variable equal to 1 if person is the ego (respondent), 0 otherwise.is_polish -- Binary variable equal to 1 if person is affiliated with a Polish academic institution, 0 otherwise.department -- Marking scientists if they work at the same department. If department is not missing then all scientist within the same interview sharing the same value of department work at the same department at the same university.scidegree -- Scientific degree of the scientist. One of "mgr"=MA, "dr"=PhD, "drhab"=habilitated doctor, or "prof"=full professor.female -- Binary variable equal to 1 if person is female, 0 if male.Pair of variables id_interview and id_node together constitute a key uniquelly identifying each row in the nodes table.
The table collaboration is essentially an edge list of collaboration ties. It has r nrow(collaboration) rows and the following r ncol(collaboration) variables:
id_interview -- Interview identification number.from and two -- Person identification numbers referencing the id_node variable from the nodes table.In other words a row consisting of values, say, id_interview=1, from=2, to=3 indicates that researchers 2 and 3 where reported as collaborating in the interview 1.
Data about resources engaged by respondents (egos) and their collaborators (alters) to every collaboration was coded based on transcripts. The data is provided in table resources having
r nrow(resources) rows and the following r ncol(resources) columns:
id_interview -- Interview identification number.from and two -- Person identification numbers (within each interview) referencing the id_node variable from the nodes table.code -- A textual code identifying type of resource contributed by researcher from into the collaboration with researcher to.Resources engaged in collaborations (variable code) were coded with a coding scheme covering different elements of a research process in different disciplines. The scheme consists of r nrow(reconqdata:::codes) codes such as:
Complete list of codes together with examples of coded interview fragments is presented in Appendix \@ref(resource-inventory).
Descriptives: freqs of female x scidegree x ego/alter:
nodes %>% count(female, scidegree, is_ego) %>% tidyr::spread(is_ego, n) %>% knitr::kable( caption = "Frequencies of gender and scientific degree." )
Collaboration + resource nets from one of the interviews:
The data is available in a GitHub repository at https://github.com/recon-icm/reconqdata as an R package with accessible files in a CSV format. Users can use the data with R by installing the package or download the data files in CSV format using URLs provided in the README file.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.