The ManyLabs2 data analysis strategy attempts to regard three principles that maximize research transparency:
Principle of Equality: All data should be treated equally by a code. That is, the code should do its job generating results while at the same time being as naive as possible to the particular facts of the study being analysed. This will reduce any chances of bias with respect to the outcomes of a certain dataset or a particular study. If it is necessary to add study specific code, the second principle should be regarded.
Principle of Transparency: All operations that are crucial for obtaining an analysis result should be available for inspection by anyone who wishes to do so. This should be possible without the help of the auhtors that generated the code. The operations concern the application of data filtering rules, computation of variables derived from original measurements, running an analysis and constructing graphs, tables and figures. If full transparency is not possible, the third principle should be regarded.
Principle of Reproducibility: The most basic requirement for analysis results is that they should be reproducable given the original code and the original data set. However, any new implementation of the same analysis strategy in a different context, or application of the code to a different dataset, e.g. a replication study, should not be problematic. That is, outcomes may differ between data sets, but this should not be attributable to any details of the code or the analysis strategy.
Ras a parser of online code.
The pre-registered Manylabs2 protocol describes a number of analyses per replication study that can be categorised as Primary (target replications per site), Secondary (additional analyses per site, e.g. on subgroups), and Global (analyses on the entire dataset).
These promised analyses have all been implemented in
R in a transparent way and this implementation is now ready for an independent review.
Each row in the table represents an analysis, the columns contain specific information about the analysis:
Columns A through E are identifiers for study, analysis and slate.
Column F and G contain
R commands which will extract and label the columnms from the dataset needed for the analysis.
Column H and I contain filter instructions for cases and subsamnples.
Columns J through L contain information about the nature of the analysis (Global, Primary Secondary).
Columns M through O contain information for study inclusion in figures and tables
Column P lists the name of a analysis specific variable function (
varfun.) which in most cases just reorganises the variables specified in previous columns so they can be passed to the analysis code. In some cases these function perform specific calculations required by the original analyses.
Columns Q through V contain information about the statistiscal tests
R package manylabRs contains the
R functions that can read the information from this sheet and conduct analyses on the data.
Several ways to install the package.
Use the code below to install the
manylabRs package directly from GitHub.
library(devtools) install_github("ManyLabsOpenScience/manylabRs") library(manylabRs)
First download the tarball, then install the package locally through the RStudio package installer:
The main function to inspect is
It will take one or more take analysis (
studies) from the
masteRkey sheet and an indication of whether the analysis is:
global - will disregard the clusters in the data and use all valid caes for analyses, both
secondary analyses have a
primary- target analysis of replication study conducted for each lab seperately.
secondary - additional analyses for each lab conducted for each lab seperately.
order - presentation order analyses disregard the clusters int he data, each order is analysed seperately
Have a look at
saveConsole.Rwhich calls the
testScript()function and creates a log file with lots of info about the analysis steps.
IMPORTANT FOR REVIEWERS
You will have to point the function
get.analyses() to where you downloaded these files:
The script will assume the data are in a subdirectory of a rootdirectory given by the arguments:
rootdir = "YOUR PATHNAME TO DATAFILES ROOTDIR" indir = list(RAW.DATA = "DIRECTORY IN YOUR rootdir CONTAINING DATAFILES", MASTERKEY = "", SOURCEINFO = "")
MyRootDir <- "~/Documents/GitHub/manylabRs/" MyRawDataDir <- "random3rd" data.names <- list(Slate1 = "ML2.Slate1.Random3rdDE.rds", Slate2 = "ML2.Slate2.Random3rdDE.rds")
The example below runs a global analysis for
df <- get.analyses(studies = 1, analysis.type = 1, rootdir = MyRootDir, data.names = data.names, indir = list(RAW.DATA = MyRawDataDir, MASTERKEY = "", SOURCEINFO = ""))
df contains two named lists:^[the names correspond to the analysis name in the
This list contains dataframes with the relevant variables for each analysis, but before the analysis specific variable functions (
vafun) are applied. There is a Boolean variable
case.include which indicates whther a case is valid and should be included for analysis.
The dataframe in
aggregated contains the data as is was analysed, after the
varfun is applied.
The output contains descriptives and sample summary characteristics and a variety of Effect Size measures. It also contains the console output of the statistical test that was conducted:
The code of the function
get.analyses() is not very readable because it include a lot of error-checking, error-reporting, conditional statements and... well... because we are not professional software engineers, but scientist doing the best we can.
The function was created to be able to handle a batch of many different analyses in one go and save the results many different files. Here, we will skip the error-checks and file-saving and focus on the four major steps by taking the first analysis in the
Huang.1, and analysing the data from the
The four steps that are be applied to all analyses in ManyLabs 2 are:
The first step is always to use the
masteRkey spreadsheet to gather all the information needed to:
Extract the appropriate data (variables and cases) for this analysis and
Conduct the appropriate analyis on the extracted data
# NOTE: This example follows some (but not all!!) steps of the main function: get.analyses() # Select analysis 1 [Huang.1] and source 'brasilia' runningAnalysis <- 1 runningGroup <- 'brasilia' # Get information about the analysis to run masteRkeyInfo <- get.GoogleSheet(data='ML2masteRkey')$df[runningAnalysis,] # Get the appropriate 'raw' dataset [Slate1 or Slate2] ifelse(masteRkeyInfo$study.slate == 1, data(ML2_S1),data(ML2_S2)) # Organise the information into a list object analysisInfo <- get.info(masteRkeyInfo, colnames(ML2.S1), subset="all") # Use analysisInfo to generate a cahin of filter instructions to select valid variables and cases filterChain <- get.chain(analysisInfo)
Let's have a look at the filterChain object:
It contains two fields:
dplyr command for selecting the appropriate variables and (if applicable) filtering on
source characteristics indicated by the column
masteRkey$source.include (e.g. whether the data were collected on-line). For the present analysis we do not have to filter on
$vars selecting valid rows.
# Apply the filterChain to select aprropriate variables from ML2.S1 df.raw <- eval(parse(text=paste("ML2.S1", filterChain$df))) # Apply the filterChain to generate a list object that represents the design cells df.split <- get.sourceData(filterChain, df.raw[df.raw$source%in%runningGroup,], analysisInfo) # Create a list object with data vectors and appropriate labels, that can be passed to the analysis function vars <- eval(parse(text=paste0(masteRkeyInfo$stat.vars,'(df.split)',collapse="")))
# Get the parameters to use for the statistical analysis stat.params <<- analysisInfo$stat.params # Run the analysis listed in masteRkey column 'stat.test' usinf the data vectors in 'vars' stat.test <- with(vars, eval(parse(text = masteRkeyInfo$stat.test)))
# Return descriptives and summaries describe <- get.descriptives(stat.test = stat.test, vars = vars, keytable = masteRkeyInfo) # Generate output ESCI <- generateOutput(describe = describe, runningGroup = runningGroup, runningAnalysis = runningAnalysis)
masteRkeyon the analyses to run:
stat.testto the data
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.