knitr::opts_chunk$set(fig.pos = 'H')
\newpage \tableofcontents
\newpage
Research can be a highly competitive environment when it comes to funding or enhancing ones reputation in the academic field. It is plausible that researchers are under pressure and have an incentive to only submit results which favor them and are more likely to be published in famous journals. The journal editors themselves may also deal with an incentive to publish more favorable findings because they have to compete with other journals over the amount of citations and the journals have to be financed as well. In combination, this may lead to the so called file drawer problem (Rosenthal 1979), selective publication or publication bias. This bias does not only occur in the field of economics. It is a wide spread problem which also concerns life sciences, medicine and health care research and other social sciences. Depending on the field of interest, holding back unfavorable results can lead to the distortion of the literature, may have consequences for the health of citizens, using scarce resources for wrong investments or have a negative impact on research and teaching practices. Fanelli (2011) shows in a study looking at 4600 papers published between 1990 and 2007 that negative results are disappearing from being published and his research suggests an increase in publication bias over these years. According to him, the share of papers finding significant support for their initial hypotheses increased by 22%. By the ethic code of researchers and journal editors, they are bound to publish both favorable as well as unfavorable results in a non selective manner. Therefore, one might argue that this decrease in unfavorable results is just the outcome of getting better at formulating hypotheses. The competition between researchers could potentially lead to the exclusion of poorly formulated hypotheses and experiments which ultimately leads to an increase in positive results (Joober et al, 2012). In contrast, replicating existing results often leads to completely different estimates with smaller magnitudes or even opposite signs which would no support this hypothesis. The publication bias is most certainly a topic which needs to be addressed. There are several methods to tackle the problem. The documented R package implements the method documented by Andrews and Kasy (2019).
Instead of going into detail about the theory and statistics Andrews and Kasy (2019) use in their paper, I will use an illustrative example they provide to show how a publication bias arises, how it can be identified and ultimately, how one can correct for it.
To get a feeling for how a publication bias can arise, we look at the behavior of journals which are responsible for publishing the papers they are provided with. Journals usually receive a lot of papers with different outcomes and they need to decide which papers will be published. Let us assume that the reported estimates $X$ are normally distributed i.e. $X \sim N(Θ, Σ^2)$ where $Θ$ are the true treatment effects, and $Σ$ are their corresponding standard errors. $Z$ denotes the standardized estimates given by $X/\Sigma$. Additionally, each study examines different treatments.
In this example, the journal publishes significant results at the 5% significance level with probability 1 ($p(z) = 1$) and insignificant results with probability 0.2 ($p(Z) = 0.2$, with $Z ∈ [-1.96, 1.96]$). This indicates a clear preference for publishing significant results. Namely, significant result are 5 times more likely to be published than insignificant results. Because of this behavior, it is clear that the published results $X$ over-estimate the magnitude of the true treatment effect $Θ$. The same holds for the confidence bounds. Small values are underestimated, while large values are overestimated. Keep in mind, selective publication may not only arise from the significance of the estimates, but could rather depend on the their signs. This behavior is currently not implemented in the package at hand. Additionally, publication decisions are based on both the researchers as well as the journals decision. Andrews and Kasy do not try to separate those two decision dimensions but rather look at them as one decision only. Therefore, the above example also holds for the researchers or even for the journals and the researchers at the same time.
This chapter summarizes the methods used to identify and correct for the publication bias following the approach by Andrews and Kasy (2019). In their paper, they differ between two approaches of identification. The first approach uses replication studies while the second one uses meta studies to determine the publication probability. Given a certain publication probability, they propose a method to construct the median unbiased estimators (i.e. the corrected estimates) and confidence bounds.
The procedure of identifying the publication probability differs between the two approaches. A replication study uses the same methods as the original study and applies them to a new sample from the same population. While using the package, we need the original estimates as well as the replication estimates to conduct the given analysis. Additionally, it is assumed that the publication probability only depends on the original estimates.
On the other hand, a meta study is a study which collects multiple estimates from studies which cover more or less the same topic. One crucial assumption regarding meta studies is the independence between the different estimates. In contrast to the replication studies, meta studies only have one set of estimates as input for the estimation of the publication probability.
To illustrate the identification approach using replication studies, the illustrative example from the beginning is continued. Remember, we assumed that $p(Z) = 1$ when $|Z| > 1.96$ , and $p(Z) = 0.2$ otherwise. A further simplification we have to take into account is that the original as well as the replication estimates both have the same variance of 1 ($ΣRep = Σ = 1$). Figure 1 shows how identification using replication studies is possible. The left panel shows 100 random draws $(Z, ZRep)$, where $ZRep$ denotes the standardized replication estimates (i.e. $XRep/\Sigma Rep$). Draws which are insignificant on the 5% level are marked as light blue $(|Z| ≤ 1.96)$ whereas draws which are significant are drawn in a darker blue $(|Z| > 1.96)$.
The right panel shows the draws $(Z, ZRep)$ which are published. These are exactly the same draws as before but now 80% of the statistically insignificant results are deleted because the journal only publishes 20% of the statistically insignificant results. That is why a lot of the light blue points vanish. To proof that some selective publication exists, we can take a closer look at the regions $A$ and $B$ in the right panel of Figure 1. In $A$, we deal with the situation that $Z$ is statistically significant while $ZRep$ is not. Obviously, we deal with the exact opposite in region $B$. At this point we have to mention that we assumed symmetry in the data generating process at the beginning. Because of this assumption, $Z$ or $ZRep$ should fall in either region with the same probability. Due to the fact that we clearly observe more data points in region A than in region B, we have evidence for selective publication. Additionally, the deviation from symmetry ($Z = ZRep$) exactly identifies the publication probability $p(.)$.
knitr::include_graphics("scatter_illu_rep.pdf")
Using the same set up as in the identification approach using replication studies and adding the crucial assumption that $Σ$ is independent of $Θ$ for true effects, we can take a closer look at Figure 2 which shows the identification approach using meta studies. Instead of dealing with original and replication estimates and having them on the X-and Y-axis, we now have the original estimates $X$ on the X-axis and the corresponding standard errors $Σ$ on the Y-axis. As before, the left panel shows 100 random draws $(X, Σ)$. Draws which are insignificant on the 5% level are marked as light blue $(|X/Σ| ≤ 1.96)$ whereas draws which are significant are drawn in a darker blue $(|X/Σ| > 1.96)$. The right panel shows the draws $(X,Σ)$ which are published. Again, these are exactly the same draws as in the left panel but now 80% of the statistically insignificant results are deleted because the journal only publishes 20% of the statistically insignificant results. Instead of looking at two regions as in the case of replication studies, we now take a closer look at two different values of $Σ$ in the right panel. These two values are at $Σ = 0.5$ and $Σ = 1.25$, highlighted by $A$ and $B$. Since we have assumed independence between $Σ$ and $Θ$, the distribution of $X$ for latent studies given greater values of $Σ$ is an inflated version of the same distribution for smaller values of $Σ$. Thus, to the degree that the same is not true for the distribution of the published estimates $X$ given $Σ$, this has to be due to selectivity in the publication process. In this example, statistically insignificant observations are absent for greater values of $Σ$. Since the likelihood of publication is higher if the result is significant, the estimated values $X$ tend to be larger on average for greater values of $Σ$. Therefore, we expect some positive correlation between the two. Considering the conditional distribution of $X$ given $Σ$ changes with $Σ$, we are again able to identify the publication probability.
knitr::include_graphics("scatter_illu_meta.pdf")
The maximum likelihood estimation is the basic estimation procedure reported by Andrews and Kasy (2019). The econometrics behind identifying the publication probability as well as its estimation is omitted in this paper as it is an extensive procedure. If one is interested, further explanations can be found in the paper and mainly in the online appendix by Andrews ans Kasy (2019). Because this approach heavily relies on the parametric assumptions on the distribution of the true treatment effects $Θ$, they provide a generalized method of moments estimator in their online appendix. The developed package always assumes a gamma distribution with a shape and scale parameter as the distribution for the true effects. The advantage of GMM is, that these parametric assumptions can be dropped and one only has to assume some functional form for the publication probability. They do this to show some robustness of their results. Meaning, that even without assuming parametric specifications, similar results can be obtained. Their result confirm this. Although moment-based methods lead to less precise results, their main findings still hold. Because it seems that the GMM estimation is a good alternative to the likelihood estimation, both approaches were implemented in the package. For further information on their method of moments approach, please check the online appendix in Andrews and Kasy (2019).
After computing the publication probability, Andrews and Kasy (2019) show how one can derive corrected estimators and confidence sets. Putting it as simple as possible, the selective publication weights the distribution of $Z$ by the publication probability. To obtain the corrected estimates and confidence sets, we only need to correct for this weighting.
In their approach, they define the distribution function for published standardized results $Z$ given their true effect. By using procedures applied in Andrews (1993) and Stock and Watson (1998), and inverting the distribution function, they are able to calculate the median unbiased estimator (i.e. corrected estimates) as well as the equal-tailed confidence sets which fully correct for the bias produced by the selective publication. To get an intuition for their result, the illustrative example is continued. Comparing the median unbiased estimator with the original and uncorrected estimates (grey line) in Figure 3, we can clearly see that the median unbiased estimator lies below the original estimator, given $Z$ is close to the cutoff 1.96. The cutoffs are shown with the dotted grey lines. With an increasing (decreasing) $Z$, we observe that the median unbiased estimator converges to the original estimator. The same observation holds for the confidence intervals. In the example in Figure 3, the corrected estimates are nearly equal the original estimates if $Z > |5|$.
knitr::include_graphics("correction_illu.pdf")
The package is currently available on GitHub. Running the code below will directly install the package with all its dependencies on your local machine.
# Installing the package directly from github devtools::install_github("t-sager/pubias") # If the package vignette should be downloaded as well devtools::install_github("t-sager/pubias", build_vignettes = TRUE) # Check vignette utils::vignette("pubias") # Load package into library library(pubias)
There are two main functions in the package. The first pubias_meta() leads to the identification of and correction for the publication bias in meta studies while pubias_replication() does the same for replication studies. The most simple syntax to get a result is the following:
# Replication Studies pubias_replication(data) # Meta Studies pubias_meta(data)
Additionally, there are four functions which are, depending on their specification, called by the two functions above. The four functions are as follows: mle_meta(), mle_replication(), gmm_meta() and gmm_replication(). For example: If the base specification for the meta studies pubias_meta(data, studynames) is used, then the function mle_meta() is called to calculate the publication probability by MLE. On the other hand, if one executes pubias_meta(data, studynames, GMM = TRUE) the function gmm_meta() will be called.
Because the input for these functions heavily depends on the input from the two main functions, I will not go into further detail about the arguments of these functions. One can also take a closer look at the built in helpfile with ?mle_meta() etc.
We have seen that the function works even without specifying more than the data containing the estimates and the standard errors. Now we will take a closer look at the different arguments which one can enter into the function. Keep in mind: the arguments for both functions are the same. The only difference is, that in the case of replication studies, the data used has additional columns because we also have to account for the replication estimates.
To check the different arguments, one can also take a closer look at the helpfiles for the two functions with ?pubias_meta() or ?pubias_replication().
The following specifications are currently possible:
pubias_meta( data, studynames, symmetric = TRUE, sign_lvl = 0.05, GMM = FALSE, print_plots = FALSE, print_dashboard = FALSE )
data: In the case of meta studies, the data consists of a n x 3 matrix where the first column contains the original estimates, the second column the associated standard errors and in the third column a cluster ID going from 1 to k, where k is the number of studies used in the meta-study. Keep in mind, k is not necessarily equal to the number of estimates because there could be several estimates coming from one single study.
If the function for the replication studies is used, the data needs to consist of a n x 4 matrix where the first (third) column contains the standardized original estimates (replication estimates) and the second (fourth) column the associated standard errors, where n is the number of estimates.
studynames: This argument is optional. Ideally a vector of type character containing all the Studynames of size n in the same order as the data argument.
symmetric: If set to TRUE, the publication probability is assumed to be symmetric around zero (default). If set to FALSE, asymmetry is allowed.
sign_lvl: A value indicating the significance level at which the analysis should be done. Ultimately leads to the threshold (z-score) for the steps of the publication probability. By default, the significance level is set at 5%, hence 0.05. Before setting the significance level, one should take a look at the distribution of the normalized estimates to determine a reasonable pick for the significance level. Because the 5% level is the most used to determine significant results, setting it at 5% will work in most cases.
GMM: If set to TRUE, the publication probability will be estimated via GMM. By default, it is set to FALSE which uses the maximum likelihood method for estimation.
print_plots: If set to TRUE, descriptive plots as well as correction plots are printed into the working directory in .pdf format.
print_dashboard: If set to TRUE, additionally to the .pdf plots, a dashboard with the same charts in a dynamic format (plotly charts) is produced. The dashboard is saved in the working directory. Only possible if print_plots is set to TRUE.
The output of the functions is pretty simple. They return a list object called pubias_results. If only the base specification is run, the list contains two elements:
Results: This element contains the three results Psihat which is the estimated publication probability, Varhat its variance and se_robust containing the robust standard errors for the publication probability. For interpretation: If the function yields Psihat = 0.25, the publication of insignificant results is only 25% as likely as the publication of significant results. Or put differently, the publication of a significant result is 4 times more likely than an insignificant result (at the given significance level). But if Psihat = 1, we deal with a situation where we see no evidence for a publication bias. Significant results are just as likely to be published as insignificant results.
Corrected Estimates: This element reports the result from the publication bias correction. First, it contains the original estimates and the median unbiased estimates (original, adj_estimates) as well as the corrected 95% confidence bounds (adj_L, adj_U). In addition, the Bonferroni corrected 95% confidence bounds are reported (adj_LB, adj_UB). If you want to lean more about the Bonferroni correction, please check Shinichi (2004).
If the functions are run with the option to print the plots and the dashboard, the result list also contains the ggplot objects for the descriptive as well as the correction plots. The plots will additionally be printed in .pdf format into the current working directory. If specified, a HTML-Dashboard with dynamic plots will be exported as well. One can also look at the plots by calling the following lines of code.
# Display the scatterplot pubias_result$`Descriptive Plots`$scatter # Display the correction plot pubias_result$`Correction Plots`$CorrectionPlot
To further illustrate the functionality of the package, we apply it to a replication study as well as a meta analysis. These are the same applications used by Andrews and Kasy (2019).
This application uses data from a study by Camerer et al. (2016). They replicate 18 economic laboratory experiment papers which were published between 2011 and 2014. The data is well suited for this application because it replicates estimates from papers which come from the same area of research and are all published in well known journals. Additionally, it seems likely that Camerer et al. would have published their results regardless of their significance. This is consistent with the assumption stated at the beginning: publication selection only arises based on the original estimates and not on the replication estimates. Ultimately, the dataset includes the original and replication estimates with their associated standard errors. This leads to a matrix with the dimensions 18 by 4.
Because the datasets for the applications are part of the package itself, it is really easy to replicate the results. The code below describes how the package is used. Of course, one can load other estimates and studynames via read.csv() or similar functions.
# Import data data <- data_econ # Has to be matrix class(data) dim(data) # 18x4 matrix # Import Studynames studynames <- studynames_econ # Has to be character class(studynames) # Apply the main function pubias_replication( data, studynames, GMM = FALSE, symmetric = TRUE, sign_lvl = 0.05, print_plots = TRUE, print_dashboard = TRUE )
After running the function, we can take a closer look at the results. The following figures correspond to the figures which are printed if one sets print_plots = TRUE. Before turning to the estimation result, let us take a closer look at the descriptives.
The histogram on the left side of Figure 4 shows the distribution of the original published standardized estimates. There is a considerable jump in the density right around the z-score of 1.96 which corresponds to the significance level of 5%. As Andrews and Kasy (2019) show in their proofs, this directly corresponds to a jump in the publication probability at the same cutoff. On the right hand side of Figure 4, we see the same type of plot as in Figure 1. Following the same interpretation, we can clearly see some evidence for publication selectivity.
knitr::include_graphics("scatter_hist_econ.pdf")
The MLE estimation leads to a publication probability of about 2.8%. This means the likelihood of publication is much lower if the results are insignificant. In other words: significant results are more than 30 times more likely to be published than insignificant results at the 5% significance level. If we run the asymmetric specification we get a similar result with a publication probability of 3.7%. We can also reject the null hypothesis that Psihat = 1, meaning, we are actually dealing with a publication bias.
It is important to note, that if we run this application with the GMM approach, we yield a really small publication probability which reports a negative sign. There are also big differences if the MLE model is run with different significance levels of 1% or 10%. This may make sense since there is only a big jump in the publication probability at 5%. This is to show, that the result does not seem to be robust if we exclude the parametric assumptions or adjust the significance level.
In the next step we correct for the said bias based on the publication probability of 2.8%. The results are summarized in the following two figures. Figure 5 plots the original, adjusted as well as the replication estimates. One can see that the adjusted estimates somewhat follow the replication estimates and are much smaller than the original estimates. Looking at the confidence sets of the adjusted and original estimates, we can see a clear difference. While only 2 of the original estimates confidence bounds included 0, adjusting for the publication bias leads to 12 confidence bounds containing 0. Therefore, the adjustment leads to substantially less significant results. Figure 6 corresponds to Figure 3 but we additionally plot the Bonferroni corrected confidence bounds. The interpretation stays the same. The corrected estimates lie clearly below the original estimate with the furthest distance exactly at our chosen cutoff. Again, the difference shrinks as the original estimates get larger or smaller.
knitr::include_graphics("org_adj_rep_econ.pdf")
knitr::include_graphics("correction_plot_econ.pdf")
The second application uses a meta-study by Croke et al. (2016). They look at the effect of a drug for deworming on the body weight of children. The authors collect a total of 22 estimates from 20 different studies. These 22 estimates together with their associated standard errors build the main data input for our function pubias_meta(). Because there are of multiple estimates for some of the studies (in this case two studies report two estimates), one has to cluster the standard errors for the study-ID. This is why the study-ID has to enter the input data as well. Contrary to the first application, we use the GMM approach in the second application.
As stated before, the data for the two documented applications can be loaded directly from the package as shown below. Of course, one can load other estimates and studynames via read.csv() or similar functions.
# Import data data <- data_dew # Has to be matrix class(data) dim(data) # 22x3 matrix # Import Studynames studynames <- studynames_dew # Has to be character class(studynames) # Apply the main function pubias_meta( data, studynames, GMM = TRUE, symmetric = TRUE, sign_lvl = 0.05, print_plots = TRUE, print_dashboard = TRUE )
As in the application for replication studies, we first take a look at the descriptives. The histogram in Figure 7 shows the density of the standardized estimates (i.e. $X/\Sigma$). We see a clear jump in the density around 0 which suggest evidence for selection for positive estimates. In the current build, there is no way to check if there exists a publication bias based on the sign of the estimates. This is why the publication probability is still calculated at the cutoff of 1.96.
On the right panel of Figure 7 we see a similar plot as in Figure 2. Applying the same logic, we do not clearly see evidence for selective probability, i.e. there is no clear positive correlation between $X$ and $\Sigma$. But because the amount of estimates is really small, we can not draw any final conclusions from the descriptives.
knitr::include_graphics("scatter_hist_dew.pdf")
Looking at the estimation results indicates a publication probability of about 25% (42% for the MLE approach). This is suggesting that statistically significant results are 4 times more likely to be included in the meta-study by Croke et al. (2016). Although this result seems decisive, Andrews and Kasy (2019) calculate different specifications for this application and show, that the results can differ a lot. They even get a result, suggesting that insignificant results are more likely to be published than significant ones. Therefore, we can not reject the null hypothesis that Psihat = 1. Consequently, we can not draw any final conclusions about the selective publication either.
Let us still take a look at the correction plots. In Figure 8 we compare the adjusted with the original estimates. It is obvious that estimates closer to the cutoff of 1.96 are adjusted quite a bit more than the other estimates. Generally, the adjusted estimates are drawn closer to zero, which would also mean that with the adjustment, we get less significant results.
Again, in Figure 9, we can see the biggest difference between the unbiased estimates and the original estimates around the cutoff 1.96. Above an estimation of 4, the adjusted equal the original estimates. The same holds for the estimates close to 0.
knitr::include_graphics("org_adj_dew.pdf")
knitr::include_graphics("correction_plot_dew.pdf")
\newpage
Currently, the package covers the base specification of the procedure proposed by Andrews and Kasy. In their paper, they propose further specifications like allowing for more than one cutoff, differentiating between numerical and analytical integration or even controlling for certain journals the papers were published in. One main functionality which the package currently lacks is the option to check for a publication bias based on the sign of the estimates. As we have seen in the meta-study application, publication might depend on the fact that estimates are positive and not only on their significance. All of these extensions could be added to the package in the future.
Primarily, the aim was to build the package as simple as possible. The user should be able to input a simple data matrix and get a reasonable result which can be easily interpreted. After conducting meta analyses or replicating a lot of different studies, there is most likely not much time left to check for something like the publication bias. This package provides an easy to use and quick way to do some robustness checks. Still, keep in mind that the package makes a lot of assumptions. For example, the assumption that the true effects are gamma distributed may not hold for all applications. It is therefore important to take a close look at the data and maybe run several specifications. As shown in the applications, results can differ substantially.
Besides the contents of package, there are obviously alternative approaches to deal with the publication bias. Andrews and Kasy discuss some of them in their paper. One of the most popular techniques are the so called meta-regression introduced by Card and Krueger (1995) and Egger et al. (1997) which regress the effect size of a study on characteristics of the study to explain the heterogeneity of treatment effects between different studies and therefore try to identify a publication bias. Andrews and Kasy also touch on the topics of manipulation or p-Hacking and discuss the question if results should even replicate or if it is completely normal if they deviate from the original estimates. In conclusion, the procedure implemented in the package is by far not the only method of dealing with a publication bias. The problem was identified a long time ago and there is still a lot of ongoing research regarding this topic.
\newpage
Andrews, Donald W. K. 1993. “Exactly Median-Unbiased Estimation of First Order Autoregressive/ Unit Root Models.” Econometrica 61 (1): 139–65.
Andrews, Isaiah, and Maximilian Kasy. 2019. "Identification of and Correction for Publication Bias." American Economic Review, 109 (8): 2766-94.
Cabin, Robert J., and Randall J. Mitchell. 2000. “To Bonferroni or Not to Bonferroni: When and How Are the Questions.” Bulletin of the Ecological Society of America. vol. 81, no. 3. pp. 246–248.
Croke, Kevin, Joan Hamory Hicks, Eric Hsu, Michael Kremer, and Edward Miguel. 2016. “Does Mass Deworming Affect Child Nutrition? Meta-Analysis, Cost-Effectiveness, and Statistical Power.” NBER Working Paper 22382.
Camerer, Colin F., Anna Dreber, Eskil Forsell, Teck-Hua Ho, Jürgen Huber, Magnus Johannesson, Michael Kirchler, et al. 2016. “Evaluating Replicability of Laboratory Experiments in Economics.” Science 351 (6280): 1433–36.
Egger, Matthias, George Davey Smith, Martin Schneider, and Christoph Minder. 1997. “Bias in Meta-Analysis Detected by a Simple, Graphical Test.” BMJ 315: 629–34.
Fanelli D. 2012 "Negative results are disappearing from most disciplines and countries." Scientometrics. 90:891–904.
Joober R, Schmitz N, Annable L, Boksa P. 2012. Publication bias: what are the challenges and can they be overcome?. J Psychiatry Neurosci. 37(3):149-152.
Rosenthal, Robert. 1979. “The File Drawer Problem and Tolerance for Null Results.” Psychological Bulletin 86 (3): 638–41.
Shinichi Nakagawa. 2004. "A farewell to Bonferroni: the problems of low statistical power and publication bias." Behavioral Ecology. Volume 15, Issue 6. Pages 1044–1045,
Stock, James H., and Mark W. Watson. 1998. “Median Unbiased Estimation of Coefficient Variance in a Time-Varying Parameter Model.” Journal of the American Statistical Association 93 (441): 349–58.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.