In josefin-werme/LAVA: Local Analysis of [co]Variant Association

Obtaining sample overlap from LDSC

To correct for potential sample overlap, LAVA uses the known or estimated sampling correlation (i.e. the phenotypic correlation that is due to sample overlap). If unknown, the intercept from cross-trait LDSC can be used to as an estimation of the sampling correlation (see Bulik-Sullivan et al. 2015 for details)

This tutorial will show you how you can create create a sampling correlation matrix from the results of cross-trait LDSC.

1. Perform cross-trait LDSC for all pairs of phenotypes (including each phenotype with itself)

see LDSC github page info on how to do this).

2. Add all the output to a single file (in bash)

```{bash, eval=F}

FILES=($(ls *_rg.log)) # assuming the output format is [phenotype]_rg.log N=$(echo ${#FILES[@]}) # and that all combinations of penotypes have been analysed

for I in ${FILES[@]}; do PHEN=$(echo $I | sed 's/_rg.log//')

    # subset log files to relevant output
    tail -n$(($N+4)) $I | head -$((N+1)) > $PHEN.rg     # (adapt as necessary)

    # add to single data set
    if [[ $I == ${FILES[0]} ]]; then
        cat $PHEN.rg > all.rg       # only including the header for the first phenotypes
    else
        cat $PHEN.rg | sed '1d' >> all.rg
    fi

done

#### 3. Extract the intercepts and create a sampling correlation matrix (in R)

```r
scor = read.table("all.rg",header=T)              # read in
scor = scor[,c("p1","p2","gcov_int")]             # retain key headers
scor$p1 = gsub("_munge.sumstats.gz","",scor$p1)   # assuming the munged files have format [phenotype]_munge.sumstats.gz
scor$p2 = gsub("_munge.sumstats.gz","",scor$p2)   # (adapt as necessary)

phen = unique(scor$p1)                  # obtain list of all phenotypes (assuming all combinations have been analysed)
n = length(phen)
mat = matrix(NA,n,n)                    # create matrix
rownames(mat) = colnames(mat) = phen    # set col/rownames

for (i in phen) {
        for (j in phen) {
                mat[i,j] = subset(scor, p1==i & p2==j)$gcov_int
        }
}

if (!all(t(mat)==mat)) { mat[lower.tri(mat)] = t(mat)[lower.tri(mat)] }  # sometimes there might be small differences in gcov_int depending on which phenotype was analysed as the outcome / predictor
mat = round(cov2cor(mat),5)                       # standardise
write.table(mat, "sample.overlap.txt", quote=F)   # save

josefin-werme/LAVA documentation built on July 4, 2024, 8:11 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com