SDAtools: SDAtools: A toolkit for SDA

SDAtools: A toolkit for SDA

Daniel Wells 2018-05-09

This vignette is a tutorial for using SDAtools to prepare data for SDA, run SDA, and analyse the results from SDA. Note that sda itself is not included in this package and should be downloaded and installed via https://jmarchini.org/sda/

Once SDA is installed we can run SDA from within R. As a toy dataset for this vignette I simulate data from the SDA model.

# Install and Load SDAtools
# remotes::install_github("marchinilab/SDAtools")
library(SDAtools)

# Simulate data
set.seed(42)
data <- simulate_2D_data()
export_data(data$Y, name = "simulated.data", path = "../data-raw/")

# Run SDA
run_SDA(out = "../data-raw/simulation_results",
        data = "../data-raw/simulated.data",
        max_iter = 200,
        save_freq = 200)

We can easily load the SDA results into an R object.

# Load the results
results <- load_results(results_folder = "../data-raw/simulation_results", data_path = "../data-raw/")
str(results)

## List of 7
##  $ scores           : num [1:100, 1:10] 1.4157 -0.7334 -0.0113 0.0596 -0.4093 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:10] "V1" "V2" "V3" "V4" ...
##  $ n                :List of 5
##   ..$ individuals     : int 100
##   ..$ components      : int 10
##   ..$ omics           : int 1
##   ..$ context_matrices: int 0
##   ..$ iterations      : chr "200"
##  $ loadings         :List of 1
##   ..$ : num [1:10, 1:500] 0.004438 0.010356 -0.001241 -0.023387 -0.000376 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : NULL
##   .. .. ..$ : chr [1:500] "Naa50" "Knstrn" "Ttc23l" "Slco6c1" ...
##  $ pips             :List of 1
##   ..$ : num [1:10, 1:500] 0.0517 0.0801 0.0345 0.1389 0.0312 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : NULL
##   .. .. ..$ : chr [1:500] "Naa50" "Knstrn" "Ttc23l" "Slco6c1" ...
##  $ free_energy      : num [1, 1:20] -81198 -80414 -79989 -79867 -79852 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:20] "V1" "V2" "V3" "V4" ...
##  $ pip_fraction     : num [1:200] 0.0002 0.781 0.7974 0.795 0.797 ...
##  $ command_arguments:Classes 'data.table' and 'data.frame':  1 obs. of  14 variables:
##   ..$ data              : chr "../data-raw/simulated.data"
##   ..$ N                 : chr "100"
##   ..$ out               : chr "../data-raw/simulation_results"
##   ..$ num_comps         : chr "10"
##   ..$ max_iter          : chr "200"
##   ..$ save_freq         : chr "200"
##   ..$ free_freq         : chr "10"
##   ..$ ignore_missing    : chr "FALSE"
##   ..$ remove_zero_comps : chr "TRUE"
##   ..$ num_blocks        : chr "1"
##   ..$ num_openmp_threads: chr "1"
##   ..$ eigen_parallel    : chr "FALSE"
##   ..$ save_everything   : chr "FALSE"
##   ..$ debug             : chr "FALSE"
##   ..- attr(*, ".internal.selfref")=<externalptr>

In this case we also have the true values of the scores so we can compare those to the SDA inferred scores.

# Compare original and recovered score vectors
check_simulation_scores(data = data, results = results)

We can check for convergence by seeing if the free energy and % PIP \<0.5 have stabalised.

check_convergence(results)

We might want to check the overall distribution of loadings, scores, or PIP.

loading_distribution(results)

scores_distribution(results)

plot_maximums(results)

plot_scree(results)

PIP_distribution(results)

PIP_component_distribution(results, 2)

PIP_threshold_distribution(results)

If we are interested in a specific gene we could check which components have the highest loadings for that gene. We could then check which are the highest genes in that component.

# Which component has the highest loading for gene 1
highest_components(results, variable_name =  "Xrn1")

# Which genes have the highest loading in component 4
highest_genes(results, component = 4)

We can also plot the gene loadings by genomic location

# Plot loadings along the genome
genome_loadings(results$loadings[[1]][1,])

# Plot individual scores
plot_scores(results, 1)

marchinilab/SDAtools documentation built on Jan. 31, 2020, 3:51 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

marchinilab/SDAtools
SDAtools: A toolkit for SDA

vignettes/vignette.md
In marchinilab/SDAtools: SDAtools: A toolkit for SDA

SDAtools: A toolkit for SDA

Running SDA within R

Load Results

Check Convergence

Plot Distributions

Check Specific Gene / Component

Manhatten-esq plot for loadings

R Package Documentation

Browse R Packages

We want your feedback!

marchinilab/SDAtools SDAtools: A toolkit for SDA

vignettes/vignette.md In marchinilab/SDAtools: SDAtools: A toolkit for SDA

SDAtools: A toolkit for SDA

Running SDA within R

Load Results

Check Convergence

Plot Distributions

Check Specific Gene / Component

Manhatten-esq plot for loadings

R Package Documentation

Browse R Packages

We want your feedback!

marchinilab/SDAtools
SDAtools: A toolkit for SDA

vignettes/vignette.md
In marchinilab/SDAtools: SDAtools: A toolkit for SDA