README.md

MetaPipe logo

R build
status

MetaPipe: A High-Performance Computing Pipeline for QTL Mapping of Large Ionomic and Metabolomic Datasets

Overview

The goal of MetaPipe is to provide an easy to use and powerful tool capable of performing QTL mapping analyses.

Installation

You can install the released version of MetaPipe from CRAN with:

install.packages("MetaPipe")

And the development version from GitHub with:

# install.packages(c("hexSticker", "kableExtra", "qpdf", "remotes")
remotes::install_github("villegar/MetaPipe", build_vignettes = TRUE)

Example

Load raw data

For details about the data structure and extended documentation, see the vignette Load Raw Data.

vignette("load-raw-data", package = "MetaPipe")

Function call

MetaPipe::load_raw(raw_data_filename = "FILE.CSV", excluded_columns = c(...))

where raw_data_filename is the filename containing the raw data, both absolute and relative paths are accepted. Next, the argument excluded_columns is a vector containing the indices of the properties, e.g. c(2, 3, ..., M).

# F1 Seedling Ionomics dataset
ionomics_path <- system.file("extdata", 
                             "ionomics.csv", 
                             package = "MetaPipe", 
                             mustWork = TRUE)
ionomics <- MetaPipe::load_raw(ionomics_path)
knitr::kable(ionomics[1:5, 1:8])

| ID | SampleWeight | Ca44 | K39 | P31 | Li7 | B11 | Na23 | | :----- | -----------: | -------: | -------: | -------: | --------: | -------: | --------: | | E_199 | 79 | 32675.79 | 6051.023 | 2679.338 | 0.1159068 | 23.32975 | 9.372606 | | E_209 | 81 | 28467.95 | 5642.651 | 2075.403 | 0.0104801 | 27.31206 | 8.787553 | | E_035 | 81 | 27901.35 | 7357.856 | 2632.343 | 0.0561879 | 16.87480 | 14.369062 | | E_197 | 79 | 27855.36 | 5225.275 | 1761.725 | 0.0104453 | 25.34740 | 11.009597 | | E_016 | 79 | 27377.40 | 6141.001 | 2145.715 | 0.0172996 | 24.64500 | 6.999958 |

Replace missing data

For extended documentation, see the vignette Replace Missing Data.

vignette("replace-missing-data", package = "MetaPipe")

Function call

MetaPipe::replace_missing(raw_data = example_data, 
                          excluded_columns = c(2), 
                          # Optional
                          out_prefix = "metapipe", 
                          prop_na = 0.5, 
                          replace_na = FALSE)

where raw_data is a data frame containing the raw data, as described in Load Raw Data and excluded_columns is a vector containing the indices of the properties, e.g. c(2, 3, ..., M). The other arguments are optional, out_prefix is the prefix for output files, prop_na is the proportion of NA values (used to drop traits), and replace_na is a logical flag to indicate whether or not NAs should be replace by half of the minimum value within each variable.

# F1 Seedling Ionomics dataset
data(ionomics) # Includes some missing data
ionomics_rev <- MetaPipe::replace_missing(ionomics, c(1, 2))
ionomics_rev <- MetaPipe::replace_missing(ionomics, 
                                          excluded_columns = c(1, 2), 
                                          prop_na =  0.025)
#> The following trait was dropped because it has 2.5% or more missing values: 
#>  - Se78
ionomics_rev <- MetaPipe::replace_missing(ionomics, 
                                          excluded_columns = c(1, 2),
                                          replace_na =  TRUE)
knitr::kable(ionomics_rev[1:5, 1:8])

| ID | SampleWeight | Ca44 | K39 | P31 | Li7 | B11 | Na23 | | :----- | -----------: | -------: | -------: | -------: | --------: | -------: | --------: | | E_001 | 79 | 15894.22 | 5888.311 | 1743.118 | 0.0128699 | 18.66673 | 6.970224 | | E_002 | 93 | 13155.45 | 7013.400 | 2244.684 | 0.0119316 | 14.47693 | 5.866392 | | E_004 | 97 | 14182.51 | 7966.273 | 2311.057 | 0.0212316 | 14.71313 | 10.251955 | | E_005 | 82 | 22550.82 | 7514.089 | 2315.675 | 0.0233063 | 20.10630 | 11.773697 | | E_006 | 99 | 15982.76 | 7608.464 | 1995.193 | 0.0588128 | 12.97801 | 11.043837 |

Assess normality

For extended documentation, see the vignette Assess Normality.

vignette("assess-normality", package = "MetaPipe")

MetaPipe assesses the normality of variables (traits) by performing a Shapiro-Wilk test on the raw data (see Load Raw Data and Replace Missing Data. Based on whether or not the data approximates a normal distribution, an array of transformations will be computed, and the normality assessed one more time.

Function call

MetaPipe::assess_normality(raw_data = raw_data, 
                           excluded_columns = c(2, 3, ..., M), 
                           # Optional
                           cpus = 1, 
                           out_prefix = "metapipe", 
                           plots_dir = tempdir(), 
                           transf_vals = c(2, exp(1), 3, 4, 5, 6, 7, 8, 9, 10),
                           alpha = 0.05,
                           pareto_scaling = FALSE,
                           show_stats = TRUE)

where raw_data is a data frame containing the raw data, as described in Load Raw Data and excluded_columns is a vector containing the indices of the properties, e.g. c(2, 3, ..., M). The other arguments are optional, cpus is the number of cores to use, in other words, the number of concurrent traits to process, out_prefix is the prefix for output files, plots_dir is the output directory where the plots will be stored, transf_vals is a vector containing the transformation values to be used when transforming the original data, alpha is the significance level for the Wilk-Shapiro tests, pareto_scaling is a boolean flag to indicate whether or not to scale the traits to the same scale, and show_stats is a boolean flag to show or hide some general statistics of the normalisation process.

# F1 Seedling Ionomics dataset
data(ionomics) # Includes some missing data
ionomics_rev <- MetaPipe::replace_missing(ionomics, 
                                          excluded_columns = c(1, 2),
                                          replace_na =  TRUE)
ionomics_normalised <- 
  MetaPipe::assess_normality(ionomics_rev,
                             excluded_columns = c(1, 2),
                             transf_vals = c(2, exp(1)),
                             out_prefix = "README-ionomics",
                             plots_dir = "man/figures/",
                             pareto_scaling = FALSE)
#> Total traits (excluding all NAs traits):     21
#> Normal traits (without transformation):      2
#> Normal traits (transformed):                 4
#> Total normal traits:                         6
#> Total skewed traits:                         15
#> 
#> Transformations summary:
#>  f(x)      Value     # traits  
#>  log       2         3         
#>  root      e         1

# Extract normalised features
ionomics_norm <- ionomics_normalised$norm
ionomics_skew <- ionomics_normalised$skew

The function call to MetaPipe::assess_normality will print a summary of the transformations performed (if any), as well as an overview of the number of traits that should be considered normal and skewed. Next, we can preview some of the partial output of the normality assessment process:

# Normal traits
knitr::kable(ionomics_norm[1:5, ]) 

| ID | Ca44 | B11 | Na23 | Mg26 | Rb85 | Sr88 | | :----- | -------: | -------: | -------: | -------: | -------: | --------: | | E_001 | 15894.22 | 4.222397 | 2.042740 | 10.77021 | 1.555742 | 7.347059 | | E_002 | 13155.45 | 3.855684 | 1.917202 | 10.54095 | 2.058711 | 6.890243 | | E_004 | 14182.51 | 3.879033 | 2.354263 | 10.51931 | 2.198422 | 9.025915 | | E_005 | 22550.82 | 4.329576 | 2.477233 | 11.13450 | 1.791578 | 15.292360 | | E_006 | 15982.76 | 3.697997 | 2.419593 | 11.72734 | 2.229866 | 13.901449 |

# Skewed traits (partial output)
knitr::kable(ionomics_skew[1:5, 1:8])

| ID | K39 | P31 | Li7 | Al27 | S34 | Fe54 | Mn55 | | :----- | -------: | -------: | --------: | --------: | -------: | -------: | -------: | | E_001 | 5888.311 | 1743.118 | 0.0128699 | 3.845879 | 1152.944 | 27.59340 | 54.53991 | | E_002 | 7013.400 | 2244.684 | 0.0119316 | 5.825639 | 1600.442 | 35.49159 | 52.57114 | | E_004 | 7966.273 | 2311.057 | 0.0212316 | 8.036047 | 1039.098 | 39.13434 | 36.66475 | | E_005 | 7514.089 | 2315.675 | 0.0233063 | 9.482051 | 1091.607 | 40.22041 | 43.24368 | | E_006 | 7608.464 | 1995.193 | 0.0588128 | 29.329605 | 1096.871 | 75.23614 | 53.64705 |

Among the transformed traits, we have B11 and Na23. Both of which seem to be skewed, but after a simple transformation, can be classify as normalised traits.

QTL mapping

Scan one QTL mapping

qtl_scone <- function(x_data, cpus = 1, ...)

where x_data

# F1 Seedling Ionomics dataset
data(father_riparia) # Genetic map
# Load cross file with genetic map and raw data for normal traits
x <- MetaPipe::read.cross(father_riparia, 
                          ionomics_norm,
                          genotypes = c("nn", "np", "--"))
#>  --Read the following data:
#>   166  individuals
#>   1115  markers
#>   7  phenotypes
#> Warning in summary.cross(cross): Some markers at the same position on chr
#> 1,4,5,7,8,9,10,12,14,15,16,17; use jittermap().
#>  --Cross type: f2

set.seed(123)
x <- qtl::jittermap(x)
x <- qtl::calc.genoprob(x, step = 1, error.prob = 0.001)
x_scone <- MetaPipe::qtl_scone(x, 1, model = "normal", method = "hk")


villegar/MetaPipe documentation built on Nov. 22, 2022, 10:44 p.m.