SpaTemHTP_pipeline: Pipeline function for HTP data

Description Usage Arguments Details Value Author(s) Examples

View source: R/SpaTemHTP_pipeline.R

Description

Pipeline function performing different level of data treatment on high throughput phenotyping (HTP) time series data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
SpaTemHTP_pipeline(
  exp_id = "exp_x",
  trait_id = "trait_i",
  out_loc = NULL,
  exp_des_data,
  pheno_data,
  raw_data = TRUE,
  raw_data_out_det = FALSE,
  raw_data_imput = FALSE,
  raw_data_out_det_imput = TRUE,
  G_BLUES_TS = TRUE,
  G_BLUES_TS_sel = TRUE,
  G_BLUES_TS_log_curve = TRUE,
  out_det = TRUE,
  miss_imp = TRUE,
  sp_adj = TRUE,
  single_mixed_model = FALSE,
  out_p_val = 0.05,
  fixed = NULL,
  random = ~row_f + col_f
)

Arguments

exp_id

Character string indicating the name of the experiment. Default = 'exp_x'.

trait_id

Character string indicating the name of the trait analyzed. Default = 'trait_i'

out_loc

Character string indicating the path location where results will be saved. Defaut is the working directory

exp_des_data

data.frame of dimension (N_genotype * N_replicate) x N_variable containing the experimental design information. It must include: a) a 'genotype' column representing the line phenotyped; b) numeric 'row' and 'col' column representing the row and column informaiton, c) the same row and column information into factor columns named 'row_f' and 'col_f'. Other variables like replicate or block can be introduced to be used in the spatially adjusted mixed model computation. The user must set those extra variable in the correct format (generally factor).

pheno_data

data.frame of dimension (N_genotype * N_replicate) x N_days containing the measured phenotypic values.

raw_data

Logical value specifying if the user wants the raw data to be returned. Default = TRUE.

raw_data_out_det

Logical value specifying if the user wants the raw data after outlier detection to be returned. Default = FALSE.

raw_data_imput

Logical value specifying if the user wants the raw data after imputation to be returned. Default = FALSE.

raw_data_out_det_imput

Logical value specifying if the user wants the raw data after outliers detection and imputation to be returned. Default = TRUE.

G_BLUES_TS

Logical value specifying if the user wants to calculate the genotypes adjusted means (BLUEs) time series after spatial adjustment. Default = TRUE.

G_BLUES_TS_sel

Logical value specifying if the user wants to select the day with the largest h2 on the genotypes adjusted means (BLUEs) time series after. Default = TRUE.

G_BLUES_TS_log_curve

Logical value specifying if the user wants to perform a logistic curve fitting on the G-BLUES time series Default = TRUE.

out_det

Logical value specifying if outlier detection should be performed on the phenotypic data. Default = TRUE.

miss_imp

Logical value specifying if missing value imputation should be performed on the phenotypic data. Default = TRUE.

sp_adj

Logical value specifying if a mixed model with spatial adjustment (SpATS model) should be used to calculate the genotype BLUEs. Default = TRUE.

single_mixed_model

Logical value indicating if a 'single-step' mixed model should be calculated. See Details for more explanations. Default = FALSE.

out_p_val

Numeric value indicating the signficance threshold for outliers detection. Default = 0.05.

fixed

Optional right hand formula object specifying the fixed effects of the SpATS model. Default = NULL.

random

Optional right hand formula object specifying the random effects of the SpATS model. Default = ~ row_f + col_f.

Details

The function perform different operations to progressively enrich the data in information content. The user can select the amount of treatment he wants to apply on the data by selecting among the following options:

  1. Raw data with experimetal design information.

  2. Raw data with outliers detected (using outliers_det_boxplot) and experimental design information.

  3. Raw data with missing values imputed after outliers detection (using miss_imp_PMM) and experimental design information.

  4. Genotype adjusted means (BLUEs) time series using the SpATS model for spatial correction after outliers detection and imputation.

  5. Selection of an optimal section or time point in the whole genotype BLUEs time series according to an heritability criteria or change point analysis (TS_select).

  6. Further analysis of the time series fitting a logistic curve to the genotype BLUEs TS.

    The two last options (selection on the time series and further modelling of the time series) are conditional on the calculation of the genotype adjusted means time series (option 4).

Value

For each chosen options, the function will save the produced data in a folder created at the specified location. Will also be added.

... (develop further)

Author(s)

ICRISAT GEMS team

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
data(SG_PH_data)

SG_PH_data$col_f <- factor(SG_PH_data$col)
SG_PH_data$row_f <- factor(SG_PH_data$row)

SG_PH_data$rep <- factor(SG_PH_data$rep)
SG_PH_data$block <- factor(SG_PH_data$block)

exp_des_data = SG_PH_data[, c("row", "col", "row_f", "col_f","genotype",
"rep", "block")]

pheno_data <- SG_PH_data[, 6:28]

## Not run: 

out_loc <- getwd() # specify a directory where the results will be saved

results <- SpaTemHTP_pipeline(exp_id = 'Exp_XX', trait_id = 'trait_1',
out_loc = out_loc, exp_des_data = exp_des_data, pheno_data = pheno_data,
random = ~ rep +  rep:block + row_f + col_f)


## End(Not run)

ICRISAT-GEMS/SpaTemHTP documentation built on March 9, 2021, 12:12 a.m.