prep_predata: Creation of predata and data impuation of missing data.

View source: R/prep_predata.R

prep_predataR Documentation

Creation of predata and data impuation of missing data.

Description

Creation of predata, a data.frame containing environmental parameters in a grid of the study area. Also it allows to impute missing data for predata and segdata using PCA from the missMDA package or Amelia from Amelia package.

Usage

prep_predata(
  segdata,
  gridfile_name,
  varenviro,
  do_log_enviro,
  varphysio,
  do_log_physio,
  imputation = "Amelia",
  shape,
  saturate_predata = F,
  saturate_segdata = F,
  inbox_poly = T,
  col2keep = NULL,
  verbose = TRUE
)

Arguments

segdata

"Segdata" data.frame, built with prepare_data_effort.

gridfile_name

Grid of the study area with all the environmental variables which allows to build predata.

varenviro

Vector containing all dynamic environmental variables (e.g. chlorophyl, SST).

do_log_enviro

Vector precising which variables among varenviro that are necessary to transform to neperian logarithm + 1 (log1p).

varphysio

Vector containing non-dynamic environmental variables (i.e depth, dist200m).

do_log_physio

Vector precising which variables among varphysio that are necessary to transform to neperian logarithm + 1 (log1p).

imputation

Data imputation method to missing values of segdata and predata. 2 methods allowed :

  • "PCA" : PCA method from missMDA.

  • "Amelia" : Amelia method from Amelia.

Default method is PCA.

shape

Shapefile of the study area. It can be either a SpatialPolygonsDataFrame class object, in this case it is not necessary to give shape_layer argument. Or it can be the name of the shape object with its extension ".shp" (ex : "data/studyAreaShapefile.shp").

saturate_predata

Boolean. If TRUE, saturate function is applied on all varenviro an varphysio columns of segdata. Saturate function excludes extreme valuesand keep values between quantiles 95 percent and 5 percent.

saturate_segdata

Boolean. If TRUE, saturate function is applied on all varenviro an varphysio columns of predata. Saturate function excludes extreme valuesand keep values between quantiles 95 percent and 5 percent.

inbox_poly

When TRUE, keep only part of the grid that match with the study area.

col2keep

character string corresponding to the columns wanted to appear in output in predata.

verbose

Boolean. whether or not to display the progress bar for "PCA" or iteration number for "amelia".

Value

This function return a list containing :

  1. predata : data.frame of predata. Corresponding to gridata keeping covariates, coordinates and area of cells

  2. segdata : data.frame of segdata.

  3. pca_seg : Output of PCA function on predata.

  4. pca_pred : Output of PCA function on segdata.

  5. seg_mipat : output of VIM::aggr on predata

  6. pred_mipat : output of VIM::aggr on segdata


MathieuGenu/geffaeR documentation built on March 23, 2022, 7:50 p.m.