linkage.gp: Calculate recombination frequency, LOD and phase using...

View source: R/exported_functions.R

linkage.gpR Documentation

Calculate recombination frequency, LOD and phase using genotype probabilities

Description

linkage.gp is used to calculate recombination frequency, LOD and phase within one type of marker or between two types of markers.

Usage

linkage.gp(
  probgeno_df,
  chk,
  pardose = NULL,
  markertype1 = c(1, 0),
  markertype2 = NULL,
  target_parent = match.arg(c("P1", "P2")),
  G2_test = FALSE,
  LOD_threshold = 0,
  prefPars = c(0, 0),
  combinations_per_iter = NULL,
  iter_RAM = 500,
  ncores = 2,
  verbose = TRUE,
  check_qall_mult = FALSE,
  method = "approx",
  log = NULL
)

Arguments

probgeno_df

A data frame as read from the scores file produced by function saveMarkerModels of R package fitPoly, or alternatively, a data frame containing the following columns:

SampleName

Name of the sample (individual)

MarkerName

Name of the marker

P0

Probabilities of dosage score '0'

P1...

Probabilities of dosage score '1' etc. (up to max dosage, e.g. P4 for tetraploid population)

maxP

Maximum genotype probability identified for a particular individual and marker combination

maxgeno

Most probable dosage for a particular individual and marker combination

geno

Most probable dosage for a particular individual and marker combination, if maxP exceeds a user-defined threshold (e.g. 0.9), otherwise NA

chk

Output list as returned by function checkF1

pardose

Option to include the most likely (discrete) parental dosage scores, used mainly for internal calls of this function. By default NULL

markertype1

A vector of length 2 specifying the first markertype to compare. The first element specifies the dosage in target_parent (and the second in the other parent).

markertype2

A vector of length 2 specifying the first markertype to compare. This argument is optional. If not specified, the function will calculate linkage within the markertype as specified by markertype1. The first element specifies the dosage in target_parent (and the second in the other parent).

target_parent

Which parent is being targeted (only acceptable options are "P1" or "P2"), ie. which parent is of specific interest? If this is the maternal parent, please specify as "P1". If the paternal parent, please use "P2". The actual identifiers of the two parents are entered using the arguments parent1_replicates and parent2_replicates.

G2_test

Apply a G2 test (LOD of independence) in addition to the LOD of linkage.

LOD_threshold

Minimum LOD score of linkages to report. Recommended to use for large number (> millions) of marker comparisons in order to reduce memory usage.

prefPars

The estimates for preferential pairing parameters for parent 1 and 2, in range 0 <= p < 2/3. By default this is c(0,0) (so, no preferential pairing). See the function test_prefpairing and the vignette for more details.

combinations_per_iter

Optional integer. Number of marker combinations per iteration.

iter_RAM

A (very) conservative estimate of working memory in megabytes used per core. It only takes the size frequency matrices into account. Actual usage is more, especially with large number of linkages that are reported. Reduce memory usage by using a higher LOD_threshold.

ncores

Number of cores to use. Works both for Windows and UNIX (using doParallel). Use parallel::detectCores() to find out how many cores you have available.

verbose

Should messages be sent to stdout?

check_qall_mult

Check the qall_mult column of chk, and filter out markers with qall_mult = 0. By default FALSE.

method

Either "approx" or "mappoly". If "approx" (the default method), then an approximated estimator is used which introduces a small amount of bias in the estimator of recombination frequency. If method "mappoly" is specified, the full likelihood is used in the estimation, leading to an unbiased estimator (this has been implemented in the mappoly package of Marcelo Mollinari). The mappoly method has higher computational demands which may introduce problems for larger datasets, but will lead to higher accuracy overall.

log

Character string specifying the log filename to which standard output should be written. If NULL log is send to stdout.

Value

Returns a data.frame with columns:

marker_a:

first marker of comparison. If markertype2 is specified, it has the type of markertype1.

marker_b:

second marker of comparison. It has the type of markertype2 if specified.

r:

recombination frequency

LOD:

LOD score associated with r

phase:

phase between markers

Examples

data("gp_df","chk1")
SN_SN_P1.gp <- linkage.gp(probgeno_df = gp_df,
                          chk = chk1,
                          markertype1 = c(1,0),
                          target_parent = "P1")

polymapR documentation built on Nov. 5, 2023, 1:09 a.m.