makeBC: Make a backcross (BC) population
In tpbilton/GUSMap: Genotyping Uncertainty with Sequencing Data and Linkage Mapping (GUSMap)

makeBC

R Documentation

Make a backcross (BC) population

Description

Create an BC object from an RA object, perform standard filtering and compute statistics specific to backcross populations.

Usage

makeBC(
  RAobj,
  pedfile,
  family = NULL,
  MNIF = 1,
  inferSNPs = FALSE,
  filter = list(MAF = 0.05, MISS = 0.2, BIN = 100, DEPTH = 5, PVALUE = 0.01, MAXDEPTH =
    500)
)

Arguments

`RAobj`	Object of class RA created via the `readRA` function.
`pedfile`	Character string giving the file name (relative to the current directory) of the pedigree file.
`family`	Vector of character strings giving the families to retain in the BC object. This allows a pedigree file with more than one family to be supplied.
`inferSNPs`	Logical value indicating whether to infer the segregation type of SNPs using the progeny information only in cases where the segregation typecould not be inferred from the parental genotypes.
`filter`	Named list of thresholds for various filtering criteria. See below for details.

Details

This function converts an RA object into an BC (backcross) object. An BC object is a R6 type object that contains RA data, various other statistics computed and functions (methods) for analyzing and performing linkage mapping for backcross populations. The statistics computed and data filtering are specific to backcross populations and sequencing data.

The filtering criteria currently implemented are:

Minor allele frequency (MAF): SNPs are discarded if their MAF is less than the threshold (default is 0.05)
Proportion of missing data (MISS): SNPs are discarded if the proportion of individuals with no reads (e.g. missing genotype) is greater than the threshold value (default is 0.5).
Bin size for SNP selection (BIN): SNPs are binned together if the distance (in base pairs) between them is less than the threshold value (default is 100). One SNP is then randomly selected from each bin and retained for final analysis. This filtering is to ensure that there is only one SNP on each sequence read.
Parental read depth (DEPTH): SNPs are discarded if the read depth of either parent is less than the threshold value (default is 5). This filter is to remove SNPs where the parental information is insufficient to infer segregation type accurately.
Segregation test P-value (PVALUE): SNPs are discarded if the p-value from a segregation test is smaller than the threshold (default is 0.01). This filters out SNPs where the segregation type has been inferred wrong.
Maximum average SNP depth (MAXDEPTHSNPs with an average read depth above the threshold value are discarded.

The segregation type of each SNP is inferred based on the genotypes of the parents. The parental genotypes are called homozygous for the reference allele if there is only reference reads seen, heterozygous if at least one read for the reference and alternate allele are seen, and homozygous for the alternate allele if only reads for the alternate allele are seen. as a result, the parental genotype may be incorrectly inferred if the read depth is too low (e.g., homozeygous genotype is called heterozygous) and hence why the DEPTH filter is implemented. The segregation test performed for the PVALUE filter is described in the supplementary methods of the publication by \insertCitebilton2018genetics1;textualGUSMap (Section 4 of File S1).

If the argument inferSNPs is TRUE, an attempt to infer the segregation type of SNPs where the segregation type could not be determined from the parental genotypes is made using the progeny data only. Note that using this approach, MI SNPs can not be distinguished from PI SNPs (since we only know that one parent is heterozygous and one parent is homozygous but we don't know which is which) and so we collectively refer to the MI and PI SNPs inferred using this approach as semi-informative (SI) SNPs.

The pedigree file must be a csv file containing the five columns:

SampleID: A unique character string of the sample ID. These correspond to those found in the VCF file
IndividualID: A character giving the ID number of the individual for which the sample corresponds to. Note that some samples can be from the same individual.
Mother: The ID of the mother as given in the IndividualID. Note, if the mother is unknown then this should be left blank.
Father: The ID of the father as given in the IndividualID. Note, if the father is unknown then this should be left blank.
Family: The name of the Family for a group of progeny with the same parents. Note that this is not necessary (it works all the full-sib families) but if given must be the same for all the progeny.

Grandparents can also be supplied but are only used to infer parental genotypes when the associated read depth is greater than or equal the threshold DEPTH.

The family argument allows the user to specify the family to be used in the creation of the full-sib population. Note that this argument using the "Family" column in the pedigree file and so pedigree file needs to be set-up correctly.

Note: Only a single full-sib family can be processed at present. There are future plans to extend this out to include multiple families.

Value

An R6 object of class BC

Author(s)

Timothy P. Bilton

References

\insertRef

bilton2018genetics1GUSMap

Examples

## extract filename for Manuka dataset in GUSMap package
vcffile <- Manuka11()

## Convert VCF to RA format
rafile <- VCFtoRA(vcffile$vcf)

## read in the RA data
mkdata <- readRA(rafile)

## Create the BC population
makeBC(mkdata, pedfile=vcffile$ped)

tpbilton/GUSMap documentation built on Feb. 22, 2025, 12:27 p.m.