filter_unlink: Filter for "unlinked" loci

View source: R/filter_unlink.R

filter_unlinkR Documentation

Description

Parses a data table of genotypes/allele frequencies and returns a list of loci that are "unlinked", in the sense they occur on different contigs.

Usage

filter_unlink(
  dat,
  chromCol = "CHROM",
  locusCol = "LOCUS",
  posCol = "POS",
  method = "random"
)

Arguments

dat

Data table: The sequencing read information, must contain the columns: CHROM = the chromosome ID, i.e. contigs; and LOCUS = the locus ID.

  1. The chromosome (or contig) ID (see param chromCol).

  2. The locus ID (see param locusCol).

  3. The SNP position (see param posCol).

chromCol

Character: The chromosome (or contig) information column. Default = 'CHROM'.

locusCol

Character: The locus information column. Default = 'LOCUS'.

posCol

Character: The locus position column. Default = 'POS'.

method

Character: How should the filtering be performed? Default = 'random', a single random SNP will be drawn per contig. Alternatively, 'first' can be used to draw the first SNP in the contig.

Details

Note, this function is specifically designed for RADseq data where contigs comprise small (100s bp) genomic regions assembed from restriction digest fragments. It should not be used on genomic contigs from genome assembly. Additionally, it is also important to follow up filtering with formal tests of linkage disequilibrium.

Value

Returns a character vector of locus names in dat[[locusCol]] that are not on the same contig in dat[[chromCol]].

Examples

data(data_Genos)

# Number of unique SNP per locus
data_Genos[, length(unique(LOCUS)), by=CHROM]$V1 %>% table

# Randomly sample 1 SNP per locus
snp.rand.1st <- filter_unlink(data_Genos, method='random')
snp.rand.2nd <- filter_unlink(data_Genos, method='random')

# Number of SNPs different between random sets
setdiff(snp.rand.1st, snp.rand.2nd) %>% length

# Sample first SNP per locus
snp.first.1st <- filter_unlink(data_Genos, method='first')
snp.first.2nd <- filter_unlink(data_Genos, method='first')

# Number of SNPs different between random sets
setdiff(snp.first.1st, snp.first.2nd) %>% length


j-a-thia/genomalicious documentation built on Oct. 19, 2024, 7:51 p.m.