gen_tibble | R Documentation |
gen_tibble
A gen_tibble
stores genotypes for individuals in a tidy format. DESCRIBE
here the format
gen_tibble(
x,
...,
valid_alleles = c("A", "T", "C", "G"),
missing_alleles = c("0", "."),
backingfile = NULL,
quiet = FALSE
)
## S3 method for class 'character'
gen_tibble(
x,
...,
parser = c("cpp", "vcfR"),
n_cores = 1,
chunk_size = NULL,
valid_alleles = c("A", "T", "C", "G"),
missing_alleles = c("0", "."),
backingfile = NULL,
quiet = FALSE
)
## S3 method for class 'matrix'
gen_tibble(
x,
indiv_meta,
loci,
...,
ploidy = 2,
valid_alleles = c("A", "T", "C", "G"),
missing_alleles = c("0", "."),
backingfile = NULL,
quiet = FALSE
)
x |
can be:
|
... |
if |
valid_alleles |
a vector of valid allele values; it defaults to 'A','T', 'C' and 'G'. |
missing_alleles |
a vector of values in the BIM file/loci dataframe that indicate a missing value for the allele value (e.g. when we have a monomorphic locus with only one allele). It defaults to '0' and '.' (the same as PLINK 1.9). |
backingfile |
the path, including the file name without extension, for
backing files used to store the data (they will be given a .bk and .RDS
automatically). This is not needed if |
quiet |
provide information on the files used to store the data |
parser |
the name of the parser used for VCF, either "cpp" to use a
fast C++ parser (the default), or "vcfR" to use the R package |
n_cores |
the number of cores to use for parallel processing |
chunk_size |
the number of loci or individuals (depending on the format)
processed at a time (currently used if |
indiv_meta |
a list, data.frame or tibble with compulsory columns 'id'
and 'population', plus any additional metadata of interest. This is only
used if |
loci |
a data.frame or tibble, with compulsory columns 'name',
'chromosome', and 'position','genetic_dist', 'allele_ref' and 'allele_alt'.
This is only used if |
ploidy |
the ploidy of the samples (either a single value, or a vector of values for mixed ploidy). Only used if creating a gen_tibble from a matrix of data; otherwise, ploidy is determined automatically from the data as they are read. |
VCF files: the fast cpp
parser is used by default. Both cpp
and
vcfR
parsers attempt to establish ploidy from the first variant; if that
variant is found in a sex chromosome (or mtDNA), the parser will fail with
'Error: a genotype has more than max_ploidy alleles...'. To successful import
such a VCF, change the order of variants so that the first chromosome is an
autosome using a tool such as vcftools
. Currently, only biallelic SNPs are
supported. If haploid variants (e.g. sex chromosomes) are included in the
VCF, they are not transformed into homozygous calls. Instead, reference
alleles will be coded as 0 and alternative alleles will be coded as 1.
packedancestry files: When loading packedancestry files, missing alleles will be converted from 'X' to NA
an object of the class gen_tbl
.
# Create a gen_tibble from a .bed file
bed_file <-
system.file("extdata", "lobster", "lobster.bed", package = "tidypopgen")
gen_tibble(bed_file,
backingfile = tempfile("lobsters"),
quiet = TRUE
)
# Create a gen_tibble from a .vcf file
vcf_path <-
system.file("extdata", "anolis",
"punctatus_t70_s10_n46_filtered.recode.vcf.gz",
package = "tidypopgen"
)
gen_tibble(vcf_path, quiet = TRUE, backingfile = tempfile("anolis_"))
# Create a gen_tibble from a matrix of genotypes:
test_indiv_meta <- data.frame(
id = c("a", "b", "c"),
population = c("pop1", "pop1", "pop2")
)
test_genotypes <- rbind(
c(1, 1, 0, 1, 1, 0),
c(2, 1, 0, 0, 0, 0),
c(2, 2, 0, 0, 1, 1)
)
test_loci <- data.frame(
name = paste0("rs", 1:6),
chromosome = paste0("chr", c(1, 1, 1, 1, 2, 2)),
position = as.integer(c(3, 5, 65, 343, 23, 456)),
genetic_dist = as.double(rep(0, 6)),
allele_ref = c("A", "T", "C", "G", "C", "T"),
allele_alt = c("T", "C", NA, "C", "G", "A")
)
gen_tibble(
x = test_genotypes,
loci = test_loci,
indiv_meta = test_indiv_meta,
valid_alleles = c("A", "T", "C", "G"),
quiet = TRUE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.