Description Usage Arguments Details Value References See Also Examples
View source: R/SlimFunctions.R
To import SLiM data into R
, we provide the read_slim
function, which has been tested for SLiM versions 2.0-3.1. The read_slim
function is only appropriate for single-nucleotide variant (SNV) data produced by SLiM's outputFull() method. We do not support output in MS or VCF data format, i.e. produced by outputVCFsample() or outputMSSample() in SLiM.
1 2 |
file_path |
character. The file path or URL of the .txt output file created by the outputFull() method in SLiM. |
keep_maf |
numeric. The largest allele frequency for retained SNVs, by default |
recomb_map |
data frame. (Optional) A recombination map of the same format as the data frame returned by |
pathway_df |
data frame. (Optional) A data frame that contains the positions for each exon in a pathway of interest. See details. |
recode_recurrent |
logical. When |
In addition to reducing the size of the data, the argument keep_maf
has practicable applicability. In family-based studies, common SNVs are generally filtered out prior to analysis. Users who intend to study common variants in addition to rare variants may need to run chromosome specific analyses to allow for allocation of large data sets in R
.
The argument recomb_map
is used to remap mutations to their actual locations and chromosomes. This is necessary when data has been simulated over non-contiguous regions such as exon-only data. If create_slimMap
was used to create the recombination map for SLiM, simply supply the output of create_slimMap
to recomb_map
. If recomb_map
is not provided we assume that the SNV data has been simulated over a contiguous segment starting with the first base pair on chromosome 1.
The data frame pathway_df
allows users to identify SNVs located within a pathway of interest. When supplied, we expect that pathwayDF
does not contain any overlapping segments. All overlapping exons in pathway_df
MUST be combined into a single observation. Users may combine overlapping exons with the combine_exons
function.
When TRUE
, the logical argument recode_recurrent
indicates that recurrent SNVs should be recorded as a single observation. SLiM can model many types of mutations; e.g. neutral, beneficial, and deleterious mutations. When different types of mutations occur at the same position carriers will experience different fitness effects depending on the carried mutation. However, when mutations at the same location have the same fitness effects, they represent a recurrent mutation. Even so, SLiM stores recurrent mutations separately and calculates their prevalence independently. When the argument recode_recurrent = TRUE
we store recurrent mutations as a single observation and calculate the derived allele frequency based on their combined prevalence. This convention allows for both reduction in storage and correct estimation of the derived allele frequency of the mutation. Users who prefer to store recurrent mutations from independent lineages as unique entries should set recode_recurrent = FALSE
.
An object of class SNVdata
, which inherits from a list
and contains:
The read_slim
function returns an object of class SNVdata
, which inherits from a list
and contains the following two items:
Haplotypes
A sparse matrix of class dgCMatrix (see dgCMatrix-class
). The columns in Haplotypes represent distinct SNVs, while the rows represent individual haplotypes. We note that this matrix contains two rows of data for each diploid individual in the population: one row for the maternally ihnherited haplotype and the other for the paternally inherited haplotype.
Mutations
A data frame cataloging SNVs in Haplotypes
. The variables in the Mutations
data set are described as follows:
colID
Associates the rows, i.e. SNVs, in Mutations
to the columns of Haplotypes
.
chrom
The chromosome that the SNV resides on.
position
The position of the SNV in base pairs.
afreq
The derived allele frequency of the SNV.
marker
A unique character identifier for the SNV.
type
The mutation type, as specified in the user's slim simulation.
pathwaySNV
Identifies SNVs located within the pathway of interest as TRUE
.
Please note: the variable pathwaySNV
will be omitted when pathway_df
is not supplied to read_slim
.
An object of class SNVdata
, which inherits from a list
and contains:
|
A sparse matrix of haplotypes. See details. |
|
A data frame cataloging SNVs in |
Haller, B., Messer, P. W. (2017). Slim 2: Flexible, interactive forward genetic simulations. Molecular Biology and Evolution; 34(1), pp. 230-240.
Douglas Bates and Martin Maechler (2018). Matrix: Sparse and Dense Matrix Classes and Methods. R package version 1.2-14. https://CRAN.R-project.org/package=Matrix
create_slimMap
, combine_exons
, dgCMatrix-class
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | # Specify the URL of the example output data simulated by SLiM.
file_url <-
'https://raw.githubusercontent.com/cnieuwoudt/Example--SLiMSim/master/example_SLIMout.txt'
s_out <- read_slim(file_url)
class(s_out)
str(s_out)
# As seen above, read_slim returns an object of class SNVdata,
# which contians two items. The first is a sparse matrix
# named Haplotypes, which contains the haplotypes for each indiviual in the
# simulation. The second item is a data set named Mutations, which catalogs
# the mutations in the Haplotypes matrix.
# View the first 5 lines of the mutation data
head(s_out$Mutations, n = 5)
# view the first 20 mutations on the first 10 haplotypes
s_out$Haplotypes[1:10, 1:20]
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.