Description Usage Arguments Format Details See Also Examples
Efficiently reads a seqz file into R.
1 2 3 4 5 6 | read.seqz(file, n_lines = NULL, col_types = "ciciidddcddccc", chr_name = NULL,
buffer = 33554432, parallel = 1,
col_names = c("chromosome", "position", "base.ref", "depth.normal",
"depth.tumor", "depth.ratio", "Af", "Bf", "zygosity.normal",
"GC.percent", "good.reads", "AB.normal", "AB.tumor",
"tumor.strand"),...)
|
file |
file name |
col_types |
a string describing the classes of each columns of the
input file (see |
chr_name |
if specified, only the selected chromosome will be
extracted instead of the entire file. For |
n_lines |
vector of length 2 specifying the first and last line to read from the file. If specified, only the selected portion of the file will be used. |
buffer |
maximal size of each chunk in bytes(see
|
parallel |
integer, number of threads used to process a seqz file
(see |
col_names |
names of the columns of the seqz file. The default corresponds to the column names of a seqz file. |
... |
any arguments accepted by |
A seqz file is a tab-separated text file with 14 columns and a header row.
The first 3 columns are derived from the original pileup
file and contain:
the chromosome name
the base position
the base in the reference genome. Note that this is NOT necessarily the same base as in the normal specimen.
The remaining 10 columns contain the following information:
read depth observed in the normal sample
read depth observed in the tumor sample
ratio of depth.tumor
and depth.normal
A-allele frequency observed in the tumor sample
B-allele frequency observed in the tumor sample in heterozygous positions
zygosity of the reference sample. "hom" corresponds to AA or BB, whereas "het" corresponds to AB or BA
GC-content (percent), calculated from the reference genome in fixed nucleotide windows
number of reads that passed the quality threshold (threshold specified in the pre-processing software), in the tumor specimen
base(s) found in the germline sample; for heterozygous positions AB are sorted using the values of Af and Bf respectively
base(s) found in the tumor sample not present in the normal specimen. The field include all the variants found in the tumor alignment, separated by a colon. Each variant contains the base and the observed frequency
frequency of the variant nucleotides detected on
the forward orientation. The field have a consistent structure with
AB.tumor
, indicating the fraction, relative to the total
number of reads presenting the specific variant, orientated
in the forward direction
read.seqz
is a function that allows to efficiently access a
seqz
file by chromosome or by line numbers. The function can also
access coordinate specific regions with tabix
-indexed seqz
files.
The specific content of a seqz
file is explained in the value
section.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ## Not run:
data_file <- system.file("extdata", "example.seqz.txt.gz", package = "sequenza")
## read chromosome 1 from an seqz file.
seqz_data <- read.seqz(data_file, chr_name = 1)
## Fast access to chromosome X using the file metrics
gc.stats <- gc.sample.stats(data_file)
chrX <- gc.stats$file.metrics[gc.stats$file.metrics$chr == "X", ]
seqz.data <- read.seqz(data_file, n_lines = c(chrX$start, chrX$end))
## Compare the running time of the two different methods.
system.time(seqz.data <- read.seqz(data_file, n_lines = c(chrX$start, chrX$end)))
system.time(seqz.data <- read.seqz(data_file, chr_name = "X"))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.