localize: Localize CGH probes in a genome

Description Usage Arguments Value Coordinate system BLAT installation Author(s) References See Also

Description

localize returns genomic coordinates (chromosome, strand, starting position, ending position) of a set of probes into a given genome. It relies on the external Blast-Like Alignment Tool to perform fuzzy both-strands matching, and provides various filters suitable to CGH probes.

blatInstall needs to be executed once after the R package installation in order to use localize.

Usage

1
2
3
4
5
  blatInstall(blat, cygwin)

  localize(probeFile, chromFiles, chromPattern = "^(.+)\\.[^\\.]+$",
    blatArgs = character(0), rawOutput = FALSE, noMulti = TRUE, noOverlap = TRUE,
    noPartial = TRUE, verbose = 2)

Arguments

blat

Single character value, path to the BLAT executable file to use for localization.

cygwin

Single character value, path to the cygwin1.dll file that might be needed to run BLAT on Windows.

probeFile

Single character value, path to a multi-fasta file describing the probes to compute the bias for. FASTA comments are used as probe names, and should be unique.

chromFiles

Character vector, paths to chromosome sequences (a single fasta file for each chromosome).

chromPattern

Single character value, a regular expression to be used for chromosome name extraction from chromFiles. It needs to capture a single value for replacement, default value will use the base names of the files without extension as chromosome names.

blatArgs

Character vector, arguments to be passed to BLAT ("name=value" or "-flag"). See the BLAT documentation in 'References' for further details.

rawOutput

Single logical value, whether to return the merged BLAT output or the processed one (see 'Value'). Notice raw output is not filtered.

noMulti

Single logical value, whether to filter out probes located in multiple genomic positions or not. Ignored if rawOutput.

noOverlap

Single logical value, whether to filter out overlapping probes or not (when two overlapping probes are detected, both are discarded). Ignored if rawOutput.

noPartial

Single logical value, whether to filter out partial matches or not (they will still be used by other filters, to disable them completely consider using different BLAT arguments). Ignored if rawOutput.

verbose

Single numeric value, the level of verbosity (0, 1 or 2).

Value

If rawOutput, localize returns the tabular section of merged psLayout 3 file returned by BLAT (see the BLAT documentation in 'References' for further details).

Else returns a data.frame with a row for each probe that was found and not filtered, ordered by chrom, start then name :

name

Character, the probe names, as defined by comments in probeFile.

chrom

Character, the chromosomal location of the probe, as defined by the chromNames corresponding to the codechromFiles in which the probe matched.

strand

Character, "+" for a forward match, "-" for a reverse complement match.

start

Integer, the lower position of the probe in the chromosome. See 'Coordinate system'.

end

Integer, the upper position of the probe in the chromosome. See 'Coordinate system'.

insertions

Integer, amount of nucleotides inserted in the probe when refering to the chromosome sequence.

deletions

Integer, amount of nucleotides deleted in the probe when refering to the chromosome sequence.

mismatches

Integer, amount of mismatching nucleotides between probe and chromosome sequence.

freeEnds

Integer, amount of nucleotides at probe extremities ignored in the alignment.

Coordinate system

When rawOutput is FALSE, coordinates begin at 1, both boundaries are comprised in the sequence and length can be computed as end - start + 1 (Biostrings behavior).

When rawOutput, refer to BLAT specifications (See 'References').

In both cases, backward matches (strand = "-") are expressed in forward coordinates (start < end) (BLAT behavior).

BLAT installation

BLAT relies on a single executable file, so installation is straight-forward.

Download the executable file or compile it for your computer architecture, then simply use the blatInstall function to copy it to the proper package folder for further uses. Precompiled executables for various systems can be found on the author website (see 'References'), as part of the BlatSuite (only 'blat.exe' or 'blat' is needed).

Windows specificities

Running BLAT on Windows needs Cygwin. You can install Cygwin entirely on your system (see 'References'), or download the "cygwin1.dll" file and provide it to blatInstall, as it is the only Cygwin component needed. DLL is a common format for informatic viruses, so be sure of the website you download this file from. You can safely (no guarantee !) download it from the official website (see 'References') mirrors, they generally keep compressed archives in /release/cygwin in which you can find the DLL (in /usr/bin).

Author(s)

Sylvain Mareschal

References

BLAT is an open-source software freely available for academic, nonprofit and personal use. See the FAQ for further details. FAQ, specifications, source code and executables

Cygwin is a free and open-source software under GNU General Public Licencing. Official website

See Also

bias


cghRA documentation built on May 2, 2019, 3:34 a.m.