normChr: Normalise chromosome labels.

Description Usage Arguments Value Chromosomes and Sequences See Also

View source: R/chr.R

Description

Normalise chromosome labels.

Usage

1

Arguments

x

Vector of chromosome labels.

Value

Vector of normalised chromosome labels.

Chromosomes and Sequences

Although for some yeast genome assemblies they are equivalent, chromosomes (cell structures containing genetic material) are treated by shmootl as being distinct from sequences (linkage units that corresponding to all or part of a chromosome). This distinction is necessary to allow for use of reference genomes in which multiple sequences map to a single chromosome. (see genomeOpt for more on setting a reference genome.) While every sequence must be mapped to a specific chromosome, it is sequences, and not chromosomes, that are used as the primary linkage unit throughout this package.

Chromosomes

A yeast nuclear chromosome can be represented by an Arabic number in the range 1 to 16, inclusive; or by the Roman numeral corresponding to the chromosome number. The mitochondrial chromosome can be represented by the number 17 or a capital 'M'. A chromosome label can include one of the optional prefixes 'c' or 'chr'. So for example, any of the following can represent chromosome 4:

Using the function normChr, all of these representations can be normalised to one consistent form: a zero-padded Arabic number (i.e. '04'). This is used internally by shmootl as a normalised representation, and is recommended.

Sequences

For genomes in which every sequence represents a specific chromosome, the sequence label is identical to the chromosome label. In other cases, the sequence label should be a chromosome label followed by a sequence-specific label (e.g. contig ID), separated by an underscore. For example, a contig '1D22' that maps to chromosome 4 can be represented as follows:

Variations in chromosome representation are possible as before, but the sequence-specific label must be consistent. As with chromosomes, the function normSeq can be used to normalise all of these forms to one consistent representation: a zero-padded Arabic number followed by the sequence-specific label (i.e. '04_1D22'). This representation is recommended, as it is used internally by shmootl as a standard way to label sequences in a genome lacking a one-to-one correspondence between sequences and chromosomes.

See Also

Other chromosome/sequence functions: formatChr, formatSeq, isNormChr, isNormSeq, normSeq, orderChr, orderSeq, rankChr, rankSeq, sortChr, sortSeq


gact/shmootl documentation built on Nov. 11, 2021, 6:23 p.m.