CrossInfo-class: An S4 class to hold yeast cross information.

Description Slots Chromosomes and Sequences Phenotype IDs Time-Series Phenotypes Locus IDs Sample IDs Tetradic Samples

Description

A CrossInfo object holds yeast cross information for a specific cross object. The contents of its slots should match its corresponding object. To view documentation for any methods of this class, input the name of the method preceded by a question mark (e.g. ?getPhenotypes).

Slots

seq

A non-redundant character vector of sequence identifiers, with the name of each element being the name of the given sequence. See also setSequences and getSequences.

pheno

A non-redundant vector of cross phenotypes, with the name of each element being the syntactically valid name of the phenotype ID (as output by the function make.names). See also setPhenotypes and getPhenotypes.

markers

A data.frame with information about the non-redundant set of markers in a cross (see setMarkers and getMarkers). This can optionally contain information about the sequences corresponding to each marker (see setMarkerSeqs and getMarkerSeqs).

samples

A data.frame with information about the samples in a cross. At minimum, this must contain indices of the samples in the given cross dataset. If relevant, it can contain information about sample IDs (see setSamples and getSamples), strain indices (see setStrainIndices and getStrainIndices), and tetrad indices (see setTetradIndices and getTetradIndices).

alleles

A vector of cross allele symbols. See setAlleles and getAlleles.

genotypes

A vector of cross genotype symbols. See setGenotypes and getGenotypes.

crosstype

Cross type. See setCrosstype and getCrosstype.

Chromosomes and Sequences

Although for some yeast genome assemblies they are equivalent, chromosomes (cell structures containing genetic material) are treated by shmootl as being distinct from sequences (linkage units that corresponding to all or part of a chromosome). This distinction is necessary to allow for use of reference genomes in which multiple sequences map to a single chromosome. (see genomeOpt for more on setting a reference genome.) While every sequence must be mapped to a specific chromosome, it is sequences, and not chromosomes, that are used as the primary linkage unit throughout this package.

Chromosomes

A yeast nuclear chromosome can be represented by an Arabic number in the range 1 to 16, inclusive; or by the Roman numeral corresponding to the chromosome number. The mitochondrial chromosome can be represented by the number 17 or a capital 'M'. A chromosome label can include one of the optional prefixes 'c' or 'chr'. So for example, any of the following can represent chromosome 4:

Using the function normChr, all of these representations can be normalised to one consistent form: a zero-padded Arabic number (i.e. '04'). This is used internally by shmootl as a normalised representation, and is recommended.

Sequences

For genomes in which every sequence represents a specific chromosome, the sequence label is identical to the chromosome label. In other cases, the sequence label should be a chromosome label followed by a sequence-specific label (e.g. contig ID), separated by an underscore. For example, a contig '1D22' that maps to chromosome 4 can be represented as follows:

Variations in chromosome representation are possible as before, but the sequence-specific label must be consistent. As with chromosomes, the function normSeq can be used to normalise all of these forms to one consistent representation: a zero-padded Arabic number followed by the sequence-specific label (i.e. '04_1D22'). This representation is recommended, as it is used internally by shmootl as a standard way to label sequences in a genome lacking a one-to-one correspondence between sequences and chromosomes.

Phenotype IDs

A phenotype ID can be any valid item ID (see package overview), although it may be changed by R/qtl to ensure that it is syntactically valid. In such cases, the original phenotype ID can be obtained from the 'info' attribute of a cross that has been loaded with readCrossCSV (see CrossInfo).

Time-Series Phenotypes

A set of phenotypes can be designated as a time-series by naming each phenotype with the time point at which phenotype observations were made (e.g. '0.0', '1.0', '2.0'). Time points can be in any unit, but must be non-negative, monotonically increasing, and have a consistent time step. If some time points are missing, the resulting gap in time must be a multiple of the time step.

Locus IDs

Map locus IDs can be any valid item ID (see package overview), and are of two main types: markers and pseudomarkers. A marker ID is any valid locus ID that is not a pseudomarker ID. Pseudomarker IDs are used by R/qtl for inter-marker loci. They indicate the reference sequence and genetic map position of the locus (e.g. 'c04.loc33'for a locus at position 33cM on chromosome IV).

Sample IDs

A sample ID can be any valid item ID (see package overview). Duplicate sample IDs are permissible, but only if referring to replicate samples of the same strain. Different strains can have different numbers of replicates, but samples from a given strain must be in consecutive rows.

Tetradic Samples

Sample IDs can be used to indicate tetrad membership, even in a cross object with some missing samples. In a tetradic dataset, sample IDs with a numeric suffix (e.g. 'FS101') are taken as segregant numbers and used to infer the tetrad to which each sample ID belongs, assuming that tetrads are labelled sequentially, with four samples per tetrad. Sample IDs can also have an alphanumeric suffix (e.g. 'FS01A'), where the numeric part is a tetrad number and the final letter (i.e. 'A', 'B', 'C', or 'D') identifies the individual tetrad member.


gact/shmootl documentation built on Nov. 11, 2021, 6:23 p.m.