compareProbes: Compare and combine results from two probes for the same SNP

View source: R/fitPolyTools.R

compareProbesR Documentation

Compare and combine results from two probes for the same SNP

Description

On Affymetrix Axiom arrays it is possible to have two probes interrogating the same SNP position. This function compares the dosage scores and checkF1 results of the two probes; if they are sufficiently similar a new marker is generated combining the results of the two probes. A dosage file with the data for the separate probes as well as the combined markers is written with the same format as writeDosagefile, and also a file summarizing the comparison results.

Usage

compareProbes(chk, scores,
probe.suffix=c("P","Q","R"), fracdiff.threshold=0.04,
parent1, parent2, F1, ancestors=character(0), other=character(0),
polysomic=TRUE, disomic=FALSE, mixed=FALSE,
ploidy, ploidy2, qall_flavor="qall_mult", shiftParents,
compfile, combscorefile)

Arguments

chk

data frame as returned by checkF1, or a subset with at least columns markername, parent1, parent2 (the consensus parental genotypes), the columns for the samples specified by parameters parent1, parent2 and ancestors, and bestParentfit, and containing only rows with selected markers. If a column with a name as specified by qall_flavor (see below) is present this will be written to file compfile, but it is not used: any selection of marker based on qall (or other) must have been made beforehand, and the rows for the unwanted markers must have been deleted from the chk data frame.
For each marker*probe combination there may be an unshifted version (shift==0), a shifted one (shift!=0), both, or neither.
If a column shift is present it will be used to shift the dosages (and their P-values with them). If some markernames end in "_shf" this part will be ignored, but the P and Q suffixes (or alternatives as specified by probe.suffix) are required to distinguish the two probes.

scores

data frame as read from the scores file produced by function fitMarkers of package fitPoly, with at least columns MarkerName, SampleName, P0 .. P<ploidyF1> and geno (where <ploidyF1> is the ploidy of the F1, i.e. the average of parental ploidy and ploidy2).
If the F1 parents are scored separately, their rows should be added to the scores data.frame for the F1 samples. If their ploidy is different from the F1, the number of their P columns must be adjusted. The P data of the parents are not used, they may all be set to NA.

probe.suffix

a 3-item character vector specifying the suffixes of the marker names that distinguish the two probes. The first two items identify the two probes; the third item is used to indicate a new marker combining the data from both probes. The three items must be different and have the same number of characters default is c("P","Q","R")

fracdiff.threshold

if more than this fraction of F1 scores differs between probes, don't combine

parent1

character vector with the sample names of parent 1

parent2

character vector with the sample names of parent 2

F1

character vector with the sample names of the F1 individuals

ancestors

character vector with the sample names of any other ancestors

other

other samples that should be treated like the F1

polysomic

TRUE or FALSE; should be the same as used by checkF1 to calculate the chk data frame

disomic

TRUE or FALSE; should be the same as used by checkF1 to calculate the chk data frame

mixed

TRUE or FALSE; should be the same as used by checkF1 to calculate the chk data frame

ploidy

the ploidy of parent 1 (must be even, 2 (diploid) or larger), and the same as used by checkF1 to calculate the chk data frame

ploidy2

the ploidy of parent 2. If omitted it is assumed to be equal to ploidy. Should be the same as used by checkF1 to calculate the chk data frame

qall_flavor

which quality parameter column must be shown in compfile, default "qall_mult". If no quality data are wanted, specify "".

shiftParents

if there is a column shift in chk the F1 dosages will be shifted. If shiftParents is TRUE the parents and ancestors will be shifted together with the F1, if FALSE only the F1 will be shifted in that case.
If shiftParents is missing or NA it will be set to TRUE except if ploidy2 != ploidy: in that case this will result in an error (because it may be that the parents are not genotyped or scored together with the F1, the user should specify explicitly what to do)

compfile

filename for tab-separated text file summarizing the comparison results; if NA no file is written. For details of the contents see the return value, component compstat

combscorefile

filename for tab-separated text file with the dosages; if NA no file is written. For details of the contents see the return value, component combscores

Details

A combined marker is made in each case that a version of each of the two probe markers is present and they are sufficiently similar. This means that they have been assigned the same bestParentfit segregation type by checkF1, and that the frequency of conflicting scores over all samples is not more than fracdiff.threshold. The combined marker will have NA scores for individuals where both probe markers are missing, the one available score if it is scored for only one of the two probe markers or both scores are equal, and the score with the highest P-value if the scores for both probe markers are unequal.
Any single-probe markers in chk that do not have a bestParentfit segregation type are ignored and will not affect or appear in the output.

Value

A list with two components, compstat and combscores.
compstat is a data frame with columns:

  • MarkerName: name of the SNP marker. If a column shift is present in data.frame chk, unshifted and shifted markers will get a "n" or "s" suffixed to the MarkerName

  • segtypeP and segtypeQ: the segtype assigned by checkF1 to the first and second probe

  • qallP and qallQ: the quality scores specified by parameter qall_flavor, assigned by checkF1 to the two probes

  • countP and countQ: the number of versions of each of the probes (0, 1, or 2, depending on whether a shifted, unshifted or both versions were present)

  • countR: the number of combinations made of versions of the two probe markers (one for each combination of a version of each of the two probe markers, if they match well enough - see details)

If the chk data frame contains a column shift, there are separate columns for the non-shifted and shifted P and Q probe markers (suffix Pn, Ps, Qn, Qs), and four columns for the R markers (suffix Rnn, Rns, Rsn, Rss where the first n/s indicates if the P was non-shifted or shifted and the second n/s for the Q probe. combscores is a data frame with columns:

  • MarkerName: the name of the marker. If the chk data frame contains a column shift, the P and Q marker names are suffixed with n or s, and the R marker names with nn, ns, sn, ss as described above

  • segtype: the segregation type

  • parental and ancestor samples: the dosages of those samples

  • parent1: the consensus dosage for parent1 as determined by checkF1

  • parent2: the consensus dosage for parent2 as determined by checkF1

  • F1 samples: the dosages for those samples

  • other samples: the dosages for those samples


fitPoly documentation built on April 3, 2025, 8:58 p.m.