snpPos: Find the column numbers of SNP identifiers/SNP numbers in a...

View source: R/snpPos.R

snpPosR Documentation

Find the column numbers of SNP identifiers/SNP numbers in a ped file

Description

Gives the column numbers of SNP identifiers or SNP numbers in a standard ped file, calculated from the SNP's positions in the corresponding map file. The column numbers are sorted in the same order as snp.select. These positions may be useful when extracting a selection of SNPs from a ped file.

Usage

snpPos(snp.select, map.file, blank.lines.skip = TRUE)

Arguments

snp.select

A character vector of the SNP identifiers (RS codes) or a numeric vector of the SNP numbers.

map.file

A character string giving the name and path of the standard map file to be used. See Details for a description of the standard map format.

blank.lines.skip

Logical. If "TRUE" (default), snpPos ignores blank lines in map.file.

Details

To extract certain SNPs from a standard ped file, one has to know their positions in the ped file. This can be obtained from the corresponding map file.

The map file should look something like this:

Chromosome SNP-identifier Base-pair-position
1               RS9629043             554636
1              RS12565286             711153 
1              RS12138618             740098

Alternatively, the map file could contain four columns. The column values should then be: Chromosome, SNP-identifier, Genetic-distance, Base-pair-position.
A header must be added to the map file if this does not already exist.

The format of the corresponding ped file should be something like this:

1104  1104-1  1104-2  1104-3  1  2  4  1  3  2
1104  1104-2       0       0  1  1  4  1  2  2
1104  1104-3       0       0  2  1  0  0  0  0
1105  1105-1  1105-2  1105-3  2  2  1  1  2  2
1105  1105-2       0       0  1  1  1  1  2  2
1105  1105-3       0       0  2  1  1  1  3  2

The column values are: Family id, Individual id, Father's id, Mother's id, Sex (1 = male, 2 = female, alternatively: 1 = male, 0 = female), and Case-control status (1 = controls, 2 = cases, alternatively: 0 = controls, 1 = cases).
Column 7 and onwards contain the genotype data, with alleles in separate columns. A “0” is used to denote missing data.

Value

A vector of the column numbers of the SNP identifiers/SNP numbers in the ped file, sorted in the same order as given in snp.select.

Note

The function does not check if the map file is formatted correctly or if the map and ped file have the same number of SNPs. The corresponding positions of the SNPs in the ped file may not be correct if the ped file has a different format from the given example.

Author(s)

Miriam Gjerdevik,
with Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
hakon.gjessing@uib.no

References

Web Site: https://haplin.bitbucket.io

See Also

convertPed, lineByLine

Examples


## Not run: 

# Find the column numbers of the SNP identifiers "RS9629043" and "RS12565286" in 
# a standard ped file
snpPos(snp.select = c("RS9629043", "RS12565286"), map.file = "mygwas.map")

## End(Not run)


Haplin documentation built on Sept. 11, 2024, 7:13 p.m.