convertPed | R Documentation |
NOTE: This function is probably less useful now that GenABEL is no longer used by Haplin. The function is used to prepare a ped file for loading into GenABEL. However, GenABEL requires unique individual IDs in the file, not only unique within family. Furthermore, numeric allele coding 1,2,3,4 is not accepted. To fix this, convertPed
can be run prior to running prepPed
. This will create unique IDs and do the necessary allele recoding, and possibly also select and reorder SNPs. convertPed
will also update the corresponding map file.
convertPed(ped.infile, map.infile, ped.outfile, map.outfile, create.unique.id = FALSE,
convert, snp.select = NULL, choose.lines = NULL, col.sep = " ",
ask = TRUE, blank.lines.skip = TRUE, verbose = TRUE)
ped.infile |
A character string giving the name of the standard ped file to be modified. The name of the file is relative to the current working directory, unless the file name contains an absolute path. |
map.infile |
A character string giving the name and path of the to-be-modified standard map file. Optional if snp.select = NULL. A description of the standard map format is given in the Details section. |
ped.outfile |
A character string of the name and path of the converted ped file. |
map.outfile |
A character string giving the name and path of the modified map file. |
create.unique.id |
Logical. If "TRUE", the function creates a unique individual ID. |
convert |
No default. The option "ACGT_to_1234" recodes the SNP alleles from A,C,G,T to 1,2,3,4, whereas "1234_to_ACGT" converts from 1,2,3,4 to A,C,G,T. If "no_recode", no conversion occurs. |
snp.select |
A character vector of the SNP identifiers (RS codes) or a numeric vector of the SNP numbers to be extracted. Default is "NULL", which means that all SNPs are selected without reordering among the SNPs. The RS codes or SNP numbers may be listed in any order. Reordering among the selected SNPs will occur in the modified files corresponding to this listing. |
choose.lines |
A numeric vector of lines to be selected from the ped file. If "NULL" (default), all lines are selected. |
col.sep |
Specifies the separator that splits the columns in |
ask |
Logical. Default is "TRUE". If set to "FALSE", an already existing outfile will be overwritten without asking. |
blank.lines.skip |
Logical. If "TRUE" (default), |
verbose |
Logical. Default is "TRUE", which means that the line number is displayed for each iteration, i.e. each line read and modified, in addition to the first ten columns of the converted line. |
convertPed
assumes a standard ped file as input.
The format of the ped file should look something like this:
1104 1 2 3 1 2 4 1 3 2 1 1 1104 2 0 0 1 1 4 1 2 2 4 1 1104 3 0 0 2 1 0 0 0 0 0 0 1105 1 2 3 2 2 1 1 2 2 4 1 1105 2 0 0 1 1 1 1 2 2 1 1 1105 3 0 0 2 1 1 1 3 2 4 4
The column values are: Family ID, Individual ID, Father's ID, Mother's ID, Sex (1 = male, 2 = female, alternatively: 1 = male, 0 = female), and Case-control status (1 = controls, 2 = cases, alternatively: 0 = controls, 1 = cases).
Column 7 and onwards contain the genotype data, with alleles in separate columns, two columns representing one SNP. A “0” is used to denote missing data.
The corresponding map file should look something like this:
Chromosome SNP-identifier Base-pair-position 1 RS9629043 554636 1 RS12565286 711153 1 RS12138618 740098
Alternatively, the map file could contain four columns. The column values should then be:
Chromosome, SNP-identifier, Genetic-distance, Base-pair-position.
A header must be added to the map file if this does not already have one.
After creating unique individual IDs and recoding the SNP alleles from 1,2,3,4 to A,C,G,T (using convertPed
with options create.unique.id = TRUE
and convert = "1234_to_ACGT"
),
the ped file above should look like this:
1104 1104_1 1104_2 1104_3 1 2 T A G C A A 1104 1104_2 0 0 1 1 T A C C T A 1104 1104_3 0 0 2 1 0 0 0 0 0 0 1105 1105_1 1105_2 1105_3 2 2 A A C C T A 1105 1105_2 0 0 1 1 A A C C A A 1105 1105_3 0 0 2 1 A A G C T T
There is no useful output; the objective of convertPed
is the converted ped file and the modified map file.
The function does not check if the ped or map file is formatted correctly. For instance, if the alleles follows the generic A/B Illumina coding, convertPed
may still be used to create unique individual IDs and extract a selection of SNPs. Using convert = "ACGT_to_1234"
would however, result in nonsense.
Miriam Gjerdevik,
with Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
hakon.gjessing@uib.no
Web Site: https://haplin.bitbucket.io
lineByLine
, Haplin:::lineConvert
, snpPos
## Not run:
# Create unique individual IDs and recode SNP alleles from 1,2,3,4 to A,C,G,T
convertPed(ped.infile = "mygwas.ped", map.infile = "mygwas.map",
ped.outfile = "mygwas_modified.ped", map.outfile = "mygwas_modified.map",
create.unique.id = TRUE, convert = "1234_to_ACGT", ask = TRUE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.