basic_SNP_annotation: SNP annotation using biomaRt and/or manufacturer data

Description Usage Arguments Details Value Author(s)

View source: R/HLP_basic_SNP_annotation.R

Description

basic_SNP_annotation adds annotation data to SNP IDs using biomaRt and/or a manufacturer's annotation file while keeping the order of input SNPs.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
basic_SNP_annotation(
  data,
  max.SNPs.per.biomaRt.call = 10000,
  data.SNP.columnName = "SNP",
  snpmaRt = useMart("ENSEMBL_MART_SNP", host = "feb2014.archive.ensembl.org", dataset =
    "hsapiens_snp"),
  biomaRt.SNP.columnName = "refsnp_id",
  biomaRt.filter = "snp_filter",
  biomaRt.attributes.groupColumns = c("refsnp_id", "chr_name", "chrom_start"),
  biomaRt.attributes.summarized = c("ensembl_gene_stable_id", "ensembl_type"),
  annotationFile = NULL,
  lines2skip.start = 0,
  lines2skip.end = -1,
  annofile.SNP.columnName = "Name",
  annofile.columns = NULL
)

Arguments

data

dataframe containing SNP IDs.

max.SNPs.per.biomaRt.call

numeric. Number of SNP IDs to be queried in biomaRt at once.

data.SNP.columnName

character with column name of SNP IDs in data or "row.names".

snpmaRt

biomaRt object to be used for annotation. If NULL, biomaRt annotation is skipped.

biomaRt.SNP.columnName

character with attribute name for SNP IDs of the biomaRt object.

biomaRt.filter

character with filter name to be used in biomaRt query.

biomaRt.attributes.groupColumns

character vector with attribute names to be queried in biomaRt.

biomaRt.attributes.summarized

character vector with further attribute names, which will be summarized according to the attributes in biomaRt.attributes.groupColumns (separated by ";").

annotationFile

dataframe or character with path to dataframe containing annotation data by the assay manufacturer. If NULL, annotation is skipped.

lines2skip.start

Numeric with number of rows to skip when loading annotationFile or regular expression for character string to identify corresponding row number to be skipped, e.g. [Assay] in Illumina annotation files.

lines2skip.end

Numeric with number of rows to read when loading annotationFile or regular expression for character string to identify corresponding row number to be read, e.g. [Controls] in Illumina annotation files as start of annotation of control probes. All rows starting from that number (incl. lines2skip.end) are skipped. Negative and other invalid values are ignored.

annofile.SNP.columnName

character with column name of SNP IDs in annotationFile.

annofile.columns

Optional character vector with column names of annotationFile to be included. If NULL, all columns of annotationFile are merged

Details

This function uses the SNP ID column from a given dataframe as input for adding annotation data. All annotation data is added in additional columns and does not change the order of input SNP IDs. Since biomaRt queries of large datasets (e.g. from SNP arrays) are prone to service malfunction, basic_SNP_annotation divides the data in chunks of feasible size given in max.SNPs.per.biomaRt.call.

Before biomaRt data are merged to input data, data columns containing multiple entries per entries (given in biomaRt.attributes.summarized) are collapsed separated by a semicolon. Data columns given in biomaRt.attributes.groupColumns are considered as grouping variables.

If a annotationFile is specified, all included data is merged to the input dataframe. The annotationFile may be supplied directly as dataframe or as character containing a file path. In latter case, the file is automatically loaded.

Value

input dataframe annotated with biomaRt and/or manufacturer data in additional columns (starting with "SNPMart_" or "Annofile_", respectively). Order of entries within the dataframe remains unchanged.

Author(s)

Frank Ruehle


frankRuehle/systemsbio documentation built on Sept. 14, 2020, 1:18 a.m.