gene_pos_counts: gene position counts

View source: R/gene_pos_counts.R

gene_pos_countsR Documentation

gene position counts

Description

Function returns matrix with allelic counts per gene per individual for SNP and gene coordinates as inputs

Usage

gene_pos_counts(dt_gen,dt_snp,dt_gene,keep_indiv=NULL,
extract_SNP=NULL,filter_gene=NULL,
impute_missing=FALSE,impute_method="mean")

Arguments

dt_gen

a dataframe for genetic data that follows PLINK format (.raw)

dt_snp

a dataframe for SNP information with SNP BP as column names.

dt_gene

a dataframe for gene boundaries with CHR START END GENE as column names. Where CHR should be integer 1-22. START and END column should be integer. GENE column contains gene names

keep_indiv

an option to specify individuals to retain. Mutation counts will be provided for individuals provided in the list only. Default is all individuals.

extract_SNP

an option to specify SNPs for which mutation counts are needed. Mutation counts will be provided for SNPs included in the list only. Default is all SNPs.

filter_gene

an option to filter in Genes. Mutation counts will be provided for genes included in the list only. Default is all genes.

impute_missing

an option to impute missing genotypes. Default is FALSE.

impute_method

an option to specify method to specify imptuation method. Default method is impute to the mean. Alternatively imputation can be carried out by median. Function accepts method in quotes: "mean" or "median". Data are rounded to the second decimal places (e.g. 0.1234 will become 0.12.).

Details

Inputs needed are: recoded genetic data formatted in PLINK format, SNP name with BP (position) and gene name with START and END position. The first six columns of the input genetic data follow standard PLINK .raw format. Column names as FID, IID, PAT, MAT, SEX and PHENOTYPE followed by SNP information as recoded by the PLINK software. The function returns allelic counts per gene per sample (where each row represents a gene and each column represents an individual starting with the second column where first column contains gene information).

Value

Returns an object of data.table class as an output with allelic gene counts within each sample where each row corresponds to gene and column to individual IDs from column second. The first column contains gene names.

Author(s)

Sanjeev Sariya

Examples

#Package provides sample data that are loaded with package loading. 
#not RUN
data(recodedgen) #PLINK raw formatted data of 10 individiduals with 10 SNPs

data(genecoord) #gene coordinates with START, END, CHR and GENE names. 
#Five genes with start and end genomic coordinates

data(snppos) #SNP and BP column names with SNP names and SNP genomic location in BP. 
#10 SNPs with genomic location

gene_pos_counts(recodedgen, snppos, genecoord) #run the function

#subset individuals
gene_pos_counts(recodedgen, snppos, genecoord,keep_indiv=c("IID_sample2","IID_sample4"))

#subset genes
gene_pos_counts(recodedgen,snppos,genecoord,filter_gene=c("GENE1","GENE2")) 

#subset genes and individual iids
gene_pos_counts(recodedgen,snppos,genecoord,filter_gene=c("GENE1","GENE2"),
keep_indiv=c("IID_sample10","IID_sample4")) 

##impute by mean
gene_pos_counts(recodedgen,snppos,genecoord,impute_missing=TRUE,impute_method="mean")

#end not RUN


sariya/GARCOM documentation built on Jan. 1, 2023, 8:29 a.m.