View source: R/gene_pos_counts.R
gene_pos_counts | R Documentation |
Function returns matrix with allelic counts per gene per individual for SNP and gene coordinates as inputs
gene_pos_counts(dt_gen,dt_snp,dt_gene,keep_indiv=NULL, extract_SNP=NULL,filter_gene=NULL, impute_missing=FALSE,impute_method="mean")
dt_gen |
a dataframe for genetic data that follows PLINK format (.raw) |
dt_snp |
a dataframe for SNP information with SNP BP as column names. |
dt_gene |
a dataframe for gene boundaries with CHR START END GENE as column names. Where CHR should be integer 1-22. START and END column should be integer. GENE column contains gene names |
keep_indiv |
an option to specify individuals to retain. Mutation counts will be provided for individuals provided in the list only. Default is all individuals. |
extract_SNP |
an option to specify SNPs for which mutation counts are needed. Mutation counts will be provided for SNPs included in the list only. Default is all SNPs. |
filter_gene |
an option to filter in Genes. Mutation counts will be provided for genes included in the list only. Default is all genes. |
impute_missing |
an option to impute missing genotypes. Default is FALSE. |
impute_method |
an option to specify method to specify imptuation method. Default method is impute to the mean. Alternatively imputation can be carried out by median. Function accepts method in quotes: "mean" or "median". Data are rounded to the second decimal places (e.g. 0.1234 will become 0.12.). |
Inputs needed are: recoded genetic data formatted in PLINK format, SNP name with BP (position) and gene name with START and END position. The first six columns of the input genetic data follow standard PLINK .raw format. Column names as FID, IID, PAT, MAT, SEX and PHENOTYPE followed by SNP information as recoded by the PLINK software. The function returns allelic counts per gene per sample (where each row represents a gene and each column represents an individual starting with the second column where first column contains gene information).
Returns an object of data.table class as an output with allelic gene counts within each sample where each row corresponds to gene and column to individual IDs from column second. The first column contains gene names.
Sanjeev Sariya
#Package provides sample data that are loaded with package loading. #not RUN data(recodedgen) #PLINK raw formatted data of 10 individiduals with 10 SNPs data(genecoord) #gene coordinates with START, END, CHR and GENE names. #Five genes with start and end genomic coordinates data(snppos) #SNP and BP column names with SNP names and SNP genomic location in BP. #10 SNPs with genomic location gene_pos_counts(recodedgen, snppos, genecoord) #run the function #subset individuals gene_pos_counts(recodedgen, snppos, genecoord,keep_indiv=c("IID_sample2","IID_sample4")) #subset genes gene_pos_counts(recodedgen,snppos,genecoord,filter_gene=c("GENE1","GENE2")) #subset genes and individual iids gene_pos_counts(recodedgen,snppos,genecoord,filter_gene=c("GENE1","GENE2"), keep_indiv=c("IID_sample10","IID_sample4")) ##impute by mean gene_pos_counts(recodedgen,snppos,genecoord,impute_missing=TRUE,impute_method="mean") #end not RUN
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.