pseudoRef: 'pseudoRef' Make a pseudo reference genome.
In yangjl/pseudoRef: Substitute Reference Genome with Sample Specific SNP variants

Description Usage Arguments Value Examples

View source: R/pseudoRef.R

pseudoRef Make a pseudo reference genome.

1	pseudoRef(fa, snpdt, sidx = 5:ncol(snpdt), arules = NULL, outdir)

`fa`	Path for the reference fasta file. [string or DNAStringSet/DNAString object]
`snpdt`	A data.table object with heterozygote SNPs coded with IUPAC ambiguity codes. [data.table, 4 required columns: chr, pos, ref, alt, (sample1, ..., sampleN)]
`sidx`	A vector to indicate the sample columns. [vector, default=5:ncol(snpdt)].
`arules`	Additional nucleotide substitution rules defined by users. [data.frame, 2 required columns: from, to, default=NULL] For example, arules <- data.frame(from=c("M", "Y", "R", "K"), to=c("C", "C", "G", "T")).
`outdir`	Output directory. Sample specific sub-folders will be created. [string]

A list of summary statistics of subsituted nucleotides. [list].

# First of all, use BCFtools to convert VCF into IUPAC coded data.table:

# bcftools view JRI20_filtered_snps_annot.bcf.gz -m2 -M2 -v snps -Oz -o JRI20_bi_snps_annot.vcf.gz
# bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%IUPACGT]\n' JRI20_bi_snps_annot.vcf.gz > JRI20_bi_snps_annot.txt
# bcftools query -f 'chr\tpos\tref\talt[\t%SAMPLE]\n' JRI20_bi_snps_annot.vcf.gz > JRI20_bi_snps_annot.header

arules <- data.frame(from=c("M", "Y", "R", "K"), to=c("C", "C", "G", "T"))
res <- pseudoRef(fa, snpdt, sidx=5:24, arules, outdir)