For example,
GRCm38_68.fa
.
wget ftp://ftp-mouse.sanger.ac.uk/ref/GRCm38_68.fa
For example,
Mus_musculus.GRCm38.68.gtf.gz
for corresponding reference genome version.
wget ftp://ftp.ensembl.org/pub/release-68/gtf/mus_musculus/Mus_musculus.GRCm38.68.gtf.gz
For example,
mgp.v5.merged.snps_all.dbSNP142.vcf.gz
for multiple mice lines.
wget ftp://ftp-mouse.sanger.ac.uk/current_snps/mgp.v5.merged.snps_all.dbSNP142.vcf.gz
Note: you can do vcf calling by using mutect, varscan, etc. if you heve no pre-existing vcf
Reference genome fasta should be indexed (samtools faidx
).
vcf-files should be compressed (bgzip
) and indexed (samtools tabix
).
One script to rule them all:
python3 /full/path/to/ASE/python/prepare_reference_tmp.py --PSEUDOREF True --HETVCF True \
--pseudoref_dir /full/path/to/dir/for/pseudo/ref/out/ \
--vcf_dir /full/path/todir/for/vcf/outputs/ \
--ref /full/path/to/GRCm38_68.fa \
--name_mat 129S1_SvImJ --name_pat CAST_EiJ \
--vcf_joint mgp.v5.merged.snps_all.dbSNP142.vcf.gz \
--gtf /n/scratch2/sv111/ASE/Mus_musculus.GRCm38.68.gtf
For help:
python3 /home/am717/ASE/python/prepare_reference_tmp.py --help
--PSEUDOREF True
| | Inbred lines | Inbred lines | Individual | Individual | | --- | --- | --- | --- | --- | | | Joint lines vcf | Separate line(s) vcf | Joint individuals vcf | Separate individual vcf | | FASTA Reference genome | --ref | --ref | --ref | --ref | | VCF Variant file(s)[1] | --vcf_joint | --vcf_mat, --vcf_pat | --vcf_joind | --vcf_ind | | Name(s) | --name_mat, --name_pat | --name_mat, --name_pat | --name_ind | --name_ind | | FASTA Output directory | --pseudo_dir | --pseudo_dir | --pseudo_dir | --pseudo_dir | | VCF Output directory | --vcf_dir | --vcf_dir | --vcf_dir | --vcf_dir |
[1] If one or the alleles in case of inbred lines is reference, then everything should be provided as mat or pat only, consistently.
--HETVCF True
| | Inbred lines | Inbred lines | Individual | Individual | | --- | --- | --- | --- | --- | | | Joint lines vcf | Separate line(s) vcf | Joint individuals vcf | Separate individual vcf | | FASTA Reference genome | --ref | --ref | --ref | --ref | | nothing or GTF or BED Selected regions annotation [2] | --gtf or --bed | --gtf or --bed | --gtf or --bed | --gtf or --bed | | VCF Variant file(s)[1] | --vcf_joint | --vcf_mat, --vcf_pat | --vcf_joind | --vcf_ind | | Name(s) | --name_mat, --name_pat | --name_mat, --name_pat | --name_ind | --name_ind | | VCF Output directory | --vcf_dir | --vcf_dir | --vcf_dir | --vcf_dir |
[1] If one or the alleles in case of inbred lines is reference, then everything should be provided as mat or pat only, consistently. [2] Bed will be considered as main; if gtf provided, automatically considered regions=exons and groups=genes; bed file should have 4 columns and prepared in advance: contig, start position, end position, group ID (no colnames).
STAR --runMode genomeGenerate
).pseudoRefDirs=/full/path/to/dir/for/pseudo/ref/out/
STAR --runMode genomeGenerate --genomeDir $pseudoRefDirs/129S1_SvImJ/ --genomeFastaFiles $pseudoDir/129S1_SvImJ/129S1_SvImJ_pseudo.fa
STAR --runMode genomeGenerate --genomeDir $pseudoRefDirs/CAST_EiJ/ --genomeFastaFiles $pseudoDir/CAST_EiJ/CAST_EiJ_pseudo.fa
gunzip
) sample fasta/fastq files.Make sure that you have: * bed file (with four columns: contig, start and end positions, group ID; no column names) with selected regions, for example, exon regions grouped by genes:
awk '$3=="exon" && ($1 ~ /^[1-9]/ || $1 == "X" || $1 == "Y")' /full/path/to/Mus_musculus.GRCm38.68.gtf | cut -f1,4,5,9 | awk -F $'\t' 'BEGIN {OFS = FS} {split($4, a, "gene_id \""); split(a[2], b, "\""); print $1, $2-1, $3, b[1]}' > /full/path/to/output/Mus_musculus.GRCm38.68.EXONS.bed
grep "^#CHROM" /full/path/to/Het_Allelic_129S1_SvImJ_CAST_EiJ.exons.vcf | cut -f1-5 > /full/path/to/Het_Allelic_129S1_SvImJ_CAST_EiJ.snp_table.txt
grep -v "^#" /full/path/to/Het_Allelic_129S1_SvImJ_CAST_EiJ.exons.vcf | sort -V | cut -f1-5 >> /full/path/to/Het_Allelic_129S1_SvImJ_CAST_EiJ.snp_table.txt
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.