PD4107a: Somatic mutations data set from a primary breast cancer...
In ClusteredMutations: Location and Visualization of Clustered Somatic Mutations

Description Usage Format Details Source References Examples

PD4107a is a data set of somatic substitution mutations from a primary breast cancer whole genome with a germline mutation in BRCA1 (Nik-Zainal et al. 2012). The data set contains five variables: sample name, chromosome where the somatic mutation is located, location of the somatic mutation, the reference base and the mutated base.

The complete set of somatic mutations from a patient with breast cancer (PD4107a) was provided by the Cancer Genome Project group at the Wellcome Trust Sanger Institute (Alexandrov et al. 2013). Mutations with Indel labels were deleted (only subs).

1	data(PD4107a)

A data frame with 9879 observations on the following 5 variables.

Sample_id: : PD4107a.
Chr: : From chromosome 1 to chromosome X.
Position: : Mutation locations on the chromosome.
Ref_base: : The reference base in the mutation locations.
Mutant_base: : The mutated base in the mutation locations.

Patient PD4107a has been described throughout the scientific literature (Alexandrov et al 2013; Fischer et al 2013; Muino et al 2014; Nik-Zainal et al 2012; Roberts et al 2013).

ftp://ftp.sanger.ac.uk/pub/cancer/AlexandrovEtAl/somatic_mutation_data/Breast/Breast_clean_somatic_mutations_for_signature_analysis.txt

Alexandrov LB, Nik-Zainal S, Wedge DC, et al. Signatures of mutational processes in human cancer. Nature. 2013 Aug 22;500(7463):415-21.

Hahsler M and Hornik K. Dissimilarity plots: A visual exploration tool for partitional clustering. Journal of Computational and Graphical Statistics, 10(2):335–354, June 2011.

Fischer A, Illingworth CJ, Campbell PJ, et al. EMu: probabilistic inference of mutational processes and their localization in the cancer genome. Genome Biol. 2013 Apr 29;14(4):R39.

Muino JM, Kuruoglu EE, Arndt PF. Evidence of a cancer type-specific distribution for consecutive somatic mutation distances. Comput Biol Chem. 2014 Aug 23. pii: S1476-9271(14)00091-7.

Nik-Zainal S, Alexandrov LB, Wedge DC, et al; Breast Cancer Working Group of the International Cancer Genome Consortium. Mutational processes molding the genomes of 21 breast cancers. Cell. 2012 May 25;149(5):979-93.

Roberts SA, Lawrence MS, Klimczak LJ, et al. An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers. Nat Genet. 2013 Sep;45(9):970-6.

data(PD4107a)

###PD4107a data set;
head(PD4107a,12)

###generate new data set with intermutational distance;
#rainfall<-imd(data=PD4107a,chr=Chr,position=Position)
###Rainfall plot for PD4107a cancer sample;
#plot(rainfall$number, rainfall$log10distance,pch=20,
#	ylab="Intermutation distance (bp)",xlab="PD4107a",yaxt="n", 
#	col=c(rep(c("black","red"),14)[rainfall$chr]))
#axis(2, at=c(0,1,2,3,4,6), labels=c("1","10","100","1000","10000","1000000"),
#	las=2, cex.axis=0.6)

###Locate the clustered mutations;
#showers(data=PD4107a,chr=Chr,position=Position)

###Visualizes a dissimilarity mutation matrix using seriation and matrix shading 
### using the method developed by Hahsler and Hornik (2011).
###Chromosome 6;
#mut.matrix <- dissmutmatrix(data=PD4107a,chr=Chr,position=Position,subset=6)
#dissplot(mut.matrix, method=NA, options=list( col = 
#  c("black","navy","blue","cyan","green","yellow","orange","red",
#	"darkred","darkred","white")))

Loading required package: seriation
   Sample_id Chr Position Ref_base Mutant_base
1    PD4107a   1  1857336        G           A
2    PD4107a   1  2329409        A           G
3    PD4107a   1  2620133        C           G
4    PD4107a   1  3050359        G           C
5    PD4107a   1  3093904        T           A
6    PD4107a   1  3802432        T           G
7    PD4107a   1  4062326        C           A
8    PD4107a   1  4088641        C           T
9    PD4107a   1  4326173        C           T
10   PD4107a   1  5166593        G           C
11   PD4107a   1  5337071        G           T
12   PD4107a   1  5411032        T           C