SOFIA: Making Sofisticated and Aesthetical Figures in R

Description Usage Arguments Details References Examples

Description

Automatically prepares all necessary configuration files and then runs Perl-based Circos directly from R.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
SOFIA(data,
chromoConfiguration=NULL,
dataColorFlag=FALSE,
dataColor=NULL,
plotType=NULL,
plotColor=NULL,
markerSize=NULL,
plotLocation=NULL,
plotBackground=NULL,
plotImportance=NULL,
plotOrientation=NULL,
density=NULL,
linksFlag=FALSE,
linkColor='red',
linkGeometry=c(.2,.2),
linkRadius=c(.7,.7),
blocksFlag=FALSE,
blocksData=NULL,
tilesFlag=FALSE,
tilesData=NULL,
tilesLocation=NULL,
blocksColor=rbind(c('blue','red'),c('green','purple')),
blocksLocation=c(.5,.01),
gaps=NULL,
ideogramThickness=20,
generalPlotConfFlag=FALSE,
generalPlotConf=NULL,
chrPrefixFont='upper',
chrPrefix='LG',
tickSeparation=50,
tickSuffix='cM',
circosLocation=NULL,
returnConf=FALSE,
circosDisplay=FALSE,
figureDisplay=TRUE,
deleteData=TRUE,
runCircos=TRUE,
confName='circos.conf')

Arguments

data

Genetic map(s) to be plotted into the Circos figure. As minimum, it has to have four variables named $locus, $chr, $pos and $map. $locus corresponds to the names of all the positions in the map (this can be marker names for example), $chr is the chromosome number for each locus, $pos is the position in the map for each locus, and $map defines if multiple maps are plotted (if a single map is going to be used, $map has to be 1's for all the rows (loci) in the dataframe). After this four variables (or columns), all the data to be plotted has to be integrated in additional variables. Specific names for these columns/variables are not requiered. NA's are ommited when constructing plots. There are six basic types of plots: 1) scatter plot, 2) heatmap, 3) heatmap_interval, 4) line, 5) text and 6) glyphs. In addition, four more types of specialized plots are included, these are 1) tiles, 2) links, 3) blocks and 4) density. These four plot types are controlled differently that the basic types, and the data corresponding to these plots are not contained into the dataframe "data".

chromoConfiguration

Dataframe with properties to configure the plot. It has to have as many rows as chromosomes in the dataframe "data". Five variables are requiered: the first two have to be called $map and $order, and they describe the order in which the chromosomes are going to be displayed in the Circos figure. Then, the 3rd variable must be named $rev and specifies if a specific chromosome is going to be reversed in the figure. Variable 4 has to be called $color and specifies the color of the bars that represent the chromosomes in the figure. Finally, the last variable is called $radius and specifies the radio of the chromosomes. As default, a chromoConfiguration dataframe is created in which the chromosome order is as found in the dataframe "data", all the chromosomes are oriented clockwise and colored grey.

dataColor

This option is used if the dataframe "data" posses more than 4 columns (when plots are going to be included in the figure), and specific colors want to be used for specific points in the plot. In that is the case, dataColor has to have the same number of rows and (columns+4) as the dataframe "data". Each column of dataColor correspond to each plot defined in the dataframe "data" after column 5.

dataColorFlag

When TRUE, the dataColor option for coloring single markers is used.

plotType

A vector defining the type of plot for each column anter column 4 in the dataframe "data". The option are 1) scatter plot, 2) heatmap, 3) heatmap_interval, 4) line, 5) text and 6) glyphs. If ommited, all the plots are going to be plotted as scatter plots, unless text is found.

plotColor

A vector defining the color of the plot for each column anter column 4 in the dataframe "data". If ommited, random colors are assigned.

markerSize

A vector defining the marker size to be used in the plots for each column anter column 4 in the dataframe "data". If ommited, random sizes are assigned.

plotLocation

A dataframe with as many rows as number of plots. Column 1 has to be named r0, it specifies the initial location of the plot (y0). Column 2 has to be named r1, it specifies the final location of the plot (y1). y1 must be > than y0 and y1<chromosome radius (see chromoConfiguration)

plotBackground

A dataframe with as many rows as number of plots. Column 1 has to be named backgroundShow, it specifies if a background is going to be displayed for every plot. Column 2 has to be named backgroundColor, it specifies the background color for every plot. Column 3 has to be called axisShow and specifies if horizontal axis is going to be display for every plot. Finally, column 4 has to be called axisSep and specifies the separation of the axis to be displayed.

plotImportance

A vector of numbers that control the overlapping of plots if their locations are the same. Larger numbers are plotted in the top.

plotOrientation

A vector containt "in" or "out" for each of the plots. This determines the orientation in the plot within the figure.

density

When used, it draws a histogram with the marker density. This has to be a dataframe with rows=1 and the following variables. $show (TRUE or FALSE) specifies if the density plot is showed or not. $bins (integer >0) controls in how many bins each chromosome is breaked for counting markers. $backgroundShow (TRUE or FALSE) determines if a brackground is displayed while $backgroundShow determines the color. $axisShow (TRUE or FALSE) determines if horizontal axis is displayed while $axisSep (numeric > 0) determines the separation between axes.

linksFlag

When TRUE and two different maps are plotted, links connecting common markers are displayed.

linkColor

Controls the color of the links when linksFlag=TRUE.

linkGeometry

A vecor of length=2 which determines the bezier_radius and the crest (from 0 to 1).

linkRadius

A vector of length=2 which determine the location of both ends of links. The first number in the vector corresponds to map 1 while the positon 2 corresponds to the map 2.

blocksFlag

When TRUE, recombination blocks are displayed.

blocksData

blocksData must be a matrix with rows equal to the number of markers and columns equal to the number of indivudals in the population. This genetic data must correspond to phased genotypes codified as 0's and 1's.

blocksColor

If a single map is used, a vector containg to colors that represent 0's and 1's. It two maps are used, four color are required.

blocksLocation

A vector of length=2. The first number corresponds to the outer position of the recombination blocks for plant 1. The second number correspond to the thickness of each block.

tilesFlag

When TRUE, tiles are displayed. Tiles are little horizontal bars draw at specific position in the maps. Color, location and length can be controlled

tilesData

tilData is a dataframe with as many rows as desired tiles, and the following variables. $map adn $chr specifies the map and chromosome in which the tile is going to be plotted. $pos1 and pos2 specifies the inital and final position of the tile. $color specifies the color of the tile.

tilesLocation

tilesLocation is a vector of length=2 in which the location of the tiles are draw (as r0 and r1). Only one set of tiles are allowed.

gaps

gaps is a dataframe with as many rows as desired gaps in the figure. The following variables are requiered. $mapA and $mapB corresponds to the maps in which a gap is going to be displayed. If only one map is used, both variables must be 1. $chrA (for mapA) and $chrB (for mapB) corresponds to the specifics chromosomes in which the gap is going to be displayed.

ideogramThickness

A single number controlling the thickness of the bars that represent the chromosomes.

generalPlotConf

When TRUE, a vector of parameters for general configuration of plots is used. See next argument.

generalPlotConfFlag

A vector of paramaters to be applied to all the plots. If you are familiar with Circos (if not, probably you do not this parameter), these arguments are located in the block plots-plots but outside each specific block plotplot. Synthaxis is similar to Circos.

chrPrefixFont

'upper' or 'lower'.

chrPrefix

A prefix to name chromosome labels.

tickSeparation

Separation of the ticks for each chromosome. Defaul = 20.

tickSuffix

A suffix to name tick labels. Defualt = 'cM'.

circosLocation

Directory that contains the circos file.

returnConf

When TRUE, return the circos.conf into the console.

circosDisplay

When TRUE, return Circos output when constructing the figure.

figureDisplay

When TRUE, Circos figure is read and displayed into R.

deleteData

When TRUE, SOFIA deletes all the files, circos.conf, ideogram.conf and data files, used for making a figure.

runCircos

When FALSE, SOFIA only creates the files requiered to make a figure but does not run Circos. This is highly useful if you only want to make manual modification in the circos.conf configuration file. If you set this FALSE, you may want to set deleteData=FALSE.

confName

Name of the Circos configuration file. Default = 'circos.conf'.

Details

Although many arguments can be defined in SOFIA, must of them are optional and figures can be generated with minimal configuration. Assuming that a dataframe "data" with the basic four variables $map $locus $pos and $chr, plus x number of additional variables with numeric/text data, a Circos figure can me made by only using the arguments 1) data and 2) circosLocation (for example >SOFIA(data=data,circosLocation='yourCircosLocation/bin/')). If this is the case, all the data will be plotted as scatter plot and the plots are going to be distributed uniformely across the Circos figure. This is a good start.

References

Luis Diaz-Garcia, Giovanny Covarrubias-Pazaran, Brandon Schlautman and Juan Zalapa. 2016. SOFIA: an R package for enhancing genetic visualization with Circos. 2016. Journal of Heredity. Submitted. Martin I Krzywinski, Jacqueline E Schein, Inanc Birol, Joseph Connors, Randy Gascoyne, Doug Horsman, Steven J Jones, and Marco A Marra. 2009. Circos: An information aesthetic for comparative genomics. Genome Res.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# Loading data contained in mark0
data(mark0)
str(mark0)

# Making the dataframe data with the variables map, chr, pos and locus
data1<-data.frame(map=mark0$map,chr=mark0$lg,pos=mark0$consensus,locus=mark0$marker)

##### Adding numeric and text variables to plot in the figure

# text plot
data1$someNames<-data1$locus
data1$someNames[sample(1:10000,9000)]<-NA

# Plotting a heatmap with only two colors (black for marker presence and white for marker absence.
# The colors are defined later). 
# By using this trick, a karyotype-like plot can be obtained.
data1$kar<-data1$pos

# Random numeric data
data1$lod1<--log10(mark0$b)
data1$lod2<-sample(data1$lod1)
data1$lod3<-rev(data1$lod1)


# Until here, we added five columns with numeric and text data to the dataframe data1. Now, we
# have to work in other arguments related with general and specific figure configuration. 


# Defining the location of each plot (5 plots total).
# In this case, the first plot is going to be in the position 0.90-0.99, the second in the 
# position 0.88-0.90, and so on. 

plotLocation<-data.frame(r0=c(0.90,.88,.78,.68,.58),r1=c(.99,.9,.87,.77,.67)) 

# Defining the background for the 5 plots. 
# In this case, the first two plots are not going to have a 
# background. All plots are going to have horizontal lines spaced at 5 units and colored vvlgrey. 


plotBackground<-data.frame(backgroundShow=c(FALSE,FALSE,TRUE,TRUE,TRUE), 
backgroundColor=rep('vvlgrey',5),axisShow=rep(TRUE,5),axisSep=rep(4,5))


# Defining the overall configuration for the figure. 
# In this example, Figure 2 has two genetic maps; the chromosomes for map 1 are 
# going to be ordered 
# from 1 to 12, while map 2 is going to have its chromosomes in order 9 to 1. 
# In addition, the argument rev is going to flag
# all chromosomes in map 2 to be reversed. Finally, chromosomes in map 1 are going 
# to be colored with two different 
# blue colors (vvdblue_a3 and lblue_a3), while map two is going to have all 
# chromosomes colored with a single color (dgreen).  



chromoConfiguration<-data.frame(order=c(1:12,9:1),map=c(rep(1,12),rep(2,9)),
rev=c(rep(FALSE,12),rep(TRUE,9)),color=c(rep(c('vvdblue_a3','lblue_a3'),6),rep('dgreen',9)),
radius=rep(1,21))



# Defining marker colors. 
# For the first plot, the text lables are going to be black colored; for the second plot, 
# the presence of numberic data is going to be colored with black (for traditional 
# heatmaps with multiple colors, a color palette, 
# such as reds-9-seq, has to be defined); for the third plot, 
# the color 'chr' tells SOFIA to use the color of the chromosome bar corresponding 
# to the markers location; for the forth plot, the markers are 
# going to be colored using the palette piyg-11-div (markers are colored 
# based on their values); the fifth line plot is going to be red colored (dred_a3).

plotType<-c('text','heatmap',rep('scatter',2),'line')

# defining marker colors. For the first plot, the letters are going to be black, 
# for the second plot, the presence of numberic data is going to be colored with 
# black (for traditional heatmaps with multiple colors, a color palette has to be defined), 
# for the third plot, the color 'chr' tells SOFIA to use the color of the chromosome where 
# the marker is located, for the forth plot, the markers are going to be colored using the 
# palette piyg-11-div (higher and lower values are going to have different colors), and 
# the fifth line plot is going to be colored with dark red.

plotColor<-c('black','black','chr','piyg-11-div','dred_a3')

# Defining marker size. 
# For the text plot, the marker size defines the text size. 
# For scatter, it defines the circle size while for line plots, it defines the line thickness. 
# For heatmaps, any random number can be included since it is not used at all. 

markerSize<-c(8,10,16,16,1)

# Please change the argument circosLocation

SOFIA(data=data1,linkColor='chr',linkGeometry=c(.001,.1),linkRadius=c(.57,.57),
linksFlag=TRUE,chromoConfiguration=chromoConfiguration,plotBackground=plotBackground,
plotLocation=plotLocation,plotType=plotType,plotColor=plotColor,markerSize=markerSize,
circosLocation=NULL,tickSuffix='cM',returnConf=TRUE,circosDisplay=TRUE,
deleteData=TRUE,runCircos=TRUE,confName='circos.conf')

SOFIA documentation built on May 2, 2019, 2:01 p.m.

Related to SOFIA in SOFIA...