score.markers: Fragment analysis scoring

Description Usage Arguments Details Value References Examples

Description

This function uses information from the fsa files read from storing.inds function and does the ssr calling in the channel specified and returns the index position, height and base pair position.

Usage

1
2
3
4
5
6
score.markers(my.inds, channel = 1, n.inds = NULL, panel=NULL, shift=0.8,
          ladder, channel.ladder=NULL, 
          ploidy=2, left.cond=c(0.6,3), right.cond=0.35, warn=FALSE, 
          window=0.5, init.thresh=200, ladd.init.thresh=200, 
          method="iter2", env = parent.frame(), my.palette=NULL,
          plotting=TRUE,  electro=FALSE, pref=3)

Arguments

my.inds

List with the channels information from the individuals specified, usually coming from the storing.inds function output

channel

The channel you wish to analyze, usually 1 is blue, 2 is green, 3 is yellow, 4 is red and so on

n.inds

Vector specifying the plants to be scored

panel

A vector containing the base pair interval where the peaks should be searched for

shift

The number of base pairs to be used for discarding neighboring peaks to the tallest peaks, i.e. if 2 peaks are 0.3 bp together the smalles will be discarded

ladder

A vector containing the expected weights for the ladder peaks that will be found the using the find.ladder function

channel.ladder

A scalar value indicating in which channel or color the ladder was read

ploidy

A scalar value indicating the ploidy of the organism to be scored to decide the maximum number of peaks the program should look for. TO BE IMPLEMENTED SOON. STILL NOT FUNCTIONAL.

left.cond

A percentage value (0-1) indicating when peaks to the left of the tallest peaks should be considered real based on the height, i.e. a value of 0.5 would mean that a close peak (to the left of the tallest peak) will be picked only if such peak is at least 50 percent as tall with respect to the tallest peak. The second argument is the number of base pair indicating when peaks to the left of the tallest peaks should be considered real based on the distance, i.e. a value of 3 would mean that a close peak (to the left of the tallest peak) will be picked only if such peak is at least 3 base pairs far away from the tallest peak

right.cond

A percentage value (0-1) indicating when peaks to the right of the tallest peaks should be considered real based on the height, i.e. a value of 0.5 would mean that a close peak (to the right of the tallest peak) will be picked only if such peak is at least 50 percent as tall with respect to the tallest peak.

warn

A TRUE/FALSE value indicating if warnings should be provided when detecting the ladder

window

A value in base pairs indicating how much is the error for detecting a peak in a sample when providing a panel with expected peaks.

init.thresh

An initial value of intensity to detect peaks. We recommend not to deal to much with it unless you have highly controlled dna concentrations in your experiment.

ladd.init.thresh

If samples were not sized using the info.ladder.attach function this value will be used to detect ladder peaks. Internally the program will use the find.ladder function. We recommend not to deal to much with it unless you identified special situations with your ladder

method

If samples were not sized using the info.ladder.attach function this method will be used to detect ladder peaks. An argument indicating one of the 3 methods available; "cor" makes all possible combination of peaks and searches exhaustive correlations to find the right peaks corresponsding to the expected DNA weights, or "ci" constructing confidence intervals to look for peaks meeting the conditions specified in the previous arguments, "iter2" an iterative procedure looking for the most likely peaks meeting your ladder expectation. Default is "iter2".

env

this is used to detect the environment of the user and load the result in the same environment. Don't mess with it please.

my.palette

A character vector with the colors to be used when drawing the RFU plots. If NULL it will use the programmed palette.

plotting

a TRUE/FALSE value indicating if the plots should be drawn or not. The default value is TRUE.

electro

A TRUE/FALSE value indicating if the electrogram/gel should be drawn or not. The default value is FALSE.

pref

A scalar value indicating how many plots should be drawn in the output plotting. The dafault is 3.

Details

Method "ci" has been depreciated, currently the method "iter2" is the default and uses the ladder provided and observed peaks to match them using an iterative procedure based on least squares.

Value

If arguments are correct the function returns a plot and a list containing

$pos

the index positions for the intensities

$hei

the intensities for the fragments found

$wei

the putative weights in base pairs based on the ladder provided

References

We have spent valuable time developing this package, please cite it in your publication:

Covarrubias-Pazaran G, Diaz-Garcia L, Schlautman B, Salazar W, Zalapa J. Fragman: An R package for fragment analysis. 2016. BMC Genetics 17(62):1-8.

Robert J. Henry. 2013. Molecular Markers in Plants. Wiley-Blackwell. ISBN 978-0-470-95951-0.

Ben Hui Liu. 1998. Statistical Genomics. CRC Press LLC. ISBN 0-8493-3166-8.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
## ================================= ##
## ================================= ##
## Fragment analysis requires 
## 1) loading your data
## 2) matching your ladder
## 3) define a panel for scoring
## 4) score the samples
## ================================= ##
## ================================= ##

#####################
## 1) Load your data
#####################

### you would use something like:
# folder <- "~/myfolder"
# my.plants <- storing.inds(folder)
### here we just load our sample data and use the first 2 plants

?my.plants
data(my.plants)
my.plants <- my.plants[1:2]
class(my.plants) <- "fsa_stored"

#######################
## 2) Match your ladder
#######################

### create a vector indicating the sizes of your ladder and do the match

my.ladder <- c(50, 75, 100, 125, 129, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375)
ladder.info.attach(stored=my.plants, ladder=my.ladder)

### matching your ladder is a critical step and should only happen once per batch of 
### samples read

###****************************************************************************************###
### OPTIONAL:
### If the ladder.info attach function detects some bad samples 
### that you can correct them manually using
### the ladder.corrector() function
### For example to correct one sample in the previous data
### ladder.corrector(stored=my.plants, 
#to.correct="FHN152-CPN01_01A_GH1x35_152-148-209_717-704-793_367-382-381.fsa", 
#ladder=my.ladder)
###****************************************************************************************###

#######################
## 3) Define a panel
#######################

### In fragment analysis you usually design a panel where you indicate
### which peaks are real. You may use the overview2 function which plots all the
### plants in the channel you want in the base pair range you want

overview2(my.inds=my.plants, channel = 2:3, ladder=my.ladder, init.thresh=5000)

### You can click on the peaks you think are real, given that the ones
### suggested by the program may not be correct. This can be done by using the 
### 'locator' function and press 'Esc' when you're done, i.e.:
# my.panel <- locator(type="p", pch=20, col="red")$x
### That way you can click over the peaks and get the sizes
### in base pairs stored in a vector named my.panel

### Just for demonstration purposes I will use the suggested peaks by 
### the program using overview2, which will return a vector with 
### expected DNA sizes to be used in the next step for scoring
### we'll do it in the 160-190 bp region

my.panel <- overview2(my.inds=my.plants, channel = 3, 
                    ladder=my.ladder, init.thresh=7000, 
                    xlim=c(160,190)); my.panel

##########################
## 4) Score the samples
##########################

### When a panel is created is time to score the samples by providing the initial
### data we read, the ladder vector, the panel vector, and our specifications
### of channel to score (other arguments are available)

### Here we will score our samples for channel 3 with our panel created previously

res <- score.markers(my.inds=my.plants, channel = 3, panel=my.panel$channel_3,
                ladder=my.ladder, electro=FALSE)

### Check the plots and make sure they were scored correctly. In case some samples 
### are wrong you might want to use the locator function again and figure out 
### the size of your peaks. To extract your peaks in a data.frame do the following:

final.results <- get.scores(res)
final.results 

Example output

my.plants               package:Fragman                R Documentation

_C_r_a_n_b_e_r_r_y _b_i_p_a_r_e_n_t_a_l _p_o_p_u_l_a_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     This dataset are 60 individuals from a progeny coming from the
     cross of 2 cranberry plants. Six SSR markers were run, 2 in the
     first channel (blue), 2 in the second channel (green), 2 in the
     third channel (yellow) and the Roxtrash375 ladder was run in the
     fourth channel (red).

_U_s_a_g_e:

     data("my.plants")
     
_F_o_r_m_a_t:

     The format is: chr "my.plants"

_D_e_t_a_i_l_s:

     The data is basically the raw FSA files coming from the ABi
     machine. No more details for this data.

_S_o_u_r_c_e:

     This data was generated by the Cranberry Genomics Lab.

_R_e_f_e_r_e_n_c_e_s:

     Covarrubias-Pazaran G, Diaz-Garcia L, Schlautman B, Salazar W,
     Zalapa J. Fragma: An R package for fragment analysis.
     http://horticulture.wisc.edu/cggl/ZalapaLab/People.html. 2015.

_E_x_a_m_p_l_e_s:

     data(my.plants)
     ## look at the list structure
     str(my.plants) 
     


  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%

Sizing process complete. Information has been stored in the environment for posterior functions.
For example to be used by the overview2() or score.markers() functions.
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%

 THE PEAKS RETURNED ARE SUGGESTIONS. 
   What you should do: 
 a) Use the locator function, i.e. ''my.panel <- locator(type='p', pch=20, col='red')$x'' 
 b) Click over the peaks you want to include in your panel 
 c) Press the 'esc' key when done selecting peaks 
 d) Make sure to provide the panel vector in the score.easy() function 
 

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%
$channel_2
[1] 332.6583

$channel_3
[1] 173.3384 175.0418 180.6467 296.6024 300.7192 315.8293 318.1012


  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%

 THE PEAKS RETURNED ARE SUGGESTIONS. 
   What you should do: 
 a) Use the locator function, i.e. ''my.panel <- locator(type='p', pch=20, col='red')$x'' 
 b) Click over the peaks you want to include in your panel 
 c) Press the 'esc' key when done selecting peaks 
 d) Make sure to provide the panel vector in the score.easy() function 
 
$channel_3
[1] 175.0418 180.6467


1) You have used a shift of 0.8 base pairs. All peaks at that distance from the tallest peak will be ignored and be considered noise. 
2) In addition the window used is 0.5 . Which means that all peaks closer by that distance to panel peaks will be accounted as peaks. 
3) Remember using the get.scores() function to extract the results from this output as a dataframe. 


  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |======================================================================| 100%
                                                                   markA.1
FHN152-CPN01_01A_GH1x35_152-148-209_717-704-793_367-382-381.fsa   175.0418
FHN152-CPN01_01B_BGBLNL95_152-148-210_717-704-794_367-382-382.fsa 175.0418
                                                                   markA.2
FHN152-CPN01_01A_GH1x35_152-148-209_717-704-793_367-382-381.fsa   175.0418
FHN152-CPN01_01B_BGBLNL95_152-148-210_717-704-794_367-382-382.fsa 180.6467

Fragman documentation built on May 2, 2019, 8:26 a.m.