Fragman-package: Fragment analysis and automatic scoring
In Fragman: Fragment Analysis in R

Description Contact Citation Author(s) References See Also Examples

Fragman is a package designed for Fragment analysis and automatic scoring of biparental populations (such as F1, F2, BC types) and populations for diversity studies. The program is designed to read files with FSA extension (which stands for FASTA-type file and contains lectures for DNA fragments), and .txt files from Beckman CEQ 8000 system, and extract the DNA intensities from the channels/colors where they are located, based on ABi machine plattforms to perform sizing and allele scoring.

The core of the package and the workflow of the fragment analysis rely in the following 4 functions;

1) storing.inds(function in charge of reading the FSA or txt(CQS) files and storing them with a list structure)

2) ladder.info.attach (uses the information read from the FSA files and a vector containing the ladder information (DNA size of the fragments) and matches the peaks from the channel where the ladder was run with the DNA sizes for all samples. Then loads such information in the R environment for the use of posterior functions)

3) overview2 (create friendly plots for any number of individuals specified and can be used to design panels (overview2) for posterior automatic scoring (like licensed software does), or make manual scoring (overview) of individuals such as parents of biparental populations or diversity populations)

4) The score.markers (function score the alleles by finding the peaks provided in the panel (if provided), otherwise returns all peaks present in the channel). Thisfinal function can be automatized if several markers are located in the same channel by creating lists of panels taking advantage of R capabilities and data structures.

** Sometimes during the ladder sizing process some samples can go wrong for several reasons related to the sample quality (low intensity in ladder channel, extreme number of noisy peaks, etc.), because of that we have introduced ladder.corrector function which allows the user to correct the bad samples by clicking over the real peaks, by default the ladder.info.attach function returns the names of the samples that had a low correlation with the expected peaks.

When automatic scoring is not desired the function overview can be used for getting an interactive session and click over the peaks (using the locator function) in order to get the allele sizes.

Feel free to contact us with questions and improvement suggestions at:

covarrubiasp@wis.edu

Just send a sample file with your question to recreate the issue or bug reported along with vector for your ladder.

We have spent valuable time developing this package, please cite it in your publication:

Covarrubias-Pazaran G, Diaz-Garcia L, Schlautman B, Salazar W, Zalapa J. Fragman: An R package for fragment analysis. 2016. BMC Genetics 17(62):1-8.

Giovanny Covarrubias-Pazaran, Luis Diaz-Garcia, Brandon Schlautman, Walter Salazar, Juan Zalapa.

Covarrubias-Pazaran G, Diaz-Garcia L, Schlautman B, Salazar W, Zalapa J. Fragman: An R package for fragment analysis. 2016. BMC Genetics 17(62):1-8.

Robert J. Henry. 2013. Molecular Markers in Plants. Wiley-Blackwell. ISBN 978-0-470-95951-0.

Ben Hui Liu. 1998. Statistical Genomics. CRC Press LLC. ISBN 0-8493-3166-8.

http://cggl.horticulture.wisc.edu/home-page/

## ================================= ##
## ================================= ##
## Fragment analysis requires 
## 1) loading your data
## 2) matching your ladder
## 3) define a panel for scoring
## 4) score the samples
## ================================= ##
## ================================= ##

#####################
## 1) Load your data
#####################

### you would use something like:
# folder <- "~/myfolder"
# my.plants <- storing.inds(folder)
### here we just load our sample data and use the first 2 plants

?my.plants
data(my.plants)
my.plants <- my.plants[1:2]
class(my.plants) <- "fsa_stored"
# plot(my.plants) # to visualize the raw data

#######################
## 2) Match your ladder
#######################

### create a vector indicating the sizes of your ladder and do the match

my.ladder <- c(50, 75, 100, 125, 129, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375)
ladder.info.attach(stored=my.plants, ladder=my.ladder)

### matching your ladder is a critical step and should only happen once per batch of 
### samples read

###****************************************************************************************###
### OPTIONAL:
### If the ladder.info attach function detects some bad samples 
### that you can correct them manually using
### the ladder.corrector() function
### For example to correct one sample in the previous data
### ladder.corrector(stored=my.plants, 
#to.correct="FHN152-CPN01_01A_GH1x35_152-148-209_717-704-793_367-382-381.fsa", 
#ladder=my.ladder)
###****************************************************************************************###

#######################
## 3) Define a panel
#######################

### In fragment analysis you usually design a panel where you indicate
### which peaks are real. You may use the overview2 function which plots all the
### plants in the channel you want in the base pair range you want

overview2(my.inds=my.plants, channel = 2:3, ladder=my.ladder, init.thresh=5000)

### You can click on the peaks you think are real, given that the ones
### suggested by the program may not be correct. This can be done by using the 
### 'locator' function and press 'Esc' when you're done, i.e.:
# my.panel <- locator(type="p", pch=20, col="red")$x
### That way you can click over the peaks and get the sizes
### in base pairs stored in a vector named my.panel

### Just for demonstration purposes I will use the suggested peaks by 
### the program using overview2, which will return a vector with 
### expected DNA sizes to be used in the next step for scoring
### we'll do it in the 160-190 bp region

my.panel <- overview2(my.inds=my.plants, channel = 3, 
                    ladder=my.ladder, init.thresh=7000, 
                    xlim=c(160,190)); my.panel

##########################
## 4) Score the samples
##########################

### When a panel is created is time to score the samples by providing the initial
### data we read, the ladder vector, the panel vector, and our specifications
### of channel to score (other arguments are available)

### Here we will score our samples for channel 3 with our panel created previously

res <- score.markers(my.inds=my.plants, channel = 3, panel=my.panel$channel_3,
                ladder=my.ladder, electro=FALSE)

### Check the plots and make sure they were scored correctly. In case some samples 
### are wrong you might want to use the locator function again and figure out 
### the size of your peaks. To extract your peaks in a data.frame do the following:

final.results <- get.scores(res)
final.results

my.plants               package:Fragman                R Documentation

_C_r_a_n_b_e_r_r_y _b_i_p_a_r_e_n_t_a_l _p_o_p_u_l_a_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     This dataset are 60 individuals from a progeny coming from the
     cross of 2 cranberry plants. Six SSR markers were run, 2 in the
     first channel (blue), 2 in the second channel (green), 2 in the
     third channel (yellow) and the Roxtrash375 ladder was run in the
     fourth channel (red).

_U_s_a_g_e:

     data("my.plants")
     
_F_o_r_m_a_t:

     The format is: chr "my.plants"

_D_e_t_a_i_l_s:

     The data is basically the raw FSA files coming from the ABi
     machine. No more details for this data.

_S_o_u_r_c_e:

     This data was generated by the Cranberry Genomics Lab.

_R_e_f_e_r_e_n_c_e_s:

     Covarrubias-Pazaran G, Diaz-Garcia L, Schlautman B, Salazar W,
     Zalapa J. Fragma: An R package for fragment analysis.
     http://horticulture.wisc.edu/cggl/ZalapaLab/People.html. 2015.

_E_x_a_m_p_l_e_s:

     data(my.plants)
     ## look at the list structure
     str(my.plants) 
     


  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%

Sizing process complete. Information has been stored in the environment for posterior functions.
For example to be used by the overview2() or score.markers() functions.
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%

 THE PEAKS RETURNED ARE SUGGESTIONS. 
   What you should do: 
 a) Use the locator function, i.e. ''my.panel <- locator(type='p', pch=20, col='red')$x'' 
 b) Click over the peaks you want to include in your panel 
 c) Press the 'esc' key when done selecting peaks 
 d) Make sure to provide the panel vector in the score.easy() function 
 

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%
$channel_2
[1] 332.6583

$channel_3
[1] 173.3384 175.0418 180.6467 296.6024 300.7192 315.8293 318.1012


  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%

 THE PEAKS RETURNED ARE SUGGESTIONS. 
   What you should do: 
 a) Use the locator function, i.e. ''my.panel <- locator(type='p', pch=20, col='red')$x'' 
 b) Click over the peaks you want to include in your panel 
 c) Press the 'esc' key when done selecting peaks 
 d) Make sure to provide the panel vector in the score.easy() function 
 
$channel_3
[1] 175.0418 180.6467


1) You have used a shift of 0.8 base pairs. All peaks at that distance from the tallest peak will be ignored and be considered noise. 
2) In addition the window used is 0.5 . Which means that all peaks closer by that distance to panel peaks will be accounted as peaks. 
3) Remember using the get.scores() function to extract the results from this output as a dataframe. 


  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |======================================================================| 100%
                                                                   markA.1
FHN152-CPN01_01A_GH1x35_152-148-209_717-704-793_367-382-381.fsa   175.0418
FHN152-CPN01_01B_BGBLNL95_152-148-210_717-704-794_367-382-382.fsa 175.0418
                                                                   markA.2
FHN152-CPN01_01A_GH1x35_152-148-209_717-704-793_367-382-381.fsa   175.0418
FHN152-CPN01_01B_BGBLNL95_152-148-210_717-704-794_367-382-382.fsa 180.6467