fasta2genlight: Extract Single Nucleotide Polymorphism (SNPs) from alignments

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/import.R

Description

The function fasta2genlight reads alignments with the fasta format (extensions ".fasta", ".fas", or ".fa"), extracts the binary SNPs, and converts the output into a genlight object.

The function reads data by chunks of a few genomes (minimum 1, no maximum) at a time, which allows one to read massive datasets with negligible RAM requirements (albeit at a cost of computational time). The argument chunkSize indicates the number of genomes read at a time. Increasing this value decreases the computational time required to read data in, while increasing memory requirements.

Multiple cores can be used to decrease the overall computational time on parallel architectures (needs the package parallel).

Usage

1
2
fasta2genlight(file, quiet=FALSE, chunkSize = 1000, saveNbAlleles=FALSE,
               parallel = require("parallel"), n.cores = NULL, ...)

Arguments

file

a character string giving the path to the file to convert, with the extension ".fa", ".fas", or ".fasta".

quiet

logical stating whether a conversion messages should be printed (FALSE,default) or not (TRUE).

chunkSize

an integer indicating the number of genomes to be read at a time; larger values require more RAM but decrease the time needed to read the data.

saveNbAlleles

a logical indicating whether the number of alleles for each loci in the original alignment should be saved in the other slot (TRUE), or not (FALSE, default). In large genomes, this takes some space but allows for tracking SNPs with more than 2 alleles, lost during the conversion.

parallel

a logical indicating whether multiple cores -if available- should be used for the computations (TRUE, default), or not (FALSE); requires the package parallel to be installed (see details).

n.cores

if parallel is TRUE, the number of cores to be used in the computations; if NULL, then the maximum number of cores available on the computer is used.

...

other arguments to be passed to other functions - currently not used.

Details

=== Using multiple cores ===

Most recent machines have one or several processors with multiple cores. R processes usually use one single core. The package parallel allows for parallelizing some computations on multiple cores, which decreases drastically computational time.

To use this functionality, you need to have the last version of the parallel package installed.

Value

an object of the class genlight

Author(s)

Thibaut Jombart t.jombart@imperial.ac.uk

See Also

- ?genlight for a description of the class genlight.

- read.snp: read SNPs in adegenet's '.snp' format.

- read.PLINK: read SNPs in PLINK's '.raw' format.

- df2genind: convert any multiallelic markers into adegenet genind.

- import2genind: read multiallelic markers from various software into adegenet.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
## Not run: 
## show the example file ##
## this is the path to the file:
myPath <- system.file("files/usflu.fasta",package="adegenet")
myPath

## read the file
obj <- fasta2genlight(myPath, chunk=10) # process 10 sequences at a time
obj

## look at extracted information
position(obj)
alleles(obj)
locNames(obj)

## plot positions of polymorphic sites
temp <- density(position(obj), bw=10)
plot(temp, xlab="Position in the alignment", lwd=2, main="Location of the SNPs")
points(position(obj), rep(0, nLoc(obj)), pch="|", col="red")

## End(Not run)

Example output

Loading required package: ade4
Registered S3 method overwritten by 'spdep':
  method   from
  plot.mst ape 

   /// adegenet 2.1.3 is loaded ////////////

   > overview: '?adegenet'
   > tutorials/doc/questions: 'adegenetWeb()' 
   > bug reports/feature requests: adegenetIssues()


[1] "/usr/lib/R/site-library/adegenet/files/usflu.fasta"

 Converting FASTA alignment into a genlight object... 

Loading required package: parallel

 Looking for polymorphic positions... 
........................................................................................................................................................................................................................................................................................................................................................................
 Extracting SNPs from the alignment... 
........................................................................................................................................................................................................................................................................................................................................................................
 Building final object... 

...done.

 /// GENLIGHT OBJECT /////////

 // 80 genotypes,  274 binary SNPs, size: 125.4 Kb
 26 (0.12 %) missing data

 // Basic content
   @gen: list of 80 SNPbin
   @ploidy: ploidy of each individual  (range: 1-1)

 // Optional content
   @ind.names:  80 individual labels
   @loc.all:  274 alleles
   @position: integer storing positions of the SNPs
   @other: a list containing: elements without names 

  [1]    7   12   31   32   36   37   44   45   52   60   62   72   73   78   96
 [16]   99  105  108  121  128  140  146  147  150  165  177  181  187  190  191
 [31]  193  196  197  210  216  218  219  232  234  246  249  255  273  286  289
 [46]  295  310  323  327  328  365  402  405  412  417  418  419  424  438  439
 [61]  441  444  445  447  450  451  460  462  466  467  472  478  479  483  499
 [76]  508  510  511  512  513  514  516  518  520  522  524  531  537  538  547
 [91]  549  563  564  571  577  585  591  603  604  606  609  614  616  622  623
[106]  624  625  626  628  630  635  636  638  642  648  650  652  669  671  672
[121]  687  689  690  693  694  696  702  705  706  712  714  715  716  717  721
[136]  722  723  727  729  732  733  734  735  744  754  759  778  780  786  811
[151]  822  831  833  837  854  855  859  872  875  876  881  882  884  885  894
[166]  903  913  920  927  930  945  948  954  963  973  981  983  988  990  993
[181] 1005 1009 1011 1017 1023 1030 1044 1047 1053 1056 1059 1080 1087 1089 1092
[196] 1107 1116 1125 1128 1134 1140 1146 1149 1158 1163 1169 1170 1171 1173 1176
[211] 1180 1182 1185 1188 1191 1197 1200 1201 1205 1206 1218 1221 1230 1233 1246
[226] 1263 1266 1275 1290 1296 1320 1323 1325 1338 1347 1353 1389 1397 1401 1403
[241] 1404 1413 1416 1431 1446 1453 1458 1474 1475 1485 1512 1518 1524 1527 1530
[256] 1536 1548 1557 1563 1572 1574 1613 1614 1620 1627 1637 1642 1644 1652 1653
[271] 1683 1687 1688 1700
  [1] "a/g" "c/t" "t/c" "t/c" "t/c" "c/a" "t/c" "c/t" "a/g" "c/t" "g/t" "c/a"
 [13] "a/g" "a/g" "a/g" "c/t" "a/g" "g/a" "c/a" "a/g" "a/g" "a/g" "a/c" "t/c"
 [25] "t/c" "t/a" "a/g" "t/c" "g/a" "c/t" "g/a" "a/g" "g/a" "t/c" "c/t" "g/a"
 [37] "a/g" "a/g" "a/g" "g/a" "a/t" "t/c" "t/g" "c/a" "a/g" "g/a" "g/a" "a/c"
 [49] "t/c" "t/c" "c/t" "g/a" "g/a" "a/g" "a/g" "g/a" "a/g" "a/g" "c/a" "g/a"
 [61] "t/c" "g/a" "g/a" "t/c" "g/a" "a/g" "g/t" "t/a" "a/g" "a/t" "g/a" "g/a"
 [73] "t/a" "c/a" "t/c" "t/c" "g/a" "c/a" "a/c" "c/t" "a/c" "a/c" "t/c" "g/a"
 [85] "a/c" "a/t" "t/c" "g/a" "c/t" "a/t" "t/c" "g/a" "c/a" "g/a" "t/c" "t/c"
 [97] "g/a" "g/a" "a/g" "c/t" "g/a" "g/a" "g/a" "a/g" "c/t" "c/t" "a/t" "g/t"
[109] "c/t" "a/g" "t/c" "t/c" "g/a" "a/t" "g/a" "g/a" "g/a" "a/g" "g/t" "a/g"
[121] "a/g" "t/c" "c/t" "g/a" "a/g" "t/c" "g/a" "t/c" "a/g" "t/a" "g/a" "g/a"
[133] "t/g" "a/g" "g/a" "g/a" "t/c" "t/c" "c/t" "t/c" "a/c" "g/t" "a/g" "c/a"
[145] "a/t" "a/g" "t/c" "g/a" "t/c" "c/a" "c/t" "a/g" "a/g" "g/a" "g/a" "g/a"
[157] "g/a" "g/a" "a/c" "c/a" "g/a" "t/c" "c/t" "t/c" "c/t" "t/c" "c/t" "a/g"
[169] "t/a" "t/c" "g/a" "c/t" "t/c" "c/t" "g/a" "a/g" "a/c" "c/t" "g/a" "a/g"
[181] "g/a" "c/a" "g/a" "a/g" "g/t" "a/g" "c/t" "c/t" "c/t" "a/g" "t/c" "g/a"
[193] "g/a" "a/g" "c/t" "c/t" "a/g" "g/a" "c/a" "a/g" "a/t" "t/c" "t/c" "t/a"
[205] "c/a" "c/t" "c/a" "g/a" "c/t" "a/g" "a/g" "c/t" "g/a" "a/g" "g/a" "g/a"
[217] "a/g" "a/g" "a/g" "g/a" "g/a" "a/g" "a/g" "c/t" "t/a" "a/g" "t/c" "c/t"
[229] "a/g" "t/c" "c/t" "g/a" "a/g" "c/t" "c/t" "t/c" "g/a" "g/a" "a/t" "g/a"
[241] "g/a" "g/a" "g/a" "c/t" "c/t" "a/g" "c/t" "g/a" "c/t" "g/a" "t/c" "a/g"
[253] "a/g" "c/t" "a/g" "a/g" "c/t" "a/g" "t/c" "g/a" "c/t" "t/c" "a/g" "c/t"
[265] "c/t" "t/c" "c/t" "g/t" "t/c" "c/t" "g/a" "a/g" "a/g" "g/a"
  [1] "7.a/g"    "12.c/t"   "31.t/c"   "32.t/c"   "36.t/c"   "37.c/a"  
  [7] "44.t/c"   "45.c/t"   "52.a/g"   "60.c/t"   "62.g/t"   "72.c/a"  
 [13] "73.a/g"   "78.a/g"   "96.a/g"   "99.c/t"   "105.a/g"  "108.g/a" 
 [19] "121.c/a"  "128.a/g"  "140.a/g"  "146.a/g"  "147.a/c"  "150.t/c" 
 [25] "165.t/c"  "177.t/a"  "181.a/g"  "187.t/c"  "190.g/a"  "191.c/t" 
 [31] "193.g/a"  "196.a/g"  "197.g/a"  "210.t/c"  "216.c/t"  "218.g/a" 
 [37] "219.a/g"  "232.a/g"  "234.a/g"  "246.g/a"  "249.a/t"  "255.t/c" 
 [43] "273.t/g"  "286.c/a"  "289.a/g"  "295.g/a"  "310.g/a"  "323.a/c" 
 [49] "327.t/c"  "328.t/c"  "365.c/t"  "402.g/a"  "405.g/a"  "412.a/g" 
 [55] "417.a/g"  "418.g/a"  "419.a/g"  "424.a/g"  "438.c/a"  "439.g/a" 
 [61] "441.t/c"  "444.g/a"  "445.g/a"  "447.t/c"  "450.g/a"  "451.a/g" 
 [67] "460.g/t"  "462.t/a"  "466.a/g"  "467.a/t"  "472.g/a"  "478.g/a" 
 [73] "479.t/a"  "483.c/a"  "499.t/c"  "508.t/c"  "510.g/a"  "511.c/a" 
 [79] "512.a/c"  "513.c/t"  "514.a/c"  "516.a/c"  "518.t/c"  "520.g/a" 
 [85] "522.a/c"  "524.a/t"  "531.t/c"  "537.g/a"  "538.c/t"  "547.a/t" 
 [91] "549.t/c"  "563.g/a"  "564.c/a"  "571.g/a"  "577.t/c"  "585.t/c" 
 [97] "591.g/a"  "603.g/a"  "604.a/g"  "606.c/t"  "609.g/a"  "614.g/a" 
[103] "616.g/a"  "622.a/g"  "623.c/t"  "624.c/t"  "625.a/t"  "626.g/t" 
[109] "628.c/t"  "630.a/g"  "635.t/c"  "636.t/c"  "638.g/a"  "642.a/t" 
[115] "648.g/a"  "650.g/a"  "652.g/a"  "669.a/g"  "671.g/t"  "672.a/g" 
[121] "687.a/g"  "689.t/c"  "690.c/t"  "693.g/a"  "694.a/g"  "696.t/c" 
[127] "702.g/a"  "705.t/c"  "706.a/g"  "712.t/a"  "714.g/a"  "715.g/a" 
[133] "716.t/g"  "717.a/g"  "721.g/a"  "722.g/a"  "723.t/c"  "727.t/c" 
[139] "729.c/t"  "732.t/c"  "733.a/c"  "734.g/t"  "735.a/g"  "744.c/a" 
[145] "754.a/t"  "759.a/g"  "778.t/c"  "780.g/a"  "786.t/c"  "811.c/a" 
[151] "822.c/t"  "831.a/g"  "833.a/g"  "837.g/a"  "854.g/a"  "855.g/a" 
[157] "859.g/a"  "872.g/a"  "875.a/c"  "876.c/a"  "881.g/a"  "882.t/c" 
[163] "884.c/t"  "885.t/c"  "894.c/t"  "903.t/c"  "913.c/t"  "920.a/g" 
[169] "927.t/a"  "930.t/c"  "945.g/a"  "948.c/t"  "954.t/c"  "963.c/t" 
[175] "973.g/a"  "981.a/g"  "983.a/c"  "988.c/t"  "990.g/a"  "993.a/g" 
[181] "1005.g/a" "1009.c/a" "1011.g/a" "1017.a/g" "1023.g/t" "1030.a/g"
[187] "1044.c/t" "1047.c/t" "1053.c/t" "1056.a/g" "1059.t/c" "1080.g/a"
[193] "1087.g/a" "1089.a/g" "1092.c/t" "1107.c/t" "1116.a/g" "1125.g/a"
[199] "1128.c/a" "1134.a/g" "1140.a/t" "1146.t/c" "1149.t/c" "1158.t/a"
[205] "1163.c/a" "1169.c/t" "1170.c/a" "1171.g/a" "1173.c/t" "1176.a/g"
[211] "1180.a/g" "1182.c/t" "1185.g/a" "1188.a/g" "1191.g/a" "1197.g/a"
[217] "1200.a/g" "1201.a/g" "1205.a/g" "1206.g/a" "1218.g/a" "1221.a/g"
[223] "1230.a/g" "1233.c/t" "1246.t/a" "1263.a/g" "1266.t/c" "1275.c/t"
[229] "1290.a/g" "1296.t/c" "1320.c/t" "1323.g/a" "1325.a/g" "1338.c/t"
[235] "1347.c/t" "1353.t/c" "1389.g/a" "1397.g/a" "1401.a/t" "1403.g/a"
[241] "1404.g/a" "1413.g/a" "1416.g/a" "1431.c/t" "1446.c/t" "1453.a/g"
[247] "1458.c/t" "1474.g/a" "1475.c/t" "1485.g/a" "1512.t/c" "1518.a/g"
[253] "1524.a/g" "1527.c/t" "1530.a/g" "1536.a/g" "1548.c/t" "1557.a/g"
[259] "1563.t/c" "1572.g/a" "1574.c/t" "1613.t/c" "1614.a/g" "1620.c/t"
[265] "1627.c/t" "1637.t/c" "1642.c/t" "1644.g/t" "1652.t/c" "1653.c/t"
[271] "1683.g/a" "1687.a/g" "1688.a/g" "1700.g/a"

adegenet documentation built on July 18, 2021, 1:06 a.m.