Microsatellite file conversion for known and unknown data

Share:

Description

This function converts two Microsatellite data files (one for the genotypes and one for locations) into the data format required for OriGen.

Usage

1
ConvertMicrosatData(DataFileName,LocationFileName)

Arguments

DataFileName

Name of file containing the genotypes of the various locations. The columns here would be LocationName, LocationNumber, Locus1, Locus2, etc. Each individual would take up 2 rows (one for each allele) with the same LocationName and LocationNumber. The value under Locus would be the length of the allele of that individual. Note that unknown individuals should have location number "-1".

LocationFileName

Space or tab delimited text file with the location information for the individuals. The columns are LocationName, LocationNumber, Latitude, and Longitude. Note that the first two columns must be in the same order as the FileName.

Value

List with the following components:

DataArray

An array giving the number alleles grouped by sample sites for each locus. The dimension of this array is [MaxAlleles,SampleSites,NumberSNPs].

SampleCoordinates

This is an array which gives the longitude and latitude of each of the found sample sites. The dimension of this array is [SampleSites,2], where the second dimension represents longitude and latitude respectively.

AllelesAtLocus

This shows the integer vector of alleles found at each locus.

MaxAlleles

This shows the maximum of AllelesAtLocus. The maximum number of alleles at all loci.

SampleSites

This shows the integer number of sample sites found.

NumberLoci

This shows the integer number of loci found.

NumberUnknowns

This is an integer value showing the number of unknowns found.

UnknownDataArray

An array showing the unknown individuals genetic data. The dimension of this array is [NumberUnknowns,2,NumberLoci].

LocationNames

This is a list of all the LocationNames (The first column of the input files).

DataFileName

This shows the inputted DataFileName.

LocationFileName

This shows the inputted LocationFileName.

Author(s)

John Michael Ranola, John Novembre, and Kenneth Lange

References

Ranola J, Novembre J, Lange K (2014) Fast Spatial Ancestry via Flexible Allele Frequency Surfaces. Bioinformatics, in press.

See Also

ConvertMicrosatData for converting Microsatellite data files into a format appropriate for analysis,

ConvertPEDData for converting Plink PED files into a format appropriate for analysis,

FitMultinomialModel for fitting allele surfaces to the converted Microsatellite data,

PlotAlleleFrequencySurface for a quick way to plot the resulting allele frequency surfaces from FitOriGenModel or FitMultinomialModel,;

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
#Note that sample files MicrosatTrialDataSmall.txt and 
#LocationTrialDataSmall.txt are included in data for formatting.
#Note that this was done to allow inclusion of the test data in the package.

## Not run: MicrosatDataSmall=ConvertMicrosatData("MicrosatTrialDataSmall.txt",
		"LocationTrialDataSmall.txt")
## End(Not run)
## Not run: str(MicrosatDataSmall)
## Not run: MicrosatAnalysisSmall=FitMultinomialModel(MicrosatDataSmall$DataArray,
		MicrosatDataSmall$SampleCoordinates,MaxGridLength=20)
## End(Not run)
## Not run: str(MicrosatAnalysisSmall)
## Not run: PlotAlleleFrequencySurface(MicrosatAnalysisSmall)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.