assoctuple: Searches for associations of a tuple of alignment positions...

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Searches for associations of tuple of nucleotide or amino acid sequence alignment positions with feature(s).

Usage

1
2
3
4
assoctuple(path_to_file_sequence_alignment, path_to_file_assocpoint_csv_result, threshold,
	min_number_of_elements_in_tuple, max_number_of_elements_in_tuple,
	save_name_csv, column_of_feature, column_of_position, column_of_p_values, 
        column_of_aa, A11, A12, A21, A22, B11, B12, B21, B22, one_feature, feature)

Arguments

path_to_file_sequence_alignment

FASTA file with sequence alignment. For reference see example file.

path_to_file_assocpoint_csv_result

results from SeqFeatRs assocpoint

threshold

p-value threshold for sequence alignment positions to be considered.

min_number_of_elements_in_tuple

minimal number of members in tuple.

max_number_of_elements_in_tuple

maximal number of members in tuple.

save_name_csv

name of file to which results are saved in csv format.

column_of_feature

column number in which feature is located for which analysis should be done.

column_of_position

column number in which sequence position is located.

column_of_p_values

column number from which p-values should be taken. See details.

column_of_aa

column number from which amino acids should be taken. See details.

A11

position of start of first HLA A allele in header line of FASTA file.

A12

position of end of first HLA A allele in header line of FASTA file.

A21

position of start of second HLA A allele in header line of FASTA file.

A22

position of end of second HLA A allele in header line of FASTA file.

B11

position of start of first HLA B allele in header line of FASTA file.

B12

position of end of first HLA B allele in header line of FASTA file.

B21

position of start of second HLA B allele in header line of FASTA file.

B22

position of end of second HLA B allele in header line of FASTA file.

one_feature

if there is only one feature.

feature

feature identifier which should be analyzed. See details.

Details

For each tuple of sequence alignment positions, Fisher's exact test is evaluated for a 2-by-2 contingency table of amino acid tuple (or nucleic acid tuple) vs. feature. The resulting p-values are returned in a table.

For this to work properly the result from SeqFeatRs assocpoint can be used, but also a user generated csv file in which at least one column describes the sequence position and another the p-value at this position, as well as a FASTA file from which those file originated.

assoctuple takes only those sequence positions which have a p-value lower than the given threshold ('threshold'). Be aware that for big datasets the calculation time can be high and a calculation of every position with every other position will most definitely result in low quality data.

Please be also aware that it just uses the position in your csv file. If this is NOT the correct position, because of removal of empty or near empty alignment positions, correct the csv file before starting.

Use the same FASTA file you had used for assocpoint!

The sequence positions to be included in this analysis are normally chosen from the corrected p-values (but can be anything else as long as it is between 0 and 1, even an own added column). The size of the tuples can be anything from 2 to number of rows (= number of alignment positions) in the csv input file. The input sequence alignment may be consist either of DNA sequences (switch dna = TRUE) or amino acid sequences (dna = FALSE). Undetermined nucleotides or amino acids have to be indicated by the letter "X".

Features may be HLA types, indicated by four blocks in the FASTA comment lines. The positions of these blocks in the comment lines are defined by parameters A11, ..., B22. For patients with a homozygous HLA allele the second allele has to be "00" (without the double quotes). For non-HLA-type features, set option one_feature=TRUE. The value of the feature (e.g. 'yes / no', or '1 / 2 / 3') should then be given at the end of each FASTA comment, separated from the part before that by a semicolon.

The analysis is done only for one single feature. This is chosen by either 'feature' if there is only 'one_feature', or column_of_feature if there are HLA types.

Value

A csv list of tuple positions and the p-value of there association.

Author(s)

Bettina Budeus

See Also

assocpoint

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#Input files
## Not run: 
fasta_input <- system.file("extdata", "Example_aa.fasta", package="SeqFeatR")
assocpoint_result <- system.file("extdata", "assocpoint_results.csv", package="SeqFeatR")

#Usage
assoctuple(
	path_to_file_sequence_alignment=fasta_input,
	path_to_file_assocpoint_csv_result=assocpoint_result,
	threshold=0.2,
	min_number_of_elements_in_tuple=2,
	max_number_of_elements_in_tuple=2,
	save_name_csv="assoctuple_result.csv",
	column_of_feature=9,
	column_of_position=1,
	column_of_p_values=9,
	column_of_aa=12,
	A11=10,
	A12=11,
	A21=13,
	A22=14,
	B11=17,
	B12=18,
	B21=20,
	B22=21,
	one_feature=FALSE,
	feature="")

## End(Not run)

SeqFeatR documentation built on May 2, 2019, 3:10 p.m.