Home

/

GitHub

/

In BMasinde/bioinfo:

library(seqinr)
library(knitr)
library(kableExtra)

Question 1: Hardy Weinberg Equilibrium

Question 1.1

Initial frequency of alles as as follows

$f_{0}(A)$ = p - Dominant

$f_{0}(a)$ = q - Recessive

Allele frequencies at each generation are obtained by polling together the allele from each genotype of the same generation according to the expected contribution from the homozygote and heterozygote genotype which are 1 and $\frac{1}{2}$ respectively.

$f_{t}(A) = f_{t}(AA) + \frac{1}{2}f_{t}(Aa)$

$f_{t}(a) = f_{t}(aa) + \frac{1}{2}f_{t}(Aa)$

Punnet Square

text_tbl <- data.frame(
  Alleles = c("", "A(p)", "a(q)"),
  Combination1 = c(
    "A(p)",
    "AA(p^2)", 
    "Aa(qp)"
  ),
  Combination1 = c(
    "A(p)",
    "AA(p^2) ", 
    "Aa(qp)"
  )
)

kable(text_tbl)%>%
  kable_styling(full_width = T) %>%
  column_spec(1, bold = T, border_right = T) %>% column_spec(2, bold = T, border_right = T)%>% column_spec(3, bold = T, border_right = T)

Genotype frequencies sum up to 1 therefore , $p^2+2pq+q^2 = 1$

$f_1(AA)=p^2=f_0(A)^2$

$f_1(Aa)=pq+qp=2pq=2f_0(A)f_0(a)$

$f_1(aa)=q^2=f_0(a)^2$

The above given frequencies explains the Hardy–Weinberg equilibrium. The allele frequencies of the next generations can be calculated as given below:

$f_{1}(A)=f_{1}(AA)+\tfrac{1}{2}f_{1}(Aa)=p^{2}+pq=p(p+q)=p=f_{0}(A)$

$f_{1}(a)=f_{1}(aa)+\tfrac{1}{2}f_{1}(Aa)=q^{2}+pq=q(p+q)=p=f_{0}(a)$

SO from the above we can conclude that the allele frequency follows hardy-weignberg priciple.

With random mating, the population can deviate from Hardfy-Weignberg equilibirium if more that one of the below conditions are not met:

Selection
Mutation
Migration
Small population size

Question 1.2

$p^2 = 357$

$pq = 485$

$q^2 = 158$

Hence from the above we can derive that

$p = 0.6$

$q = 0.4$

$Exp(AA) = 360$

$Exp(Aa) = 480$

$Exp(aa) = 160$

$H_{0}=$

The population is in HW Equilibirium

$H_{a}=$ The population is not in HW Equilibirium

Here we take $\alpha=0.5$

Pearson's chi-squared test states:

$\chi^{2}= \sum (O - E)^2/E$

$= 0.025 + 0.052 + 0.025$

$=0.102$

The number of degrees of freedom is 1

The 5% significance level for 1 degree of freedom is 3.84. The calculated $\chi^2$ value is 0.102 which is less the tabulated value which is 3.84. Hence we fail to reject the null hypothesis that the population is in Hardy Weinberg Equilibrium.

Question 2: Exploring a genomic sequence

2.1: Protein products of the CDS

The proteing product is : RecQ type DNA helicase

REcQ DNA helicase functions during DNA replication. It comes into picture during the S-Phase of chromosome replication and helps in unwinding of the paired DNA. In eukaryots, the DNA replication does not happen normally in the absence of this protein. This proteing can also reverse damage caused due to replication errors.

2.2: The first four Amino Acids.

The first four Amino acids are:

Methionine
Valine
Valine
Alanine

2.3:

the nucleotide sequence of the coding strand that corresponds to these amino acids were saved as a FASTA format file "nucleotideseq2.3.FASTA".

2.4

The obtained coding strand sequence differs from the nucleotide sequence provided. But when we reverse complemented the sequence, we obtained a sequence, which when translated gave the same amino acids. Therefor we can conclude that the FASTA we got from the genbank is the template strand of the DNA thus reverse translating it gives the correct sequence. This is saved as "nucleotiderev2.4.FASTA"

2.5

The genome sequence lies in chromosome I, number range 1-5662. From the nucleotide sequence there is no stop codon. We confirmed this by back translating the protein sequence to nucleotide sequence, and the length of the back-translation was 5661 only one letter short of the nucleotide sequence.

Question 3.

3.1: C. elegans

*caenorhabditis elegans is a nematode. It belongs to the phylum Nematoda which are unsegemented worms with a long cylindrical body shape tapered at the ends. They include the roundworms and the threadworms.

C.elegans is a non-hazardous, non-infectious, non-pathogenic, non-parasitic organism. It lives in the soil in most parts of the world, and feeds on microbes and bacteria.

C. elagans are advanced organisms (eukaryotes), meaning they exhibit little complexity in body structure and organization. Despite this, C. elagans shares many of the essential characteristics that are central problems of human biology. The worm is conceived as a single cell, undergoes complex development. It has a nervous system, exhibits behaviour and capable of rudimentary learning. Produces sperm and eggs, mates and reproduces. After reproduction, it gradually ages and dies.

Importance of C.elagans to scientific community.

Due to their size they can be handled as microorganism (i.e grown of petri-plates) all its cells are visible with a microscope because it is transparent, and average life span of 2-3 weeks. Therefore, considering its body structure, development it is the ideal compromise between complexity and tractability while studying genes. They have been proposed as model organisms for study of neural development in animals (Sydeney Brenner).

3.2

The figure below illustrates our findings of the BLAST tool for nucleotide.

knitr::include_graphics("c_elagan_blastn.png")

3.3

The database genomic sequence progresses in the opposite direction to the query direction. When we did reversed compliment, its was observed that they proceed in the same direction. The image below shows the obtained alignment.

knitr::include_graphics("c_elagans_blastn2.png")

3.4 Chromosome and position.

The query sequence is found on Chromosome V and position 6,936 to 7818 in gene ife-3.

3.5

The exons were extracted and the combined nucleotide sequence given to transeq( with frame = 6) and we got a protein sequence starting with M (ATG nucleotide). This protein sequence is saved as file "emboss3_5.fa". This protein sequence was blasted and we got the protein name as "IFE4". From the worm base , we got the function of the gene and we understand that this gene code the protein "ife4" which is used in translation initiation.

3.6

ife-3 encodes one of five C. elegans homologs of the mRNA cap-binding protein eIF4E; by homology, IFE-3 is predicted to bind capped mRNA and mediate its recruitment to ribosomes during translation initiation; in vitro, IFE-3 binds a monomethylated guanosine cap structure but does not bind a trimethylated guanosine cap, which suggests that IFE-3 likely mediates translation of those mRNAs that do not contain a spliced-leader sequence; of the C. elegans eIF4E isoforms, IFE-3 is the most similar to human eIF4E and is the only isoform required for viability (homozygous ife-3 mutant embryos arrest in the early division stages of embryogenesis); IFE-3 is enriched in the adult gonad.

BMasinde/bioinfo documentation built on May 5, 2019, 7:06 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com