README.md

Project: High Performance Computing

Introduction

Implementation of Gibbs sampling algorithm for bivariate distributions in Python and in other languages and benchmark the implementations.

Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations which are approximated from a specified multivariate probability distribution, when direct sampling is difficult. Gibbs sampling is commonly used as a means of statistical inference, especially Bayesian inference. It is a randomized algorithm (i.e. an algorithm that makes use of random numbers), and is an alternative to deterministic algorithms for statistical inference such as the expectation-maximization algorithm (EM).

Objective of the project

The objective of this project is to analyze the computation time of each implementation and compare their speed and efficiency. The languages that will be user here will be Python, R, C and PyPy(implementation of python). The results will give us an idea of how fast each of the language is as compared to one another.

Structure of the code

The algorithm is implemented in 3 different languages, namely C (gibbs.C), Python (gibbs.py) and R (gibbs.R) and Rcpp(Rcpp_Gibbs.cpp) and 5 shell scripts are present which compute the time computed for running in these four languages independently. Also, the fourth shell script contains the time computed by the PyPy implementation. The four shell script files are : 1. C.sh 2. python.sh 3. R.sh 4. pypy.sh 5. Rcpp.sh

The five .sh files are run by executing a python script. This python script modifies the output produced by the shell scripts and transfers the output into the file “out1.txt”. The sampled values are put inside data.tab.

To install all the pre-requisites and run the code, please check the file “install.txt”, which is present in the following folder.



mathurshikhar/HPC documentation built on May 21, 2019, 12:55 p.m.