Home

/

GitHub

/

GaoGN517/689_SNP_FastImpute

/

vcf2df: Read a vcf file, output the corresponding dataframe.

vcf2df: Read a vcf file, output the corresponding dataframe.
In GaoGN517/689_SNP_FastImpute: SNPFastImpute: Perform fast missing value imputation for single nucleotide polymorphism data.

Description Usage Arguments Details Value Examples

View source: R/vcf2df.R

Read a vcf file, output the corresponding dataframe.

1	vcf2df(vcf_df)

filename

The vcf file we want to read and generate the dataframe we need to generate the xgboost dataframe.

The input vcf file is directly from raw sequencing data. It contains (n + 9) columns (Information for each SNP and corresponding SNP values for the samples) and p rows (SNP positions).

Starting from the 10th column are the information for the first sample. So we first remove the first 9 columns.

For our imputation, we need the p SNPs as features (columns), n samples as rows, so we need to transpose the dataframe.

In the input dataset, there are 2 values indicating the SNP types for each SNP position as there are two alleles: 0 (Wild type) and 1 (Mutate type). So the values can be "0/0", "0/1", "1/0", "1/1". Some of the values might be missing.

We sum up the two values at each position to one value to represent the corresponding SNP type.

In the output data, each unit is the corresponding SNP type: (1) 0: both alleles are mutations; (2) 1: one of the alleles is a mutation, the other is wild type; (3) 2: both alleles are wild type; (4) NA: at least one of the SNP type of the two alleles is missing. We need to predict the value for this position.

A dataframe which we need to generate the xgboost data structure.

data(vcf_df)
output_df <- vcf2df(vcf_df)
## This dataset has 112 samples and 338 SNP positions.
## The original file has 121 columns and 338 rows.

## Output should be a dataset with 112 rows and 338 columns.

GaoGN517/689_SNP_FastImpute documentation built on Jan. 2, 2020, 11:44 a.m.

GaoGN517/689_SNP_FastImpute index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

GaoGN517/689_SNP_FastImpute
SNPFastImpute: Perform fast missing value imputation for single nucleotide polymorphism data.

vcf2df: Read a vcf file, output the corresponding dataframe.
In GaoGN517/689_SNP_FastImpute: SNPFastImpute: Perform fast missing value imputation for single nucleotide polymorphism data.

Description

Usage

Arguments

Details

Value

Examples

Related to vcf2df in GaoGN517/689_SNP_FastImpute...

R Package Documentation

Browse R Packages

We want your feedback!

GaoGN517/689_SNP_FastImpute SNPFastImpute: Perform fast missing value imputation for single nucleotide polymorphism data.

vcf2df: Read a vcf file, output the corresponding dataframe. In GaoGN517/689_SNP_FastImpute: SNPFastImpute: Perform fast missing value imputation for single nucleotide polymorphism data.

Description

Usage

Arguments

Details

Value

Examples

Related to vcf2df in GaoGN517/689_SNP_FastImpute...

R Package Documentation

Browse R Packages

We want your feedback!

GaoGN517/689_SNP_FastImpute
SNPFastImpute: Perform fast missing value imputation for single nucleotide polymorphism data.

vcf2df: Read a vcf file, output the corresponding dataframe.
In GaoGN517/689_SNP_FastImpute: SNPFastImpute: Perform fast missing value imputation for single nucleotide polymorphism data.