Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/segmentExpression2CopyNumber.R

Maps single cell expression profiles to copy number profiles.

1 2 | ```
segmentExpression2CopyNumber(eps, gpc, cn, seed=0, outF=NULL, maxPloidy=8,
nCores=2, stdOUT="log.applyAR2seg")
``` |

`eps` |
Segment-by-cell matrix of expression. |

`gpc` |
Number of genes expressed per cell. |

`cn` |
Average copy number across cells for each segment (i.e. row in eps). |

`seed` |
The fraction of entries in a-priori segment-by-cell copy number matrix to be used as seed for association rule mining. |

`outF` |
Output file prefix in which to print intermediary heatmaps and histograms, or NULL (default) if no print. |

`maxPloidy` |
The maximum ploidy to accept as solution. |

`nCores` |
The numbers of threads used. |

`stdOUT` |
Log-file to which standard output is redirected during parallel processing. |

Let S := { *S_1, S_2, ... S_n* } be the set of *n* genomic segments obtained from bulk DNA-sequencing. Let *E_{ij}* and *G_{ij}* be the average number of UMIs and the number of expressed genes per segment *i* per cell *j*. The segment-by-cell expression matrix is first normalized by gene coverage. For each *x \in S*, the linear regression model:

*E_{x*} \sim ∑_{i \in S}G_{i*} *

, fits the average segment expression per cell onto the cell's overall gene coverage. The model's residuals *R_{ij}* reflect inter-cell differences in expression per segment that cannot be explained by differential gene coverage per cell. A first approximation of the segment-by-cell copy number matrix CN is given by:

*CN_{ij} := R_{ij} * (cn_i / \bar{R_{i*}} )*

, where *cn_i* is the population-average copy number of segment *i* derived from DNA-seq. Above transformation of *E_{ij}* into *CN_{ij}* is in essence a numerical optimization, shifting the distribution of each segment to the average value expected from bulk DNA-seq.

Let *x' \in CN* be the measured copy number of a given segment-cell pair, and *x* its corresponding true copy number state. The probability of assigning copy number *x* to a cell *j* at locus *i* depends on:

**A. Cell j's read count at locus i**, calculated conditional on the measurement

, the cumulative confidence of the rules in support of

We first obtain a seed of cell-segment pairs by assigning a-priori copy number states only when

*argmax_{x \in [1,8]} P_A(x|x') + P_B(x) *

Segment-by-cell matrix of copy number states.

Noemi Andor

Andor, N.*, Lau, B.*, Catalanotti, C., Kumar, V., Sathe, A., Belhocine, K., Wheeler, T., et al. (2018) Joint single cell DNA-Seq and RNA-Seq of gastric cancer reveals subclonal signatures of genomic instability and gene expression. doi: https://doi.org/10.1101/445932

Borgelt C & Kruse R. (2002) Induction of Association Rules: Apriori Implementation.

1 2 3 4 5 6 7 8 9 10 11 | ```
##Calculate number of genes expressed per each cell:
data(epg)
gpc = apply(epg>0, 2, sum)
##Call function:
data(eps)
data(segments)
cn=segments[rownames(eps),"CN_Estimate"]
cnps = segmentExpression2CopyNumber(eps, gpc, cn, seed=0.5, nCores=2, stdOUT="log")
head(eps[,1:5]); ##Expression of first five cells
head(cnps[,1:5]); ##Copy number of first five cells
``` |

noemiandor/liayson documentation built on Dec. 13, 2018, 10:24 p.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.