林炳清, 庄泽帆. scSimseq: 非参数单细胞RNA测序数据模拟方法[J]. 应用概率统计, 2023, 39(6): 813-831. DOI: 10.3969/j.issn.1001-4268.2023.06.003
引用本文: 林炳清, 庄泽帆. scSimseq: 非参数单细胞RNA测序数据模拟方法[J]. 应用概率统计, 2023, 39(6): 813-831. DOI: 10.3969/j.issn.1001-4268.2023.06.003
LIN Bingqing, ZHUANG Zefan. scSimseq: A Nonparametric Simulation Method for Single-Cell RNA Data[J]. Chinese Journal of Applied Probability and Statistics, 2023, 39(6): 813-831. DOI: 10.3969/j.issn.1001-4268.2023.06.003
Citation: LIN Bingqing, ZHUANG Zefan. scSimseq: A Nonparametric Simulation Method for Single-Cell RNA Data[J]. Chinese Journal of Applied Probability and Statistics, 2023, 39(6): 813-831. DOI: 10.3969/j.issn.1001-4268.2023.06.003

scSimseq: 非参数单细胞RNA测序数据模拟方法

scSimseq: A Nonparametric Simulation Method for Single-Cell RNA Data

  • 摘要: 随着新一代测序技术的广泛使用,单细胞RNA数据逐渐成为研究的主流对象. 然而,直接从生物体上获取单细胞~RNA~数据往往需要付出不小的成本.如何简单快捷地获取这些数据便是一个重要的问题. 为了满足对比实验的需要,单细胞~RNA~数据的模拟方法通常除了模拟数据的统计量和原始数据接近以外,还需要在模拟数据中能够保留原数据的基因和细胞样本.在这里我们介绍了一种基于数据的模拟方法,在保留原数据的基因和细胞样本的基础上, 不但可以低成本地模拟单细胞RNA数据,同时保证模拟结果和原数据在大部分特征上相似. 通过大量数值实验证明,本文介绍的方法在基因表达的离散程度、0~表达比例、表达异常值等方面都优于其他模拟方法, 而且和实际数据更加接近.

     

    Abstract: With the wide use of new generation sequencing technology, single-cell RNA data has gradually become the mainstream object of research. However, it is costly to obtain single-cell RNA data directly from organisms. Therefore, how to obtain these data simply and quickly is an important problem. In order to meet the needs of comparative experiments, the simulation method of single-cell RNA data usually needs not only the statistics of the simulation data are close to the original data, but also the gene and cell samples that can retain the original data in the simulation data. Here, we introduce a data-based simulation method. On the basis of retaining the gene and cell samples of the original data, we can simulate the single-cell RNA data at low cost and ensure that the simulation results are similar to the original data in most characteristics. Through a large number of numerical experiments, it is proved that the proposed method is superior to other simulation methods in terms of distribution of gene expression.

     

/

返回文章
返回