CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning.

CRISPRpred(SEQ):使用传统机器学习的基于序列的sgRNA对靶标活性预测的方法。

Muhammad Rafid, Toufikuzzaman, Rahman, Rahman (2020) CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning. BMC Bioinformatics (IF: 3) 21(1) 223

Abstract

The latest works on CRISPR genome editing tools mainly employs deep learning techniques. However, deep learning models lack explainability and they are harder to reproduce. We were motivated to build an accurate genome editing tool using sequence-based features and traditional machine learning that can compete with deep learning models. In this paper, we present CRISPRpred(SEQ), a method for sgRNA on-target activity prediction that leverages only traditional machine learning techniques and hand-crafted features extracted from sgRNA sequences. We compare the results of CRISPRpred(SEQ) with that of DeepCRISPR, the current state-of-the-art, which uses a deep learning pipeline. Despite using only traditional machine learning methods, we have been able to beat DeepCRISPR for the three out of four cell lines in the benchmark dataset convincingly (2.174%, 6.905% and 8.119% improvement for the three cell lines). CRISPRpred(SEQ) has been able to convincingly beat DeepCRISPR in 3 out of 4 cell lines. We believe that by exploring further, one can design better features only using the sgRNA sequences and can come up with a better method leveraging only traditional machine learning algorithms that can fully beat the deep learning models.

CRISPR基因组编辑工具的最新作品主要采用深度学习技术。但是,深度学习模型缺乏可解释性,并且很难重现。我们有动力使用基于序列的功能和可以与深度学习模型竞争的传统机器学习来构建准确的基因组编辑工具。在本文中,我们介绍了CRISPRpred(SEQ),一种用于sgRNA靶上活性预测的方法,该方法仅利用传统的机器学习技术和从sgRNA序列中提取的手工特征。我们将CRISPRpred(SEQ)的结果与使用了深度学习管道的最新技术DeepCRISPR的结果进行了比较。尽管仅使用传统的机器学习方法,我们仍然能够令人信服地击败基准数据集中四个细胞系中三个细胞的DeepCRISPR(三个细胞系分别提高2.174%,6.905%和8.119%)。 CRISPRpred(SEQ)能够令人信服地在4种细胞系中的3种中击败DeepCRISPR。我们相信,通过进一步探索,仅使用sgRNA序列即可设计出更好的功能,并且可以利用仅能完全击败深度学习模型的传统机器学习算法提出更好的方法。

Links

http://www.ncbi.nlm.nih.gov/pubmed/32487025
http://dx.doi.org/10.1186/s12859-020-3531-9

Similar articles

Tools