DeepMAsED: evaluating the quality of metagenomic assemblies

Mineeva, Rojas-Carulla, Ley, Schölkopf, Youngblut (2020) DeepMAsED: evaluating the quality of metagenomic assemblies Bioinformatics (IF: 5.8) 36(10) 3011-3017

Abstract

Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large-scale metagenome assemblies.We present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates a 1% contig misassembly rate in two recent large-scale metagenome assembly publications.DeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modeling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects.DeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED.Supplementary data are available at Bioinformatics online.© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Links

http://www.ncbi.nlm.nih.gov/pubmed/32096824
http://dx.doi.org/10.1093/bioinformatics/btaa124

Similar articles

Tools