5/20/2023 0 Comments Sequencher mtdna![]() To avoid misinterpretation of data which can arise from sequencing errors as well as low-level contamination of samples, we introduced extensive QC checks. We efficiently parallelized workflow steps such as sequence alignment, per-base alignment scoring (BAQ) ( 13), and heteroplasmy and contamination detection. For handling large studies (>100 samples), we implemented new parallel mechanisms to overcome limitations of local single node architectures. Here we present mtDNA-Server, a highly scalable Hadoop-based server ( 12) for mtDNA NGS data processing. ![]() To eliminate these shortcomings, web servers were implemented (9–11), but they were limited to small input files, revealed shortcomings in usability, overloaded with parameter options, or generate poor and often unreliable results (see Supplementary Tables S1–3). These pipelines facilitate the analysis of mtDNA data, but can be challenging to install. Since the first description of analyzing mtDNA heteroplasmy on NGS devices in 2010 ( 5), several Unix command line pipelines have been presented (6–8). in mitochondrial encephalomyopathy, lactic acidosis and stroke-like episodes (MELAS) ( 4)) its origin and mechanisms to prevail as somatic mutations is largely unknown ( 1). While the role of such variants is acknowledged for some diseases (e.g. While higher error rates within NGS can be opposed with higher sequencing coverage for variant detection, interpretation of results still needs consideration when analyzing variant allele frequencies (VAF) below 10%, the detection limit for Sanger-based sequencing. Furthermore, the paradigm shift from analyzing few reliable long reads (400–800 bp) in Sanger based sequencing to millions of short reads (50–250 bp) in Next Generation Sequencing (NGS) requires new computational models and additional attention interpreting results. Artefacts became even more evident with new and more sensitive sequencing technologies ( 2, 3). The differentiation between real mutational clones and sequencing artefacts can be complex, but is crucial in researching somatic mutations in cancer, neurodegenerative diseases and aging ( 1). Heteroplasmy describes a de novo mtDNA mutation often present in only a few copies. Mitochondrial DNA (mtDNA) is maternally inherited in humans and present in thousands of copies per cell. mtDNA-Server is currently able to analyze the 1000G Phase 3 data ( n = 2,504) in less than 5 h and is freely accessible at. Our evaluation data shows that mtDNA-Server detects heteroplasmies and artificial recombinations down to the 1% level with perfect specificity and outperforms existing approaches regarding sensitivity. We validated the underlying heteroplasmy and contamination detection model by generating four artificial sample mix-ups on two different NGS devices. All computational steps are parallelized with Hadoop MapReduce and executed graphically with Cloudgene. The mtDNA-Server workflow includes parallel read alignment, heteroplasmy detection, artefact or contamination identification, variant annotation as well as several quality control metrics, often neglected in current mtDNA NGS studies. Here we present mtDNA-Server, a scalable web server for the analysis of mtDNA studies of any size with a special focus on usability as well as reliable identification and quantification of heteroplasmic variants. While several pipelines for analyzing heteroplasmies exist, issues in usability, accuracy of results and interpreting final data limit their usage. ![]() intra-individual sequence variation) to a higher level of detail. Next generation sequencing (NGS) allows investigating mitochondrial DNA (mtDNA) characteristics such as heteroplasmy (i.e.
0 Comments
Leave a Reply. |