Email updates

Keep up to date with the latest news and content from Orphanet Journal of Rare Diseases and BioMed Central.

Open Access Research

Assessment of a targeted resequencing assay as a support tool in the diagnosis of lysosomal storage disorders

Ana Fernández-Marmiesse1*, Marcos Morey1, Merce Pineda2, Jesús Eiris3, Maria Luz Couce1, Manuel Castro-Gago3, Jose Maria Fraga1, Lucia Lacerda4, Sofia Gouveia1, Maria Socorro Pérez-Poyato5, Judith Armstrong6, Daisy Castiñeiras1 and Jose A Cocho1

Author Affiliations

1 Unidad Diagnóstico y Tratamiento de Errores Congénitos del Metabolismo (Servicio de Neonatología), Facultad de Medicina y Odontología de la Universidad de Santiago de Compostela, 15706 Santiago de Compostela, La Coruña, Spain

2 Neuropediatra Fundación Hospital San Juan de Dios, CIBERER, Barcelona, Spain

3 Servicio de Neuropediatría, Hospital Clínico Universitario de Santiago de Compostela, Facultad de Medicina y Odontología de la Universidad de Santiago de Compostela, Santiago de Compostela, La Coruña, Spain

4 Unidade de Bioquímica Genética, Centro de Genética Médica Jacinto Magalhães, Centro Hospitalar do Porto, Porto, Portugal

5 Unidad de Neuropediatría. Hospital Clínico Universitario Marqués de Valdecilla, Santander, Spain

6 Servicio de Genética Molecular, Hospital San Juan de Dios, Barcelona, Spain

For all author emails, please log on.

Orphanet Journal of Rare Diseases 2014, 9:59  doi:10.1186/1750-1172-9-59

The electronic version of this article is the complete one and can be found online at: http://www.ojrd.com/content/9/1/59


Received:16 October 2013
Accepted:7 April 2014
Published:25 April 2014

© 2014 Fernández-Marmiesse et al.; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Abstract

Background

With over 50 different disorders and a combined incidence of up to 1/3000 births, lysosomal storage diseases (LSDs) constitute a major public health problem and place an enormous burden on affected individuals and their families. Many factors make LSD diagnosis difficult, including phenotype and penetrance variability, shared signs and symptoms, and problems inherent to biochemical diagnosis. Developing a powerful diagnostic tool could mitigate the protracted diagnostic process for these families, lead to better outcomes for current and proposed therapies, and provide the basis for more appropriate genetic counseling.

Methods

We have designed a targeted resequencing assay for the simultaneous testing of 57 lysosomal genes, using in-solution capture as the enrichment method and two different sequencing platforms. A total of 84 patients with high to moderate-or low suspicion index for LSD were enrolled in different centers in Spain and Portugal, including 18 positive controls.

Results

We correctly diagnosed 18 positive blinded controls, provided genetic diagnosis to 25 potential LSD patients, and ended with 18 diagnostic odysseys.

Conclusion

We report the assessment of a next–generation-sequencing-based approach as an accessory tool in the diagnosis of LSDs, a group of disorders which have overlapping clinical profiles and genetic heterogeneity. We have also identified and quantified the strengths and limitations of next generation sequencing (NGS) technology applied to diagnosis.

Keywords:
In-solution enrichment; Targeted resequencing; Lysosomal storage disorders; Diagnostic odysseys

Introduction

Lysosomal storage disorders (LSDs) are rare diseases with a combined incidence of ~1 in 1500 to 7000 live births [1,2]. This group of inborn errors of metabolism encompasses >50 different diseases, each characterized by the accumulation of specific substrates [3-6]. Generally, newborns with LSDs appear normal at birth and symptoms develop progressively over the first few months of life. Late-onset juvenile and adult forms of LSDs, resulting from chronic substrate accumulation, also occur, but due to their varied signs and symptoms can have a delayed diagnosis. The recent development and availability of enzyme-replacement therapy (ERT) for several LSDs means that diagnosis early in the clinical process is of particular relevance [7]. Most importantly, early genetic diagnosis can provide parents with realistic information about their child’s prognosis, enable appropriate genetic counseling about future pregnancies, and prevent ‘diagnostic odysseys’ for families [8].

Because the clinical features of many LSDs overlap, establishing a diagnosis solely on the basis of clinical presentation is difficult. Until recently, clinicians have had different ways of approaching LSD diagnosis. The first option has been laboratory assays based on detection of the storage product. Although many of the clinical symptoms of different LSDs result primarily from substrate storage anomalies, presentation varies widely, depending on type, quantity, location, and time of extraction of the accumulated storage material, thus frequently giving rise to false negatives in biopsy analysis. Tests for elevated levels of secreted substrate material are routinely used to examine the pattern of glycosaminoglycans and oligosaccharides in patients suspected of having mucopolysaccharidoses (MPS) or disorders that present with oligosacchariduria. Although urine screens are very sensitive, affected individuals with normal urine screens have been reported mainly in young and adults; thus, when there is a strong index of suspicion, normal urine screening results should still be followed by enzyme analysis [9-11]. Enzyme activity detected in blood spots, either individually or simultaneously, is useful in the diagnosis of a small number of LSDs, but needs verification with a second type of assay; while measurement of enzyme activity in leukocytes and plasma serves this purpose for most LSDs, a proportion of cases may also not be detected using this method. Other limitations with enzyme activity tests is that they cannot detect heterozygous carriers of a disease, and are not suitable for potentially oligogenic LSDs (not described so far for LSD but recognized for retinitis pigmentosa, deaffness or ciliopathies) [12]. All of the above methods are laborious, time-consuming, and require accurate clinical diagnosis to reduce the number of enzymatic assays used for each patient. Moreover, all these techniques are semi-quantitative and subject to high variability, leading to false positives and negatives.

In summary, diagnosis of LSDs represents a challenge for clinicians and can take several years. Even reaching a diagnosis with traditional techniques, a genetic diagnosis also has to be made in order to provide the family with appropriate genetic counseling, itself arguably as important as the diagnosis. Genetic analysis is usually not performed as the primary screening tool in the diagnosis of LSD due to the cost and delay incurred by the sequential genetic tests necessary to diagnose any particular disorders. However, with the availability of next-generation sequencing (NGS) technologies, a genetically based diagnosis can be completed in 4 or 6 weeks, while reducing the cost to that of Sanger sequencing a single gene [13-15]. Here, we present the results of a pilot project to evaluate the application of NGS to mutation screening in a diagnostic context. We show the strengths and limitations of this approach and, although this assay would never suffice as the sole diagnostic tool, we propose it as a useful adjunct to diagnosis for specialists in everyday clinical management who might suspect an LSD, given its ability to provide accurate information in a short time.

Methods

Ethics statement

The study protocol adhered to the tenets of the Declaration of Helsinki and was approved by the local Ethics Committee (Comité Ético de Investigación Clínica de Galicia - CEIC). Informed written consent was obtained from each study participant. Index patients underwent a full neurological examination in each source hospital. Whenever available, blood samples from affected and unaffected family members were collected for co-segregation analysis.

Probands

A total of 84 probands were collected from different institutions in Spain and Portugal, including 18 positive controls, and 66 patients with a suspected LSD. Positive controls underwent biochemical test and Sanger sequencing. Analyses of 13 controls and 33 patients were performed with SOLiD and 5 controls and 33 patients with Illumina platform.

LSD diagnostic suspicion index

Subjective parameter chosen by the clinical specialists who managed each case. We asked them to choose between three degree of suspicion: high (you believe your patient has a lysosomal disease with a high probability due to biochemical or clinical data), low (your main suspicion is another condition but there is a low probability that even if it is a lysosomal disease and it is important to discard it), medium (in the differential diagnosis are lysosomal diseases).

Capture array design, library construction, and NGS

A custom Sure Select oligonucleotide probe library was designed to capture the 551 exons and exon-intron-boundaries of 57 genes known to be associated with LSDs, according to GeneReviews (NCBI) (Table 1) [16]. Design includes all transcripts from each target gene. The eArray web-based probe design tool was used for this purpose [17]. The following parameters were chosen for probe design: 120 bp length for baits, 5X probe-tiling frequency, and 20 bp overlap in restricted regions identified by the implementation of eArray’s Repeat Masker program. A total of 5037 unique baits, covering 183,440 bp, were generated and synthesized by Agilent Technologies (Santa Clara, CA, USA). Sequence capture, enrichment, and elution were performed according to the manufacturer’s instructions.

Table 1. Genes included in the NGS-LSD assay and their associated disorders

SOLiD4 platform

Briefly, 3–4 μg of each genomic DNA was fragmented by sonication (Covaris S2, Massachusetts, USA), purified to yield 150–180 bp fragments and end-repaired. Adaptor oligonucleotides from Agilent technologies were ligated on repaired DNA fragments, which were then purified, size-selected by gel electrophoresis, nick-translated, and amplified by 12 PCR cycles. The libraries (500 ng) were then hybridized to the Sure Select biotinylated-RNA capture library for 24 h. After hybridization, washing, and elution, the captured fraction underwent 12 cycles of PCR amplification with barcoded primers followed by purification and quantification by qPCR. Forty-eight barcoded samples were then pooled in groups for sequencing on a SOLiD4 platform as single end 50 bp reads. 12 sample libraries were loaded per octet of SOLiD4 slide.

HiSeq2000 platform

The library preparation for capturing of selected DNA regions was performed according to the SureSelect XT Target Enrichment System protocol for Illumina paired-end sequencing (Agilent). In brief, 3 μg of genomic DNA was sheared on a Covaris™ E220 focused-ultrasonicator. Fragment size (150-200 bp) and quantity were confirmed with an Agilent 2100 Bioanalyzer 7500 chip. The fragmented DNA was end-repaired, adenylated and ligated to Agilent indexing-specific paired-end adaptors. The DNA with adaptor-modified ends was PCR amplified (6 cycles, Herculase II fusion DNA polymerase) with SureSelect primers, quality controlled using the DNA 7500 assay specific for a library size of 250–350 bp, and hybridized for 24 hr at 65°C. The hybridization mixture was washed in the presence of magnetic beads (Dynabeads MyOne Streptavidin T1, Life Technologies), and the eluate PCR amplified (16 cycles) to add index tags using SureSelectXT Indexes for Illumina. The final library size and concentration was determined using an Agilent 2100 Bioanalyzer 7500 chip and sequenced on an Illumina HiSeq 2000 platform with a paired-end run of 2 × 76 bp, following the manufacturer’s protocol. 36 sample libraries were loaded in three lanes of HiSeq 2000.

Data filtering and analysis pipeline

SOLiD4 platform

Image analysis and base calling was performed using the SETS (SOLiD experimental Tracking Software) pipeline to generate primary data. Sequence reads were aligned to the reference human genome UCSC hg19 using Life Technologies’ BioScope suite v1.3.1. Default parameters, recommended for targeted resequencing, were used. Variant calling was performed using two software programs in parallel: the diBayes alignment algorithm embeded in the Bioscope suite [18] and the Genome Analysis Toolkit (GATK) v1.5, a software package developed at the Broad Institute (Cambridge, MA) to analyze next-generation resequencing data [19]. The GATK Unified Genotyper is a state-of-the-art variant caller for NGS data and used extensively in human sequencing projects. The variant detection pipeline uses well-established statistical models for recalibration of the base quality score and variant calling. Low stringency parameters were selected to avoid false negatives although a high rate of false positives was expected.

HiSeq2000 platform

Base calling and quality control were performed on the Illumina Real Time Analysis (RTA) sequence analysis pipeline. Sequence reads were trimmed to keep only those bases with a quality index > 10 and then mapped to Human Genome build hg19 (GRCh37), using a Genome Multitool (GEM) [20] and allowing up to 4 mismatches. Reads not mapped by GEM were submitted to a last round of mapping with BLAT-like Fast Accurate Search Tool (BFAST) [21]. Uniquely mapping non-duplicate read pairs were locally realigned with GATK. Samtools suite [22] was used to call single nucleotide variants (SNVs) and short INDELs, taking into account all reads per position. Variants on regions with low mappability or variants in which there was not at least one sample with read depth ≥10 were filtered out.

Sanger sequencing

To verify the DNA sequence variants detected by NGS, we amplified the target sites and flanking sequences of each variant with specific primers designed using the free software Primer3 v.0.4.0 [23]. Next, we sequenced the PCR products using the Sanger method to ascertain the precision of the variants identified by NGS. Sequencing reactions consisted of 1.0 μl of previously purified PCR products (ExoSAP-IT, USB, Cleveland, OH), 1 μl of each primer, and 1 μl of Big Dye Terminator v3.1 from the Cycle Sequencing kit (Applied Biosystems, Foster City, CA). The reactions were run in an ABI 3730 DNA Analyzer (Applied Biosystems, Foster City, CA). Analysis was performed with the Staden package free software.

Detection of gross indels

A simple homemade Excel table was designed for detection of gross deletions and duplications encompassing one or more exons. The table contained coverage peak areas for each exon and patient. Ratios of successive peak areas between different patients were compared to identify homozygous and heterozygous gross indels. To establish the extremes of deletion it is necessary to perform cDNA Sanger sequencing studies.

Filtering of annotated sequence data

We received the bam files and the annotated sequencing variants in Excel tables from the two platforms. Raw data were filtered with custom designed scripts using the free, open source statistical software R package [24]. We have submitted all novel variants included in the article to ClinVar database [25] from NCBI and to the Locus Specific Mutation Databases (LSMDs) from Human Genome Variation Society [26].

Assessment of the pathogenicity of variants

The following criteria were applied to evaluate the pathogenic nature of novel variations identified by NGS: 1) stopgain, frameshift and splicing variants, considered the most likely to cause disease; 2) the presence of a second mutant allele, taken to indicate recessive inheritance; 3) cosegregation; 4) the absence of the variant in other samples; 5) frequency <0.01 in the 1000 g2012 database; and 6) for missense mutations, amino acid conservation and prediction of pathogenicity. To evaluate criterion 6), we used the freely available bioinformatics web tools: SIFT (Sorting Intolerant From Tolerant) [27], Polyphen-2 [28] and Mutation Taster [29]. To evaluate the possible effect of synonymous variant in gene splicing we used the Human Splicer Finding web tool [30].

Results and discussion

Statistical data from two platforms

Total target sequence length, including coding exons and exon-intron junctions from 57 LSD genes, was 183,440 bp. Coverage distribution across the designed region was replicated among the samples for each platform. The overall sequence depth was 456X for SOLiD4 and 7,416X for HiSeq2000. SOLiD4 yielded a total of up to 158,863,367 raw 50-mer reads with 87.28% mapped reads overall. A total of 94.5% of bases were covered by at least 20 reads and the mean percentage of bases not covered per sample was 2.6%. The HiSeq2000 run resulted in (1292,371,000) QC-passed reads with 88.47% mapped reads overall. A total of 99.97% of bases were covered by at least 20 reads, involving a percentage of target bases with less than 20 reads of only 0.03%.

Limitations of the enrichment method

The most important challenge for this kind of enrichment is that local sequence architecture has a strong effect on the efficiency of DNA enrichment for individual exons [31]. Exons close to repetitive regions are not fully covered, because the eArray web tool does not design baits in repetitive regions. In our assay 5 exons were affected by this initial lack of baits. We tried to minimize this pitfall by using the 5X bait-tailing option, to reduce the gaps as much as possible. However, our results showed that such an approach overcame this problem only if coverage was increased maximally. Another significant limitation of the enrichment method is that coverage decreases dramatically, even to zero, for exons located in CpG islands. With 456X coverage, 22 of 57 genes showed gaps in coverage (<20X) in one exon, and 6 in more than one exon, with the most damaged genes being IDUA, GBA, and GAA. Thus, our NGS-LSD assay did not detect mutations in the IDUA gene in a patient with biochemically confirmed Hurler disease. With 7416X coverage no gaps were found in any of LSD genes. Therefore, to establish this assay as a diagnostic screening tool for routine use it will be essential to eliminate sequence gaps. Because enrichment failures with hybrid capture were reproducible, they may be amenable to rescue by individual PCRs or probe redesign.

Filtering of raw data

The main challenge of using NGS for diagnostic applications will be in interpreting the massive number of genomic variants detected by sequencing platforms. Fast and reliable identification of causative variants will be crucial to the implementation of this technology in diagnostics. In our study, simple filters based on variant function and frequency, and careful selection of index cases versus controls, was found to be a useful way of discriminating between pathogenic mutations and background polymorphisms. Priority was given to variants considered to be specific to individual patients, based on the assumption that a mutation underlying such a monogenic disorder is highly penetrant and rare. It was also assumed that these mutations are likely to be coding and to have a major phenotypic effect.

From the 12,179 and 52,303 variants detected by the SOLiD4 and HiSeq platforms, respectively, successive filters were applied to reduce these numbers to 77 and 68. SOLiD4 raw data provided by Bioscope SNP calling software were filtered twice, using specifically designed R scripts (Figure 1). Filter 1 selected variants with the following specific features: 1) non-reference allele (NRA) frequency ≤0.01 in the 1000 g2012 database; 2) location in exonic or splicing regions; 3) non-synonymous, indels, stopgain, or splicing; and 4) NRA frequency ≤4% in our study group. This filter reduced the initial number of variants to 219. Filter 2 then discriminated between real and false-positive changes, by fulfilling at least 3 of following conditions: 1) coverage of position ≥20; 2) percentage of NRA reads ≥30; 3) mean quality values (MQV) of reference and variant ≥15; 4) [reference-variant] MQV ≤5. The number of variants was reduced to 77, implying a false-positive rate of 65%. Careful observation of the false-positive variants showed that they appeared in characteristic areas which either had a low coverage or a sudden drop in coverage, or were considered coverage ‘valleys’. Therefore such false positives were the result of coverage irregularities arising from local sequence architecture, and were largely avoidable, either by improving enrichment efficiency or increased coverage, as shown by the results obtained with the Illumina NGS-LSD assay. Filter 2 was also adjusted by checking variants by Sanger sequencing (see next section) and comparing data from two different variant-calling software packages, Lifescope and GATK. Variants obtained from HiSeq2000 passed through the same Filter 1 as in SOLiD4, reducing the number of variants directly to 68. In this case, a second filter was not necessary as the high coverage achieved meant that the false-positive rate was insignificant using HiSeq2000.

thumbnailFigure 1. Global flowchart of the filtering pipeline used for selection of most likely pathogenic mutations, starting from SOLiD4 raw data. (NRA: non-reference allele).

Assessment of the novel genetic tool for LSD diagnosis

All putatively pathogenic variants detected by either of the two platforms were subsequently tested by Sanger sequencing of both the index case and parents, and all were confirmed. A more thorough analysis of the variants detected by SOLiD4 was carried out, to test the reliability of Filter 2 in discrimination of real from false variants. Of the 71 missense variants which passed Filter 2 and were confirmed with Sanger sequencing (see previous section), 83% were found to be real and 17% false-positive. Of 48 variants which did not pass through Filter 2, 14 were additionally selected for having values near the limit for one or more conditions; all of these were then verified as false-positive. Therefore, while overall we had a false-positive rate of 17%, we conclude that the second filter detected real variants with a high degree of confidence. If, on the other hand, we had only allowed variants fulfilling all 4 conditions of Filter 2 to pass through, we would not have had any false positives but instead we would have had a rate of 9% of false negatives. A comparison of the two variant calling software packages was also found to be useful in discriminating between real and false variants, as none of the false positives seen using Lifescope appeared when using GATK and vice versa.

Detection of gross deletions

One advantage of NGS-based strategies, as opposed to Sanger sequencing, is that, in addition to SNVs and small indels, they can also detect gross deletions and insertions which affect one or more exons. Thus, we detected two heterozygous macrodeletions, one an exon 5 deletion in GLB1 (P20), and another an exon 7 + 8 deletion in CLN3 (C18) (Figure 2). We also detected the same CLN3 macrodeletion in a homozygous state (C2 and P3; not shown).

thumbnailFigure 2. Heterozygous macrodeletions detected by LSD-NGS. A) GLB1 exon 5 deletion; B) CLN3 exon 7 + 8 deletion. The area of the coverage peak for each deleted exon is half that seen in the corresponding control sample.

Limitations of SNP calling software

Using Bioscope software, heterozygous variants were found with a percentage of NRA reads between 30 and 50%, while homozygous variants had percentages of between 70 and 100%. Lower percentages were seen for small deletions and insertions, comprising a potential problem with this technology. On the other hand, a comparison of the two SNP calling software packages showed there to be a significant proportion of false negatives with each package. Thus, of 105 variants confirmed by Sanger sequencing, Bioscope had failed to detect 3 (2.8%) and GATK failed to detect 9 (8.5%). This illustrates the importance of using more than one type of variant calling software in parallel, to discriminate between real and false variants as well as to reduce the false-negative rate as much as possible.

Diagnosis achieved with the NGS-LSD screening tool

We applied the NGS-LSD method to 84 probands with a spectrum of early-onset neurodegenerative disorders that were potentially caused by a deficiency in one of the 57 LSD proteins (Figure 3). From these 84, we selected 18 control samples with mutations previously identified by Sanger sequencing, three of which had only a single mutation in the affected gene (C14, 15, 18). Of the 33 mutations expected, all but one was detected using the NGS-LSD assay (Table 2). The undetected mutant was located on exon 1 of the NAGLU gene (C8), which was not covered by hybridization baits, due to its being located close to repetitive regions. It is important to highlight that this mutation would have been detected with Illumina platform. For the 66 unclassified patients, an LSD suspicion index was assigned, as low, moderate, or high in each case. For each case, clinical data and tests carried out for diagnostic purposes, including genetic, were collected (Table 3). Based on the presence of one or two mutations, we were able to achieve genetic diagnosis in 26 patients. Twenty-two of these were found to carry two mutations in the same gene, consistent with clinical and/or biochemical features and establishing a genetic diagnosis (Table 4). A further 26 patients were found to be carriers of mutations (Tables 4 and 5). In all cases, the gene harboring a mutation was carefully examined to find any other variant that might have been initially undetected due to its location in an intronic region, synonymity, or being a gross indel. Thus, a second mutation was found in 4 patients (P11, 20, 21, 23). Of the remaining carriers, we carried out biochemical assays, to confirm or exclude disease, in four patients (P12, 27, 28, 29) chosen for the severity of the variant found and correlation between genotype and phenotype. Biochemical tests on P27, P28 and P29 gave negative results. In P12 the diagnosis of NPC2 could not be confirmed but it could not be discarded neither (filipin staining was unconclusive but this is not unusual in a percentage of NPC2 patients and only one heterozygous pathogenic mutation was found). It is important to emphasize that in the current study we were readily able to confirm biochemically mutation pathogenicity in the majority of genes analyzed, but this is not possible for pathologies that have no biochemical markers and in which confirmation will need to be obtained by other methods. No mutations were found in 19 probands (p49-p65), but the LSD index of suspicion was low for all but one of these.

thumbnailFigure 3. Diagnosis achieved using NGS-LSD. (1x: one mutation found; 2x: two mutations found in the same gene; VCS: variant calling software; ??: not confirmed.)

Table 2. Results obtained for positive controls included in NGS-LSD assay

Table 3. Diagnostic suspicions and biochemical/histopathological tests for patients diagnosed using the NGS-LSD tool

Table 4. Diagnosis achieved using the NGS-LSD tool

Table 5. Results found for patients remaining un-diagnosed

Unexpected diagnoses

In cases P7, P9 and P11, the diagnoses were unexpected, adding significant value to our method. In case P7, the patient’s history did not initially suggest type 1 gangliosidosis (GM1), due to presence of cerebellar atrophy on magnetic resonance imaging (MRI) at first consultation (age 2 years) and a consistently raised lactic acid level in both blood and cerebrospinal fluid (CSF). Together with updated clinical data, therefore, a mitochondrial cytopathy seemed likely and a muscle biopsy was performed, which showed a slight deficiency in complex I of the mitochondrial respiratory chain. Subsequent electroneurography (at 4.7 years) showed results consistent with a mixed polyneuropathy. This finding, along with the clinical data, MRI, and minimal deficiency in complex I, suggests a probable neuroaxonal dystrophy. With enzymatic and genetic data from the NGS-LSD assay, we concluded that this child suffered from a late infantile GM1 that could be categorized as atypical due to the presence of cerebellar atrophy, mixed polyneuropathy, and increased lactate in the blood and CSF. In classical GM1, MRI is usually normal or shows some late brain atrophy, while polyneuropathy and high lactate levels are not usually present.

Case P9 was shown to have juvenile type 2 gangliosidosis (GM2), following a delayed suspicion of a storage disorder due to the late onset and low specificity of symptoms, and slow progression of the disease. In case P11, ceroid lipofuscinosis neuronal (CLN) was initially suspected due to epilepsy, cognitive regression, and a loss of memory and expressive language. Progressive multifocal epilepsy was refractory to combination therapy, the patient’s vision worsened, and he appeared ataxic and lost bowel control. There was also a reduced retinal vascular tree with papillary pallor, thought to signify the onset of retinopathy; however the detection of a severe mutation in the GM2-associated gene HEXA altered the diagnosis. P11 currently has progressive optic atrophy. Whilst infantile GM2 is easy to recognize due to a cherry-red spot in the fundus, juvenile GM2 is difficult without the identification of genetic variants, as discussed above. As it is shown in the Table 4, second mutation in HEXA for P11 was synonymous. Familial co segregation was demonstrated and the program Human splicing finder predicted the disruption of an enhancer motif for SRp40 protein and the simultaneous creation of a new silencer motif.

A finding of interest was the presence of a hemizygous variant of the Hunter-associated gene IDS in patient P25. While this mutation is already registered in Human Gene Mutation Database (HGMD) as associated with the Hunter phenotype, the patient showed none of the clinical features usually found in Hunter patients; he did not showed increased levels of urine GAGs and no skeletal deformities were shown in radiological test, excluding completely a Hunter diagnosis. This lends further weight to what has long been suspected, that we have only seen the ‘tip of the iceberg’ of genotype-phenotype associations in clinical genetics and that NGS will change some universally accepted paradigms and lead to new genetic prescripts, such as the implication of more than one gene in classically monogenic disorders [32].

Time-to-diagnosis and cost: comparisons with Sanger sequencing

The period between onset of disease and genetic diagnosis (GDD column, Table 4) also enhances the value of the NGS-LSD tool. Diagnostic delay in our study was on average 7 (range 1–23) years. Using NGS-LSD ended 14–15 diagnostic odysseys, thus opening the way for genetic counseling and carrier tests. In terms of cost, the difference between classical and next-generation sequencing technology is $2400 and $0.1 per million bases, respectively. Two examples from the current study serve to illustrate the difference between two technologies. Although a CLN was strongly suspected, patient P5 (with 13 years of diagnostic delay) underwent five classical genetic analyses without a positive result and at a total cost of about 4000€. With an early NGS-LSD assay, the diagnosis could have been made in 8 weeks at a cost of about 900€. In case P3 several diseases were suspected over a period of 12 years, including GM1, Sandhoff, galactosialidosis, Schindler disease and CLN. If the NGS-LSD tool had been used on first suspicion, the time-to-diagnosis would have been reduced to 1–2 months, and the cost to that of a single genetic analysis.

Future approaches to reduce non-diagnosis

The current study ended with 39–40 patients still undiagnosed, although 64% of these had a low or moderate index of suspicion of LSD. Clearly, these undiagnosed cases indicate a significant problem with our proposed tool, which is due to the considerable phenotypic overlap between many LSD and non-LSD neurodevelopmental conditions for which the genetic variants are not included in the NGS-LSD technique. To address this problem, we are currently developing a broad-range genetic panel that encompasses most known neurometabolic disorders, the NGS-NMD1 tool, which will be updated as new genes come to light with continued research. Such an approach will form an important next step in optimizing the neurometabolic diagnostic process. Six of our undiagnosed patients are currently undergoing whole-exome resequencing.

Application of NGS-LSD to clinical diagnosis

NGS technologies have significantly improved sequencing capacity in the past 5 years. These technologies are now widely used for research purposes and are starting to find their way into clinical applications [33-35]. Whole-genome and exome sequencing approaches are being successfully implemented in research projects [31,36-42], but are not yet routine strategies in diagnosis due to high costs, long turnover times (run and analysis time) and ethical issues. Targeted resequencing, on the other hand, is appealing in a clinical setting due to lower sequencing costs, shorter sequencing time, simpler data analysis, and greater sensitivity per gene due to the greater coverage achieved [43,44]. The results of the current study also illustrate the importance of good coverage to reliability in detection of mutations in NGS in clinical settings.

Due to the nature of the technology, NGS also brings with it uncertainties and limitations of a higher order of magnitude than previously seen in genetic diagnosis. However, uncertainty in genetic testing is nothing new, and we also have the foresight and fortitude to develop tools to deal with these issues. Assays must also be carefully validated with pilot projects to assess sensibility and accuracy of these new technologies. Several groups are working in this direction, designing resequencing assays for a panel of genes related to groups of diseases that have similar clinical manifestations and difficult diagnoses [45-54].

To the best of our knowledge, the current study is the first to use NGS in LSD screening. Overall, it shows that a high rate of detection of mutations is possible when sequence coverage is sufficient and gaps due to limitations in the enrichment method can be overcome. Our results show that the combination of in-solution based capture and NGS can be used for the parallel screening of multiple disease genes and can successfully identify disease-causing mutations. This assay, therefore, can be used as a support genetic tool, that always in combination with biochemical and clinical data could facilitate a diagnosis. Finally, we have shown the power of our approach as a tool for making diagnoses that are particularly challenging and for bringing diagnostic odysseys to a more rapid conclusion. It will be important, therefore, to continue to work with specialists to optimize this powerful and promising technology for the benefit of patients and their families”.

Abbreviations

LSDs: Lysosomal storage disorders; NGS: Next-generation sequencing; ERT: Enzyme-replacement therapy; MPS: Mucopolysaccharidoses; SETS: SOLiD experimental Tracking Software; GATK: Genome Analysis Toolkit; RTA: Illumina Real Time Analysis; GEM: Genome Multitool; BFAST: BLAT-like Fast Accurate Search Tool; SNVs: Single nucleotide variants; LSMDs: Locus Specific Mutation Databases; SIFT: Sorting Intolerant From Tolerant; NRA: Non-reference allele; MQV: Mean quality values; GM1: Type 1 gangliosidosis; GM2: Type 2 gangliosidosis; MRI: Magnetic resonance imaging; CSF: Cerebrospinal fluid; CLN: Ceroid lipofuscinosis neuronal; HGMD: Human gene mutation database.

Competing interests

The authors declared that they have no competing interest.

Authors’ contributions

AFM was involved in the study design, choice of genes, interpreting the NGS data, DNA sample and clinical data collection and writing the manuscript. MM collaborated in study design, choice of genes, DNA extraction, in solution enrichment and sequencing in SOLID platform, and Sanger sequencing of detected variants. JE, MP, MCG, MLC, JMF, MSPP and JA collaborated in the patients and clinical data collection, evaluation of data analysis results and managing biochemical and biopsy test to support results. JMF also helped to write manuscript. LL collaborated with blinded controls and some patients. SG was involved in DNA extraction, sample preparation for NGS assays, Sanger sequencing of detected variants and draft manuscript correction. DC and JAC coordinated and integrated the different works, especially in the laboratory landscape and helped to elaborate final manuscript. All authors read and approved the final manuscript.

Acknowledgements

We are indebted to the families affected by LSDs who participated by allowing us to use genetic samples and clinical information. We also thank the clinicians who collected samples and data, including those not presented in the final manuscript. This study was financially supported by a grant from the Spanish Ministerio de Sanidad y Consumo (PI10/1193).

References

  1. Stone DL, Sidransky E: Hydrops fetalis: lysosomal storage disorders in extremis.

    Adv Pediatr 1999, 46:409. PubMed Abstract OpenURL

  2. Fletcher JM: Screening for lysosomal storage disorders–a clinical perspective.

    J Inherit Metab Dis 2006, 29:405-408. PubMed Abstract | Publisher Full Text OpenURL

  3. Staretz-Chacham O, Lang TC, LaMarca ME, Krasnewich D, Sidransky E: Lysosomal storage disorders in the newborn.

    Pediatrics 2009, 123:1191-1207. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  4. Beaudet A, Scriver C, Sly W, Valle D: Genetics, biochemistry and molecular basis of variant human phenotypes. In The Metabolic and Molecular Bases of Inherited Disease. 7th edition. Edited by Scriver C, Beaudet A, Sly W, Valle D. New York: McGraw-Hill; 1995:53-118. OpenURL

  5. Wraith JE: Lysosomal disorders.

    Semin Neonatol 2002, 7:75-83. PubMed Abstract | Publisher Full Text OpenURL

  6. Vellodi A: Lysosomal storage disorders.

    Br J Haematol 2005, 128:413-431. PubMed Abstract | Publisher Full Text OpenURL

  7. Meikle PJ, Hopwood JJ: Lysosomal storage disorders: Emerging therapeutic options require early diagnosis.

    Eur J Pediatr 2003, 162(Suppl 1):34-37. OpenURL

  8. Altarescu G, Beeri R, Eiges R, Epsztejn-Litman S, Eldar-Geva T, Elstein D, Zimran A, Margalioth EJ, Levy-Lahad E, Renbaum P: Prevention of lysosomal storage diseases and derivation of mutant stem cell lines by preimplantation genetic diagnosis.

    Mol Biol Int 2012, 797342-797342.

    doi:10.1155/2012/797342

    OpenURL

  9. Harzer K, Cantz M, Sewell AC, Dhareshwar SS, Roggendorf W, Heckl RW, Schofer O, Thumler R, Peiffer J, Schlote W: Normomorphic sialidosis in two female adults with severe neurologic disease and without sialyl oligosacchariduria.

    Hum Genet 1986, 74:209-214. PubMed Abstract OpenURL

  10. Peelen GO, De Jong J, Wevers RA: HPLC analysis of oligosaccharides in urine from oligosaccharidosis patients.

    Clin Chem 1994, 40:914-921. PubMed Abstract | Publisher Full Text OpenURL

  11. Harzer K, Rolfs A, Bauer P, Zschiesche M, Mengel E, Backes J, Kustermann-Kuhn B, Bruchelt G, Van Diggelen OP, Mayrhofer H, Krageloh-Mann I: Niemann-pick disease type A and B are clinically but also enzymatically heterogeneous: Pitfall in the laboratory diagnosis of sphingomyelinase deficiency associated with the mutation Q292 K.

    Neuropediatrics 2003, 34:301-306. PubMed Abstract | Publisher Full Text OpenURL

  12. Schäffer AA: Digenic inheritance in medical genetics.

    J Med Genet 2013, 50(10):641-652. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  13. Tucker T, Marra M, Friedman JM: Massively parallel sequencing: The next big thing in genetic medicine.

    Am J Hum Genet 2009, 85:142-154. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  14. Mardis E: The impact of next-generation sequencing technology on genetics.

    Trends Genet 2008, 24:133-141. PubMed Abstract | Publisher Full Text OpenURL

  15. García JL: Aceleración de la secuenciación genómica (rebajas en la secuenciación).

    2010.

    http://goo.gl/kpbXA webcite

  16. Gnirke A, Melnikov A, Maguire J, Rogov P, Leproust EM, Brockman W, Fennell T, Giannoukos G, Fisher S, Russ C, Gabriel S, Jaffe DB, Lander ES, Nusbaum C: Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing.

    Nat Biotechnol 2009, 27:182-189. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  17. Agilent Technologies eArray [http://earray.chem.agilent.com/earray webcite]

  18. McKernan KJ, Peckham HE, Costa GL, McLaughlin SF, Fu Y, Tsung EF: Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding.

    Genome Res 2009, 19:1527-1541. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  19. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA: The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

    Genome Res 2010, 20:1297-1303. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  20. Marco-Sola S, Sammeth M, Oacute RG, Ribeca P: The GEM mapper: fast, accurate and versatile alignment by filtration.

    Nat Meth 2012, 1:7. OpenURL

  21. Homer N, Merriman B, Nelson SF: BFAST: an alignment tool for large scale genome resequencing.

    PLoS ONE 2009, 4:e7767. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  22. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecases G, Durbin R, 1000 Genome Project Data Processing Subgroup: The sequence alignment/map format and SAMtools.

    Bioinformatics 2009, 25:2078-2079. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  23. Primer 3 [http://bioinfo.ut.ee/primer3-0.4.0/ webcite]

  24. R Project for Statistical Computing [http://www.r-project.org webcite]

  25. ClinVar [http://www.ncbi.nlm.nih.gov/clinvar/ webcite]

  26. Human Genome Variation Society – Locus Specific Mutation Databases [http://www.hgvs.org/dblist/glsdb.html webcite]

  27. Sorting Intolerant From Tolerant [http://sift.bii.a-star.edu.sg/ webcite)]

  28. PolyPhen-2: prediction of functional effects of human nsSNPs [http://genetics.bwh.harvard.edu/pph2/ webcite]

  29. Mutation T@sting [http://www.mutationtaster.org/ webcite]

  30. Hamroun D, Desmet FO, Lalande M: Human Splicing Finding Error. [http://www.umd.be/HSF/4DACTION/input_SSF webcite]

  31. Hoischen A, Van Bon B WM, Gilissen C, Arts P, Van Lier B, Steehouwer M, De Vries P, De Reuver R, Wieskamp N, Mortier G, Devriendt K, Amorim M, Revencu N, Kidd A, Barbosa M, Turner A, Smith J, Oley C, Henderson A, Hayes I, Thompson E, Brunner H, De Vries B, Veltman J: De novo mutations of SETBP1 cause schinzel-giedion syndrome.

    Nat Publ Group 2010, 42:483-485. OpenURL

  32. Badano JL, Katsanis N: Beyond mendel: an evolving view of human genetic disease transmission.

    Nat Rev Genet 2002, 3:779-789. PubMed Abstract | Publisher Full Text OpenURL

  33. Klee EW, Hoppman-Chaney NL, Ferber MJ: Expanding DNA diagnostic panel testing: is more better?

    Expert Rev Mol Diagn 2011, 11:703-709. PubMed Abstract | Publisher Full Text OpenURL

  34. Mardis ER: New strategies and emerging technologies for massively parallel sequencing: applications in medical research.

    Genome Med 2009, 1:40. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  35. Baudhuin LM: A new era of genetic testing and its impact on research and clinical care.

    Clin Chem 2012, 58:1070-1071. PubMed Abstract | Publisher Full Text OpenURL

  36. Gilissen C, Arts HH, Hoischen A, Spruijt L, Mans DA, Arts P, Van Lier B, Steehouwer M, Van Reeuwijk J, Kant SG, Roepman R, Knoers NV, Veltman JA, Brunner HG: Exome sequencing identifies WDR35 variants involved in Sensenbrenner syndrome.

    Am J Hum Genet 2010, 87:418-423. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  37. Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, Dent KM, Huff CD, Shannon PT, Jabs EW, Nickerson DA, Shendure J, Bamshad MJ: Exome sequencing identifies the cause of a mendelian disorder.

    Nat Publ Group 2010, 42:30-35. OpenURL

  38. Ng SB, Bigham AW, Buckingham KJ, Hannibal MC, McMillin MJ, Gildersleeve H, Beck A, Tabor H, Cooper G, Mefford H, Lee C, Turner E, Smith J, Rieder M, Yoshiura K, Matsumoto N, Ohta T, Niikawa N, Nickerson D, Bamshad M, Shendure J: Exome sequencing identifies MLL2 mutations as a cause of kabuki syndrome.

    Nat Publ Group 2010, 42:790-793. OpenURL

  39. Haack TB, Danhauser K, Haberberger B, Hoser J, Strecker V, Boehm D, Uziel G, Lamantea E, Invernizzi F, Poulton J, Rolinski B, Iuso A, Biskup S, Schmidt T, Mewes HW, Wittig I, Meitinger T, Zeviani M, Prokisch H: Exome sequencing identifies ACAD9 mutations as a cause of complex I deficiency.

    Nat Publ Group 2010, 42:1131-1134. OpenURL

  40. Choi M, Scholl UI, Ji W, Liu T, Tikhonova IR, Zumbo P, Nayir A, Bakkaloglu A, Ozen S, Sanjad S, Nelson-Williams C, Farhi A, Mane S, Lifton R: Genetic diagnosis by whole exome capture and massively parallel DNA sequencing.

    Proc Natl Acad Sci U S A 2009, 106:19096-19101. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  41. Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, Lee C, Shaffer T, Wong M, Bhattacharjee A, Eichler EE, Bamshad M, Nickerson DA, Shendure J: Targeted capture and massively parallel sequencing of 12 human exomes.

    Nature 2009, 461:272-276. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  42. Vissers LELM, De Ligt J, Gilissen C, Janssen I, Steehouwer M, De Vries P, Van Lier B, Arts P, Wieskamp N, Del Rosario M, Van Bon BW, Hoischen A, De Vries BB, Brunner HG, Veltman JA: A de novo paradigm for mental retardation.

    Nat Publ Group 2010, 42:1109-1112. OpenURL

  43. Wendl MC, Wilson RK: The theory of discovering rare variants via DNA sequencing.

    BMC Genomics 2009, 10:485. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  44. Mefford HC: Diagnostic exome sequencing — are we there yet?

    N Engl J Med 2012, 367:1951-1953. PubMed Abstract | Publisher Full Text OpenURL

  45. Hoischen A, Gilissen C, Arts P, Wieskamp N, van der Vliet W, Vermeer S, Steehouwer M, De Vries P, Meijer R, Seiqueros J, Knoers NV, Buckley MF, Scheffer H, Veltman JA: Massively parallel sequencing of ataxia genes after array-based enrichment.

    Hum Mutat 2010, 31:494-499. PubMed Abstract | Publisher Full Text OpenURL

  46. Audo I, Bujakowska KM, Leveillard T, Mohand-Said S, Lancelot ME, Germain A, Antonio A, Michiels C, Saraiva JP, Letexier M, Sahel JA, Bhattacharya SS, Zeitz C: Development and application of a next-generation-sequencing (NGS) approach to detect known and novel gene defects underlying retinal diseases.

    Orphanet J Rare Dis 2012, 7:8. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  47. Calvo SE, Compton AG, Hershman SG, Lim SC, Lieber DS, Tucker EJ, Laskowski A, Garone C, Liu S, Jaffe DB, Christodoulou J, Fletcher JM, Bruno DL, Goldblatt J, Dimauro S, Thorburn DR, Mootha VK: Molecular diagnosis of infantile mitochondrial disease with targeted next-generation sequencing.

    Sci Transl Med 2012, 4:118ra10. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  48. Shearer AE, DeLuca AP, Hildebrand MS, Taylor KR, Gurrola J, Scherer S, Scheetz T, Smith R: Comprehensive genetic testing for hereditary hearing loss using massively parallel sequencing.

    Proc Natl Acad Sci U S A 2010, 107:21104-21109. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  49. Otto EA, Hurd TW, Airik R, Chaki M, Zhou W, Stoetzel C, Patil SB, Levy S, Ghosh AK, Murga-Zamalloa CA, Van Reeuwijk J, Letteboer SJ, Sang L, Giles RH, Liu Q, Coene KL, Estrada-Cuzcano A, Collin RW, McLaughlin HM, Held S, Kasanuki JM, Ramaswami G, Conte J, Lopez I, Washburn J, Macdonald J, Hu J, Yamashita Y, Maher ER, Guay-Woodford LM, et al.: Candidate exome capture identifies mutation of SDCCAG8 as the cause of a retinal-renal ciliopathy.

    Nat Publ Group 2010, 42:840-850. OpenURL

  50. Vasli N, Bohm J, Gras S, Muller J, Pizot C, Jost B, Echaniz-Laguna A, Laugel V, Tranchant C, Bernard R, Plewniak F, Vicaire S, Levy N, Chelly J, Mandel JL, Biancalana V, Laporte J: Next generation sequencing for molecular diagnosis of neuromuscular diseases.

    Acta Neuropathol 2012, 124:273-283. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  51. Kalender Atak Z, De Keersmaecker K, Gianfelici V, Geerdens E, Vandepoel R, Pauwels D, Porcu M, Lahortiga I, Brys V, Dirks W, Quentmeier , Cloos J, Cuppens H, Uyttebroeck N, Vandenberghe , Cools J, Aerts S: High accuracy mutation detection in leukemia on a selected panel of cancer genes.

    PLoS ONE 2012, 7:e38463. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  52. Wei X, Ju X, Yi X, Zhu Q, Qu N, Liu T, Chen Y, Jiang H, Yang G, Zhen R, Lan Z, Qi M, Wang J, Yang Y, Chu Y, Li X, Guang Y, Huang J: Identification of sequence variants in genetic disease-causing genes using targeted next-generation sequencing.

    PLoS ONE 2011, 6:e29500. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  53. Shen P, Wang W, Krishnakumar S, Palm C, Chi A, Enns G, Davis R, Speed T, Mindrinos M, Scharfe C: High-quality DNA sequence capture of 524 disease candidate genes.

    Proc Natl Acad Sci U S A 2011, 108:6549-6554. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  54. Bell CJ, Dinwiddie DL, Miller NA, Hateley SL, Ganusova EE, Mudge J, Langley RJ, Zhang L, Lee CC, Schilkey FD, Sheth V, Woodward JE, Peckham HE, Schroth GP, Kim RW, Kingsmore SF: Carrier testing for severe childhood recessive diseases by next-generation sequencing.

    Sci Transl Med 2011, 3:65ra4-65ra4. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL