Candidate gene list. OMIM, online Mendelian inheritance in man; AR, autosomal recessive; AD, autosomal dominant; XL, X-linked; nd, not defined, *In progress (OMIM)
Inherited macrothrombocytopenias comprise a heterogeneous group of inherited platelet disorders that are characterized by large platelets, thrombocytopenia and bleeding tendencies in affected individuals. Diagnostic platforms have traditionally involved a battery of complex phenotypic tests that often fail to reach a diagnosis. Next-generation sequencing lacks the pre-analytical and analytical shortcoming of these tests and provides an attractive alternate diagnostic approach. Our group has developed a candidate gene array targeting genes known to affect platelet function and tested it in a large cohort of Australasian patients with presumed platelet function disorders, particularly macrothrombocytopenia. This array identified causative variants in a significant portion of patients with uncharacterized platelet disorders, including transcription factor mutations that cannot easily be diagnosed with standard platelet phenotyping procedures. We propose that targeted genotypic screening can identify the genetic basis of platelet function defects and has the potential to be developed into a powerful clinical platform to help clinicians diagnose these rare disorders.
- Inherited macrothrombocytopenia
- next-generation sequencing
- candidate gene array
Platelets are essential for clot formation after tissue trauma. Initiation of the platelet plug occurs by adhesion of platelets to the damaged vascular endothelium mediated by interactions of glycoprotein Ib/IX/V complexes with von Willebrand factor (vWF), and GPVI and integrin α2β1 with collagen . Extension of the platelet plug requires activation of αIIbβ3 through an “inside-out” signaling cascade which enables receptor cross-linking with fibrinogen and vWF and activation of “outside-in” signaling events [1, 2].
Primary hemostasis relies on both adequate function and number of platelets. Abnormalities in platelet function and/ or number may be acquired (liver disease, chronic kidney disease) or inherited (inherited platelet function disorders, IPFDs or inherited platelet number disorders, IPNDs). The group of inherited macrothrombocytopenias is included in the heterogeneous IPNDs and are characterized by large platelets, thrombocytopenia and bleeding tendencies in affected individuals (Figure 1A, Figure 1B, Figure 1C and Figure 1D) .
Unfortunately, inherited macrothrombocytopenia is under-recognized with the presence of large platelets on blood film examination often leading to a misdiagnosis of immune thrombocytopenic purpura (ITP), resulting in subsequent inappropriate treatment with steroids or in some cases removal of the spleen . Diagnostic algorithms have traditionally been based around biological laboratory tests examining functional properties and activation pathways in isolated platelets [3, 5–7]. This phenotypic approach is poorly standardized, technically difficult and not easily reproducible [6–11]. In addition, numerous pre-analytical variables may affect phenotypic test results. These variables include the effect of food (garlic), alcohol, drugs (herbal remedies, non-steroidal anti-inflammatory drugs, anti-platelet medications) and stimulants (smoking and caffeine) on platelet function, activation of platelet samples during venipuncture and transport necessitating careful sample handling as well as the relatively large volume of blood needed (which becomes a major problem when assessing pediatric samples) [12–14]. Despite these complex phenotypic tests, many cases remain without a definitive diagnosis.
Genetic technology may overcome many of the problems surrounding phenotypic testing for thrombocytopenia as DNA is stable, can easily be transported long distances and is not affected by diet or drugs. Moreover, genetic-based tests have provided opportunities to reduce redundancy and heterogeneity of diagnostic algorithms and have shifted our ability to describe inherited platelet disorders from a level of the defective platelet pathway involved, to a molecular level.
The Sanger sequencing method  has long been considered the “gold standard” technology to rapidly analyze small regions across a limited number of samples, but it is not suited to screening large numbers of genes in multiple patients . The emergence of next-generation sequencing (NGS) technologies as a diagnostic approach has been able to generate more test sequence increasing the number of gene targets and decreasing the costs [17, 18]. Human whole genome sequencing (WGS) or whole exome sequencing (WES) [19, 20] have proven to be clinically appropriate and practical modalities in describing new genetic mutations in families and identifying known pathogenic mutations in individuals formerly without a diagnosis .
Testing approaches may vary depending on whether a novel genetic mutation is likely. WGS and WES are powerful platforms in discovering novel causal variants in individuals with rare penetrant monogenic disorders , whilst a candidate gene approach allows assessment of known mutations in genes causing clinical phenotypes.
Whole genome approaches incorporating NGS have recently reported novel mutations in an essential platelet transcription factor GFI1B [22, 23], and a WES approach followed by targeted Sanger sequencing was used successfully to describe mutations in ACTN1 causing macrothrombocytopenia [24, 25]. Acknowledging these advancements, we employed a targeted candidate gene approach to explore cases of suspected inherited macrothrombocytopenia that remained uncharacterized despite phenotypic testing and hypothesized this to be an effective approach to diagnose inherited macrothrombocytopenia.
2. Materials and methods
Diagnostic assessment of patients with uncharacterized thrombocytopenia was performed as part of a human research ethics committee approved study conducted in accordance with the Declaration of Helsinki.
Following informed written consent, 20 ml of blood was taken from an antecubital vein and collected into EDTA tubes. This blood was easily transported, in some cases, over 1,000 km between diagnostic sites in Australia.
A total of 95 patient DNA samples were analyzed. This included two internal controls for which DNA-based diagnosis had previously been established by Sanger sequencing.
32 male patients (mean age 37.4 years, range 18–92 years) and 44 female patients (mean age 38.7 years, range 18–79 years) were included in the NGS assay. The mean age of the cohort was 38.1 years (range 18–92 years). Sixteen de-identified DNA samples were received from referring institutions for which no additional laboratory data were available.
Phenotypic testing data were available for 59 (62.1%) individuals. This included platelet functional analysis (PFA) (n = 25, 26.0% of the cohort), light transmission aggregometry / whole blood impedance aggregometry (LTA/WBIA) (n = 39, 41.3% of the cohort), flow cytometry (n = 45, 47.8% of the cohort) and electron microscopy (n = 12, 13% of the cohort). These phenotypic test results suggested a diagnosis to a “pathway level”, that is, a description to the level of the suspected defective biochemical pathway, in only 11 cases. Pathway orientated defects included, storage pool disorders (n = 3), platelet glycoprotein deficiency (n = 3), platelet signaling defects (n = 2), platelet secretion defects (n = 2) as well as α-granule disorder (n = 1).
2.2. DNA preparation
Genomic DNA (gDNA) was isolated from peripheral blood leukocytes using the Wizard® Genomic DNA purification kit (Promega, Alexandria, NSW, Australia). DNA quality and concentration were assessed using the Nanodrop™ 1000 spectrophotometer (Thermo Scientific, Scoresby, Vic, Australia) that measures the purity of DNA by the ratio of absorbance of molecules at 260 and 280 nm. Samples with ratios between 1.8 and 2.0 were accepted for analysis whilst ratios lower than this may represent the presence of contaminants and these samples were not processed further . At least, 250 ng of input gDNA was prepared per sample.
2.3. Candidate gene identification and gene panel design
An extensive literature search using public databases was performed to assemble an initial candidate gene list of all genes reasonably hypothesized to have an impact on platelet number and size (n = 173). A final list of candidate genes (n = 19) was derived by including those genes in which mutations were known to be definitively associated with IPNDs (predominantly, macrothrombocytopenia) and by excluding genes, which although known to result in thrombocytopenia, could easily be identified by conventional and clinical methods characterized by distinct clinical phenotypes.
A TruSeq custom amplicon (TruSeq® Custom Amplicon Kit, Illumina Inc., Scoresby, Vic, Australia) specific for the target regions of the selected 19 genes (Table 1, ACTN1, CD36, ETS1, F2R, FLI1, GATA1, GFI1B, GP1BA, GP1BB, GP6, GP9, ITGA2, ITGA2B, ITGB1, ITGB3, MYH9, NBEAL2, P2RY12, RUNX1, TUBB1) was designed as an entire custom pool using the web-based software tool, Illumina Design Studio (Illumina Inc.). This generated 201 gene targets that were either exons or gene regions that were split into 632 amplicons, each of approximately 250 base pairs (bps). There were no undesignable targets and a total coverage of 91% was predicted for the panel.
|Gene||Description (OMIM)||Inheritance||Disorder (abbreviation in this paper, OMIM entry)|
|ACTN1||Alpha-Actinin-1||AD||α actinin-related thrombocytopenia (α actinin-RT, 615193)|
|CD36 (GPIV)||Thrombospondin receptor (Glycoprotein IV)||AD||Familial thrombocytopenia with GPIV deficiency (nd, 608404)|
|ETS1||V-Ets avian erythroblastosis virus E26 oncogene homolog 1||nd||nd|
|F2R||Coagulation factor II (thrombin) receptor||nd||nd|
|FLI1||Friend leukaemia virus integration 1||AD||Paris-Trousseau syndrome / Jacobsen syndrome (TCPT/JBS, 188025, 600588)|
|GATA1||GATA-binding protein 1||XL||GATA1-related disorders (GATA1-RD, 300367, 314050)|
|GFI1B||Growth factor-independent 1B||AD||GFI1B-related thrombocytopenia (GFI1B-RT, 187900)|
|GP1BA||Glycoprotein 1b-alpha polypeptide||AR|
|Bernard Soulier syndrome (BSS, 231200)|
Platelet type-von Willebrand disease (PT-VWD, 177820)
Velocardiofacial syndrome (VCFS, 192430)
Mediterranean thrombocytopenia (nd, 153670)
|GP1BB||Glycoprotein 1b-beta polypeptide||AR||Bernard Soulier syndrome (BSS, 231200)|
|GP6||Glycoprotein VI||AR*||Bleeding disorder, platelet type 11|
|GP9||Glycoprotein IX||AR||Bernard Soulier syndrome (BSS, 231200)|
|ITGA2||Integrin, alpha-2||AR||GPIa/IIa deficiency (giant platelets and mitral valve insufficiency) (nd,nd)|
|ITGA2B||Integrin, alpha-2B||AD||Monoallelic ITGA2B/ITGB3-related thrombocytopenia (ITGA2B/ITGB3-RT, 187800)|
|ITGB1||Integrin, beta-1||AR||GPIa/IIa deficiency (giant platelets and mitral valve insufficiency) (nd,nd)|
|ITGB3||Integrin, beta-3||AD||Monoallelic ITGA2B/ITGB3-related thrombocytopenia (ITGA2B/ITGB3-RT, 187800)|
|MYH9||Myosin heavy-chain 9||AD||MYH9-related disease (MYH9-RD,155100)|
|NBEAL2||Neurobeachin-like 2||AR||Gray platelet syndrome (GPS, 139090)|
|P2RY12||Purinergic receptor P2Y, G protein-coupled 12||AR*||Bleeding disorder, platelet type 8|
|RUNX1||Runt-related transcription factor 1||AD||Platelet disorder, familial, with associated myeloid malignancy (FDP/AML, 601399)|
|TUBB1||Tubulin, beta-1||AD||β1 Tubulin-related thrombocytopenia ( β1 tubulin-RT, 613112)|
2.4. Next-generation sequencing
The Truseq custom amplicon library preparation kit and the MiSeq Illumina sequencer platform (Illumina Inc.) were used to create the sequencing library and perform resequencing respectively. All steps were performed in-house according to the manufacturer’s instructions [27, 28].
Library preparation was performed by enrichment of the target regions using an amplicon-based multiplex polymerase chain reaction (PCR) method. Here, a custom amplicon tube (CAT) containing upstream and downstream oligonucleotides specific for the target regions was hybridized to the unfragmented gDNA samples in a 96-well plate. Unbound oligonucleotides were then removed by a series of wash steps using manufacturer supplied reagents. A proprietary extension–ligation mix containing DNA polymerase and ligase (Illumina Inc.) extended and ligated the upstream bound oligonucleotide through the targeted region to the 5′ end of the downstream oligonucleotide. The resulting extension–ligation products containing the targeted genomic region flanked by common sequences required for amplification were then amplified by standard PCR on a thermal cycler. The amplicon size (250 bps), the number of amplicons in the CAT (632 amplicons) and the type of input DNA (high quality) determined the number of PCR cycles (n = 24). The PCR reaction incorporated two unique, sample-specific, multiplexing index sequences (barcoding) that would later be used by the alignment software (MiSeq reporter) to identify individual samples following library pooling, and common adapters required for cluster generation. PCR products were purified by AMPure XP beads (Beckman Coulter, Lane Cove, NSW, Australia) and the quantity of each library was normalized by an integrated bead-based method. Equal volumes of the normalized libraries were then combined, diluted in hybridization buffer (Illumina Inc.) and heat denatured.
The MiSeq Illumina instrument was used to resequence the pooled library by paired-end sequencing. The DNA library was immobilized to the single-use glass-based MiSeq flow cell through the adapter sequences. Bridge PCR amplification then generated clusters of clonal copies of each DNA molecule. These templates were then sequenced using platform-specific reversible dye terminator sequencing-by-synthesis chemistry. Sequence alignment to the reference genome (GRCh37/hg19) was performed using on-instrument software (MiSeq reporter software, Illumina Inc.) that aligned the reads in BAM format and outputted variant calls in.vcf files. Variant calls were generated using ANNOVAR software (http://www.openbioinformatics.org/annovar)  with an acceptance threshold Q-score of 30, corresponding to a 1:1000 error rate and genomic datasets were viewed using the Integrative Genomics viewer (IGV) (www.broadinstitute.org/igv/) . Sanger sequencing was performed to provide data for bases with insufficient coverage and validate variants of clinical significance.
2.5. Data analysis
The University of California, Santa Cruz (UCSC), genome browser (http://genome.ucsc.edu) was used for variant analysis and variants were cross-checked against databases including the NHLBI-Extended Sequencing Project (ESP), 1000 Genomes Project Database  and the Database of Single-Nucleotide Polymorphisms (dbSNP, http://www.ncbi.nlm.nih.gov/SNP/). Bioinformatic tools, Sorting Intolerant From Tolerant (SIFT, http://sift.jcvi.org/) , Polymorphism Phenotyping-2 (PolyPhen-2, http://genetics.bwh.harvard.edu/pph2/)  and Mutation taster (http://www.mutationtaster.org/)  were used to predict variant effects on protein structure and function in the cases of variants lacking published literature.
2.6. Nomenclature and descriptions for variant reporting
All variants identified were annotated according to Human Genome Variation Society (HGVS) nomenclature for clinical reporting (http://www.hgvs.org). The variant elements included gene name, zygosity, cDNA nomenclature, protein nomenclature, exon number and clinical assertion.
Descriptions of sequence variations were adapted from the American College of Medical Genetics and Genomics (ACMG) recommendations for standards for interpretation and reporting of sequence variations and are listed below :
Pathogenic: The sequence variation has been reported in the literature and is a recognized cause of the disorder.
Likely pathogenic: The sequence variation is previously unreported and is of the type that is expected to cause the disorder.
Variant of uncertain significance (VUS): The sequence variation is previously unreported and is of the type which may or may not be causative of the disorder.
Likely non-pathogenic: The sequence variation is previously unreported and is probably not causative of disease.
Non-pathogenic: The sequence variation is previously reported and is a recognized neutral variant.
3.1. Next-generation sequencing platform performance
Next-generation sequencing on the Illumina platform produced 13 690 589 (96.74%) reads that passed initial filtering. This process removes any clusters demonstrating excessive intensity corresponding to bases other than the called base. Only reads that passed the quality filter were assigned a quality score. A quality score of Q30 was accepted in the run predictive of an error probability of ≤0.1%. One sample was excluded from analysis due to poor DNA quality that generated poor-quality scores across all genomic regions.
Overall coverage across all genomic targets was 92.3%. This was consistent with the initial software prediction.
3.2. Candidate gene panel results
A total of 703 non-synonymous variants were detected; 75 of these variants were novel and had not been reported in the dbSNP database. An average of eight non-synonymous variants was detected per patient.
Two individuals with known mutations in GFI1B, GP1BA and GP9 by Sanger sequencing were included as controls. NGS successfully called the first, GFI1B c.880-881insC, but failed to detect the second, a patient with a phenotype consistent with the inherited macrothrombocytopenia Bernard-Soulier syndrome (BSS). This patient’s genotype had previously been confirmed by Sanger sequencing and included mutations in both the GPIBA (GPIBA c.2217C>T) and the GP9 genes (c.1829A>G and c.1859T>G). Failure to detect these mutations may have been caused by sequencing errors introduced by GC-rich motifs in these regions [36, 37].
Pathogenic mutations were detected in 16 individuals (17.4% of the cohort) whilst 36 individuals (39.1%) had VUS and 40 individuals (43.0%) were without identifiable pathogenic mutations (Table 2, Table 3).
|Genes||Number of individuals with pathogenic mutations||Number of mutations detected of uncertain significance|
|Total Number||16||60 mutations in 36 individuals|
|Number of individuals without pathogenic mutations identified: 40|
|Gene||Chromosome||Zygosity||Nucleotide change||Protein alteration||Exon|
The candidate array was successful in detecting mutations in genes commonly associated with macrothrombocytopenia and included a total of nine MYH9 mutations (six of which had previously been reported in the literature as pathogenic and three of which are of uncertain significance) (Figure 2) and a compound heterozygous mutation of NBEAL2 in keeping with Gray platelet syndrome.
A homozygous mutation of ITGA2B was also detected and confirmed a suspected Glanzmann thrombasthenia phenotype. Several transcription factor variants were found, including a FLI1 mutation of uncertain significance in one patient, three GATA1 mutations of uncertain significance in three individuals from two families, three pathogenic GFI1B mutations in three individuals from two families and two of uncertain significance in two individuals in another two families. RUNX1 mutations were identified in three individuals from three families; two of these were considered likely pathogenic, whilst the third was shown to represent a false positive result (RUNX1, heterozygous, stop/gain, c. 966T>G (p.Tyr322X), exon 6). False positivity was confirmed by Sanger sequencing that showed a wild-type sequence across that region.
Sanger sequencing was also performed in selected samples across regions of low coverage (Q < 30) from those genes in which the clinical significance is widely accepted and included, GP9, GP1BA, GPIBB, FLI1 exon 3, FLI1 exon 9, MYH9 exon 20, MYH9 exon 37 and GFI1B exon 5. This confirmatory step detected a novel mutation in FLI1 , not identified by NGS.
The diagnosis of IPFD and IPNDs using classic phenotypic methods poses a challenge to clinicians and laboratory scientists due to lack of consensus over classification and diagnostic criteria, poor standardization of tests and heterogeneity of traditional diagnostic approaches . This diagnostic conundrum is evident in our cohort where only 11 patients received a suspected diagnosis to a pathway level following multiple previous phenotypic tests. In addition, only 62% of patients received any form of phenotypic test, reflecting the difficulty of accessing these specialized techniques in many centers.
Sanger sequencing is widely regarded as a reliable platform for routine diagnostic genetic testing and small-scale projects. However, effective analysis of numerous disease-associated genes by Sanger sequencing in a diagnostic setting is time-consuming, expensive and not always feasible . A candidate gene array was selected as it has the potential to simultaneously analyze all of the selected coding regions of disease-targeted genes. Moreover, relative to WES and WGS, it provides good gene coverage and representation of exons, is relatively fast and cheap and minimizes the problems with unexpected findings and development of complex downstream bioinformatic pipelines for analysis .
We have demonstrated that high-quality sequence data can be generated from a candidate group of platelet genes using the Illumina MiSeq platform. Our candidate gene panel comprised 19 genes associated with IPNDs, predominantly inherited macrothrombocytopenia. Pathogenic mutations were detected in 17.4% of the cohort. The most number of mutations was detected in the MYH9 gene. MYH9-related disorders are the most common forms of inherited thrombocytopenia and are frequently under-recognized or misdiagnosed as immune ITP [40–42]. Immunofluorescence staining of the peripheral blood film demonstrating abnormal clustering of non-muscle myosin heavy chain IIA (NMMHC IIA), seen as Döhle bodies on the blood film is regarded as a suitable diagnostic test , but is not available at all centers. A strong genotype–phenotype relationship is recognized in these disorders, with mutations affecting the motor (head and neck) region of NMMHC-IIA causing more severe thrombocytopenia and a higher risk for nephritis, cataracts and deafness, whilst those mutations affecting the tail region cause less severe thrombocytopenia and extra-hematological manifestations [43, 44]. Genetic confirmation of MYH9-related disorders, therefore, has prognostic significance. In our group of patients, three pathogenic mutations in five individuals were detected and were predicted to affect the motor region of NMMHC IIA. Knowledge of these mutations has provided an opportunity to offer advice regarding additional non-hematological surveillance tests such as audiograms, renal function assessments and ophthalmological screening for cataracts [40, 41, 45].
Transcription factors are the key regulators for the development of the hemostatic platelet from blood stem cells. Stem cells differentiate into a bipotent megakaryocyte-erythroid progenitor, then a committed megakaryocyte that undergoes endoreplication prior to extending proplatelet extensions from the cytoplasm into the bone marrow sinusoid forming platelets . This complex differentiation pathway is orchestrated by the activation and repression of groups of genes important for blood cell development via transcription factors [46, 47]. The candidate gene panel contained four genes that encode hemopoietic transcription factors, FLI1, GATA1, GFI1B and RUNX1. Definitive diagnosis of platelet disorders caused by mutations in these genes solely by phenotypic testing is not possible. We detected a pathogenic mutation in one of these genes, GFI1B, and likely pathogenic mutations, in RUNX1. The RUNX1 gene is responsible for the familial platelet disorder with a predisposition to acute myeloid leukemia (FPD/AML) . The propensity to develop acute leukemia is determined by the action of the variant, with dominant negative and haploinsufficient mutations having different leukemogenic risk. The former has a higher risk (up to 40% in some reports) of progression to AML or myelodysplastic syndrome [49–51]. Other factors include the residual level of activity of wild-type RUNX1 , deregulation induced by dominant negative mutations on hamopoietic stem cell genes such as NR4A3  as well as effects on p53 genes-dependent genes that induce genomic instability of the granulomonocytic precursors . The median age of onset of progression to myelodysplastic syndrome / acute leukemia is 33 years of age, and therefore, the detection of two, likely pathogenic, RUNX1 mutations by our candidate gene panel is of obvious importance . Despite their adverse risk, clinical guidelines regarding the best way to counsel, test and manage these patients and their family members are lacking and recommendations are largely based on expert opinion . Initial referral to a specialist team comprising a physician as well as genetic counselor is recommended, as well as, full blood count analysis, bone marrow biopsy (to detect occult malignancy) and full human-leukocyte antigen (HLA) typing of patients and their first-degree relatives (in the event a bone marrow transplant is required in the future). A biannual follow-up schedule thereafter should be established to ensure close hematological surveillance . GFI1B is another transcription factor that plays an essential role in hematopoiesis [46, 55]. Two recent publications [22, 23] described mutations in the DNA-binding zinc finger domain of GFI1B causing an autosomal dominant bleeding disorder in affected families. Our candidate gene array detected another mutation in a non-DNA-binding zinc finger domain of GFI1B (GFI1B c.503G>T). Further characterization of this c.503G>T mutation indicates a milder platelet phenotype with less clinical bleeding symptomatology than the DNA-binding mutants  (Figure 3). The detection of this non-DNA-binding mutation has afforded us an opportunity to propose a genotype–phenotype relationship associated with mutations in two different regions of GFI1B. This is important to enable classification, aid diagnosis and inform treatment strategies.
The yield of pathogenic variants reported above may have been improved by more stringent patient selection criteria. In this study, all patients suspected of an inherited thrombocytopenia by treating hematologists were included regardless of the platelet phenotype. That is, not all patients demonstrated macrothrombocytopenia. In addition, in 16 cases only DNA was received and the platelet phenotype was not known. Noting that 15 of the 19 genes on the candidate panel are known to cause macrothrombocytopenia and that only 5 genes on the panel (ETS1, P2RY12, F2R, GP6, RUNX1) have an uncertain platelet phenotype or otherwise known to cause functional disorders with normal-sized platelets, the pre-test probability of detecting a pathogenic variant in samples where macrothrombocytopenia was not present was low. Furthermore, this candidate array was performed in a research laboratory and therefore included genes (ETS1 and F2R) where the association with inherited thrombocytopenia is not well delineated. Exclusive inclusion of genes with clear evidence of disease association may further improve the diagnostic yield.
Variants of uncertain significance (VUS) were detected in over a third of the cohort (39.1%). Thirteen samples contained more than one VUS. One sample contained five VUS in five different genes (GFI1B, ITGA2, MYH9, NBEAL2 and TUBB1). In many instances, these variants were novel. It is likely, as knowledge of the genes causing inherited platelet bleeding disorders increases, this percentage will decrease, the VUS either becoming recognized as pathogenic or definitely non-pathogenic. Our analytical pathway used three bioinformatics tools (SIFT, PolyPhen2, Mutation taster) in variants lacking published literature to assist variant annotation. Bioinformatic tools using sequence and/or structure to predict the effects of amino acid substitutions on protein function have been developed following observations that disease-causing mutations are more likely to occur at positions that show evolutionary conservation and/or common structural features which enable them to be distinguished from neutral substitutions [57–60]. These tools serve to guide future experiments and should not be used solely as a clinical predictor of pathogenicity. Consider the ACTN1 missense mutation (ACTN1, heterozygous, c.580G>A [p.Gly194Arg], exon 6, rs145918825) detected in our candidate gene array. It is predicted to disturb the calponin homology domain (CHD) within the actin-binding domain (ABD) of α-actinin (an important platelet structural protein). All of the mutations described in the literature to date have identified ACTN1 mutations within the functional domains (ABD and the C-terminal calmodulin-like domain [CaM]) but not within the spacer spectrin repeats [25, 61, 62]. Bioinformatic tools were applied to this variant. It is predicted to be deleterious by SIFT (sequence homology-based tool), whereas PolyPhen-2 (structure/sequence based tool) predicts the amino acid alteration to be benign. This highlights two points. Firstly, it is advisable that predictions are made by integrating the results from several tools as reliance on one tool may lead to incorrect annotation , and secondly, that bioinformatic tools provide predictions only. In this case, the functional consequences of the ACTN1 DNA variant are yet to be described and thus the variant may or may not be significant. Further family studies and additional structural analyses of the protein may clarify the pathogenicity of the variant .
Coverage is a crucial metric for establishing accuracy as well as analytical sensitivity and specificity of a NGS testing platform . Coverage requirements depend on the application of the NGS test. In general, sequencing more reads will increase the power of the assay. We determined the necessary coverage level based on recommendations forwarded by the Royal College of Pathologists of Australasia  whose guidance is in compliance with National Pathology Accreditation Advisory Council (NPAAC) standards for testing of human nucleic acids  and combined this advice with recommendations from published literature and other international bodies such as the ACMG . Our accepted Q score (Q30) was met in 92.3% of all genomic targets and in 97% of exonic targets. The read coverage distribution curve displayed a classic Poisson-like distribution indicating uniformity of coverage, this data accompanied by the high quality of base calls suggested that the NGS platform is able to deliver reliable sequence data. However, there were also areas of lower coverage where the platform did not perform as well, and lacked sensitivity. These regions were identified at genomic targets in FLI1, GP1BA, GP1BB, GP9, ITGB1 and NBEAL2 and were predicted in the design studio report. Two false negative results were confirmed in regions where coverage was low. The first being the failed detection of GPIBA and GP9 mutations in the second internal control sample and the second was a novel pathogenic mutation in FLI1 that was confirmed by Sanger sequencing and additional laboratory investigations. To ensure coverage of the respective amplicons over the GP9 region, parallel Sanger sequencing was performed. Targeted Sanger sequencing was also performed for GP1BA and GP1BB in cases in which phenotypic details had been provided by the referring clinician and where confident exclusion of a variant in those genes was necessary. Sanger sequencing performed over these regions did not detect additional mutations. Only a single false positive result was confirmed by Sanger sequencing (RUNX1, stop/gain, c.966T>G). This suggested good platform specificity. The question as to whether confirmatory Sanger sequencing need be performed is debated in the literature [39, 67]. Proponents argue that it is required to confirm a diagnosis as well as remove incorrect calls introduced by experimental errors. Whereas, opponents argue, in the setting where the NGS platform performance metrics have been established to be comparable to Sanger sequencing performance measures, a strategy dictated by the degree of coverage per nucleotide be adopted. Suggesting that parallel Sanger sequencing need not be performed as long as the coverage is >30 times per nucleotide at that genomic target, adding that confirmatory testing be performed where coverage is less than 20 times, and be determined by visual inspection with coverage between 20 and 30 times. Authors commented that the laboratory may also simply elect to exclude the target from the report if Sanger sequencing is not performed despite low coverage .
An important aspect of the post-analytical process is the timely provision of a genomic test report. In the setting of inherited platelet disorders, a false negative interpretation may lead to a falsely conservative bleeding prophylactic strategy at the time of surgery, in turn, placing the individual at a potentially increased risk of bleeding. A false positive result, on the other hand, may cause undue stress to the individual and their family. A genomic test report was therefore carefully and consistently structured taking into consideration recommendations from professional bodies such as the RCPA  and ACMG . The report (Appendix 1) contained a summary of the genes analyzed and reflected the scope and limitation of the assay and indicated the context in which the test was performed. A clear, succinct, interpretative comment was made regarding the detected variant. This indicated whether or not the detected variant was associated with the clinical phenotype and highlighted variants of uncertain significance. The body of the report detailed, in a structured format (see materials and methods), any detected pathogenic or clinically relevant variants and whether these had been previously described. An interpretation on the significance of the detected variant was supported by relevant references where possible, and recommendations regarding additional validation tests and /or genetic counseling and clinical screening were provided. Following the main body of the report, DNA variants that were considered to be non-pathogenic were listed. The report was concluded by a description of the test method and limitations thereof.
In conclusion, our study has demonstrated the potential to successfully diagnose inherited macrothrombocytopenia in cases that remained uncharacterized by traditional phenotypic approaches. Optimization of this format will provide patients an opportunity for a “one stop, one step” testing platform that is cost-effective and not affected by the pre-analytical variables that hinder current testing methods based on functional analysis of platelets. However, the translation of NGS from a powerful research tool into the clinical laboratory will require co-operation from international groups to establish best practice, quality and reporting standards for these conditions, as well as to generate reliable databases that link platelet phenotypes to genotypes to provide best hemostasis clinician advice.
Test performed: Candidate gene array of 19 genes (ACTN1, CD36, F2R, FLI1, ETS1, GATA1, GFI1b, GP1BA, GP1BB, GP6, GP9, ITGA2, ITGA2B, ITGB1, ITGB3, MYH9, NBEAL2, P2RY12, RUNX1, TUBB1) using the Illumina MiSeq next-generation sequencing platform.
This test has been performed for research purposes only and has not been NATA accredited in our laboratory.
Validation by Sanger sequencing has not been performed on clinically significant or novel detected variants and should be considered by the referring clinician.
Result: A mutation in a gene known or predicted to be associated with decreased platelet counts and/ or function has been identified. A second variant of uncertain significance has also been identified.
DNA variants: Variant 1: MYH9, Heterozygous, c.287C>T (p.Ser96Leu), Exon 2, rs121913657, pathogenic.
Variant 2: NBEAL2, Heterozygous, c.6178C>T (p.Arg2060Cys), exon37, uncertain significance.
Previously described: Variant 1: Yes (rs121913657)
Variant 2: No.
Interpretation: A heterozygous 287C-T transition in the MYH9 gene, resulting in a ser96-to-leu (S96L) substitution, has been predicted to disturb the helical region of the protein resulting in MYH9- related disorder (Epstein syndrome).
The pathogenicity of variant 2 is uncertain as information regarding this mutation is not available in the reported literature. Note that the classification of variants of uncertain/ unknown significance may change over time if additional information on these conditions becomes available in the reported literature.
References: Arrondel C, et al. Expression of the non-muscle myosin heavy chain IIA in the human kidney and screening for MYH9 mutations in Epstein and Fechtner syndromes. J Am Soc Nephrol 2002;13: 65–74.
Utsch B, et al. Bladder exstrophy and Epstein type congenital macrothrombocytopenia: evidence for a common cause? (Letter) Am J Med Genet 2006;140A:2251–3.
Kunishima S, et al. Immunofluorescence analysis of neutrophil non-muscle myosin heavy chain-A in MYH9 disorders: association of subcellular localization with MYH9 mutations. Lab Invest 2003;83:115–22.
Recommendations: The pathogenicity of detected candidate variants should be validated independently by Sanger sequencing. Where necessary, the functional significance of these variants should be confirmed independently by appropriate biological assays to replicate the phenotype of this patient.
MYH9-related disorders have an autosomal dominant inheritance. Genetic counselling is recommended for this individual and their family. Family screening may be appropriate after appropriate genetic counselling.
DNA variants detected of unlikely clinical significance:
NBEAL2, Heterozygous, c.1531C>G (p.Arg511Gly), Exon 13, rs11720139, likely non-pathogenic. GP6, Homozygous, c.691G>A (p.Ala231Thr), Exon 6, rs2304167, likely non-pathogenic. MYH9, Heterozygous, c.4876A>G (p.IIe1626Val), Exon 34, rs2269529, likely non-pathogenic.
A TruSeq custom amplicon specific for the target regions of 19 genes, ACTN1, CD36, F2R, FLI1, ETS1, GATA1, GFI1b, GP1BA, GP1BB, GP6, GP9, ITGA2, ITGA2B, ITGB1, ITGB3, MYH9, NBEAL2, P2RY12, RUNX1, TUBB1 was designed using Illumina design studio (Illumina, Inc, San Diego, CA, USA). Next-generation sequencing was performed using the MiSeq Illumina sequencer platform (Illumina, Inc.). Obtained sequences were aligned to the reference genome (GRCh37/hg19) using MiSeq reporter software (Illumina, Inc.) and the genomic datasets viewed using the Integrative Genomics viewer (IGV) (www.broadinstitute.org/igv/). Variant calls were generated using ANNOVAR software (http://www.openbioinformatics.org/annovar) with an acceptance threshold Q-score of 30, corresponding to a 1:1000 error rate. Sanger sequencing was performed to provide data for bases with insufficient coverage. The University of California, Santa Cruz (UCSC), genome browser (http://genome.ucsc.edu) was used for variant analysis and variants were cross-checked against databases including the NHLBI-extended sequencing project (ESP), 1000 genomes project database and the Database of Single-Nucleotide Polymorphisms (dbSNP). Bioinformatic tools (SIFT, PolyPhen-2 and Mutation taster) were used to predict variant effects on protein structure and function in the cases of variants lacking published literature.
Limitations: Overall gene coverage was 97% using this format. Therefore, it is possible that the genomic region where a disease causing mutation exists in the proband was not captured and therefore was not detected.
It is also possible that a particular genetic mutation was not recognised as the underlying cause of the genetic disorder due to incomplete scientific knowledge of the impact of all variants at this point in the literature.
An example of a NGS report.