Biomarkers in Rare Genetic Diseases

Biomarkers offer a way to speed up medical research by shedding light on the physiopa‐ thological mechanisms of disease. Furthermore, biomarkers are considered invaluable tools for monitoring disease progression, prognosis, and response to drugs, especially in clinical trials, where they can be used to assess the efficacy, efficiency, and side effects of novel drugs. Biomarkers also pave the way to personalised medicine, a rapidly developing field that is of particular interest in rare diseases (RDs), i.e. those with a prevalence of less than 5/10,000, which are often genetic in origin. Although rare genetic diseases may be less appealing targets for pharmaceutical companies, they are nevertheless in urgent need of research into their diagnosis, prevention, treatment, and standards of care. Here we summarise the state of the art in RDs, genetic diagnosis, and novel strategies aimed at accurately identifying and defining gene mutations, and review the evidence emerging from the latest research and clinical trials. We focus in particular on novel biomarkers, describing the different types discovered so far, highlighting their importance and indicating how they may be translated into research, diagnostics, treatment, and preventative applications in personalised strategies for RDs.


Introduction
As each rare disease (RD) only affects a relatively small number of individuals across the globe, there are often great obstacles to their research, diagnosis, treatment, and prevention. In Europe, a disease is considered to be rare, or orphan, when it affects fewer than 5 people in 10,000, in line with the definitions adopted by the European Committee (EC) in their Orphan In addition to there being a wide spectrum of RDs, they are also characterised by great variability in the age of onset, signs and symptoms, and patterns of tissue/organ involvement. To further complicate the issue, molecular testing and phenotype analysis reveal that mutations occurring in the same gene can be associated with different clinical diagnoses, and marked intra-and interfamilial phenotype variability has been documented. RDs are therefore often extremely difficult to diagnose, and only about 4000 genes have been identified for the 7000 RDs described in the OMIM database [3]. Understandably, therefore, the IRDiRC [4] has set its members the challenge of diagnosing most, if not all, RDs by 2020, and discovering at least 200 new therapeutic options for their patients.
Nevertheless, without early diagnosis and effective treatment strategies, it is impossible to guarantee any improvement in the quality of life and/or life expectancy of such patients. Furthermore, our lack of knowledge regarding the causes, physiopathological mechanisms, and clinical progression of RDs makes it difficult to apply available treatments and to develop novel therapeutic strategies. In addition, the small number of patients complicates the recruitment of an adequate sample for clinical trials, especially in children, which make up an even smaller percentage of the overall RD population. This is an obvious deterrent to the pharmaceutical industry, which has only limited interest in developing and marketing products for this small consumer base. In order to counter some of these problems, both national and the EU governments have made orphan drug laws and funding a priority, but, despite this recent interest, treatment options are currently only available for 5% of RDs [2].
It is not only RDs that could benefit from more activity in this area, as RD research is also considered pivotal for many common diseases, and has in some cases revealed mechanisms and pathways that have been subsequently associated with other rare or common diseases [5]. Indeed, several RDs have been linked to a high degree of genetic and phenotypic heterogeneity; for example, mutations occurring in the LMNA gene can cause different disease types by affecting different tissues, such as (i) striated muscle (muscular dystrophy such as Emery-Dreifuss muscular dystrophy and limb-girdle muscular dystrophy or dilated cardiomyopathy), (ii) adipose tissue (lipodystrophy syndromes), (iii) peripheral nerve (peripheral neuropathy such as Charcot-Marie-Tooth disorder) or (iv) accelerated ageing (progeria diseases). There are also clinical signs that can be associated with both genetic and acquired disease. For instance, renal cell carcinoma is characterised by the dysregulation of metabolic pathways (oxygen, iron, and nutrient sensing) which are also manifestations of rare hereditary syndromes such as Von Hippel-Lindau (VHL, OMIM 193300) and Birt-Hogg-Dubé (BHD, OMIM 135150) syndromes, as well as hereditary leiomyomatosis and renal cell carcinoma (HLRCC, OMIM 150800) [5]. It is therefore essential for the research being carried out worldwide to focus on identifying characteristic determinants able to discriminate between specific disease states, stages, and probabilities of responding to particular treatments-put simply, biomarkers.

Biomarker: definition and utility
Biomarkers were first described and defined in 2001 by two different review papers [6,7], both of which suggested that they would be the key to understanding the physiopathology of disease and discovering novel treatment strategies. The classic definition of a biomarker is 'a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacological responses to a therapeutic intervention'. In other words, biomarkers are 'measurables' that rely on tools and technologies for assessing body fluids or tissue (blood, urine, cell, skin, etc.), such as DNA analysis [point variants, copy number variation (CNV), translocations, methylation analysis], RNA analysis [expression profile and microRNA (miRNA) characterisation], protein analysis (quantification of circulating proteins), and imaging technologies, or other means of physiological measurement [8].  From their definition, the number of published papers related to biomarker discovery has increased more than 20-fold (Figure 1), and the discovery and development of novel biomarkers have kept pace with technical advances, in particular the advent of high-throughput analysis technologies. Moreover, the large number of grant projects set up over the last 5 years to fund biomarker research, including BIO-NMD [9] and NeurOmics [10], has begun to yield considerable fruits in this field.
The BIO-NMD project is a Europe-wide research network whose aim is to identify and validate biomarkers for rare neuromuscular diseases, such as dystrophinopathies (Becker muscular dystrophy, OMIM 300376; Duchenne muscular dystrophy OMIM 310200; dilated cardiomyopathy, OMIM 302045) and COL6-related myopathies (Bethlem myopathy, OMIM 158810; Ullrich congenital muscular dystrophy, OMIM 254090). Funded by the EU (2009-2012), BIO-NMD set out to investigate different human tissues/cells/fluids using multiple -omic strategies (genomics, transcriptomics, and proteomics), an approach that led to the identification of several biomarkers. Thanks to this project, both plasma and tissue biomarkers that will be useful for monitoring disease progression, prognosis, and treatment response have been described and will ultimately help to pinpoint appropriate options for personalised treatment [11,12].
In a similar vein, the EU's NeurOmics project is still ongoing and aims to revolutionise diagnostics and develop new treatments for 10 major neuromuscular and neurodegenerative diseases by using sophisticated -omics technologies. To do this, it has brought together leading European research groups, five highly innovative SMEs, and experts from outside the EU, who are all working to identify genes and develop biomarkers for clinical application, as well as to identify drug targets and improve understanding of the physiopathology of the diseases in question.
This research activity has been largely prompted by the versatility of biomarkers. Indeed, the classical view of biomarkers as a clinical end-point, an objective snapshot that reflects how a patient feels, functions or survives, is extremely reductive. In addition to numerous applications in clinical settings, biomarkers may also serve as a surrogate end-point, a predictor of clinical benefit (or lack thereof) based on epidemiological, therapeutic, physiopathological, or other scientific evidence [13]. In other words, a biomarker may act as a clinically meaningful end-point in clinical trials. Such surrogate end-point biomarkers are foreseeably of particular benefit in RDs, in which a high percentage of diseases without a genetic cause, slow disease progression, chronic nature of the diseases, high heterogeneity of signs and symptoms within the same phenotype, and the difficulty in objectively measuring any change in symptoms dramatically increase the expense of clinical trials. Not only the cost but also the difficulty in undertaking trials based on conventional end-points severely curtails their number, and the lack of sensitive, specific, and timely outcome measures hinders the discovery and development of novel treatments.
However, a biomarker can lessen the burden of the clinical trial process by providing information about the safety and efficacy of treatments before the collection of definitive clinical data, which provides the opportunity for mid-course re-appraisal, and even interruption if the intervention being investigated is revealed as potentially harmful to participants [14]. Indeed, biomarkers are far superior to subjective measurements, which may not be directly associated to a disease characteristic, or able to detect small changes, especially in the short term. Biomarkers, on the other hand, can provide an objective measurement of aspects precisely correlated to a specific disease condition, potentially enabling small changes in status to be identified, the disease progression to be assessed, and the likely effects of the therapeutic intervention to be predicted [1] while the trial is ongoing, as well as in real-world settings.
Considering the versatility of biomarkers, the European Medicines Agency (EMA) has attempted to standardise them by drawing up a list of the features of an 'ideal' biomarker, namely [15]: • Analytical validity. Like a fingerprint, a biomarker should enable measurement, within a specific range, of a parameter able to accurately and clearly distinguish between altered/ normal status or treatment response/non-response. The test(s) used to detect a biomarker should be accurate, reliable, and reproducible, and their technological limits clearly defined.
As the analytical accuracy depends on laboratory procedures, such as sample preparation and technology application, these should be reported in order to ensure reproducibility of biomarker discovery and validation.
• Clinical validity. Like a mirror, a biomarker should accurately reflect the features of a disease (or treatment), detecting even small changes, and not be influenced by circumstantial factors such as diet, exercise, stress, age, sex, or the environment, i.e. an alteration in the disease features should always be reflected by the biomarker, and a difference in the biomarker should always reflect a change in the disease. In other words, an ideal biomarker will identify specific disease parameters and be sensitive to any change in them. Likewise, to be clinically valid, a biomarker must display a high degree of accuracy (indicating correctly whether a patient has or does not have the disease or treatment effects in the vast majority of cases).
• Clinical utility. Like a prophet, a biomarker should herald the outcome of a given situation/ intervention. In other words, biomarkers should predict the members of a population who will develop a disease, manifest a disease progression, or respond to a specific treatment.
The clinical utility of a biomarker in an appropriate population can be measured by two predictive values, the PPV and NPV, which are respectively used to quantify the probability that a person with a positive test for that biomarker will manifest the outcome predicted by the test, and the probability that a person with negative test will not respond to the intervention/treatment.
• Non-invasiveness. Like an open door, a biomarker should grant accessibility, i.e. enable an early, sensitive measurement, of disease severity etc., via the simple collection of body fluids (urine/blood) or scanned images (e.g. MRI or PET, etc.). This will allow a disorder to be monitored at different time points, without recourse to invasive procedures such as biopsies or tissue analysis.
• Feasibility. Like the passage of time, a biomarker should be practical to identify and measure, as well as invariable, irrespective of the type of sample collection, processing procedures, or methods used in its detection.
• Time and cost-effectiveness. Like a moneybox, a biomarker should be quick and easy to use and not be so expensive and time-consuming to measure that it cannot be used as a surrogate endpoint in clinical trials or to aid diagnostics and disease monitoring.
Biomarkers that possess all these features will inevitably lead to improvements in clinical trials, especially in the field of personalised medicine. Personalised medicine shifts the current 'onesize-fits-all' approach to a more individual line of attack or defence, centred on giving 'the right drug to the right patients at the right time' [16]. This is particularly crucial in RDs, in which successful treatment development is generally hindered by the small number of patients and short runs that characterise trials for novel interventions.

Strategies for biomarker discovery
In recent years, novel techniques and strategies have emerged for biomarker discovery, and there are currently two major approaches being applied: • Candidate approach. This is a hypothesis-driven method based on knowledge of the relevant physiopathological processes, disease pathway(s), or key molecule(s). It analyses the known gene/protein and their linking products in order to discover a qualitative or quantitative variation in diseased samples (fluids, cells, tissues) with respect to normal ones.
• High-throughput approach. This is a hypothesis-free strategy that takes advantage of the development of novel techniques for generating very large amounts of data to compare pathological and normal status. This 'big-data' approach is extremely powerful, although it is cost-intensive and requires significant time for validation and clinical definition of the biomarkers identified.

Discovery of genetic variations
Next-generation sequencing (NGS) techniques are based on high-throughput genomic and transcriptomic sequencing. In brief, target regions can be isolated from the entire genome by hybridisation to complementary sequences. This 'capturing' is performed on demand, to isolate sequences that may consist of protein-coding regions only (whole exome sequencing), a specifically targeted gene region (focusing on a limited number of known genes), or the entire genome (whole-genome sequencing). The captured region can then be sequenced by one of several methods (pyrosequencing, 454 Roche; sequencing by reversible termination, Illumina; sequencing by ligation, Solid; semiconductor sequencing, Ion Torrent), and the resulting output is composed of several sequence reads, which are then computationally aligned to the known genome in order to unravel any variations, such as small insertions or deletions [17]. Unlike traditional Sanger sequencing, which reads a sequence base by base, NGS is very timeefficient, enabling the simultaneous analysis of millions of base pairs organised in multiple aligned reads. Despite its efficiency, however, NGS is unable to detect dynamic mutations (e.g. triplet expansions) and still has limited capability to identify CNVs. Nevertheless, while we await the development of specific algorithms to overcome these limitations, NGS can be integrated with DNA profiling tools, such as array-CGH, for the detection of CNVs and other genetic imbalances.
The methylation profile of genes can also be explored via epigenomics. In fact, the recent advent of methylomic profiling now allows us to determine the DNA methylation status of the entire genome, and thereby to identify an increasing number of genes that are methylated in disease states, particularly cancer [18].

Discovery of RNA variations
Complementary genome-wide information technologies can be used to identify qualitative and quantitative variations at the RNA level. For example, a gene expression microarray or high-throughput technology such as RNA sequencing (RNAseq) can be used to perform transcriptome analysis. Transcriptome profiling can be performed on samples from biopsy or cell cultures from specific affected tissues, or, less invasively, from different body fluids such as urine, blood, or saliva [1]. The technique enables the generation of enriched RNA/cDNA libraries that cover the entire transcribed region, or, alternatively, a catalogue of genes of interest that can be used to evaluate gene expression or identify novel transcripts, alternative splicing, and/or gene fusion products.
Although transcript sequencing is heavily influenced by the tissue/cell type analysed, transcription and RNA editing being profoundly tissue specific, it is highly versatile. Indeed, in addition to mRNAs, transcriptomics can be extended to non-coding RNAs such as miRNAsingle-strand sequences of 18-25 nucleotides regulating the expression of target genes already known for their role as biomarkers.
Gene expression profiling is also considered a very powerful method of identifying biomarkers of pathological status, disease progression, and/or drug response, with the advantage of exploring specific tissue behaviour [19]. Microarray technologies may be used to quantify and compare the DNA levels/configurations of many transcripts in diseased and healthy samples, or at different time points (e.g. pre-and post-treatment).

Discovery of protein biomarkers
The evolution of mass spectrometry (MS)-based technologies and the development of other proteomic strategies such as two-dimensional gel electrophoresis (2D-DIGE) have considerably advanced our understanding of the nature of the proteome. This can be analysed to explore specific cellular functions and the control of specific biological processes, although the complexity and size of the human proteome pose larger challenges than those encountered in genomic and transcriptomic research [20]. Indeed, the individual proteome can change markedly over the course of a lifetime, and a single gene often produces very different isoforms, by alternative splicing or post-translational modifications such as phosphorylation, glycosylation, acetylation, and ubiquitination. However, proteins are often a target for pharmacological intervention, and proteomic technologies able to evaluate the expression level of soluble proteins are emerging, thereby paving the way to the discovery and validation of protein biomarkers.
The most common novel high-throughput approaches currently being used in discovery proteomics are those based on MS. These technologies enable the analysis of complex mixtures of proteins, measuring the mass-to-charge ratio of charged particles in order to determine their mass, quantity, and elemental composition. There are essentially two different types of MS approaches, namely top-down experiments, which analyse the whole protein, and bottom-up, which analyse proteins previously digested by proteases. For characterisation purposes, the resulting peptide mixtures may then be separated using different strategies, such as liquid chromatography (LC), gas chromatography, or ion mobility spectrometry, and then the identified proteins can be quantified. To achieve this, samples can be isotopically labelled by different methods, such as stable isotope labelling by amino acids (SILAC), isotype-coded affinity tagging (ICAT), isobaric tags for relative and absolute quantification (iTRAQ), and mass tags for relative and absolute quantification (mTRAQ) [21]. A typical MS protocol would therefore consist of sample loading (of intact or digested protein), vaporisation, ionisation, and separation of the ionised sample by mass-to-charge ratio, detection in an MS instrument, and generation of a detailed profile of the exact chemical composition of a sample.
By these means, it is possible to differentially analyse proteins from different biological processes or disease states in order to discover candidate biomarkers. Many biomarkers used in existing clinical practice are assays to quantify proteins, and proteomics techniques such as 2D-DIGE can be used to separate non-digested proteins within a biological sample based upon either apparent molecular mass (by gel electrophoresis) or charge (via isoelectric focusing). Such strategies thereby provide a measure of protein abundance and enable the identification of isoforms and post-translational modifications [21]. Validation of such potential biomarkers can be performed using a common protein expression method such as Western blotting and/or antibody-based assays.
As shown in the workflow illustrated in Figure 2, biomarker discovery can be facilitated by using a strategy combining two or more of the above approaches, for example • High-throughput/candidate approach. This strategy exploits the benefits of both the techniques by filtering the high-throughput data beforehand using a candidate list or a functional interactome map. This provides better applicability to the disease/treatment response but, considering the large amount of data generated by the high-throughput method, is very labour-intensive.
• Multiple -omics approach. This is a highly demanding method based on the simultaneous use of genomics, transcriptomics, proteomics, etc., to analyse the interactome and define interactome and functional pathways. If performed on the same individual at different times, or disease or treatment stage, such analyses are able to monitor changes in an individual's -omic profile, thereby lending themselves to the development of personalised medicine strategies.
The benefit of multiple -omics approaches has been clearly demonstrated by Finkel et al., in their recent 'BforSMA' cross-sectional study aimed at identifying novel biomarkers in spinal muscular atrophy (SMA, OMIM: 253300). SMA is a neurodegenerative motor neuron disorder caused by homozygous/compound hetero-zygous mutations in the motor neuron 1 (SMN1) gene [22]. It is characterised by the degeneration of the anterior horn cells of the spinal cord and leads to symmetrical muscle weakness and atrophy. The SMN protein plays a crucial role in RNA biosynthesis in all tissues, forming a large, multiprotein complex that drives the assembly of small nuclear ribonucleoproteins (snRNPs) of the spliceosomes. Through functions in RNP assembly, the SMN complex is required for the expression of essentially all protein-coding genes [23]. Preliminary results from the 'BforSMA' project-based on proteomics, metabolomics, and transcriptomics discovery platforms-indicate the discovery of a total of 200 candidate biomarkers, including 97 plasma proteins, 59 plasma metabolites, and 44 urine metabolites that could potentially be used to address clinical trial design and identify novel therapeutic targets in SMA [22].

Genomic biomarkers
New molecular biomarkers could be detected at different levels. According to the Food and Drug Administration/EMA definition, genomic biomarkers include both DNA and RNA determinants, and genomic biomarkers therefore include DNA methylation status and sequence variations, such as single-nucleotide polymorphisms (SNPs), insertions, deletions, translocations, CNV, as well as RNA alterations such as differential gene expression and miRNAs (Figure 3). The current research focus has shifted somewhat, from SNP to haplotype analysis, which it is hoped will furnish useful disease, prognostic, or predictive biomarkers. Indeed, DMD patients, for example, despite having common features such as the absence of dystrophin in the striated muscles, show different rates of disease progression, especially in terms of the age of loss of ambulation. This supports the idea that genetic modifiers exist and can influence both the phenotype and the clinical severity of the disease. To this end, Flanigan et al. identified SNPs located within the LTBP4 gene, which encodes for the latent transforming growth factor (TGF) b binding protein (LTBP), in more than 200 patients, showing that individuals homozygous for the IAAM LTBP4 haplotype remained ambulatory significantly longer than those heterozygous or homozygous for the VTTT haplotype [24]. Furthermore, in long QT syndrome (LQTS)-a rare hereditary cardiac disorder characterised by a prolongation of the QT interval due to mutations in genes encoding ion channels responsible for the generation of electrical impulses-it appears that the haplotype group C-G-T of the heat shock protein HSP-70 gene is strongly related to the disease condition and may therefore represent a diagnostic biomarker [25].
An example of potential RNA biomarkers has been provided in a study by Harten et al. into Hutchinson-Gilford progeria syndrome (HGPS, OMIM: 176670). This is a rare, fatal, autosomal dominant premature-aging disease (prevalence: <1/1,000,000) caused by splicing mutations in the LMNA gene that creates cryptic splice sites and leads to the production of progerin, a toxic, permanently farnesylated splicing variant [26]. In their study, the authors analysed the expression profile of several matrix metalloproteinases, identifying a donor-age-dependent reduction in the expression of MMP-3 mRNA in HGPS primary dermal fibroblast cultures, suggesting that a fall in MMP-3 correlates with disease severity in vivo [26].
RNAseq can be used in conjunction with new technologies such as NGS to analyse the whole transcriptome both quantitatively and qualitatively and thereby provide information about alterations in gene expression. This approach can potentially speed up the process of genomic biomarker discovery and was used to good effect in a recent study aimed at tracing a detailed RNA profile in both collagen VI myopathy (ColVI) patients and an animal model of the same. Collagen VI myopathies are genetic disorders arising from mutations in the collagen VI genes; they range from the severe Ullrich congenital muscular dystrophy (UCMD, OMIM: 254090, prevalence: 1-9/1,000,000) to the milder Bethlem myopathy (BTHLM1, OMIM: 158810, prevalence: <1/1,000,000), which can both be inherited via both dominant and recessive models. Generally speaking, neither the type of mutation nor the effect of the mutation on the protein structure/function allows precise discrimination between two phenotypes. However, by a combined RNAseq approach, the authors identified the potential involvement of circadian genes, reporting a marked deregulation of the CLOCK gene in UCMD patients alone, suggesting it as a candidate biomarker of disease severity in ColVI [27].
miRNAs also make quite appealing biomarkers, and a recent study by Eisenberg et al. found that the levels of muscle-specific miRNAs (myomirs) are correlated with disease severity in several muscular dystrophies, including limb girdle and Duchenne/Becker muscular dystrophies [28]. miRNA studies have also been extended to other RDs, such as cystic fibrosis (CF, OMIM: 219700). This is a recessive genetic disorder (prevalence: 1-9/100,000) characterised by eccrine gland dysfunction, chronic obstructive lung disease, and exocrine pancreatic dysfunction. It is caused by mutations in the cystic fibrosis conductance regulator gene (CFTR), and it appears that miR-494 and miR-145 are significantly over-expressed in CF tissues with respect to those of healthy individuals, suggesting their role as disease biomarkers [29].
As mentioned above, genomic biomarkers also include epigenomics modifications such as DNA methylation. Recent studies on Friedreich ataxia (FRDA, OMIM 229300), the most common ataxia, which is caused by an expanded GAA repeat in the first intron of FXN, have demonstrated that hypermethylation of the gene region upstream of the expanded GAA repeat correlates with clinical severity, while hypomethylation of the downstream region correlates with the age at onset [30]. It is evident, therefore, that genomic biomarkers may have a wide spectrum of functions as clinical and research outcome measures.

Proteomic biomarkers
Proteomic studies have several advantages over genomic analysis, not least the potential identification of biomarkers more closely related to biological function/dysfunction. Furthermore, proteomic biomarkers are more readily accessible than genomic biomarkers, being detectable in body fluids such as blood and urine (Figure 3). This makes them potentially useful in clinical trials as early indicators of the disease condition, disease progression, or treatment effects (drug response or adverse effects).
As an example, Martell et al. have provided a clear indication of biomarker accessibility and utility in Morquio A syndrome, also named mucopolysaccharidosis IVA (MPS, OMIM: 253000, prevalence: 1-9/1,000,000). This recessive lysosomal storage disorder is caused by a mutation in N-acetylgalactosamine-6-sulfatase gene (GALNS), which codes for keratan sulphate and chondroitin-6-sulphate. The mutation results in a wide spectrum of clinical features involving skeletal, cardiac, pulmonary, corneal, and hearing impairment, and the identification of biomarkers able to monitor the response to enzyme replacement therapy during clinical trials is long past due. To this end, the authors measured the plasma levels of 88 candidate proteins, finding that three of them (alpha-1-antitrypsin, lipoprotein a, and serum amyloid P) may be suitable surrogate end-points for clinical trials [31].
The main advantage of techniques that can assess biomarkers in body fluids is, of course, their lack of invasiveness. In this regard, a new protein technology, the SOMA scan assay-an aptamer-based method able to recognise specific protein epitopes-has been used to evaluate protein levels in the sera of DMD patients. By using this technology to compare serum samples from two independent DMD cohorts with healthy individuals, 44 serum biomarkers were identified [32]. Similarly, Auray-Blais et al. have recently applied novel MS-based highthroughput technologies to protein biomarker discovery in the urine samples of patients affected by Fabry disease, succeeding in identifying the lyso-Gb3/related analogue profile as a diagnostic biomarker [33].
Low invasiveness is also a feature of the most commonly used method of measuring and validating protein biomarkers, the immunoassay. Immunoassays are based on the ability of monoclonal antibodies to capture and detect specific protein domains and enable the simultaneous investigation of several proteins using very low amounts of samples. For example, in idiopathic pulmonary fibrosis (IPF, OMIM 178500), a rare lethal lung disease (prevalence: 1-5/10,000) of unknown aetiology and variable and unpredictable course, a multiplexed assay has been used to simultaneously evaluate 92 proteins in plasma samples from more than 200 patients. By these means, three biomarkers predictive of IPF outcome were identified [34].
Other studies have used the ELISA immunoassay to evaluate serum levels of an extracellular matrix glycoprotein, tenascin-C (TN-C), in Emery-Dreifuss muscular dystrophy (EMD, OMIM 310300), a rare neuromuscular disorder (1-9/1,000,000) characterised by muscular weakness and atrophy, with early joint contractures and cardiomyopathy, finding an association between elevated circulating TN-C levels and an increased risk of developing dilated cardiomyopathy [35].
Due to the low invasiveness of the methods involved, proteomic biomarkers are also very appealing as surrogate end-points in clinical trials and/or screening (e.g. neonatal testing).

Other biomarkers
As mentioned earlier, imaging technologies, and indeed any diagnostic test that is able to measure the disease status in patients, are useful for measuring, and therefore for investigating certain biomarkers. Magnetic resonance imaging (MRI), for example, is a safe and non-invasive method of analysing muscle, connective tissue, fat, and bone. Indeed, Kinali and co-workers have demonstrated that the MRI scan, focused on particular muscles, can serve as a biomarker for disease progression in Duchenne muscular dystrophy (DMD, OMIM: 310200), a rare neuromuscular disease (affecting 1/3300 male births) characterised by rapidly progressive muscle weakness and wasting due to degeneration of skeletal, smooth, and cardiac muscles. MRI can be used to accurately identify which type of muscles is sufficiently preserved in DMD, making it a reliable tool for use in clinical trials. Similarly, MRI scans of muscle biopsies are currently being used to correlate the clinical features of muscle diseases with the structure and morphology of muscle fibres [36].
Neurophysiological measurements can also be exploited as imaging biomarkers. For instance, Vucic and colleagues have reported that transcranial magnetic stimulation (TMS) is a useful and non-invasive method of assessing the functional integrity of the motor cortex and its corticomotoneuronal projections in ALS. Despite their similarities, TMS was able to reliably distinguish between ALS and similar peripheral disorders, thereby demonstrating its potential diagnostic utility [37].
In fact, imaging biomarkers are generally considered very appealing, generating a large amount of intensive research in recent years. The ultimate aim of such research is the development of innovative methods of using imaging tools for the detection and monitoring of the signs and symptoms of RDs.

Diagnostic/prognostic biomarkers
A diagnostic, or prognostic, biomarker is one that identifies a disease or quantifies its pathogenic factors (Figure 4). Essentially, they are signatures that divide the population into healthy and diseased individuals, but in some cases they can finely stratify the disease phenotype into different degrees of severity or sub-phenotypes. The routine diagnostic markers classically used in clinical practice are temperature, blood pressure, and cholesterol levels, among others, whereas in genetic diseases, according to the IRDiRC statement [4], all gene mutations known to cause a Mendelian disease have to be considered their primary genetic biomarkers. For example, DMD, the most common fatal genetic disorder diagnosed during early childhood, arises through mutations in the causative dystrophin (DMD) gene, which are therefore considered disease biomarkers, and can accordingly be used to select patients for enrolment in clinical trials [38].
In some cases, mutations in causative genes can be considered biomarkers of disease severity. This is the case in fragile X syndrome (FXS, OMIM: 300624), a rare intellectual disability disorder with an estimated prevalence about 1 in 2500 to 5000 men and 1 in 4000 to 6000 women. FXS is caused by an expanded CGG triple-repeat located within the 5' UTR of the FMR1 gene. The triplet expansion variability defines four different phenotypes, ranging from healthy to a severe phenotype, and can therefore be used to distinguish between them [39].
In ALS (OMIM 105400), the situation is less clear cut. ALS is a devastating neurodegenerative disease with an incidence of 1/50,000 per year. Although several mutated genes have been identified in ALS (DCTN1, OMIM 601143; PRPH, OMIM 170710; SOD1 OMIM 147450; NEFH OMIM 162230), the vast majority of patients do not show a defined genetic defect. This would seem to indicate that the causative gene is still missing [40], and research in this area has therefore focused on the discovery of specific biomarkers able to assist clinical diagnosis and monitor the disease progression. In this regard, Hwang et al. have correlated an increased level of HMGB1, non-histone architectural protein, in serum samples with the onset of ALS, even in early stages of the disease. This increased level of HMGB1 could also be useful as a severity biomarker, since they also found higher HMGB1 levels in patients with a severe disease status [41]. Moreover, the same group has recently correlated a reduction in the protein level of LG72 gene, activator of D-amino acid oxidase, to the pathogenesis of ALS [42]. Animal model studies have demonstrated that transactivation of the GFAP promoter is an early indicator of the disease process, and that GFAP level in the CSF could be a potential biomarker in human patients [43].
Biomarkers used in clinical practice to improve disease progression monitoring or disease-risk prediction are defined as prognostic. Simply put, a prognostic biomarker provides information on the course of a disease in an untreated individual, and an example has been identified for Marfan syndrome (MFS, OMIM: 154700), a systemic disease of the connective tissue characterised by a wide spectrum of cardiovascular, skeletal muscular, ophthalmic, and pulmonary manifestations. With an estimated prevalence of around 1/5000, patients affected by MFS suffer from an increased risk of cardiovascular complications that lead to premature death, and a correlation has been demonstrated between the larger aortic root diameters, coupled to a faster aortic root growth, and high serum levels of transforming growth factor-β (TGF-β). Increasing levels of TGF-β predict cardiovascular events and thereby possesses significant prognostic value [44].
Another biomarker for cardiac muscular involvement has been found in Fabry disease (FD, OMIM: 301500), a rare systematic disease (prevalence 1-5/10 000) characterised by the accumulation of globotriaosylceramide in the plasma and cellular lysosomes of vessels, nerves, tissues, and organs throughout the body. This accumulation leads to progressive skin lesions, renal failure, cardiac and cerebrovascular involvement, and peripheral neuropathy. Continuously elevated cardiac troponin I (cTNI), a laboratory parameter well known to reflect acute and chronic cardiac muscle damage, has been demonstrated in a substantial proportion of patients with FD, suggesting that raised cTNI levels could be a useful laboratory marker for assessing myocardial damage in FD [45].
Finally, a recent study on DMD has indicated the matrix metalloproteinase-9 (MMP-9) as both a diagnostic and prognostic biomarker. Indeed, DMD patients showed a higher serum level of MMP-9 protein and tissue inhibitors of metalloproteinase-1 (TIMP-1) proteins with respect to controls, with MMP-9 levels being even higher in older, non-ambulant patients than in ambulant patients [46].

Predictive/therapeutic biomarker
Considering the heterogeneous nature of RDs, not all patients are expected to benefit from a newly available treatment. Hence the identification of a sub-group of patients likely to respond to a novel treatment is important both in terms of health, and in terms of cost-effectiveness [12].
To this end, a predictive, or therapeutic, marker must be able to discriminate between drug responders (patients gaining benefit from the therapy) and poor/low responders (Figure 4). Predictive biomarkers will therefore enable the most appropriate and efficacious treatments or interventions to be selected for each patient, thereby underpinning a personalised approach to treatment.
There are a few examples of therapeutic biomarkers useful in RDs, generally SNPs, as in typical pharmacogenetics, although some protein studies have also been reported. For instance, a pharmacological predictive biomarker has been reported in idiopathic nephrotic syndrome, a RD affecting the kidneys. Specifically, Wen et al. [47] found a significant difference in the serum proteome of steroid-sensitive nephrotic syndrome (SSNS) and steroid-resistant nephrotic syndrome (SRNS, OMIM 256370) patients, predictive of their respective responses to treatment.
ed to losartan used to reduce the aortic root dilatation rate, had higher baseline TGF-β levels but exhibited lower plasma TGF-β concentrations during losartan therapy [49].
Predictive biomarkers such as these are likely to play an increasingly important role in clinical practice, since evaluating the efficacy of a treatment/intervention is fundamental to making decisions about treatment choices, and therefore determining therapy outcomes.

Conclusions
Since the definition of biomarkers in 2001, their importance in clinical and research settings has increased dramatically due to their diagnostic/prognostic functions and their ability to monitor/predict disease stage, treatment response, and/or adverse effects. Indeed, the creation of an exhaustive catalogue of approved biomarkers may be the single most important innovation in healthcare, bringing considerable clinical and economic benefits. Although current research, both academic and corporate, is heavily focused on the development of drugs and companion diagnostic tests, in the future, biomarker discovery and development will be vital for tailoring medical care to individual patients. This will be especially important in the field of RDs, in which the discovery of efficacious biomarkers is likely to greatly facilitate the process of EMA approval and development of novel orphan drugs. In addition to being both and time and cost-effective, biomarker research also provides exciting opportunities to expand our knowledge of the physiopathological mechanisms behind rare and other diseases, helping to discriminate between distinct disease presentations and comorbidities, as well as predict the different impacts of concomitant medication, and various important demographic parameters such as gender, age, and ethnicity. In short, biomarker discovery represents a giant leap towards the ultimate goal of truly personalised medicine.