A wide range of malignant and nonmalignant diseases require hematopoietic stem cell transplantation (HSCT) as last resort therapeutic approach. Graft versus host disease (GVHD), which is one of the major causes of transplant-related mortality, is minimized whenever increased matching of human leukocyte antigens (HLAs) between donor and recipient is present. Suitable donor selection is determined with the utilization of HLA typing. HLAs are highly polymorphic glycoproteins encoded by a region of genes known as the major histocompatibility complex (MHC). Their biological function is to present antigenic peptides to T lymphocytes. However, they also play important role in HSCT acceptance/rejection. During the previous years, various techniques have been acquired in order to better characterize the HLA profile of transplant donors and recipients. This effort is particularly challenging due to MHC size, but most importantly due to high sequence variability in specific regions of the respective genetic loci, between individuals. Initially, HLA typing was performed using serological typing, hybridization techniques, and restriction fragment length polymorphism (RFLP) approaches. Later on, polymerase chain reaction (PCR) based techniques and direct sequencing (dideoxy-based Sanger sequencing) capillary electrophoretic analyses arose. Nowadays, 2nd and 3rd generation sequencing (NGS) technologies show great potential in effectively identifying these polymorphic regions.
- HLA typing
- single-molecule sequencing
The past few decades have been detrimental for understanding the mechanism of appearance and evolution of many myeloid and lymphoid diseases. Previous immunobiology and molecular biology techniques along with most current sequencing technologies and their implementation in molecular diagnostics altogether contributed toward characterizing such conditions.
Novel pharmaceutical approaches, such as targeted therapies, optimized chemotherapy regimens, radiotherapy, and others have been developed. Yet, many of these diseases still present poor survival outcomes. In such cases, hematopoietic stem cell transplantation (HSCT) is considered as final resort therapeutic approach, whenever all other options have failed .
The success of HSCT depends on various factors that should be taken into consideration in advance. Reduced immunological reaction is such a major factor. This is only accomplished when donor and recipient of the graft are immunologically compatible. During the previous decades, immunology and molecular biology techniques have been moving toward delineating the biological mechanism of this compatibility.
During the past few years, the advent of high throughput sequencing technologies has helped move toward this direction with a much faster pace. In this chapter, we will review the past, present, and future of these technologies in this particular area of research.
2. Transplantation as a therapeutic approach
HSCT offers potentially curative therapy for patients suffering from various congenital or acquired malignant or nonmalignant lymphohematopoietic diseases.
These mainly include myeloid malignancies such as myeloproliferative neoplasms (MPNs), in particular myelofibrosis (MF). Others include myelodysplastic syndromes (MDS), myelodysplastic syndromes/myeloproliferative neoplasms (MDS/MPNs), chronic and acute myeloid leukemias (CML and AML).
Also, lymphoid diseases such as acute lymphoblastic leukemia (ALL), chronic lymphocytic leukemia (CLL), Hodgkin, and non-Hodgkin lymphomas, as well as various types of anemias (Fanconi’s anemia, severe aplastic anemia) are highly ranked among [2, 3].
2.1. Factors affecting the decision and outcome of HSCT
Several factors should be taken into account in order to estimate the benefit/risk of HSCT compared to other treatment options, such as chemotherapy .
These include disease characteristics, age along with related comorbid conditions and donor availability, followed by race, socioeconomic status and financial fitness.
These can be widely viewed as pretransplantation, transplantation-associated and post-transplantation risk factors, although they cannot be classified in such a way, because the transplantation protocol is affected by pre-transplantation conditions and they may both affect post-transplantation events. The latest comprise of graft-versus-host disease (GVHD), infections, and disease relapse.
Donor suitability is a major issue affecting the course of HSCT and shall be thoroughly analyzed.
2.1.1. Donor suitability
18.104.22.168. Human leukocyte antigen (HLA) system
It is well established that donor suitability is mainly dictated by the genomic loci of major histocompatibility complex (MHC), located on the short arm of chromosome 6 in humans. This highly polymorphic genetic system encodes for the major histocompatibility antigens that comprises the human leukocyte antigen (HLA) system.
These cell surface antigens where first characterized using allo-antibodies (allo-Abs) against leukocytes. Although they are clinically important in HSCT, their primary biological function is the regulation of immune response .
Only 30% of patients have HLA-matched-sibling donor (HLA-MSD) which is the gold standard for allogeneic hematopoietic stem cell transplantation (allo-HSCT). The remaining 70% relies on alternative sources of stem cells.
These include suitable volunteer HLA-matched-unrelated donors (HLA-MUD), one-locus HLA-mismatched-unrelated donors (HLA-mmUD), HLA-haploidentical donors (HLA-haplo) (half matched donor, typically a parent or other relative), or umbilical cord blood (UCB) units [6–8].
The success of HSCT highly depends on the HLA compatibility between graft and patient. This is because recognition of HLA allelic differences by T lymphocytes of the patient increases the risk of graft rejection, GVHD, slow or incomplete immune reconstitution, and consequent risk of lethal opportunistic infections .
Other important prognostic impact factors are age, sex, cytomegalovirus (CMV) serostatus, and natural-killer (NK) cells allo-reactivity .
22.214.171.124. Killer inhibitory receptor (KIR) types
KIR types comprise yet another genetic characteristic of donor that affects transplantation outcomes during allo-HSCT.
NK cells are lymphoid cells of the innate immunity that contribute to GVT, but not GVHD. Their function is characterized by interaction of surface receptors with their cognate ligands on target cells. KIRs are such receptors whose genetic loci constitutes of multiple genes that encode for them, just like the respective genetic loci of HLA genes.
Also, there is considerable genetic diversity in the KIR genetic locus, like with MHC.
Upon binding of some of the KIRs with their ligands, NK cell function is inhibited, while other KIRs promote activation of NK cells after engagement with their cognate ligand.
Most of the KIR-ligands are HLA-class I molecules.
KIRs can be subdivided into two main categories based on the strength of their affinity to the ligands. It has been observed that Group A binds more effectively than group B.
KIRs seem to play important role in transplantation outcome. Transplant recipients missing KIR-ligands, especially in the absence of allo-reactive T cells (e.g., in T-cell-depleted HLA-haplo HSCT) were proven to present decreased rate of disease recurrence and improved survival.
To conclude, the presence of activating KIR genes in the donor favorably affects recurrence rates in myeloid, but not lymphoid neoplasms .
3. The human leukocyte antigen system
3.1. Organization of the human MHC genetic loci
As previously described, the human MHC, also known as HLA, located on the short arm of chromosome 6 (6p21), is a highly polymorphic gene dense genetic system. The HLA gene products are globular glycoproteins, each composed of two noncovalently linked chains. These proteins are ligand molecules, cell surface receptors and other factors involved in inflammatory response; recognition, processing, and presentation of foreign antigens to T cells, as part of the adaptive immune response; and also in innate immunity.
In addition to protein encoding genes, the MHC genetic loci contains pseudogenes and also transposon, retro-transposon and regulatory elements .
The HLA system comprises of almost 220 genes, with 21 of them being genes of major interest. These are located within genomic location 6p21.3 and their protein products mediate human response to infectious disease and influence the outcome of cell and organ transplants .
The human MHC genetic loci are divided in three distinct regions.
The class I region consists of genes that encode for HLA class I molecules, namely the HLA-A, HLA-B, and HLA-C (and also the nonclassical HLA-E, HLA-F, HLA-G, and the class I-like molecules MIC-A and MIC-B). These are expressed on the surface of almost all nucleated cells and are responsible for presenting intracellular derived peptides to CD8+ T cells.
The class II region includes genes that encode for HLA class II molecules, namely HLA-DR (DRA, DRB1, and depending on the haplotype the DRB3, DRB4, or DRB5), HLA-DQ (DQA1, DQB1), and HLA-DP (DPA1, DPB1) molecules. These are expressed in professional antigen presenting cells (APCs), such as macrophages, dendritic cells and B lymphocytes, in order to present extracellular derived peptides to CD4+ T cells.
Located between these two, is the class III region that contains non-HLA genes with immune function, such as complement components (C2, C4, factor B), cytokines, tumor necrosis factor (TNF), and lymphotoxins and heat shock proteins .
3.2. HLA nomenclature
The complexity of HLA requires the development of a more sophisticated nomenclature for locating the specific genomic region addressed each time:
HLA prefix: the HLA-prefix designates the MHC gene complex.
Genetic loci: following capitals indicate the specific genomic region (A, B, C, D, etc.) and subregion if available (DR, DQ, DP, DO, DN, etc.).
Genetic loci encoding for specific class II alpha and beta peptide chains are indicated next (DRA1, DRA2, DRB1, DRB2, etc.).
Field 1 (two-digit typing) provides the allele group (or allele family), which is designated by two digit that define the serologic group reactivity.
Following the allele family, separated by a colon (:), is field 2 (four-digit typing) which provides the specific HLA allele (HLA protein).
The following digits, also separated by colons, contain other scientifically important information.
Field/digit 3. Alleles that differ only by synonymous nucleotide substitutions within the coding sequence (CDS) are distinguished by the use of the fifth and sixth digits (six-digit typing).
Field/digit 4. Alleles that differ only by sequence polymorphisms in noncoding regions (e.g., introns) are distinguished by the use of the seventh and eighth digits (eight-digit typing). This is level of resolution distinguishes the specific HLA genome sequence.
3.3. The diversity of HLA
The HLA genes located on a single chromosome, meaning the entire set of A, B, C, DR, DQ, and DP genes, also called a haplotype, are inherited in a typical Mendelian fashion altogether. So, each parent passes on a specific HLA haplotype to their descendants. This way in 50% of the cases two siblings are HLA haplo-identical (share one haplotype), whereas siblings with the same HLA genotype (both haplotypes are the same) or totally different HLA haplotypes equally share the remaining percentage (25% each).
This first level of genetic variation may be further enhanced with random genetic crossovers (chromosomal recombination) in the HLA region during meiotic division of gametic cells, though this is usually uncommon.
Additionally, amino acid variation which is mainly found in the extracellular antigen-binding grooves, as well as their surrounding regions, on the HLA protein molecules, alters the antigen binding specificity of the cells. This possibly contributes to enhanced diverse response after exposure to a variety of environmental infectious and noninfectious agents in the different areas of the world.
This amino acid variation stems from nucleotide sequence alterations such as single nucleotide polymorphisms (SNPs), copy number variations (CNVs), insertion/deletion events (InDels), and inversions, especially within the HLA class I and II gene regions [5, 11, 17].
3.4. HLAs in HSCT: the purpose of HLA typing
It has become clear by now that HLA molecules play an important role in HSCs, the success of which highly relies on the degree to which donor and recipient are HLA matched. HLA genotypically identical sibling is the gold standard. Whenever this is not the case, a perfect or well-matched unrelated donor is preferred over mismatched unrelated donors, haplo-identical donors and UCB.
Thus HLA matching is especially crucial when it comes to HSCT between unrelated persons. This is because allo-recognition of HLA allelic differences by T cells is related with acute and chronic GVHD, impaired engraftment, and higher mortality [12, 18].
To address this issue, molecular typing technologies have evolved substantially in order to more accurately determine the HLA genotype of both patients and donors, before HSCT. Older techniques provided limited information compared to more advanced high throughput sequencing methods, which dramatically increased the list of known HLA alleles. More than 14,000 HLA alleles have been identified, the vast majority of whom is being variants of the HLA class I genes. These encode for more than 10,000 different HLA proteins [8, 17].
3.5. HLA typing resolution
Levels of HLA typing resolution have been established by expert consortiums. These include:
Low-resolution typing, or two-digit typing, is equivalent to serological typing, provides limited information that correspond to identification of broad families of alleles and is also called antigen level typing.
High-resolution typing, is a four-digit typing, which refers to one or a set of alleles that encode for the same antigen binding site and excludes null alleles (e.g., alleles that are not expressed on the cell surface).
Allele level typing (all-digit typing), refers to the exact nucleotide sequence determination of an HLA gene.
Other level of resolution correspond to intermediate level of typing (between low and high), which can define specific allele groups and subtypes.
Today, when adult donor HSCT is considered, the gold standard is high-resolution typing at the HLA-A, HLA-B, HLA-C, HLA-DRB1, and HLA-DQB1 genetic loci (10/10 match). Single discrepancies for these regions are associated with increased risk of post-transplant complications, although HLA-DQB1 and in some cases HLA-C mismatches seem to be better tolerated, compared to mismatches in the other regions. Also, not all mismatches are of the same risk, some of them appear to have little or no increased risk, the so called permissive mismatches, which will be discussed later on.
HLA-DPB1 and KIR are also taken into account whenever possible. HLA-DPB1 is not tightly linked to the other genomic regions, so it is more difficult to find a perfect donor (12/12) when this genetic locus is also taken into account. The positive aspect of this misfortune is the fact that there are some permissive HLA-DPB1 mismatches that do not impact overall survival rates in case of perfectly matched donor unavailability (11/12).
Nowadays, HLA-DQA1 and HLA-DPA1 are not taken into account during HLA typing because of the strong linkage disequilibrium (LD) they present with the corresponding HLA-DQB1 and HLA-DPB1 loci. LD refers to certain alleles inherited together with increased frequency than that expected only by chance.
A treatment algorithm has been developed to address the complicated issue of selecting the fittest available unrelated donor for HSCT:
At first, search for 7/8 or 8/8 HLA-A, HLA-B, HLA-C, or HLA -DRB1 allele-matched donor.
When many of them are available, look for 9/10 or 10/10 HLA-A, HLA-B, HLA-C, HLA-DRB1, and HLA-DQB1 matching.
If none of the first are available search for suitable, at least 4/6 HLA-A, HLA-B, and HLA-DRB1, UCB units, with adequate cells dose, ideally NIMA-matched. NIMA-effect refers to bidirectional trans-placental trafficking of cells, which expose the fetus to the maternal cells that express both inherited maternal antigens (IMA), as well noninherited maternal antigens (NIMA), resulting in the development of NIMA-specific responses.
3.6. Seeking for permissive (relatively well-tolerated) HLA mismatches
It is well understood that even single nucleotide substitutions might impact the course of transplantation significantly either on the aspect of GVHD, engraftment success and transplant related mortality, delayed immune reconstitution, but not disease relapse.
Of course the extent of their impact is driven not only by the kind of genetic alteration (SNP, CNV, InDel, inversion), the effect it exerts on the final protein product (synonymous versus nonsynonymous polymorphisms) and the gene it appears at (HLA class I or II), but also the exact nucleotide position it is located, since this might affect more or less important amino acid sequences, regarding the protein’s function.
With the advent of novel sequencing techniques allele level typing provides extensive nucleotide sequence data, which in correlation with previously available clinical data, is going to provide significant information in the context of retrospective studies [8, 12, 19].
4. The past and present of HLA typing
HLA genes contain 5–8 exons ranging in length from 4 to 17 kb. Most high-resolution four-digit HLA typing technologies, mainly sequence-based techniques (SBT) or probe-based hybridization techniques, primarily focus on deciphering the sequence of the antigen-binding groove only. This is due to the high cost of complete HLA genotyping and the limited time interval before HSCT. Thus, only exons 2 and 3 (540 bp) for class I molecules and exon 2 (270 bp) of class II molecules are typically analyzed, providing intermediate-resolution typing.
Before these techniques came into light, less informative serology methods were acquired for HLA typing. On the other hand, the future seems very promising thanks to the evolution of sequencing technologies .
Next, we will provide a brief review of evolution of the most widely applied techniques for HLA typing, frequently utilized by clinical laboratories.
4.1. Serological methods
4.1.1. CDC (complement-dependent lympho-cytotoxicity) technique
Lymphocytes incubated with polyclonal sera in the presence of complement, was the first attempt for determining patient-donor compatibility. CDC utilizes sera from multiparous allo-immunized women, whose HLA specificity (reactivity against a particular HLA type) is determined using a panel of lymphocytes of already known HLA type.
From a population of peripheral blood lymphocytes (PBLs), T lymphocytes are used for determining class I antigens, while B-lymphocytes (professional APCs) are separately isolated for determining class II antigens. The cells are incubated along with the characterized serum and complement. Their reactivity is determined based on the lysis of Ab-covered lymphocytes, from complement components, as shown in Figure 2.
The low-resolution two-digit serologic typing this technique offers is further limited by the availability of sera containing various HLA specificities. However, this method has value in confirming the presence or absence of an antigen in case of mutations in promoter regions or genes not otherwise analyzed [5, 18, 21].
A variation of the above technique involves incubation of the donor’s cells with serum from the patient in the presence of complement. The results are interpreted the same way as previously described.
Both techniques rely on cell viability in order to be successful and accurate. Also, cell populations need to be lymphocyte-specific so that the results be interpretable. Samples contaminated by other lymphocytes and/or precursor cells lead to inaccuracies .
4.1.2. Other serological techniques
FACS overcomes two additional obstacles of the CDC method. The first refers to the positive reactions mediated through cytotoxic Abs directed against non-HLA molecules (lower specificity). The second regards positive reactions that are driven only by the complement-activating Abs, thus failing to detect complement-independent acting donor-specific Abs (lower sensitivity).
However cell viability dependence is still an obstacle. Combined with its high cost, this technique is prohibitive for HLA-typing in a routine use by a clinical laboratory.
Solid phase assays, such as ELISA (enzyme-linked immunosorbent assays) and the Luminex technology, the last utilizing fluorescent dye impregnated beads bound to HLA molecules, have also been developed for HLA typing. These have mainly, but not exclusively, been studied in solid organ transplantation studies. In HSCT studies, these techniques mainly focus on HLA-Ab screening due to the renewed interest on donor-specific Abs (DSA) and their importance in graft failure [22–27].
Since molecular methods have proven to be more reliable, we will focus on them for a much more comprehensive analysis .
4.2. Molecular methods
The development and extensive usage of molecular methods soon substituted serologic techniques, for determination of individuals’ HLA type.
Molecular methods provided higher resolution typing, without the need for preserving characterized sera of all HLA types, even those not very common ones, and without the prerequisite of cell viability for test success.
Molecular methods of typing are still the main approaches to HLA typing resolution of low (two-digit) or intermediate (partial four-digit) level. The four-digit resolution these techniques offer for analysis of limited but HSCT important loci, contributed to the identification of serologically indistinguishable variants of HLA class I and II molecules with few, but detrimental, amino acid changes.
High-resolution typing at the four-digit level for all HLA loci is an unrealistic goal with these techniques. Molecular methods mainly focus on identifying polymorphisms in exons 2 and 3 of the class I locus and exon 2 of the class II locus, which are crucial for HSCT, as mentioned before [5, 18, 29].
4.2.1. PCR/SSP (sequence-specific priming PCR)
Genomic DNA (gDNA) is isolated from the under investigation sample and HLA regions of interest are amplified using PCR technology.
PCR utilizes in vitro prepared small oligonucleotide sequences (primers or oligos). These oligos bind to an exact location of a DNvA molecule, according to complementarity rules, acting as starting points for the production of multiple complementary copies of the intermediate region between a pair of them (amplification). More primers per reaction or more pairs of primers may be included in a single PCR, depending on the purpose of each protocol [29–32].
Sequence specific PCR (PCR/SSP), whether characterized as allele-specific amplification PCR (PCR/AS), amplification refractory mutation system PCR (PCR/ARMS) or multiplex PCR, exploits PCR for the amplification of specific HLA regions with minor alterations in primer design each time.
The amplification primers are polymorphic-specific, meaning that they only extend and form a product if the targeted polymorphism exists. The primers are designed in such a way, so that their 3’ end nucleotide is complementary to the investigated genomic alteration. Thus products of specific length are produced depending on the polymorphism and the primer design. Afterward, these are visualized using gel electrophoresis.
This technique might result in no amplification at all if none of the polymorphisms analyzed exists, thus another set of primers is coamplified. This corresponds to a monomorphic target sequence that produces an extra fragment of distinguishable length. The last determines the quality of DNA and the validity of the technique (successful reaction) .
PCR/SSP may become a time and labor intense technique, when not used in a multiplexed format for the analysis of many polymorphic sites. If multiplex format is preferred, the conditions for a successful PCR need to be very stringent. Also, this technique is prone to false-positive bands and false-negative results, especially for degraded samples.
However, this method is especially useful if applied in conjunction with PCR/SSOP (sequence specific oligonucleotide probes) hybridization typing, providing higher resolution, since it allows the separate amplification of the two alleles in a heterozygote [28, 33–35].
4.2.2. PCR/SSOP (sequence-specific oligonucleotide probes)
126.96.36.199. Reverse hybridization (reverse dot blot)
The PCR step of SSOP utilizes chemically modified (biotin labeled) primers nonpolymorphic-specific. Under normal circumstances, meaning the DNA is intact and no random polymorphism exists in the 3’ end of a single primer, the biotinylated amplified products (amplicons) are produced by each primer pair. This way, the primer pair is designed in such a way in order to produce an amplicon that includes polymorphisms inside its sequence.
The PCR product is afterward incubated with a panel of already known polymorphic HLA sequence molecules (SSO probes), which are enzymatically poly-thymidine (poly-T) tailed. This enzymatic tailing enables their prestabilization on a solid surface, usually a nylon membrane.
The modified amplicon will cross react with only one of these probes (complete complementarity) during their incubation along with horseradish peroxidase (HRP)-conjugated streptavidin and a chromogenic or chemiluminescent substrate. HRP-conjugated streptavidin binds on the biotinylated product. In the presence of biotin, HRP enzyme is activated and metabolizes the substrate in order to emit light signal. This light is device detected and computer analyzed later on, to determine the polymorphism of the unknown sample, based on position analysis of the signal, since the stabilization position of each probe on the membrane is previously known.
As one can conclude the specific sequence of gDNA where the primers will bind, need to be known in advance. The polymorphisms under investigation also need to be already known, in order to prepare and stabilize the suitable set of probes.
Interpretation difficulties may arise from less intense, absent or dubious hybridization patterns, due to poor-quality and low-quantity DNA amplification or background signals due to poor membrane washing or temperature variation during hybridization.
Another limitation of this technique is the fact that the extremely polymorphic HLA alleles, especially those of class I, are impossible to analyze due to the very large number of probes such a design would require. However, as a general rule, wider number of probes for every HLA loci and larger number of unknown HLA regions investigated provides higher resolution level [29–31, 36].
188.8.131.52. Direct hybridization (conventional dot-blot)
An alternative to reverse dot-blot PCR-SSOP, is the conventional dot-blot technique, where the PCR amplified regions are the ones immobilized on the solid surface, and biotin-labeled SSO probes are incubated along with HRP-conjugated streptavidin and a substrate. The ones that bind the unknown amplicons emit light of specific wavelength identified in the same manner as previously described.
In addition to the previously described limitations this technique is also more cumbersome, since the number of SSOPs required for typing vastly increases, due to the high polymorphic state of HLA loci [30, 31].
4.2.3. PCR/RFLP (restriction fragment length polymorphism)
RFLP analysis involves digestion of gDNA with endonucleases, one at a time, which cleaved a specific nucleotide sequence motif, in order to produce fragments of various lengths. This variation can be detected by Southern blot analysis of the digested fragments. Suitable probes for detection are either cloned cDNA, or genomic DNA sequences, complementary to mainly HLA class II regions which were better studied. In fact, this technique was the first that revealed the incredible variation of HLA class II region.
However, it is cumbersome, it requires a large amount of high molecular weight genomic DNA, and it can only be applied to regions that bear the respective restriction sites, which is not always the case and that is why it did not replace serological typing, but was rather used complementary.
PCR-combined RFLP (PCR/RFLP) analysis is an improvement to the previous method. The amplified HLA region is incubated with a restriction endonuclease that recognizes specific nucleotide sequence. The digestion reaction is performed whether a polymorphism exists or not, producing amplicon fragments of various lengths. The products are analyzed by gel electrophoresis.
However, like RFLP analysis, PCR/RFLP may lead to inconsistencies during HLA typing due to complex manipulation steps and possible incomplete digestion reactions. Also, this technique is unable to detect multiple recognition sites simultaneously [29, 33, 36, 37].
4.2.4. PCR/SBT (sequencing-based typing)
Another approach to high-resolution HLA typing is the PCR amplification and subsequent direct sequencing of previously described class I and II exons. Dideoxy-based Sanger sequencing, using capillary electrophoresis, provides increased reliability especially when applied after SSO or SSP.
The two alleles of heterozygous samples, which represent a substantial source of ambiguities, are usually sequenced separately following SSO or SSP typing, thus increasing the resolution of possible genotypes [29, 37].
This technique, although more automated, easier to implement and less prone to technical and interpretational errors, is less sensitive than others, like SSO, which when optimized provide more accurate results and less ambiguous results .
4.2.5. PCR/RSCA (reference strand-mediated conformational analysis)
Although PCR/SBT is able to detect unknown HLA polymorphisms, the problem of not being entirely able to resolve novel arrangements of known polymorphisms, also known as ambiguity, can be overcome by the PCR/RSCA technique .
PCR/RSCA can achieve high-resolution results without the ambiguities seen in the previously described methods. This technique is based on the principles that DNA fragments that differ in nucleotide composition exhibit different motilities after separation by nondenaturing polyacrylamide gel electrophoresis (PAGE). The amplified alleles under investigation are hybridized with a fluorescent labeled reference strand, forming a double stranded DNA with unique conformation (double strand conformation analysis; DSCA).
PCR/RSCA is capable of resolving even single base alterations and of course, identification of new mutations thanks to the different spatial structure of the newly formed DNA-probe duplexes .
5. The present and future of HLA typing
We will mainly focus our review on the second and third generation of sequencers which made their way into many aspects of personalized medicine, genetic diseases research and clinical diagnostics, providing reduced hands on time, higher throughput, higher sensitivity and lower cost-per-base compared to Sanger sequencing and other techniques.
Transplantation success depends on many factors; one of them being the similarity of the sequence of genes, mainly those of the MHC genetic loci, as long as others (minor histocompatibility complex; KIR, MIC-A, and MIC-B), between donor and recipient of the graft. Characterization of these genomic sequences in both persons before HSCT is of great importance for selecting the most appropriate transplant, in order to avoid GVHD, enhance engraftment rate and assist GVT effect [38–40].
5.1. The second generation of sequencers (typically named next-generation sequencers; NGS)
When it comes to HLA typing, NGS technology overcomes many of the cons older techniques present.
First of all, it allows setting the phase of linked polymorphisms within the amplicons produced during the first steps of the technique, meaning that it helps determine in which of the two alleles, the identified groups of variants, belongs to. In heterozygous samples this is a major concern with older techniques, as recognized polymorphisms resulted in two or more different allele combinations that produced identical consensus sequences.
It also allows determination of a large number of sequences in a single reaction. This way many exonic, and also important intronic sequences, can be simultaneously analyzed. The expression levels of HLA genes are also very important, thus detection of polymorphisms outside of exonic sequences, within regulatory intronic regions, is also necessary [29, 40].
Another advancement, all NGS systems share, compared to older techniques, is the higher level of coverage they provide, leading to increased accuracy as previously described. Coverage refers to the number of times each nucleotide position is read, and later on, successfully aligned to a reference genome, during all sequencing runs. The higher the coverage of a single position, the higher the confidence level of base calling .
Many companies rally toward building the fastest, less expensive and more accurate sequencing system, along with user friendly analyses pipelines, since the huge amount of data extracted from these machines require extensive bioinformatics knowledge. Algorithms that deal with a variety of issues concerning data analysis have been developed during the past few years. Many of these are nicely summarized in the publications from Szolek et al.  and Hosomichi et al. , although new algorithms are continuously deployed, with the prospect of simplifying and making data analysis more precise, for their implementation in everyday practice of clinical laboratories and biomedical research .
NGS, or second-generation sequencing technologies, constitute various strategies relying on a combination of template preparation, sequencing and imaging followed by in silico genome alignment and assembly methods. Of them, the most widely utilized, sequencers for HLA typing, are those of Illumina (MiSeq) and Roche/454 (454 GS FLX Titanium/454 GS Junior) .
5.1.1. Library preparation
184.108.40.206. PCR based
The first step the technologies of Roche/454 and Illumina NGS systems usually utilize is a fragmentation step, where gDNA is digested into smaller fragments, followed by a PCR, for the amplification of DNA samples. Primers that bind to specific sequences of genomic DNA (gDNA) are designed, and the intermediate region of interest is enzyme amplified.
This row of events may be reversed, meaning that long range PCR, with the addition of suitable enzymes and primers, may precede the fragmentation step.
Each primer, except for the gDNA complementary sequence, also includes a number of additional nucleotides. These mainly comprise of the system-specific adapter sequences and the multiplex identifier tags (MIDs). The adapters assist the amplicons bind to a solid surface (either this is a bead or a slide) and provide a universal priming site for sequencing primers. MIDs help recognize the individual sample to whom the amplicon sequence generated and then sequenced, belongs to. This is particularly useful, for discriminating samples’ reads (demultiplexing), in case more than one samples are prepared and pooled together for sequencing. This barcoding method is called “amplicon sequencing.”
A variation of this technique, also known as “shotgun sequencing,” utilizes simple primers. The adapters and MIDs are ligated to the amplified sequences after the PCR and prior to sample pooling [16, 40].
220.127.116.11. Hybridization based (target enrichment)
Target enrichment utilizes biotin labeled DNA or RNA oligonucleotide sequences (probes) (55–120 bp) which hybridize to their complementary target region of previously fragmented gDNA. Streptavidin magnetic bead particles are used for probe/DNA hybrid capture. PCR is then applied to amplify the captured gDNA fragments [16, 40].
5.1.2. Clonal amplification and cyclic-array sequencing
18.104.22.168. Roche/454 (454 GS FLX Titanium/454 GS Junior) sequencing-by-synthesis; single-nucleotide addition (SNA)
Once the library is ready, the second step toward sequencing with the Roche/454 instrumentation, includes an “emulsion PCR” (emPCR) for clonal amplification of the amplicons already produced. During emPCR, the DNA sequences are converted into single strands under alkali conditions and captured on beads, in a unique single-stranded molecule per bead fashion. Then, they get mixed with oil and aqueous buffer to create a system of droplets inside of whom clonal PCR amplification takes place.
Once this step takes place, the beads containing the amplicons are placed into the wells of a PicoTiter Plate (PTP) for a pyrosequencing reaction. During pyrosequencing, only one out of the set of four different nucleotides or dNTPs (dATPαS, dTTP, dGTP, dCTP), is added into the PTP, in each round. A series of enzymatic reactions, between enzymes and their substrates (ATP sulfurylase, luciferase, luciferin, DNA polymerase and adenosine 5’ phosphosulfate; APS) leads to the release of inorganic pyrophosphate (PPi) only when a specific dNTP is incorporated. The release of PPi transforms ATP which drives luciferin into oxyluciferin that emits visible light.
The light emitted is viewed by a charge-coupled diode (CCD) camera and translated into a single peak per base incorporated, with a computer software. More than one nucleotide may be incorporated per cycle, in the presence of homo-polymeric sequences (consecutive runs of the same base). When this happens, light of equal amount to the number of the nucleotides added, is emitted, resulting to an analogously higher peak.
Each time, the un-incorporated bases are degraded by apyrase. Subsequently, another set of dNTPs is released one by one, in the reaction system and another pyrosequencing round is performed.
Many studies, utilizing Roche NGS platforms, GS FLX Titanium and GS Junior, for HLA typing have been conducted so far. The comparative advantages this technology offers, over the rest of its kind, are the long sequence reads (around 400 up to maximum 1000 amplicons), which capture critical phase information of nearby DNA variants, and also the speed in which a complete run is performed (10–24 h), depending of course on which machine is used.
The 454 GS FLX Titanium is capable of providing up to 1 million (M) reads per run depending on the sequencing protocol, while the bench top format 454 GS Junior provides around 0.1 M reads.
Despite the advantages of long reads and rapidity of the technique, there are several inborn disadvantages. These include, high cost of pyrosequencing reagents, high error rate in case of homo-polymers (typically more than six), and emPCR, the latest being a challenging reaction that if semiautomated could reduce manpower. Insertion mutations are the most common error type, followed by deletions [16, 29, 34, 35, 38–40, 43–45].
22.214.171.124. Illumina (MiSeq and MiniSeq) sequencing-by-synthesis; cycling reversible terminator (CRT)
Illumina utilizes a different sequencing-by-synthesis approach called CRT. During clonal amplification, instead of emPCR, this system incorporates a glass slide with lanes (flowcell). High density of primers complementary to a sequence of the adapters of the fragmented DNA amplicons, are already attached to the slide where sequencing is later performed. Through a process called clustering each fragment is isothermally amplified to create a cluster of clonally amplified fragments, in a process called bridge amplification.
After amplification sequencing begins with the binding of the first sequencing primer on each fragment of every cluster and its subsequent extension to produce the first read. All four dNTPs are fluorescently tagged and compete with each other for addition to the growing chain. Once a nucleotide that is complementary to the original sequence is incorporated, a washing step removes all unbound nucleotides and a light signal of characteristic wavelength and intensity is emitted. This signal that differs between the dNTPs is captivated by a CCD camera and recorded by the computer .
The fluorescent molecule of each nucleotide incorporated needs to be cleaved before continuing with the second cycle, due to its reversible terminator chemistry, that will not allow further nucleotides to be added on the extending sequence. Once the light signal of the incorporated molecule is emitted and received by the camera, the dye is removed and the second cycle is ready to begin after an additional washing step. The length of the read depends on the number of sequencing cycles that are pre-determined by the user .
Illumina instruments are shortread sequencers in opposition to those of Roche/454. They provide read lengths of as low as 25 bp until up to 300 bp, with many intermediate options. The MiSeq and the most recent MiniSeq bench top solution both offer an option of 44–50 M Paired End (PE) reads, more than enough for HLA typing of many samples in the same run, and competent to the GS machines concerning runtime (13–24 h) (Goodwin_2016). PE reads denoted that two distinct sequencing reads are performed, one from each end of the template DNA fragments.
CRT sequencing method overcomes the disadvantages of SNA, by only incorporating a single nucleotide at a time, however as the sequencing reaction proceeds, the error rate of the machine increases. This is due to incomplete removal of the continuous fluorescence signals, which lead to higher background noise levels. Sequencing errors accumulate toward the read end, thus longer reads, that can be trimmed, are preferred compared to shorter ones. Longer reads also prevail due to more precise mapping on the reference genome.
The chemistry of Illumina analyzers is also more prominent to substitution errors, rather than InDel errors, especially when the previously incorporated nucleotide is guanine (G).
126.96.36.199. Thermo-Fischer (Ion PGM)
The Thermo-Fischer Ion instruments acquire a pH-mediated sequencing detection method. The sequencing reaction is the previously described sequencing-by-synthesis SNA approach, but the detection of the incorporated nucleotides is substantially different.
The addition of a new dNTP on the extending DNA strand involves the formation of a covalent bond and the release of pyrophosphate and a positively charged hydrogen ion (proton). The shift in the pH level is detected by an ion-sensitive layer with a sensor on the bottom of the microwells of a semiconductor chip, where sequencing takes place.
There are different sequencing chips with increasing number of wells allowing for different strategies to be applied. The read length ranges from 200 to 400 bp sequenced, and depending on the chip as low as 0.4 M and as high as 5.5 M reads can be exported from a PGM run.
The breakthrough of this technique regards the non-need for optic devices which contribute to increased error calling, lower speed, higher cost, and larger instrument size. Also, the employment of unmodified nucleotides circumvents potential biases arising from their incorporation. Another positive characteristic is the runtime ranging from 2 to7 h that outweighs other competitors.
However, the same drawbacks of SNA sequencing-by-synthesis method that were addressed previously, also apply here. These constitute higher InDel errors and difficulties during homopolymer region (>6 bp) sequencing [40, 43, 46].
5.2. The third generation of sequencers (single-molecule sequencing)
While the development and optimization of second-generation sequencers is still ongoing, the third generation, that analyzes single-molecule templates, without the need for DNA pre-amplification, is already on the field.
They promise even lower cost-per-base, easier sample preparation from less amount of starting gDNA material, significantly faster run times, simplified primary data analysis and longer read lengths (hundreds of base pairs and more).
Longer reads simplify sequence assembly and facilitate polymorphism analysis and complete haplotype phasing, both especially important for accurate HLA typing and clarification of phase ambiguities .
Also, no need for PCR step overrides any potential biases rising from AT-rich and GC-rich target sequences, avoids incorporation of additional nonexisting variants due to PCR amplification errors and reduces template preparation time [38, 43].
We distinguish two platforms among them. These are Pacific Biosciences (RS II) and Oxford Nanopore (MinION) .
5.2.1. Pacific biosciences (RS II)—single-molecule real-time (SMRT) sequencing
The Pacific Biosciences (PacBio) instruments utilize a flowcell with many individual transparent bottom wells (zero-mode waveguide wells; ZMW), each holding 20 zeptoliters (10−21 L).
SMRT technology uses short single-stranded (ss) hairpin adaptors (SMRTbell adaptors) that ligate on the ends of the DNA fragments. This results in ssDNA regions at the ends, and double-stranded DNA (dsDNA) regions in the middle of the fragments.
Size-selection follows in order to retain sequences of preferred length from as low as less than 3 kb, up to around 20 kb, according to the purpose of the experiment.
A unique phi29 DNA polymerase molecule anchored to the bottom of each well binds a single DNA molecule and starts copying it. The labeled dNTPs incorporate one at a time. Upon binding the fluorophore emits light visualized with a laser and a camera, then the dye is cleaved and the polymerase may incorporate the next labeled dNTP. Each color change at every single one of the wells captured by the camera corresponds to a different dNTP added to the amplifying sequence.
Generally, the runtime and throughput of the instrument can be tuned by the user. Longer templates require longer times in order to extract consistent results.
The method of PacBio template preparation lasts 4–6 h, much less time compared to the one needed for completion of the corresponding procedure for second-generation sequencers. In addition, there is no need for a PCR step as previously described, resulting in reduced biases and errors. The turnover rate is also reduced, with runs of RSII instruments finishing within 4 h. The average read length is 10–15 Kb (20,000 bp), longer than any other second-generation sequencer [43, 46].
The main drawback of this technique is the high error rate, due to the short interphase interval between two nucleotide incorporation events. Most errors appear as stochastic events and are not biased anyhow, thus repeated circular sequencing of each nucleotide many times, results in higher coverage and improved accuracy, up to 99% .
5.2.2. Oxford Nanopore (MinION)
MinION is a third-generation sequencer from Oxford Nanopore that uses a tiny bio-pore of nanoscale in diameter, with an attached exonuclease.
The fragments of dsDNA are primed with two adapters, a leader and a hairpin, one at each end. The hairpin adapter holds the two strands together in ssDNA conformation, while the leader adapter directs the DNA through the exonuclease which cleaves each base and guides it via the pore.
The concept is that the ssDNA molecule that passes through the α-hemolysin pore (αHL), disrupts the continuous ionic flow applied along αHL. The disruption is detected by standard electrophysiological techniques. The current modulation differs for each of the nucleotides that goes through the pore, a property that assists in discriminating them. Ionic current is resumed after each trapped nucleotide squeezes out of the pore.
This type of sequencing needs no polymerase enzyme, there is no need for DNA polymerization and incorporation of nucleotides, no need for pH alteration detection. As all it needs is just a molecule of ssDNA and two suitable adapters, one at each end, that help guide it through the exonuclease and the pore, the cost of sequencing is substantially lowered.
Also, this way of sequencing is fluorescent-tag free, along with the pros that were described before, concerning Thermo-Scietific’s Ion technology, although these two differ in concept. Also, the avoidance of using enzymes like polymerase constitutes Nanopore sequencing more reliable as it is less sensitive to temperature alterations during sequencing.
A drawback of this technique is the large error rate of up to 30%, mainly for InDel detection, due to the ability of the technique to detect more than 1000 different signals originating from variation in the nucleotides coming through the pore, especially when modified bases present on native DNA are taken into account. Also homopolymers are difficult to recognize due to the same feature of this technique [43, 46].
Usually, only exons 2 and 3 of HLA class I genes and exon 2 of HLA class II genes are assessed for polymorphisms, while other parts of HLA genes are not, due to time and cost constraints. Many databases, such as dbMHC, AFND, IMGT/HLA, and others [12, 48, 49] keep a record of known HLA allelic sequences and track newly found polymorphisms. Most HLA allelic sequences are maintained there as CDS or partial exonic regions. This way, 8-digit HLA typing resolution, which is the ultimate goal, cannot be achieved.
To address this issue, complete genomic sequence analysis of all HLA exons, along with other important HLA regulatory sequences and also other significant genes, must be performed.
The high-resolution HLA typing of NGS is advantageous compared with the existing PCR/SSO, PCR/SSP, and PCR/SBT techniques, but infers limitations that seem impossible to overcome, the most important being their incapability of sequencing long enough fragments, in order to confront without ambiguity the allele phasing issue .
This is important because HLA is a highly polymorphic region, therefore it is quite difficult to determine which variants are associated with the final phenotype, the latest resulting from the complete end-to-end haplotype, meaning all variants across the MHC and also other genomic loci [16, 40].
The above issues, of HLA ambiguity and phasing multiple short-read fragments, are resolved with third-generation sequencers able to analyze long-reads that cover entire intronic-exonic regions of whole genes. Further optimization and gradually reducing the cost and error rates of these sequencers will establish their dominance in the field of HLA typing.