Tracking Salmonella Enteritidis in the Genomics Era: Clade Definition Using a SNP-PCR Assay and Implications for Population Structure

Salmonella enterica serovar Enteritidis (or Salmonella Enteritidis, SE) is one of the oldest members of the genus Salmonella, based on the date of first description and has only gained prominence as a significant bacterial contaminant of food over the last three or four decades. Currently, SE is the most common Salmonella serovar causing foodborne illnesses. Control measures to alleviate human infections require that food isolates be characterized and this was until recently carried out using Pulsed-Field Gel Electrophoresis (PFGE) and phage typing as the main laboratory subtyping tools for use in demonstrating relatedness of isolates recovered from infected humans and the food source. The results provided by these analytical tools were presented with easy-to-understand and comprehensible nomenclature, however, the techniques were inherently poorly discriminatory, which is attributable to the clonality of SE. The tools have now given way to whole genome sequencing which provides a full and comprehensive genetic attributes of an organism and a very attractive and superior tool for defining an isolate and for inferring genetic relatedness among isolates. A comparative phylogenomic analysis of isolates of choice provides both a visual appreciation of relatedness as well as quantifiable estimates of genetic distance. Despite the considerable information provided by whole genome analysis and development of a phylogenetic tree, the approach does not lend itself to generating a useful nomenclature-based description of SE subtypes. To this end, a highly discriminatory, cost-effective, high throughput, validated single nucleotide based genotypic polymerase chain reaction assay (SNP-PCR) was developed focussing on 60 polymorphic loci. The procedure was used to identify 25 circulating clades of SE, the largest number so far described for this organism. The new subtyping test, which exploited whole genome sequencing data, displays the attributes of an ideal subtyping test: high discrimination, low cost, rapid, highly reproducible and epidemiological concordance. The procedure is useful for identifying the subtype designation of an isolate, for defining the population structure of the organism as well as for surveillance and outbreak detection. Salmonella spp. A Global Challenge 2


Introduction
The genus Salmonella contains a large number of Gram-negative bacteria primarily found in the gastrointenstinal tract of vertebrate organisms including humans, cattle, pigs, horses, companion animals, avian, reptiles and fish [1]. There are two species of Salmonella, namely Salmonella enterica and S. bongori [2]. Salmonella enterica is the species of relevance in food safety, and consists of five subspecies of varying importance in human health. Salmonella enterica subspecies enterica has received the greatest attention because of its large number of constituent organisms, now estimated at about 2,600, each defined as a serovar based on the Kauffman-White classification [1]. Salmonella enterica serovar Enteritidis (commonly written as Salmonella Enteritidis or SE) is the most prominent. The organism was originally described as a distinct species and named as Salmonella enterica alongside two other species namely Salmonella choleraesuis and Salmonella typhi. Since those early days, the taxonomy of Salmonella has changed to reflect two species and hundreds of serovars. Curiously, a limited number of S. enterica serovars is associated with foodborne illnesses of which SE has emerged over the last few decades as the most prevalent cause of foodborne salmonellosis in humans worldwide [3]. However, this has not always been the case and prior to the 1970s there was only the occasional report of foodborne salmonellosis attributable to SE.
The earliest reports of foodborne illnesses caused by Salmonella were attributed to duck egg sources as summarized by Scott [4]. Subsequently, the organisms was found in live chicks, ducks and ducklings [5,6]. Although these early reports came from different countries, SE did not become a common cause of foodborne illnesses until the 1980s [7]. By 1994, SE was the most commonly reported Salmonella serotype, with an incidence of 110 laboratory-confirmed infections per 100,000 population in the Northeast of US, and shell eggs from hens were identified as the major vehicle for SE infection in humans [8], in contrast to the earlier reports incriminating duck eggs. A 2010 outbreak of egg-related SE infections in the US resulted in an estimated 1,939 illnesses and a recall of over 500 million eggs, which ranked as the largest egg recall in history and one of the most expensive food recalls ever [9]. Similar events occurred in other parts of the world and were severe enough to warrant a warning of a new pandemic [7]. Together with two other serovars namely, Typhimurium and Heidelberg, the three most common serovars alone account for 59% of Salmonella outbreaks in humans in Canada, while the 10 most commonly observed Salmonella serovars account for about 76% of the total Salmonella infections reported. Establishing epidemiological linkages between contaminated products and human disease for Salmonella serovars has been particularly difficult for a number of reasons. One of the historically important reasons has been the clonal nature of many of the dominant serovars, especially Enteritidis which makes discrimination of strains difficult and an attribution of a particular strain linked with illness to a food source particularly challenging.
One resource that has been used by researchers to study SE is the strain P125109 phage type 4 (PT4) which was isolated from an outbreak of human food poisoning in the United Kingdom, and traced back to a poultry farm. The strain is highly virulent in newly hatched chickens and is also invasive in laying hens, resulting in egg contamination [10,11]. The complete genome sequences of the host-promiscuous SE PT4 isolate P125109 was determined by Thomson et al. in 2008 [12].
Tracking Salmonella Enteritidis in the Genomics Era: Clade Definition Using a SNP-PCR Assay… DOI: http://dx.doi.org /10.5772/intechopen.98309 Next generation sequencing (NGS) and especially whole genome sequencing (WGS) has emerged in recent years and has made it possible to sequence bacterial genomes within hours, a remarkable feat that is revolutionizing the field of microbiology. With the advent of microbial WGS, new light is shed on the nature of pathogens and our understanding of the biology of Salmonella is steadily increasing as Salmonella genomes are generated increasingly at a rapid rate and are deposited in public databases. Further understanding of genome diversity and variation of bacterial pathogens has the potential to improve quantitative risk assessment and assess the evolution of Salmonella, relationship among strains and serovars, emergence of new strains and the role of mobile genetic elements especially plasmids and bacteriophages in Salmonella [13]. The recent development of the Salmonella SystOmics database (SalFoS https://salfos.ibis.ulaval.ca/), a rich collection of over 3000 Salmonella genomes and their metadata represents a milestone and an important resource for future approaches to mitigate the burden of foodborne salmonellosis [14].
Food safety which is significantly impacted by Salmonella has gained from the advent of microbial genomics. Subspecies characterization including serovar identification and strain differentiation can now be done using genomics approach. As will soon be evident to the reader, there is much work yet to be done as the new capacity is yet to translate to tangible benefits to the consumer. Outbreaks caused by SE have remained at a high level or even increasing and there is a need to evaluate the efficacy of procedures used to detect the organism in food as well as approaches used in tracking the organism through the entire spectrum of the food chain, from farm to fork.

Culture procedures for Salmonella
Culture-based methods are commonly employed to detect pathogens in food, and in clinical and environmental samples. The Compendium of Analytical Methods (https://www.canada.ca/en/health-canada/services/food-nutrition/ research-programs-analytical-methods/analytical-methods/compendium-methods. html) and the Bacteriological Analytical Manual (https://www.fda.gov/food/laboratory-methods-food/bacteriological-analytical-manual-bam) are compilations of laboratory procedures developed by the food safety regulatory agencies in Canada and the United States, respectively and each contains a catalog of official and recommended methods for isolating and detecting Salmonella. Briefly, Salmonella detection in food relies on a series of culture steps in broth formulations optimized to resuscitate Salmonella following injury caused by food handling, processing and storage and to reduce the abundance of competing bacteria [15]. In many enrichment protocols, broth and culture plates have been described for the isolation of Salmonella in different types of samples and matrices [16][17][18]. Typically, the first step is to culture a suspect food sample in a non-selective pre-enrichment broth, examples of which are lactose broth, buffered peptone water, trypticase soy, brilliant green water, powdered milk with brilliant green and universal pre-enrichment [16]. Following an overnight incubation commonly performed at 37°C, the culture material is subsequently transferred into a selective enrichment broth which suppresses and inhibits the growth of non-salmonellae while expanding the Salmonella population, facilitating isolation by plating on the appropriate media plates [19,20]. Tetrathionate (TT) and Rappaport-Vassiliadis (RV) broths and RV semi-solid medium are the most commonly used selective culture conditions, performed at 37° or 42°C overnight for several days [15,19].
When used to detect the presence of a microorganism in a food sample, laboratory culture procedures are slow and time consuming, requiring the sequential use of non-selective and selective enrichment broths and could take a week or longer. Another disadvantage is the documented inherent bias in the performance of selective broths which results in the preferential recovery of certain Salmonella serovars and not others [17,21,22]. For instance, different Salmonella serotypes are recovered by culture procedures performed on non-clinical, non-human sources when compared to samples tested in hospitals and other clinical settings from patients experiencing symptoms. Experimental results show that members of some Salmonella serogroups are unable to effectively compete with other serovars leading to a reduced efficiency of recovery of some Salmonella organisms including SE, from contaminated food [21]. The use of culture-independent procedures that can lead to rapid and sensitive detection of Salmonella [23] may in time eclipse the routine use of culture methods for detection. Nevertheless, the recovery of Salmonella in food is currently required to establish risk to the consumer and in support of a regulatory action. For this reason, and for the purpose of building inventories of microbial organisms for clinical and regulatory food microbiology, culture procedures are expected to remain in use. A wide variety of selective plating media are available for the isolation of Salmonella and a number of them will now be examined.

Xylose lysine desoxycholate (XLD) agar
XLD agar is a selective growth medium originally shown to facilitate the isolation of Shigella but was demonstrably useful for Salmonella isolation and has been further modified since its first description [24,25]. At pH 7.4, the XLD agar appears bright pink or red as a result of the phenol red indicator. Salmonella ferments xylose, a sugar molecule, to produce acid and the bacterial colony turns yellow. In time, xylose is consumed and lysine is in turn utilized which upon decarboxylation produces an acidic environment and colonies turn back to red. In contrast, Shigella cannot ferment xylose and the colony remains red. Salmonella is able to metabolize thiosulfate to produce hydrogen sulphide, leading to the formation of colonies with black centres, which is an important feature in differentiating Salmonella colonies from Shigella. XLD agar is capable of supporting other members of Enterobacteriaceae such as Escherichia coli however the colonies and media turns yellow because of the fermentation of lactose which is also present in the agar. Pseudomonas aeruginosa is also able to grow on XLD plates as pink, flat, rough colonies but will not metabolize thiosulfate nor turn black. Proteus organisms can grow on XLD to give rose colored colonies and can sometimes metabolize thiosulfate to render the colonies black which will be readily confused with Salmonella. In addition, Salmonella strains have been described that do not metabolize thiosulfate and will grow as pink colonies which will be readily confused with Shigella. Thus, XLD agar is a moderately selective medium for isolating Salmonella and for differentiating it from other organisms.

Xylose lysine Tergitol-4 (XLT-4) agar
Similar to XLD agar, XLT-4 agar is also a selective culture medium which is used to isolate and identify Salmonella in food and environmental samples. Compared to XLD agar, XLT-4 is supplemented with a surfactant, 7-ethyl-2-methyl-4-undecanol hydrogen sulfate commonly referred to as Tergitol 4 while lacking sodium chloride and sodium desoxycholate. The surfactant is responsible for the inhibition of Proteus spp. and other non-salmonellae. XLT-4 agar is Tracking Salmonella Enteritidis in the Genomics Era: Clade Definition Using a SNP-PCR Assay… DOI: http://dx.doi.org /10.5772/intechopen.98309 clearly one of the most stringent of all selective culture plates used for isolating Salmonella with positive colonies growing up as red and eventually turning black starting from the centre as a result of hydrogen sulfide production. However, Salmonella strains that fail to produce hydrogen sulfide appear as yellow colonies on XLT-4 agar [26,27].

XA medium -modified XLD agar by adding D-arabinose
XA medium is an improved selective and differential medium over XLD agar following its supplementation with arabinose, a sugar that is fermented by Citrobacter and Proteus but not by Salmonella [28]. The sensitivity of isolation of Salmonella using the XA and XLD media are equally high, however, the specificity of XA medium (92.0%) is superior to that of XLD (73.0%) [28]. Many Salmonella organisms appear as black colonies on XA agar whereas non-salmonellae will either not grow or appear as pink colonies. The use of arabinose to differentiate Salmonella from other closely related organisms represents a cost-effective approach, especially when compared to chromogenic plates (see Section 2.1.7).

Hektoen enteric (HE) agar
HE agar is a selective and differential medium for isolating and distinguishing members of the genera of Salmonella and Shigella from the other Enterobacteriaceae. HE agar has a blue appearance and contains indicators of lactose fermentation and hydrogen sulfide production while inhibiting the growth of Gram-positive bacteria. Species belonging to Enterobacteriaceae that are capable of fermenting one or more carbohydrates produces yellow or salmon-orange colored colonies, e.g., Klebsiella pneumonia which ferments lactose. Non-fermenters produce bluegreen colonies. Organisms that reduce sulfur to hydrogen sulfide such as Salmonella will produce black colonies or blue-green colonies with a black center. In contrast, colonies of Shigella remain green and do not turn black because of inability to metabolize sulfur.

MacConkey agar
MacConkey agar is used for the isolation of Gram-negative enteric bacteria which represents a large group of bacteria prominent among which includes Salmonella, E. coli, Proteus, Citrobacter, Klebsiella, Pseudomonas, Shigella, Enterobacter and Yersinia. These organisms grow on the agar because of the selective property conferred by crystal violet and bile salts to inhibit the growth of Grampositive bacteria. The indicator system is the neutral red dye which turns red at a pH below 6.8 but is colorless at higher pH. Thus, lactose fermenters such as E. coli, Klebsiella and Enterobacter which contain the lac operon form red or pink colonies on McConkey agar. In contrast, the other organisms including Salmonella which are generally non-lactose fermenters do not change color. Because Salmonella produce colonies similar to other non-lactose fermenters on MacConkey, the medium does not allow for identification of Salmonella, an objective that has to be achieved by employing other more selective agars. At the same time, lactose fermenting Salmonella have historically been shown to be causes of severe infections and outbreaks in humans [29] which is attributable to the presence of the lac operon carried in the chromosome or on plasmids [30] and leading to colonies that appear pink or reddish on MacConkey agar. Despite its limitations, the MacConkey agar can still be a very useful addition to the collection of media needed to comprehensively isolate and identify Salmonella in contaminated samples.

Brilliant green sulfa (BGS) agar
The selectivity of the BGS agar is due to the presence of brilliant green and sulfadiazine, two components that individually inhibits Gram-positive and most Gram-negative bacilli. Phenol red is the pH indicator that detects changes in pH due to the fermentation of sucrose and/or lactose. Salmonella colonies range from reddish or pink to nearly white in color with a red zone. Lactose or sucrose fermenters occasionally grow on this medium and appear as yellow-green colonies surrounded by a yellow-green zone. The presence of sulfadiazine in the media is effective in inhibiting the growth of E. coli and Proteus and to a large extent Shigella species [31]. In a latter modification of the BGS agar, the replacement of lactose with glucose and of sulfadiazine with novobiocin to create the novobiocin-brilliant green agar (NBG), led to a higher recovery of Salmonella but the medium could not differentiate it from hydrogen sulfide-positive Citrobacter organism [32].

Salmonella chromogenic agar
Chromogenic plates have been developed for Salmonella as an improved alternative to procedures that rely on the ability of the organism to produce hydrogen sulfide or their inability to ferment lactose, attributes that are not fully diagnostic of Salmonella. This often result in Citrobacter and Proteus species being mistakenly identified as Salmonella while some atypical Salmonella are missed entirely, using agar plates described above. There are a number of commercially available chromogenic culture media which incorporate different chromogenic substrates and result in different colors of Salmonella colonies. Using the Salmonella chromogenic agar marketed by Oxoid (United Kingdom) as an example, the medium contains the substrate, Magenta-cap (5-bromo-6-chloro-3-indolylcaprylate) which is hydrolyzed by Salmonella species to give magenta colonies. The second substrate, X-Gal (5-bromo-4-chloro-3-indolyl-D-galactopyranoside), is hydrolyzed by many non-Salmonella species including Citrobacter and Proteus to give blue colonies [33,34]. The selection for Salmonella is further enhanced by the presence of bile salts which inhibit Grampositive bacteria, and of two antibiotics namely, novobiocin and cefsulodin which inhibit Proteus and Pseudomonas, respectively.
The isolation of Salmonella colonies in contaminated food demonstrates the presence of live organisms that can potentially cause harm. As indicated above, the procedure requires a combination of culture conditions, and takes time. Molecular procedures that can rapidly detect Salmonella are often used to accelerate the process, to improve on sensitivity of detection and also to confirm colonies as Salmonella because of the challenges with the isolation of the bacteria as outlined above. Many molecular techniques are now available for serotype-specific identification of SE.

Identification of Salmonella Enteritidis
Many laboratory diagnostic platforms have been applied to detect and identify Salmonella contamination in food and these include the PCR, enzyme-linked immunosorbent assay and the lateral flow assay [35][36][37]. Examples are available as commercial products. Currently, the most popular platform is the PCR and the most frequently used gene target is the invA gene. Nevertheless, many commercial offers do not disclose their target for proprietary reasons. PCR assays have also been developed with other gene targets present either in the chromosome, e.g., flagellin [38], OriC [39] hilA [40], ttr [41] or on plasmids, e.g., SpvR Tracking Salmonella Enteritidis in the Genomics Era: Clade Definition Using a SNP-PCR Assay… DOI: http://dx.doi.org/10.5772/intechopen.98309 operon [42]. Multiplex PCR assays that are able to detect and distinguish among multiple serovars have also been developed by including serovar-specific gene targets such as STM4449 (Typhimurium [43]), STM 4497 (Typhimurium [44], fliC (Typhimurium [45]), sdfI (Enteritidis [46]) and sefA [29]. Recent work by Nadin-Davis and colleagues showed that many of the previously identified serovar specific markers were shared by other serovars especially sefA and fliC while highlighting the limitation with the use of a plasmid encoded target [47].
A multiplex PCR method which is capable of detecting all Salmonella spp., while identifying and distinguishing SE from the other two most prevalent serovars namely Typhimurium [48] and Heidelberg (Ogunremi et al., unpublished) is now available. The PCR was designed to amplify DNA fragments from four Salmonella genes, namely, invA gene (211-bp fragment), iroB gene (309-bp fragment), Typhimurium STM 4497 (523-bp fragment), and Enteritidis SE147228 (612-bp fragment) and has lately incorporated a 124-bp Heidelberg-specific fragment.
The identification of members of genus Salmonella to the subspecies level i.e., serovar is pivotal in tracking these pathogens along the food chain and the above molecular methods are very promising replacements to replace the traditional biochemical tests because of ease of application and high specificity for identifying SE and the other serotypes.

Serotyping
Serotyping has consistently been the basis of public health surveillance of Salmonella and has retained this primary role, as a first-line typing method, in the era of WGS based on the development of novel bioinformatics tools (see Section 3.3). Serotypes of Salmonella are defined by the presence of two types of antigens, namely, a heat stable, somatic O antigen, a component of the lipopolysaccharide envelope covering the organism which is an important virulence factor, and the H antigen which is present on the flagella of the organism [49]. The antigenic properties of the O antigen are depicted as numerals, e.g., 1,9,12 for SE. In contrast, the H antigens are described using one or a few letters for the phase I antigen (e.g., g, m for SE) or as a combination of letters and numbers for antigens that are expressed should the flagella bear a phase II antigen (e.g., r and 1, 2 for Heidelberg). Agglutination assays are performed on the organisms using antibodies that are able to recognize specific antigenic molecules developed through laborious crossabsorption process against other serovars [50]. The result is an elaborate classification scheme, developed by Kauffman and White [51,52] and which has now led to the identification of some 2,600 serotypes of Salmonella. The complexity has been further enhanced by the ability of plasmids and prophages to alter the expression of some of the antigens, and this had led to a frequent re-evaluation of some serovar designations. Fortunately, these alterations are fairly rare and the serotyping scheme has served well since first proposed by Schüte in 1920 [53]. Of the large number of Salmonella serovars identified so far, only a relatively small numbers, perhaps no more than 100 serovars are commonly associated with foodborne illnesses [54,55].

Traditional subtyping procedures for Salmonella Enteritidis
There are two approaches for the subspecies characterization of SE. Phenotypic tests rely on the biochemical properties of the live organism and the most prominent example is phage typing. More recently, DNA based approaches or genotypic tests have dominated the field. The most widely used genotypic test being the Pulsed-Field Gel Electrophoresis. Whole genome sequencing of the DNA of SE, has over the last few years, become the dominant subtyping method in the developed world.

Pulsed-field gel electrophoresis (PFGE)
The PFGE can been used to characterize bacteria isolates based on the pattern of distribution of restriction enzyme sites present in the organism's DNA. For Salmonella, the electrophoretic mobility of DNA fragments digested by the restriction enzyme XbaI or BlnI produces a characteristic fingerprinting pattern that is used to subtype the isolate. During the period between 2009 and 2019, the Canadian Food Inspection Agency used the PFGE for outbreak investigations as one of the two subtyping tests for SE, the other being the phage type. Despite the presence of hundreds of different PFGE types among field isolates of SE only two PFGE types predominated and each consisted of thousands of isolates in the Canadian PulseNet database. The two commonest Canadian primary PFGE types, namely SEN.XAI 0003 and SEN.XAI 0006, were responsible for 33.8 and 19.2% of Canadian SE isolates documented in the PulseNet database between 2012 and 2017 (Ogunremi, Allain and Nadon, unpublished). The predominance of only a few PFGE SE types was long recognized as a consequence of the poor discriminatory ability of the technique for analyzing the relatedness of SE isolates ( Table 1) rather than a reflection of an evolutionary dominance of a few circulating strains [56]. These observations led to the pursuance of WGS as an alternative approach [57].

Phage typing
In contrast to the PFGE, phage typing is a phenotypic test that exploits the ability of certain bacteriophages, i.e., viruses that infect bacteria, to differentially attach and gain entrance into strains of bacteria. Phage typing of SE is the outcome of the pattern of susceptibility of different strains to a bacteriophage or a combination of bacteriophages, resulting in lysis of the bacterial cell [58]. A large number of phage types of SE have been described in Canada and elsewhere, however phage types 8, 13 and 13a were observed to predominate in Canada [59]. This observation may not reflect the presence of a few, circulating dominant strains of SE in Canada, but instead may be a consequence of the inadequacy of phage typing as a discriminatory tool that can accurately delineate the population structure of SE in Canada, similar to the PFGE as discussed above (see Section 3.2.2 and Table 1). The plasticity of phage types also diminishes its use as a subtyping tool. Factors such as the restriction system within the bacteria, ability of lipopolysaccharides and outer membranes to adsorb the bacteriophage, and the immune system of the vertebrate host infected by the bacteria can alter the phage type of an organism [60]. The reagents used for phage typing require very rigorous quality control and yet, test performance can be remarkably different among laboratories [61]. Changes occurring within an organism such as the acquisition or loss of IncN plasmid [62,63], transfer of IncX plasmid [64] or loss of the lipopolysaccharide layer [65] have been shown to lead to poor test reproducibility. Thus, two isolates with the same phage type may in fact be unrelated and conversely, two isolates that show distinct phage types may be closely related. As a result of these factors, phage typing shows inadequate discriminatory power, partial typeability and poor reproducibility [66].

Multiple locus variable-number tandem repeat analysis (MLVA) assay
MLVA is a molecular typing method that is based on PCR amplification of polymorphic regions of the DNA containing variable numbers of tandemly repeated sequences. The method has been standardized by PulseNet International and applied to the epidemiological investigations of SE either as a supplement or substitute for PFGE subtyping [67,68]. An advantage of the MLVA is the designation of the typing results with a numeric sequence of tandem repeats. This represents a simple, easy-to-understand nomenclature which facilitated the reporting and exchange of test results between laboratories, and translated to a reliable tracking of an organism during epidemiological investigations. The discriminative ability of the MLVA has been variously shown to be superior [69], equivalent [70] or poorer than the PFGE [71].
Detailed genetic studies of SE have consistently shown the underlying causes of the poor discriminatory abilities of available subtyping tools, namely: isolates of SE are extremely similar (i.e., are highly clonal) and this poses a difficulty in finding a definitive, distinguishing trait that could be used to track lineages [70,72,73]. The timely arrival and increasing adoption of WGS has altered the analytical landscape.

Application of whole genome sequencing (WGS) in Salmonella Enteritidis: identification and characterization
The development of WGS procedure has heralded the application of a powerful technology for the identification and characterization of SE [57] which has been used for outbreak investigations [74], trace back procedures [75] and surveillance [76]. Furthermore, WGS analysis of SE has provided insights into phylogenetic relatedness of isolates, presence and prevalence antimicrobial resistance genes, novel mobile elements, virulence markers and bacteriophages in strains of the organism isolated from humans, food animals, production facilities and environmental sources [77][78][79]. Relevant to developing long term control and intervention strategies are the insights to be gained from the increasing application of WGS to the understanding of transmission dynamics of SE as was done in Chile to infer possible transmission of SE between gulls, poultry, and humans [80]. Bioinformatics approaches that allow useful information to be mined from genome sequences will now be discussed.

Whole genome-based serotyping
Serovar prediction can now be done on Salmonella isolates if the whole genome sequence is available by replacing the laborious agglutination assay (see Section 3.1) with an in silico analysis of the nucleotide sequence of the organism. Effectively, the traditional gold standard of traditional serology based on the Kauffmann-White Scheme has been replaced in the developed economies with in silico approaches [81]. Two of the mostly widely tools for this purpose are the Salmonella In Silico Typing Resource (SISTR) software and the SeqSero2 software [82,83].
SISTR is an open, web-based bioinformatics platform capable of rapid in silico analyses of minimally processed draft assemblies of Salmonella genomes to generate accurate serovar designations. A collection of markers previously developed for the various Salmonella serovars formed the basis of the new tool [84]. The performance of SISTR is enhanced by the integration of additional multilocus sequence typing tools (see Section 3.3.2) which as a separate platform has been suggested as a replacement for the use of serotypes to define taxonomic as well as evolutionary groups of Salmonella [55]. SeqSero, which was launched in 2015 was developed to Tracking Salmonella Enteritidis in the Genomics Era: Clade Definition Using a SNP-PCR Assay… DOI: http://dx.doi.org /10.5772/intechopen.98309 employ the use of the rfb cluster, fliC and flijB to categorize Salmonella according to serovar using draft genome assemblies [83]. A subsequent improvement of the software, released as SeqSero2 included addition of markers at the level of the genus, species, subspecies as well as certain serotypes. Furthermore, a kmer-based algorithm was included that ensured a genome can be analyzed and the result available within seconds [85].

Multilocus sequence typing
Multilocus sequence typing (MLST) evaluates the nucleotide sequences of multiple housekeeping genes of an organism as a means of establishing similarities or differences among isolates [86]. Based on the sequences, each housekeeping gene is assigned an allele which can be stringed together in a nomenclature that defines the organism. Although the MLST scheme was developed using the bacterium Neisseria meningitidis [86], the advantage of electronic portability of sequence data and ease of incorporation of additional genes found a good synergy in the advent of WGS and has gained application in food safety. This has birthed the widely used EnteroBase (https://enterobase.warwick.ac.uk/) [87], an integrated web-based platform that permits the upload and analysis of short read Illumina sequences. This has allowed the expansion of the MLST scheme which was based on the initial six housekeeping genes [86] to a series of flexible applications and expansions for Salmonella including seven genes (legacy MLST), 3002 genes identified as the core genome of Salmonella, to produce core genome MLST (cgMLST) and 21,065 orthologous genes detected in a set of 537 Salmonella genomes, regarded as whole genome MLST (wgMLST). Despite the adoption of the wgMLST by PulseNet International [88], an influential international body which overlooks regulatory subtyping procedures for foodborne bacteria, EnteroBase's Sequence Type, ST, of Salmonella became a widely adopted subtype descriptor for Salmonella. However, ST does not provide adequate resolution for epidemiological concordance and outbreak level discrimination [89], and in addressing the challenge EnteroBase has additionally provided the core genome ST, cgSTs, complemented with a newly described 11 levels of genetic resolution hierarchies or HierCC for Salmonella (Table 1) [87,90]. The result is a tool that appears to provide the needed resolution for strain differentiation in the context of disease outbreaks.

Single nucleotide polymorphism (SNP) pipelines
Single base substitutions represent one of the commonest variation in genomes and the resulting polymorphism can form the basis for the characterization of a microbe including SE. SNPs are detected as nucleotide changes at a specific location in a genome after aligning or comparing it to a designated reference genome. Bioinformatics pipelines have been developed to automate the aligning and identification of the variants. A number of SNP pipelines are in common use and will now be described. SNVPhyl which was developed at the Public Health Agency of Canada identifies high quality SNPs among a set of selected isolates and is useful for generating phylogenetic trees from these SNPs [91]. Public Health England developed SnapperDB, also a high-quality SNP pipeline which analyzes microbial genomes, evaluates genetic distances among the genomes and infers relatedness of strains [92]. Parsnp detects core genome SNP in bacterial genomes and with the aid of adjunct interactive tool Gingr can be used to display informative overviews for specific sub-clades and genomic regions [93]. The kSNP tool detects SNPs in the pan genome but is uniquely able to carry out comparisons among genomes without a requirement for genome alignment nor the use a reference genome [94].

Rationale for developing a new reliable, rapid, robust, cost-effective, epidemiologically concordant, easily implementable subtyping tool
A strategy aimed at developing a tool capable of differentiating lineages in the highly clonal S. Enteritidis lineages will likely require interrogating a significant amount of the bacterial DNA information. The opportunities provided by the massively parallel sequencing technology [95], which deduces the entire nucleotide sequence of an organism appeared at the onset to be the most viable option in charting a course to address the need. Use of genome sequence for taxonomy including strain differentiation could conceivably work well with strains showing significant genetic diversity, e.g., >5% differences among unrelated strains. However, this may be very difficult for a clonal organism such as SE where diversity between unrelated strains could be as little as 1% and the similar regions of the genome would have to be ignored before focusing on the dissimilar portions to demonstrate an accurate quantitative estimate of relatedness. This may explain the failure to use whole genome sequence to develop a reliable estimation of genetic distance by means of a phylogenetic tree for a group of SE isolates (Ogunremi et al., unpublished data) using a method shown to work for other bacteria [96].
Consequently, this led to an effort to develop, analyze and characterize the genomes of SE. During the early phase of this endeavor involving a select number of SE isolates from Canada, 669 SNPs were detected in the genome of SE [57]. Subsequent analysis of 135 SE genomes present in the GenBank in 2014 led to the identification of a total of 1440 SNPs providing a robust resource that was exploited for a SNP-based strain differentiation and clustering of foodborne SE isolates [57]. Thus, despite the universal acceptance of the usefulness of whole genome sequences for microbes, individual organisms such as the highly clonal SE may pose a unique challenge that might require a more focused analysis on carefully selected targets of the entire genome.

Single nucleotide polymorphism-polymerase chain reaction test
(SNP-PCR) as a new, nomenclature friendly procedure

History and development of Salmonella Enteritidis lineages/clades and SNP-PCR
The existing molecular methods investigate only very small portions or attributes of the entire bacterial genome. The PFGE, as an example, identifies enzyme restriction patterns in the genome whereas WGS-based procedures have available for analysis detailed information on the entire genome to exploit as a basis for comparison and discrimination. To that end, extremely small differences, such as single nucleotide polymorphisms (SNPs), can be identified and used for subtyping as long as these attributes are consistently preserved in a particular bacterial lineage. Notably, Allard and colleagues [97] carried out bioinformatics analysis of a total of 104 SE genomes belonging, for the most part, to the predominant PFGE pattern (JEGX01.0004). They described a total of 9 clades and found 366 genes that showed variation, i.e., presence or absence, in the SE genome. This observation complemented and expanded on an earlier study by another laboratory which showed that two isolates of SE with the same phage type, PT 13a, were differentiated by a relatively large number of loci, i.e., 250 SNPs [73]. Similarly, by using a specific reference genome, for instance SE strain P125109, the WGS-based sequence reads were mapped to the reference to find SNPs which were used to build maximum-likelihood phylogenetic trees.
Tracking Salmonella Enteritidis in the Genomics Era: Clade Definition Using a SNP-PCR Assay… DOI: http://dx.doi.org /10.5772/intechopen.98309 Another study involving 55 SE strains selected from clinical and environmental samples in Minnesota and Ohio from 2001 to 2014 showed the existence of only two major groups [98]. Furthermore, WGS based SNPs analysis of 675 SE isolates from 45 countries formed a global epidemic clade and two new clades that were found to be geographically restricted to distinct regions of Africa [99]. Using a closely related serovar -S. Gallinarum -as an outgroup, a maximum-likelihood phylogenetic tree was constructed based on the alignment of a total of 42,373 SNPs [99]. In addition, a SNP-based phylogenetic structure of 401 European SE isolates implicated outbreaks correlating with national and international egg distribution network [75].
Thus, genetic variation that could allow the development of a routine subtyping tool for tracking purposes is present and demonstrable within the SE genome but was apparently not fully exploited given the few number of subgroupings in each of the reported, sampled populations, and this presented a need to properly mine the SE genome and develop a very discriminatory subtyping procedure. In exploring this need, our hypothesis was that the use of a large number of SNPs may not necessarily improve the power of discrimination. More is not necessarily better. A large number of uninformative loci may be counterproductive and undesirable for strain differentiation. As a first step to address this need, whole-genome sequences of 11 SE isolates obtained in Canada were developed and compared to SE P125109 reference strain phage type 4 which led to the identification of 1361 loci where the SE genome showed SNP [100]. Subsequent selection of 60 SNPs spread throughout the genome and distributed among different gene types and in intergenic locations led to the development of a rapid, inexpensive fluorescence-based real time PCR subtyping assay [55].

The SNP-PCR subtyping procedure
The SNP-PCR genotype assay is an allele-specific, single amplification procedure based on the specific binding of one of two, competing forward primers, 18-20 nucleotides long, which differ by one single nucleotide at the locus of interest. The use of a single reverse primer completes the amplification process leading to the accumulation of an amplicon bearing the SNP of interest. Each primer is designed with a specific tail that allows a complementary binding with a commercially provided, customized sequence labeled with a fluorescent dye, FAM or HEX for allele 1 or 2 respectively (LGC Genomics, Beverly, MA). Thus, the first cycle of amplification ensures that the specific forward oligonucleotide present in the primer mix binds to the sequence containing the SNP and excludes the other primer. The reverse primer, also 18-20 nucleotides long, binds and elongates the fragment during amplification ensuring that the tail sequence is present, which then allows the accumulating fragment to contain either the FAM or HEX fluorescent label depending on the initial binding of one of the bi-allelic primers, which is dictated by which of the SNP corresponds to allele 1 or allele 2. Thus, detection is based on the use of fluorescent labeled sequence that assigns the allele number to either of the two nucleotides that may occupy the SNP position. The SNP alleles are compiled for all SE strains at the 60 loci and used as input to carry out evolutionary history analyses using Maximum Parsimony method, which was conducted using Molecular Evolutionary Genetics Analysis on the MEGA-X computing platform [101]. The distinct grouping of the SE isolates are identified as clades and each given a specific numerical description starting from 1.
Following the development of the SNP-PCR procedure, our initial application of the assay to a group of 55 SE isolates obtained in Canada led to the recognition of 12 clades of SE [57].

Twenty five circulating clades of Salmonella Enteritidis
Recently, the laboratory validation of the SNP-PCR assay was completed using 1,127 SE isolates obtained from food, animal, humans, and environmental sources in Canada and Europe and we observed a total of 25 circulating clades of SE ( Table 1, Ogunremi et al., manuscript under preparation). In addition, 13 other globally distributed isolates identified from published papers [98,99] as well as the widely used reference SE strain P125109 phage type 4 were also included in a phylogenetic comparison using the Maximum Parsimony method. These strains were distributed across the generated phylogenetic tree and homed to distinct SE clades providing further validation of the SNP-PCR tool to appropriately cluster strains and at the same time, distinguish among different strains (Ogunremi et al., manuscript under preparation). The validation procedure unambiguously demonstrated the robustness of the assay while displaying its prowess in estimating genetic distances and relatedness among and between clades, and its relevance in constructing an evolutionary map of SE following the testing of a large number of isolates.

Advantages of SNP-PCR: nomenclature and population structure
Previous studies aimed at evaluating the population structure of the highly clonal SE have reported fewer lineages and clades among isolates tested. For instance, a study of 675 very diverse isolates collected over many decades  in 45 countries and 6 continents revealed the presence of only 3 clades; a subgroup of 58 isolates was identified but could not be clustered by the method used by the authors [97]. Yet another study demonstrated 9 clades among a large but PFGE-uniform group of isolates [99]. These studies, which showed a limited diversity among SE populations, served to underscore our contrasting observations, and reinforced the excellent discrimination observed for SE using the validated SNP-PCR assay. The SNP-PCR compares well with cgMLST-HierCC function in EnteroBase in discriminating among strains chosen to represent SE clades from a very diverse SE population from a variety of sources and different continents ( Table 1; Ogunremi et al., under preparation).
Apart from being a highly discriminatory and robust assay, the SNP-PCR is very cost-effective. Reagents cost are estimated at Can$0.25 per SNP per isolate and testing 60 SNPs is cheaper than the traditional, less discriminatory subtyping assays (Can$26 for phage typing and Can$36 for two-enzyme PFGE analysis in reagent costs) or for WGS (Can$100). The SNP-PCR validation procedure (described above) showed that only 17 SNP loci needed to be tested to assign an isolate to a clade and the test performed excellently well on crude, boiled bacterial extract, obviating the need for DNA purification and further creating an increased savings of reagents, labour and time.
Another important attribute of the SNP-PCR is its equal adaptability to few samples or a large number of samples. When compared to Illumina WGS which requires a prescribed number of samples per run (e.g., 20 Salmonella strains using MiSeq version 3 library kit over 600 cycle sequencing which runs for 65 hours), the SNP-PCR can be used to test one or a few samples with the appropriate controls without any cost implication on the volume of analysis. At the other end, a single PCR sample can handle a 384-well plate loaded with hundreds of samples and machine run completed in 2 hours. The labor costs of running the SE SNP-PCR test (2 h PCR time) and analyzing the results are at least an order of magnitude lower than those of any subtyping approach including traditional molecular tests or WGS.
The SNP-PCR test shows very good reproducibility (95%) in tests conducted in six laboratories.