Open access peer-reviewed chapter

# Tracking Salmonella Enteritidis in the Genomics Era: Clade Definition Using a SNP-PCR Assay and Implications for Population Structure

By Dele Ogunremi, Ruimin Gao, Rosemarie Slowey, Shu Chen, Olga Andrievskaia, Sadjia Bekal, Lawrence Goodridge and Roger C. Levesque

Submitted: November 6th 2020Reviewed: May 7th 2021Published: October 14th 2021

DOI: 10.5772/intechopen.98309

## Abstract

Salmonella enterica serovar Enteritidis (or Salmonella Enteritidis, SE) is one of the oldest members of the genus Salmonella, based on the date of first description and has only gained prominence as a significant bacterial contaminant of food over the last three or four decades. Currently, SE is the most common Salmonella serovar causing foodborne illnesses. Control measures to alleviate human infections require that food isolates be characterized and this was until recently carried out using Pulsed-Field Gel Electrophoresis (PFGE) and phage typing as the main laboratory subtyping tools for use in demonstrating relatedness of isolates recovered from infected humans and the food source. The results provided by these analytical tools were presented with easy-to-understand and comprehensible nomenclature, however, the techniques were inherently poorly discriminatory, which is attributable to the clonality of SE. The tools have now given way to whole genome sequencing which provides a full and comprehensive genetic attributes of an organism and a very attractive and superior tool for defining an isolate and for inferring genetic relatedness among isolates. A comparative phylogenomic analysis of isolates of choice provides both a visual appreciation of relatedness as well as quantifiable estimates of genetic distance. Despite the considerable information provided by whole genome analysis and development of a phylogenetic tree, the approach does not lend itself to generating a useful nomenclature-based description of SE subtypes. To this end, a highly discriminatory, cost-effective, high throughput, validated single nucleotide based genotypic polymerase chain reaction assay (SNP-PCR) was developed focussing on 60 polymorphic loci. The procedure was used to identify 25 circulating clades of SE, the largest number so far described for this organism. The new subtyping test, which exploited whole genome sequencing data, displays the attributes of an ideal subtyping test: high discrimination, low cost, rapid, highly reproducible and epidemiological concordance. The procedure is useful for identifying the subtype designation of an isolate, for defining the population structure of the organism as well as for surveillance and outbreak detection.

### Keywords

• Salmonella Enteritidis
• WGS
• SNP-PCR
• PFGE
• phage typing
• nomenclature
• population structure

## 1. Introduction

The genus Salmonellacontains a large number of Gram-negative bacteria primarily found in the gastrointenstinal tract of vertebrate organisms including humans, cattle, pigs, horses, companion animals, avian, reptiles and fish [1]. There are two species of Salmonella, namely Salmonella entericaand S. bongori[2]. Salmonella entericais the species of relevance in food safety, and consists of five subspecies of varying importance in human health. Salmonella entericasubspecies entericahas received the greatest attention because of its large number of constituent organisms, now estimated at about 2,600, each defined as a serovar based on the Kauffman-White classification [1]. Salmonella entericaserovar Enteritidis (commonly written as SalmonellaEnteritidis or SE) is the most prominent. The organism was originally described as a distinct species and named as Salmonella entericaalongside two other species namely Salmonella choleraesuisand Salmonella typhi. Since those early days, the taxonomy of Salmonellahas changed to reflect two species and hundreds of serovars. Curiously, a limited number of S. entericaserovars is associated with foodborne illnesses of which SE has emerged over the last few decades as the most prevalent cause of foodborne salmonellosis in humans worldwide [3]. However, this has not always been the case and prior to the 1970s there was only the occasional report of foodborne salmonellosis attributable to SE.

The earliest reports of foodborne illnesses caused by Salmonellawere attributed to duck egg sources as summarized by Scott [4]. Subsequently, the organisms was found in live chicks, ducks and ducklings [5, 6]. Although these early reports came from different countries, SE did not become a common cause of foodborne illnesses until the 1980s [7]. By 1994, SE was the most commonly reported Salmonellaserotype, with an incidence of 110 laboratory-confirmed infections per 100,000 population in the Northeast of US, and shell eggs from hens were identified as the major vehicle for SE infection in humans [8], in contrast to the earlier reports incriminating duck eggs. A 2010 outbreak of egg-related SE infections in the US resulted in an estimated 1,939 illnesses and a recall of over 500 million eggs, which ranked as the largest egg recall in history and one of the most expensive food recalls ever [9]. Similar events occurred in other parts of the world and were severe enough to warrant a warning of a new pandemic [7]. Together with two other serovars namely, Typhimurium and Heidelberg, the three most common serovars alone account for 59% of Salmonellaoutbreaks in humans in Canada, while the 10 most commonly observed Salmonellaserovars account for about 76% of the total Salmonellainfections reported. Establishing epidemiological linkages between contaminated products and human disease for Salmonellaserovars has been particularly difficult for a number of reasons. One of the historically important reasons has been the clonal nature of many of the dominant serovars, especially Enteritidis which makes discrimination of strains difficult and an attribution of a particular strain linked with illness to a food source particularly challenging.

One resource that has been used by researchers to study SE is the strain P125109 phage type 4 (PT4) which was isolated from an outbreak of human food poisoning in the United Kingdom, and traced back to a poultry farm. The strain is highly virulent in newly hatched chickens and is also invasive in laying hens, resulting in egg contamination [10, 11]. The complete genome sequences of the host-promiscuous SE PT4 isolate P125109 was determined by Thomson et al.in 2008 [12].

Next generation sequencing (NGS) and especially whole genome sequencing (WGS) has emerged in recent years and has made it possible to sequence bacterial genomes within hours, a remarkable feat that is revolutionizing the field of microbiology. With the advent of microbial WGS, new light is shed on the nature of pathogens and our understanding of the biology of Salmonellais steadily increasing as Salmonellagenomes are generated increasingly at a rapid rate and are deposited in public databases. Further understanding of genome diversity and variation of bacterial pathogens has the potential to improve quantitative risk assessment and assess the evolution of Salmonella, relationship among strains and serovars, emergence of new strains and the role of mobile genetic elements especially plasmids and bacteriophages in Salmonella[13]. The recent development of the SalmonellaSystOmics database (SalFoS https://salfos.ibis.ulaval.ca/), a rich collection of over 3000 Salmonellagenomes and their metadata represents a milestone and an important resource for future approaches to mitigate the burden of foodborne salmonellosis [14].

Food safety which is significantly impacted by Salmonellahas gained from the advent of microbial genomics. Subspecies characterization including serovar identification and strain differentiation can now be done using genomics approach. As will soon be evident to the reader, there is much work yet to be done as the new capacity is yet to translate to tangible benefits to the consumer. Outbreaks caused by SE have remained at a high level or even increasing and there is a need to evaluate the efficacy of procedures used to detect the organism in food as well as approaches used in tracking the organism through the entire spectrum of the food chain, from farm to fork.

## 2. Laboratory culture and identification of organism

### 2.1 Culture procedures for Salmonella

Culture-based methods are commonly employed to detect pathogens in food, and in clinical and environmental samples. The Compendium of Analytical Methods (https://www.canada.ca/en/health-canada/services/food-nutrition/research-programs-analytical-methods/analytical-methods/compendium-methods.html) and the Bacteriological Analytical Manual (https://www.fda.gov/food/laboratory-methods-food/bacteriological-analytical-manual-bam) are compilations of laboratory procedures developed by the food safety regulatory agencies in Canada and the United States, respectively and each contains a catalog of official and recommended methods for isolating and detecting Salmonella. Briefly, Salmonelladetection in food relies on a series of culture steps in broth formulations optimized to resuscitate Salmonellafollowing injury caused by food handling, processing and storage and to reduce the abundance of competing bacteria [15]. In many enrichment protocols, broth and culture plates have been described for the isolation of Salmonellain different types of samples and matrices [16, 17, 18]. Typically, the first step is to culture a suspect food sample in a non-selective pre-enrichment broth, examples of which are lactose broth, buffered peptone water, trypticase soy, brilliant green water, powdered milk with brilliant green and universal pre-enrichment [16]. Following an overnight incubation commonly performed at 37°C, the culture material is subsequently transferred into a selective enrichment broth which suppresses and inhibits the growth of non-salmonellae while expanding the Salmonellapopulation, facilitating isolation by plating on the appropriate media plates [1920]. Tetrathionate (TT) and Rappaport-Vassiliadis (RV) broths and RV semi-solid medium are the most commonly used selective culture conditions, performed at 37° or 42°C overnight for several days [15, 19].

When used to detect the presence of a microorganism in a food sample, laboratory culture procedures are slow and time consuming, requiring the sequential use of non-selective and selective enrichment broths and could take a week or longer. Another disadvantage is the documented inherent bias in the performance of selective broths which results in the preferential recovery of certain Salmonellaserovars and not others [17, 21, 22]. For instance, different Salmonellaserotypes are recovered by culture procedures performed on non-clinical, non-human sources when compared to samples tested in hospitals and other clinical settings from patients experiencing symptoms. Experimental results show that members of some Salmonellaserogroups are unable to effectively compete with other serovars leading to a reduced efficiency of recovery of some Salmonellaorganisms including SE, from contaminated food [21]. The use of culture-independent procedures that can lead to rapid and sensitive detection of Salmonella[23] may in time eclipse the routine use of culture methods for detection. Nevertheless, the recovery of Salmonellain food is currently required to establish risk to the consumer and in support of a regulatory action. For this reason, and for the purpose of building inventories of microbial organisms for clinical and regulatory food microbiology, culture procedures are expected to remain in use. A wide variety of selective plating media are available for the isolation of Salmonellaand a number of them will now be examined.

#### 2.1.1 Xylose lysine desoxycholate (XLD) agar

XLD agar is a selective growth medium originally shown to facilitate the isolation of Shigellabut was demonstrably useful for Salmonellaisolation and has been further modified since its first description [24, 25]. At pH 7.4, the XLD agar appears bright pink or red as a result of the phenol red indicator. Salmonellaferments xylose, a sugar molecule, to produce acid and the bacterial colony turns yellow. In time, xylose is consumed and lysine is in turn utilized which upon decarboxylation produces an acidic environment and colonies turn back to red. In contrast, Shigellacannot ferment xylose and the colony remains red. Salmonellais able to metabolize thiosulfate to produce hydrogen sulphide, leading to the formation of colonies with black centres, which is an important feature in differentiating Salmonellacolonies from Shigella. XLD agar is capable of supporting other members of Enterobacteriaceaesuch as Escherichia colihowever the colonies and media turns yellow because of the fermentation of lactose which is also present in the agar. Pseudomonas aeruginosais also able to grow on XLD plates as pink, flat, rough colonies but will not metabolize thiosulfate nor turn black. Proteusorganisms can grow on XLD to give rose colored colonies and can sometimes metabolize thiosulfate to render the colonies black which will be readily confused with Salmonella. In addition, Salmonellastrains have been described that do not metabolize thiosulfate and will grow as pink colonies which will be readily confused with Shigella.Thus, XLD agar is a moderately selective medium for isolating Salmonellaand for differentiating it from other organisms.

#### 2.1.2 Xylose lysine Tergitol-4 (XLT-4) agar

Similar to XLD agar, XLT-4 agar is also a selective culture medium which is used to isolate and identify Salmonellain food and environmental samples. Compared to XLD agar, XLT-4 is supplemented with a surfactant, 7-ethyl-2-methyl-4-undecanol hydrogen sulfate commonly referred to as Tergitol 4 while lacking sodium chloride and sodium desoxycholate. The surfactant is responsible for the inhibition of Proteusspp. and other non-salmonellae. XLT-4 agar is clearly one of the most stringent of all selective culture plates used for isolating Salmonellawith positive colonies growing up as red and eventually turning black starting from the centre as a result of hydrogen sulfide production. However, Salmonellastrains that fail to produce hydrogen sulfide appear as yellow colonies on XLT-4 agar [26, 27].

#### 2.1.3 XA medium - modified XLD agar by adding D-arabinose

XA medium is an improved selective and differential medium over XLD agar following its supplementation with arabinose, a sugar that is fermented by Citrobacterand Proteusbut not by Salmonella[28]. The sensitivity of isolation of Salmonellausing the XA and XLD media are equally high, however, the specificity of XA medium (92.0%) is superior to that of XLD (73.0%) [28]. Many Salmonellaorganisms appear as black colonies on XA agar whereas non-salmonellae will either not grow or appear as pink colonies. The use of arabinose to differentiate Salmonellafrom other closely related organisms represents a cost-effective approach, especially when compared to chromogenic plates (see Section 2.1.7).

#### 2.1.4 Hektoen enteric (HE) agar

HE agar is a selective and differential medium for isolating and distinguishing members of the genera of Salmonellaand Shigellafrom the other Enterobacteriaceae. HE agar has a blue appearance and contains indicators of lactose fermentation and hydrogen sulfide production while inhibiting the growth of Gram-positive bacteria. Species belonging to Enterobacteriaceae that are capable of fermenting one or more carbohydrates produces yellow or salmon-orange colored colonies, e.g., Klebsiella pneumoniawhich ferments lactose. Non-fermenters produce blue-green colonies. Organisms that reduce sulfur to hydrogen sulfide such as Salmonellawill produce black colonies or blue-green colonies with a black center. In contrast, colonies of Shigellaremain green and do not turn black because of inability to metabolize sulfur.

#### 2.1.5 MacConkey agar

MacConkey agar is used for the isolation of Gram-negative enteric bacteria which represents a large group of bacteria prominent among which includes Salmonella, E. coli, Proteus, Citrobacter, Klebsiella, Pseudomonas, Shigella, Enterobacterand Yersinia. These organisms grow on the agar because of the selective property conferred by crystal violet and bile salts to inhibit the growth of Gram-positive bacteria. The indicator system is the neutral red dye which turns red at a pH below 6.8 but is colorless at higher pH. Thus, lactose fermenters such as E. coli, Klebsiellaand Enterobacterwhich contain the lacoperon form red or pink colonies on McConkey agar. In contrast, the other organisms including Salmonellawhich are generally non-lactose fermenters do not change color. Because Salmonellaproduce colonies similar to other non-lactose fermenters on MacConkey, the medium does not allow for identification of Salmonella,an objective that has to be achieved by employing other more selective agars. At the same time, lactose fermenting Salmonellahave historically been shown to be causes of severe infections and outbreaks in humans [29] which is attributable to the presence of the lacoperon carried in the chromosome or on plasmids [30] and leading to colonies that appear pink or reddish on MacConkey agar. Despite its limitations, the MacConkey agar can still be a very useful addition to the collection of media needed to comprehensively isolate and identify Salmonellain contaminated samples.

#### 2.1.6 Brilliant green sulfa (BGS) agar

The selectivity of the BGS agar is due to the presence of brilliant green and sulfadiazine, two components that individually inhibits Gram-positive and most Gram-negative bacilli. Phenol red is the pH indicator that detects changes in pH due to the fermentation of sucrose and/or lactose. Salmonellacolonies range from reddish or pink to nearly white in color with a red zone. Lactose or sucrose fermenters occasionally grow on this medium and appear as yellow-green colonies surrounded by a yellow-green zone. The presence of sulfadiazine in the media is effective in inhibiting the growth of E. coliand Proteusand to a large extent Shigellaspecies [31]. In a latter modification of the BGS agar, the replacement of lactose with glucose and of sulfadiazine with novobiocin to create the novobiocin-brilliant green agar (NBG), led to a higher recovery of Salmonellabut the medium could not differentiate it from hydrogen sulfide-positive Citrobacterorganism [32].

#### 2.1.7 Salmonellachromogenic agar

Chromogenic plates have been developed for Salmonellaas an improved alternative to procedures that rely on the ability of the organism to produce hydrogen sulfide or their inability to ferment lactose, attributes that are not fully diagnostic of Salmonella. This often result in Citrobacterand Proteusspecies being mistakenly identified as Salmonellawhile some atypical Salmonellaare missed entirely, using agar plates described above. There are a number of commercially available chromogenic culture media which incorporate different chromogenic substrates and result in different colors of Salmonellacolonies. Using the Salmonellachromogenic agar marketed by Oxoid (United Kingdom) as an example, the medium contains the substrate, Magenta-cap (5-bromo-6-chloro-3-indolylcaprylate) which is hydrolyzed by Salmonellaspecies to give magenta colonies. The second substrate, X-Gal (5-bromo-4-chloro-3-indolyl-D-galactopyranoside), is hydrolyzed by many non-Salmonellaspecies including Citrobacterand Proteusto give blue colonies [33, 34]. The selection for Salmonellais further enhanced by the presence of bile salts which inhibit Gram-positive bacteria, and of two antibiotics namely, novobiocin and cefsulodin which inhibit Proteusand Pseudomonas, respectively.

The isolation of Salmonellacolonies in contaminated food demonstrates the presence of live organisms that can potentially cause harm. As indicated above, the procedure requires a combination of culture conditions, and takes time. Molecular procedures that can rapidly detect Salmonellaare often used to accelerate the process, to improve on sensitivity of detection and also to confirm colonies as Salmonellabecause of the challenges with the isolation of the bacteria as outlined above. Many molecular techniques are now available for serotype-specific identification of SE.

### 2.2 Identification of SalmonellaEnteritidis

Many laboratory diagnostic platforms have been applied to detect and identify Salmonellacontamination in food and these include the PCR, enzyme-linked immunosorbent assay and the lateral flow assay [35, 36, 37]. Examples are available as commercial products. Currently, the most popular platform is the PCR and the most frequently used gene target is the invAgene. Nevertheless, many commercial offers do not disclose their target for proprietary reasons. PCR assays have also been developed with other gene targets present either in the chromosome, e.g., flagellin[38], OriC[39] hilA[40], ttr[41] or on plasmids, e.g., SpvRoperon [42]. Multiplex PCR assays that are able to detect and distinguish among multiple serovars have also been developed by including serovar-specific gene targets such as STM4449 (Typhimurium [43]), STM 4497 (Typhimurium [44], fliC(Typhimurium [45]), sdfI(Enteritidis [46]) and sefA[29]. Recent work by Nadin-Davis and colleagues showed that many of the previously identified serovar specific markers were shared by other serovars especially sefAand fliCwhile highlighting the limitation with the use of a plasmid encoded target [47].

A multiplex PCR method which is capable of detecting all Salmonellaspp., while identifying and distinguishing SE from the other two most prevalent serovars namely Typhimurium [48] and Heidelberg (Ogunremi et al., unpublished) is now available. The PCR was designed to amplify DNA fragments from four Salmonellagenes, namely, invAgene (211-bp fragment), iroBgene (309-bp fragment), Typhimurium STM 4497(523-bp fragment), and Enteritidis SE147228(612-bp fragment) and has lately incorporated a 124-bp Heidelberg-specific fragment.

The identification of members of genus Salmonellato the subspecies level i.e., serovar is pivotal in tracking these pathogens along the food chain and the above molecular methods are very promising replacements to replace the traditional biochemical tests because of ease of application and high specificity for identifying SE and the other serotypes.

## 3. Typing of SalmonellaEnteritidis

### 3.1 Serotyping

Serotyping has consistently been the basis of public health surveillance of Salmonellaand has retained this primary role, as a first-line typing method, in the era of WGS based on the development of novel bioinformatics tools (see Section 3.3). Serotypes of Salmonellaare defined by the presence of two types of antigens, namely, a heat stable, somatic O antigen, a component of the lipopolysaccharide envelope covering the organism which is an important virulence factor, and the H antigen which is present on the flagella of the organism [49]. The antigenic properties of the O antigen are depicted as numerals, e.g., 1,9,12 for SE. In contrast, the H antigens are described using one or a few letters for the phase I antigen (e.g., g, m for SE) or as a combination of letters and numbers for antigens that are expressed should the flagella bear a phase II antigen (e.g., r and 1, 2 for Heidelberg). Agglutination assays are performed on the organisms using antibodies that are able to recognize specific antigenic molecules developed through laborious cross-absorption process against other serovars [50]. The result is an elaborate classification scheme, developed by Kauffman and White [51, 52] and which has now led to the identification of some 2,600 serotypes of Salmonella. The complexity has been further enhanced by the ability of plasmids and prophages to alter the expression of some of the antigens, and this had led to a frequent re-evaluation of some serovar designations. Fortunately, these alterations are fairly rare and the serotyping scheme has served well since first proposed by Schüte in 1920 [53]. Of the large number of Salmonellaserovars identified so far, only a relatively small numbers, perhaps no more than 100 serovars are commonly associated with foodborne illnesses [54, 55].

### 3.2 Traditional subtyping procedures for SalmonellaEnteritidis

There are two approaches for the subspecies characterization of SE. Phenotypic tests rely on the biochemical properties of the live organism and the most prominent example is phage typing. More recently, DNA based approaches or genotypic tests have dominated the field. The most widely used genotypic test being the Pulsed-Field Gel Electrophoresis. Whole genome sequencing of the DNA of SE, has over the last few years, become the dominant subtyping method in the developed world.

#### 3.2.1 Pulsed-field gel electrophoresis (PFGE)

The PFGE can been used to characterize bacteria isolates based on the pattern of distribution of restriction enzyme sites present in the organism’s DNA. For Salmonella, the electrophoretic mobility of DNA fragments digested by the restriction enzyme XbaI or BlnI produces a characteristic fingerprinting pattern that is used to subtype the isolate. During the period between 2009 and 2019, the Canadian Food Inspection Agency used the PFGE for outbreak investigations as one of the two subtyping tests for SE, the other being the phage type. Despite the presence of hundreds of different PFGE types among field isolates of SE only two PFGE types predominated and each consisted of thousands of isolates in the Canadian PulseNet database. The two commonest Canadian primary PFGE types, namely SEN.XAI 0003 and SEN.XAI 0006, were responsible for 33.8 and 19.2% of Canadian SE isolates documented in the PulseNet database between 2012 and 2017 (Ogunremi, Allain and Nadon, unpublished). The predominance of only a few PFGE SE types was long recognized as a consequence of the poor discriminatory ability of the technique for analyzing the relatedness of SE isolates (Table 1) rather than a reflection of an evolutionary dominance of a few circulating strains [56]. These observations led to the pursuance of WGS as an alternative approach [57].

CladeStrain identificationSource descriptionPhage typePFGE type SENXAI, SENBNIEnteroBase
MLST (7 gene)cgMLST_v2 + HierCC_v1
12007-MI-0187-0006Poultry environmentAtypical0214, 0225814259062
208OTH012 6–4Poultry environment9b0214, 0225814259068
306-1472Animal feed13a0006, N/A639273915
4OLF 10012–1Sea food, clams13, 1b0009, 0013115485
5ID094888Clinical case6aN/A, 001111259098
6dart-1997-742-B2Cheese lunchables80003, 000311259481
7S-MBS4754AChicken ceacum51N/A8471259064
8SE974-OLF-2015-NSubBovine, heiferN/AN/A11260728
9S-MBS1982AChicken thighN/AN/A11259069
1010OTH025 7–14Poultry environment130038, 001611259063
11S-MBS0737RChicken carcass13aN/A11259067
1205–3936Chicken breast13a0068, N/A11259480
1307–1474Chicken nuggets80003, N/A1130959
14S-MBS3492AChicken breastN/AN/A11259071
15S-MBS7608AChicken carcass8N/A11259072
1610SU010 19–1Poultry environment80003, 0003115490
1707–1485Chicken nuggets14b0003, 00031130959
18S-MBS3006AChicken ceacum8N/A11259070
1911OTH025 11-5Poultry environment80003, 000311273916
20S-MBS8825AChicken ceacum8N/A11259066
21SA20100239Bovine liver2N/A1114029
2200D989 83–4Poultry environment230003, 0009115498
23SE972-OLF-2015-NSub112-S19Water treatment plant8N/A11259100
24ID112184Human80007, 021211259479
25EN1811Food processing equipment130076, 000311233056

### Table 1.

Clade designation of SalmonellaEnteritidis organisms depicting a representative strain for each clade and comparison with the results of traditional and new subtyping assays.

The single nucleotide-polymorphism chain reaction (SNP-PCR) was used to test SalmonellaEnteritidis (SE) isolates and a representative strain for each designated clade (from 1 to 25) is shown in comparison to traditional and whole genome sequence based subtyping results. Only the SNP-PCR and EnteroBase core-genome multi-locus sequence typing (cg-MLST) supplemented with Hierarchical level analysis (HierCC) showed distinct resolution of the representative strains. All other methods including 7 gene MLST, phage typing and pulsed-field gel electrophoresis (PFGE) did not provide adequate discriminatory ability relevant for strain differentiation, outbreak investigation or tracking SE from farm to fork. N/A: Not available.

#### 3.2.2 Phage typing

In contrast to the PFGE, phage typing is a phenotypic test that exploits the ability of certain bacteriophages, i.e., viruses that infect bacteria, to differentially attach and gain entrance into strains of bacteria. Phage typing of SE is the outcome of the pattern of susceptibility of different strains to a bacteriophage or a combination of bacteriophages, resulting in lysis of the bacterial cell [58]. A large number of phage types of SE have been described in Canada and elsewhere, however phage types 8, 13 and 13a were observed to predominate in Canada [59]. This observation may not reflect the presence of a few, circulating dominant strains of SE in Canada, but instead may be a consequence of the inadequacy of phage typing as a discriminatory tool that can accurately delineate the population structure of SE in Canada, similar to the PFGE as discussed above (see Section 3.2.2 and Table 1). The plasticity of phage types also diminishes its use as a subtyping tool. Factors such as the restriction system within the bacteria, ability of lipopolysaccharides and outer membranes to adsorb the bacteriophage, and the immune system of the vertebrate host infected by the bacteria can alter the phage type of an organism [60]. The reagents used for phage typing require very rigorous quality control and yet, test performance can be remarkably different among laboratories [61]. Changes occurring within an organism such as the acquisition or loss of IncN plasmid [62, 63], transfer of IncX plasmid [64] or loss of the lipopolysaccharide layer [65] have been shown to lead to poor test reproducibility. Thus, two isolates with the same phage type may in fact be unrelated and conversely, two isolates that show distinct phage types may be closely related. As a result of these factors, phage typing shows inadequate discriminatory power, partial typeability and poor reproducibility [66].

#### 3.2.3 Multiple locus variable-number tandem repeat analysis (MLVA) assay

MLVA is a molecular typing method that is based on PCR amplification of polymorphic regions of the DNA containing variable numbers of tandemly repeated sequences. The method has been standardized by PulseNet International and applied to the epidemiological investigations of SE either as a supplement or substitute for PFGE subtyping [67, 68]. An advantage of the MLVA is the designation of the typing results with a numeric sequence of tandem repeats. This represents a simple, easy-to-understand nomenclature which facilitated the reporting and exchange of test results between laboratories, and translated to a reliable tracking of an organism during epidemiological investigations. The discriminative ability of the MLVA has been variously shown to be superior [69], equivalent [70] or poorer than the PFGE [71].

Detailed genetic studies of SE have consistently shown the underlying causes of the poor discriminatory abilities of available subtyping tools, namely: isolates of SE are extremely similar (i.e., are highly clonal) and this poses a difficulty in finding a definitive, distinguishing trait that could be used to track lineages [70, 72, 73]. The timely arrival and increasing adoption of WGS has altered the analytical landscape.

### 3.3 Application of whole genome sequencing (WGS) in SalmonellaEnteritidis: identification and characterization

The development of WGS procedure has heralded the application of a powerful technology for the identification and characterization of SE [57] which has been used for outbreak investigations [74], trace back procedures [75] and surveillance [76]. Furthermore, WGS analysis of SE has provided insights into phylogenetic relatedness of isolates, presence and prevalence antimicrobial resistance genes, novel mobile elements, virulence markers and bacteriophages in strains of the organism isolated from humans, food animals, production facilities and environmental sources [77, 78, 79]. Relevant to developing long term control and intervention strategies are the insights to be gained from the increasing application of WGS to the understanding of transmission dynamics of SE as was done in Chile to infer possible transmission of SE between gulls, poultry, and humans [80]. Bioinformatics approaches that allow useful information to be mined from genome sequences will now be discussed.

#### 3.3.1 Whole genome-based serotyping

Serovar prediction can now be done on Salmonellaisolates if the whole genome sequence is available by replacing the laborious agglutination assay (see Section 3.1) with an in silicoanalysis of the nucleotide sequence of the organism. Effectively, the traditional gold standard of traditional serology based on the Kauffmann-White Scheme has been replaced in the developed economies with in silicoapproaches [81]. Two of the mostly widely tools for this purpose are the Salmonella In SilicoTyping Resource (SISTR) software and the SeqSero2 software [82, 83].

SISTR is an open, web-based bioinformatics platform capable of rapid in silicoanalyses of minimally processed draft assemblies of Salmonellagenomes to generate accurate serovar designations. A collection of markers previously developed for the various Salmonellaserovars formed the basis of the new tool [84]. The performance of SISTR is enhanced by the integration of additional multilocus sequence typing tools (see Section 3.3.2) which as a separate platform has been suggested as a replacement for the use of serotypes to define taxonomic as well as evolutionary groups of Salmonella[55]. SeqSero, which was launched in 2015 was developed to employ the use of the rfbcluster, fliCand flijBto categorize Salmonellaaccording to serovar using draft genome assemblies [83]. A subsequent improvement of the software, released as SeqSero2 included addition of markers at the level of the genus, species, subspecies as well as certain serotypes. Furthermore, a kmer-based algorithm was included that ensured a genome can be analyzed and the result available within seconds [85].

#### 3.3.2 Multilocus sequence typing

Multilocus sequence typing (MLST) evaluates the nucleotide sequences of multiple housekeeping genes of an organism as a means of establishing similarities or differences among isolates [86]. Based on the sequences, each housekeeping gene is assigned an allele which can be stringed together in a nomenclature that defines the organism. Although the MLST scheme was developed using the bacterium Neisseria meningitidis[86], the advantage of electronic portability of sequence data and ease of incorporation of additional genes found a good synergy in the advent of WGS and has gained application in food safety. This has birthed the widely used EnteroBase (https://enterobase.warwick.ac.uk/) [87], an integrated web-based platform that permits the upload and analysis of short read Illumina sequences. This has allowed the expansion of the MLST scheme which was based on the initial six housekeeping genes [86] to a series of flexible applications and expansions for Salmonellaincluding seven genes (legacy MLST), 3002 genes identified as the core genome of Salmonella,to produce core genome MLST (cgMLST) and 21,065 orthologous genes detected in a set of 537 Salmonellagenomes, regarded as whole genome MLST (wgMLST). Despite the adoption of the wgMLST by PulseNet International [88], an influential international body which overlooks regulatory subtyping procedures for foodborne bacteria, EnteroBase’s Sequence Type, ST, of Salmonellabecame a widely adopted subtype descriptor for Salmonella. However, ST does not provide adequate resolution for epidemiological concordance and outbreak level discrimination [89], and in addressing the challenge EnteroBase has additionally provided the core genome ST, cgSTs, complemented with a newly described 11 levels of genetic resolution hierarchies or HierCC for Salmonella(Table 1) [87, 90]. The result is a tool that appears to provide the needed resolution for strain differentiation in the context of disease outbreaks.

#### 3.3.3 Single nucleotide polymorphism (SNP) pipelines

Single base substitutions represent one of the commonest variation in genomes and the resulting polymorphism can form the basis for the characterization of a microbe including SE. SNPs are detected as nucleotide changes at a specific location in a genome after aligning or comparing it to a designated reference genome. Bioinformatics pipelines have been developed to automate the aligning and identification of the variants. A number of SNP pipelines are in common use and will now be described. SNVPhyl which was developed at the Public Health Agency of Canada identifies high quality SNPs among a set of selected isolates and is useful for generating phylogenetic trees from these SNPs [91]. Public Health England developed SnapperDB, also a high-quality SNP pipeline which analyzes microbial genomes, evaluates genetic distances among the genomes and infers relatedness of strains [92]. Parsnp detects core genome SNP in bacterial genomes and with the aid of adjunct interactive tool Gingr can be used to display informative overviews for specific sub-clades and genomic regions [93]. The kSNP tool detects SNPs in the pan genome but is uniquely able to carry out comparisons among genomes without a requirement for genome alignment nor the use a reference genome [94].

### 3.4 Rationale for developing a new reliable, rapid, robust, cost-effective, epidemiologically concordant, easily implementable subtyping tool

A strategy aimed at developing a tool capable of differentiating lineages in the highly clonal S.Enteritidis lineages will likely require interrogating a significant amount of the bacterial DNA information. The opportunities provided by the massively parallel sequencing technology [95], which deduces the entire nucleotide sequence of an organism appeared at the onset to be the most viable option in charting a course to address the need. Use of genome sequence for taxonomy including strain differentiation could conceivably work well with strains showing significant genetic diversity, e.g., >5% differences among unrelated strains. However, this may be very difficult for a clonal organism such as SE where diversity between unrelated strains could be as little as 1% and the similar regions of the genome would have to be ignored before focusing on the dissimilar portions to demonstrate an accurate quantitative estimate of relatedness. This may explain the failure to use whole genome sequence to develop a reliable estimation of genetic distance by means of a phylogenetic tree for a group of SE isolates (Ogunremi et al., unpublished data) using a method shown to work for other bacteria [96].

Consequently, this led to an effort to develop, analyze and characterize the genomes of SE. During the early phase of this endeavor involving a select number of SE isolates from Canada, 669 SNPs were detected in the genome of SE [57]. Subsequent analysis of 135 SE genomes present in the GenBank in 2014 led to the identification of a total of 1440 SNPs providing a robust resource that was exploited for a SNP-based strain differentiation and clustering of foodborne SE isolates [57]. Thus, despite the universal acceptance of the usefulness of whole genome sequences for microbes, individual organisms such as the highly clonal SE may pose a unique challenge that might require a more focused analysis on carefully selected targets of the entire genome.

## 4. Single nucleotide polymorphism-polymerase chain reaction test (SNP-PCR) as a new, nomenclature friendly procedure

### 4.1 History and development of SalmonellaEnteritidis lineages/clades and SNP-PCR

The existing molecular methods investigate only very small portions or attributes of the entire bacterial genome. The PFGE, as an example, identifies enzyme restriction patterns in the genome whereas WGS-based procedures have available for analysis detailed information on the entire genome to exploit as a basis for comparison and discrimination. To that end, extremely small differences, such as single nucleotide polymorphisms (SNPs), can be identified and used for subtyping as long as these attributes are consistently preserved in a particular bacterial lineage. Notably, Allard and colleagues [97] carried out bioinformatics analysis of a total of 104 SE genomes belonging, for the most part, to the predominant PFGE pattern (JEGX01.0004). They described a total of 9 clades and found 366 genes that showed variation, i.e., presence or absence, in the SE genome. This observation complemented and expanded on an earlier study by another laboratory which showed that two isolates of SE with the same phage type, PT 13a, were differentiated by a relatively large number of loci, i.e., 250 SNPs [73]. Similarly, by using a specific reference genome, for instance SE strain P125109, the WGS-based sequence reads were mapped to the reference to find SNPs which were used to build maximum-likelihood phylogenetic trees. Another study involving 55 SE strains selected from clinical and environmental samples in Minnesota and Ohio from 2001 to 2014 showed the existence of only two major groups [98]. Furthermore, WGS based SNPs analysis of 675 SE isolates from 45 countries formed a global epidemic clade and two new clades that were found to be geographically restricted to distinct regions of Africa [99]. Using a closely related serovar - S.Gallinarum - as an outgroup, a maximum-likelihood phylogenetic tree was constructed based on the alignment of a total of 42,373 SNPs [99]. In addition, a SNP-based phylogenetic structure of 401 European SE isolates implicated outbreaks correlating with national and international egg distribution network [75].

Thus, genetic variation that could allow the development of a routine subtyping tool for tracking purposes is present and demonstrable within the SE genome but was apparently not fully exploited given the few number of subgroupings in each of the reported, sampled populations, and this presented a need to properly mine the SE genome and develop a very discriminatory subtyping procedure. In exploring this need, our hypothesis was that the use of a large number of SNPs may not necessarily improve the power of discrimination. More is not necessarily better. A large number of uninformative loci may be counterproductive and undesirable for strain differentiation. As a first step to address this need, whole-genome sequences of 11 SE isolates obtained in Canada were developed and compared to SE P125109 reference strain phage type 4 which led to the identification of 1361 loci where the SE genome showed SNP [100]. Subsequent selection of 60 SNPs spread throughout the genome and distributed among different gene types and in intergenic locations led to the development of a rapid, inexpensive fluorescence-based real time PCR subtyping assay [55].

### 4.2 The SNP-PCR subtyping procedure

The SNP-PCR genotype assay is an allele-specific, single amplification procedure based on the specific binding of one of two, competing forward primers, 18–20 nucleotides long, which differ by one single nucleotide at the locus of interest. The use of a single reverse primer completes the amplification process leading to the accumulation of an amplicon bearing the SNP of interest. Each primer is designed with a specific tail that allows a complementary binding with a commercially provided, customized sequence labeled with a fluorescent dye, FAM or HEX for allele 1 or 2 respectively (LGC Genomics, Beverly, MA). Thus, the first cycle of amplification ensures that the specific forward oligonucleotide present in the primer mix binds to the sequence containing the SNP and excludes the other primer. The reverse primer, also 18–20 nucleotides long, binds and elongates the fragment during amplification ensuring that the tail sequence is present, which then allows the accumulating fragment to contain either the FAM or HEX fluorescent label depending on the initial binding of one of the bi-allelic primers, which is dictated by which of the SNP corresponds to allele 1 or allele 2. Thus, detection is based on the use of fluorescent labeled sequence that assigns the allele number to either of the two nucleotides that may occupy the SNP position. The SNP alleles are compiled for all SE strains at the 60 loci and used as input to carry out evolutionary history analyses using Maximum Parsimony method, which was conducted using Molecular Evolutionary Genetics Analysis on the MEGA-X computing platform [101]. The distinct grouping of the SE isolates are identified as clades and each given a specific numerical description starting from 1.

Following the development of the SNP-PCR procedure, our initial application of the assay to a group of 55 SE isolates obtained in Canada led to the recognition of 12 clades of SE [57].

### 4.3 Twenty five circulating clades of SalmonellaEnteritidis

Recently, the laboratory validation of the SNP-PCR assay was completed using 1,127 SE isolates obtained from food, animal, humans, and environmental sources in Canada and Europe and we observed a total of 25 circulating clades of SE (Table 1, Ogunremi et al., manuscript under preparation). In addition, 13 other globally distributed isolates identified from published papers [98, 99] as well as the widely used reference SE strain P125109 phage type 4 were also included in a phylogenetic comparison using the Maximum Parsimony method. These strains were distributed across the generated phylogenetic tree and homed to distinct SE clades providing further validation of the SNP-PCR tool to appropriately cluster strains and at the same time, distinguish among different strains (Ogunremi et al., manuscript under preparation). The validation procedure unambiguously demonstrated the robustness of the assay while displaying its prowess in estimating genetic distances and relatedness among and between clades, and its relevance in constructing an evolutionary map of SE following the testing of a large number of isolates.

### 4.4 Advantages of SNP-PCR: nomenclature and population structure

Previous studies aimed at evaluating the population structure of the highly clonal SE have reported fewer lineages and clades among isolates tested. For instance, a study of 675 very diverse isolates collected over many decades (1948–2013) in 45 countries and 6 continents revealed the presence of only 3 clades; a subgroup of 58 isolates was identified but could not be clustered by the method used by the authors [97]. Yet another study demonstrated 9 clades among a large but PFGE-uniform group of isolates [99]. These studies, which showed a limited diversity among SE populations, served to underscore our contrasting observations, and reinforced the excellent discrimination observed for SE using the validated SNP-PCR assay. The SNP-PCR compares well with cgMLST-HierCC function in EnteroBase in discriminating among strains chosen to represent SE clades from a very diverse SE population from a variety of sources and different continents (Table 1; Ogunremi et al., under preparation).

Apart from being a highly discriminatory and robust assay, the SNP-PCR is very cost-effective. Reagents cost are estimated at Can$0.25 per SNP per isolate and testing 60 SNPs is cheaper than the traditional, less discriminatory subtyping assays (Can$26 for phage typing and Can$36 for two-enzyme PFGE analysis in reagent costs) or for WGS (Can$100). The SNP-PCR validation procedure (described above) showed that only 17 SNP loci needed to be tested to assign an isolate to a clade and the test performed excellently well on crude, boiled bacterial extract, obviating the need for DNA purification and further creating an increased savings of reagents, labour and time.

Another important attribute of the SNP-PCR is its equal adaptability to few samples or a large number of samples. When compared to Illumina WGS which requires a prescribed number of samples per run (e.g., 20 Salmonellastrains using MiSeq version 3 library kit over 600 cycle sequencing which runs for 65 hours), the SNP-PCR can be used to test one or a few samples with the appropriate controls without any cost implication on the volume of analysis. At the other end, a single PCR sample can handle a 384-well plate loaded with hundreds of samples and machine run completed in 2 hours. The labor costs of running the SE SNP-PCR test (2 h PCR time) and analyzing the results are at least an order of magnitude lower than those of any subtyping approach including traditional molecular tests or WGS. The SNP-PCR test shows very good reproducibility (95%) in tests conducted in six laboratories.

The SNP-PCR impressively satisfies all the seven criteria expected of an ideal subtyping test which includes cost effectiveness, rapid performance, robust results, typeability, high discrimination, reproducibility, and epidemiological concordance [66].

## 5. Conclusions

The bacterial pathogen, SalmonellaEnteritidis is one of the most prevalent causes of foodborne illness in humans worldwide, yet tracking a strain of the organism through the food safety system is challenging because of its clonal nature, evident at the genomic level, which historically has resulted in poorly discriminating laboratory typing methods. The current application of genomics has led to the development of comprehensive and highly discriminatory tools however there are still challenges with the interpretation of the outputs and the application of the methods to differentiate between outbreaks and sporadic infections. The effect is a poorly understood population structure of SE.

This chapter illustrates the existence of 25 clades of SE, which should be useful for defining the population structure and tracking the pathogen from farm to fork. The phylogenetic relationships among the 25 clades of SE was obtained using a population of 1127 isolates obtained from a variety of sources in Canada and Europe. The validated SNP-PCR assay displayed the attributes of an ideal subtyping test and can be implemented in resource deprived countries where routine genome sequencing remains unaffordable, as well as in resource rich countries when characterizing a few isolates may not justify the expense of a genome sequencing run or for surveillance where interest in characterizing a large number of lower priority, non-clinical but valuable isolates is a very desirable goal.

## Acknowledgments

This work was funded by the Canadian Food Inspection Agency, Ontario Ministry of Agriculture, Food and Rural Affairs, and Genome Research and Development Initiative of the Government of Canada.

## Conflict of interest

The authors declare no conflict of interest.

## Acronyms and abbreviations

PFGE

Pulsed-field gel electrophoresis

XLD

Xylose lysine desoxycholate

XLT4

Xylose lysine tergitol-4

HE

Hektoen enteric

BGS

Brilliant green sulfa

NTS

Non-typhoidal Salmonella

NGS

Next generation sequencing

SalFoS

SalmonellaFoodborne Syst-OMICS database

chapter PDF
Citations in RIS format
Citations in bibtex format

## More

© 2021 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## How to cite and reference

### Cite this chapter Copy to clipboard

Dele Ogunremi, Ruimin Gao, Rosemarie Slowey, Shu Chen, Olga Andrievskaia, Sadjia Bekal, Lawrence Goodridge and Roger C. Levesque (October 14th 2021). Tracking <em>Salmonella</em> Enteritidis in the Genomics Era: Clade Definition Using a SNP-PCR Assay and Implications for Population Structure, Salmonella spp. - A Global Challenge, Alexandre Lamas, Patricia Regal and Carlos Manuel Franco, IntechOpen, DOI: 10.5772/intechopen.98309. Available from:

### Related Content

Next chapter

#### Salmonella enterica subsp. diarizonae Serotype 61:k:1:5:(7) a Host Adapted to Sheep

By Inés Rubira, Luis Pedro Figueras, José Calasanz Jiménez, Marta Ruiz de Arcaute, Héctor Ruiz, José Antonio Ventura and Delia Lacasta

#### Cohort Studies in Health Sciences

Edited by R. Mauricio Barría

First chapter

#### Introductory Chapter: The Contribution of Cohort Studies to Health Sciences

By René Mauricio Barría

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.