Average nucleotide composition and G+C content obtained from yeasts from Raphia sp. (R) or Elaeis sp. (E) palm wine and their relatives after measuring nucleotide frequencies (%) in 100 sequences relative to each yeast species shown.
Sequences from three palm wine yeast genera namely Saccharomyces cerevisiae, Pichia kudriavzevii, and Candida ethanolica were analyzed to establish their phylogenetic relationships, geographical origin, and food matrix source of their close relatives. Up to 600 sequences present in yeasts representing close relatives of palm wine yeasts were examined. Pyhlogenetic trees constructed showed polyphyletic relationships in C. ethanolica whereas close relatives of S. cerevisiae and P. kudriavzevii showed little divergence. Sequence data for both Elaeis sp. and Raphia sp. palm trees showed that highest number of palm wine yeasts relatives sequence submissions to the Genbank were from China and beverages were mainly the sources of close relatives of S. cerevisiae and P. kudriavzevii whereas C. ethanolica closest relatives were from various non-food sources. Overall relatives of palm wine yeasts were not specific to any particular food or fermentation mix. The guanine-cytosine (G+C) content in P. kudriavzevii (57–58%) and C. ethanolica (56–57%) was higher than that of S. cerevisiae (47.3–51%). This suggests that the P. kudriavzevii and C. ethanolica have a higher recombination rate than S. cerevisiae strains analyzed. The data may help to understand palm wine yeast conservation and the diverse food matrixes and geographical origins where their close relatives exist.
- Saccharomyces cerevisiae
- Pichia kudriavzevii
- Candida ethanolica
Palm wine is a traditional drink consumed mainly in sub-Saharan Africa, parts of Asia, and South America. It is obtained from fermentation of saps of different palm trees. Palm wine is sourced from palm trees and they grow throughout tropical and subtropical regions with just a few species found in temperate regions possibly due to freeze intolerance of seedlings . The method of obtaining the drink by tapping has been described in many reports  and the palm sap varies according to palm trees found in different geographical location. Yeasts are the main organisms implicated in the fermentation of the drink and they exist as natural flora on palm trees. Irrespective of the palm tree source, a common feature of the drink is that it goes sour within 24 h unless it is subjected to cold storage. The two trees from which palm wine is mostly tapped in Nigeria are Raphia hookeri and Elaeis guineensis. There is a debate on the possible origin or source of these palm trees. The tree Raphia hookeri is known as the wine palm and is the most widespread familiar Raphia palm in fresh water swamps of west and central Africa . Many local varieties exist in the tropical rain forest of Nigeria and it is also grown in India, Malaysia, and Singapore . The E. guineensis oil palm variety is more widely found around the world. A report pointed out that E. guineensis palm tree originated in the tropical rain forest region of West Africa and can be found in Cameroon, Côte d’Ivoire, Ghana, Liberia, Nigeria, Sierra Leone, Togo Angola, and the Congo . It is believed in the report that during the fourteenth to seventeenth centuries, some palm fruits were taken to the Americas and from there to the Far East where it thrived. Yeast are known to reflect human history  hence it is possible the yeast strains found in palm wine were introduced to new regions via the plant materials introduced in those locations.
Although it is known that yeasts have been used for food and beverage fermentations  hundreds of years ago and domestication is believed to have been initiated before the discovery of microbes , the extent of genetic diversity is still under study around the world. Recent reports have shown that non-Saccharomyces yeasts have different oenological properties to those of S. cerevisiae . Other reports emphasize that even though biochemical and genomic studies of S. cerevisiae have helped our understanding of yeasts, the other lesser known yeast species have not been fully exploited . More understanding of S. cerevisiae and non-S. cerevisiae yeasts in palm wine is needed  in order to get more information on the capabilities of yeasts present in the drink or to probe for novel species . To generate more information, molecular characterization has been used by many investigators and this has led to proper identification of new yeast strains in the drink. The diversity of yeasts from palm wine has not had much in-depth investigation and reports that show evolutionary trees which are the basic structures necessary to establish the relationships among organisms  are few in literature. This chapter examines evolutionary relationships of palm wine yeasts and their close relatives based on 26S rRNA sequence data and aims to shed more light on the diversity of yeasts found in the drink.
2.1. Ribosomal ribonucleic acid genes partial sequence data
In a previous study , partial 26S rRNA gene sequences from 18 palm wine yeast isolates were deposited under accession numbers (HG452325-42). The sequences from three yeasts genera identified in that study namely S. cerevisiae, P. kudriavzevii, and C. ethanolica from Elaeis sp. and Raphia sp. palm trees were selected and used to carry out new updated searches in this report. For Elaeis sp., the sequence accession numbers used were HG425336, HG425328, and HG425333 whereas HG425332, HG425338, and HG425335 were used for the Raphia sp. palm tree. The current versions of the selected six sequences mentioned above were used separately for an updated search in the Genbank database. The searches were optimized for highly similar sequences and the first 100 sequences from relatives of each yeast species with the highest percent identity were marked to make a shortlist of up to 600 sequences. These sequences were examined for the features listed at the time of submission after which the countries of origin and sources were noted. Sources were classified as beverage, food, or non-food sources.
2.2. Construction of phylogenetic trees
Phylogenetic trees were constructed from the shortlisted sequences by using the molecular evolutionary genetic analysis (MEGA, version 7) computer software . The software allowed a seamless transfer of the sequences from Genbank. Using the multiple sequence comparison by log expectation (MUSCLE) reported by Edgar , multiple sequence alignments (MSA) were constructed with the software. The evolutionary history was inferred by using the maximum likelihood method based on the Tamura-Nei model . The tree with the highest log likelihood was chosen. Initial trees for the heuristic search were obtained using the maximum composite likelihood approach. Trees were drawn to scale, with branch lengths measured in the number of substitutions per site. All positions containing gaps and missing data were eliminated. The nucleic acid composition of the sequences was calculated automatically by switching to the nucleic acids estimation mode of the software after which the G+C content of the sequences were calculated manually from the arginine, guanine, cytosine, and thiamine percentage distribution displayed. The MAS tool MUSCLE used assumes an equality of substitution rates among sites and takes into account differences in transitional, transversional rates, and G+C-content bias . For brevity, only 20 sequences from the initial 100 relatives obtained are shown in the trees with the reference sequence.
The complete list of 600 sequences analyzed showing sources and countries of origin is available in the public repository figshare .
3. Results and discussion
3.1. Evolutionary relationships of palm wine yeasts and their relatives
Yeasts facilitate several industrial food fermentation processes, which often consist of a desired specific strain . This may be why domestication is believed to be the main driver of specific yeast prevalence in a geographical location. The understanding of the ecological basis of yeast diversity in nature remains fragmented and cross-kingdom competition has been proposed as a method to generate industrially useful yeast strains with new metabolic traits . Palm wine yeasts are yet to enjoy significant diversity study hence a look at their relatives will enable more information to be generated.
In the last decade, there has been increase in submissions of palm wine yeast sequences based on 26S rRNA genes mainly due to quality checks by academic journals. The identification of new strains is accompanied by performing a search with the basic local alignment search tool  followed by submission of DNA sequences to the GenBank. According to Benson et al. , GenBank is a comprehensive database that contains publicly available nucleotide sequences for up to 370,000 formally described species. It is common knowledge that these submissions which contain a lot of information are generated mainly through submissions from investigators around the world. Each sequence data received is curated by the GenBank annotation staff to ensure that it is free from errors after which accession numbers are assigned.
All the sequences used in this study were the first versions submitted by investigators. The maximum likelihood method was preferred for the trees constructed because it is computationally intense and all possible trees are considered. Also the method can be useful for widely divergent groups or other difficult situations .
3.2. Candida ethanolica
The yeast C. ethanolica is not widely reported in palm wine. It has been reported as a non-conventional yeast which may present massive resource of yeast biodiversity for industrial applications because it has been found to be adapted to some of the stress factors present in harsh environmental . In that report, it was found that C. ethanolica tolerated up to 7% v/v ethanol. This could be useful information for new palm wine drink development especially now that there is increasing interest in non-Saccharomyces yeasts with peculiar features able to replace or accompany S. cerevisiae during specific industrial fermentations .
The C. ethanolica strain from Raphia sp. (Figure 1) and Elaeis sp. (Figure 2) palm wine showed close relationships with other Candida species. The relatives of Raphia sp. palm wine that emanated from the same node (Figure 1) came from diverse sources. The flanking close relatives (KY283163 and DQ466540) of C. ethanolica (HG425332) were isolated from composite microbial powders for aquaculture in China  and composite cocoa fermentation in Ghana . Other close relatives included species from the genus Pichia. The P. deserticola strain (KM005182) from the same node as the reference strain was from aerobic deterioration of total mixed ration silage in China . For Elaeis sp. (Figure 2) palm wine, close relatives to C. ethanolica (HG425336) strain were from a laboratory culture collection with unidentified source  and a tannin tolerant yeasts associated with naturally fermented Miang leaves in Thailand . A close P. deserticola strain of unstated source in GenBank was from a large characterization study .
In both Elaeis sp. and Raphia sp. palm wine, several monophyletic groups were formed with other Pichia species namely P. deserticola, P. Manshurica and P. galeiformis which indicate polyphyletic relationships. The polyphyletic nature of Pichia has been demonstrated by Kurtzman and Robnett  in the analysis of gene sequences that included all known ascomycetous yeasts. Apart from possible similar conserved regions, previous nomenclature at the time of submission of the sequences may also be the reason why Pichia species of different genus were observed as close relatives of C. ethanolica from Elaeis sp. and Raphia sp. palm trees.
It has been reported that ascomycetic fungi submitted to the database previously have been assigned names based on their life stages [32, 33]. For example, it was shown that the name for the fungi Candida krusei is based on the anamorphic stage whereas its telemorph stage name is Pichia kudriavzevii. It also has an older name Issatchenkia orientalis. The whole Candida species consists of up to 850 organisms, which can be distantly related . Hence in order to avoid the confusion, the International Botanical Congress in Melbourne in July 2011, made a change in the international code of nomenclature for fungi and adopted the principle of one fungus can only have one name and ended the system of permitting separate names to be used for anamorphs . The report emphasized that this validated all legitimate names proposed for a species, regardless of what stage they were typed and can serve as the correct name for that species.
3.3. Sachharomyces cerevisiae
The yeast S. cerevisiae is generally known to be the most used microorganism in the food and drink manufacturing sector. The organism is the dominant yeast species isolated from many studies on palm wine. However, it is unclear whether S. cerevisiae as a species occurs naturally or exists solely as a domesticated species . S. cerevisiae strains are genetically diverse, largely as a result of human efforts to develop strains specifically adapted to various fermentation processes. These adaptive pressures from various ecological niches may generate behavioral differences among these strains . In a review , it was suggested that domestication in Saccharomyces, is most pronounced in beer strains, because they live in their industrial niche always and allow only limited genetic admixture with wild stocks and minimal contact with natural environments. Due to this restriction, it was pointed out that beer yeast genomes show complex patterns of domestication and divergence, making both ale (S. cerevisiae) and lager (S. pastorianus) strains ideal models to study domestication.
The relatives of palm wine S. cerevisiae was not distributed among many species or different genus observed for Candida species. Two nodes were observed for the S. cerevisiae trees constructed for Elaeis sp. (Figure 3) and Raphia sp. (Figure 4). The yeast strain isolated from Elaeis sp. (Figure 3) was in a different branch from most of its relative whereas it was vice versa for the palm wine yeast from Raphia sp. (Figure 4) palm wine. As observed for Candida species, isolation of S. cerevisiae species was from different sources. The close relatives flanking the palm wine strain from Elaeis sp. palm wine (HG425328, Figure 3) with accession numbers KU862639 and MF966566 were isolated from grape surface  and pear sough dough  whereas the close relatives of Raphia sp. palm wine (HG425338, Figure 4) with accession numbers GU080046 and HM191669 were isolated from must of spontaneous fermentation  and grape juice used to brew Musalais, a beverage made from compressed grapes .
It is believed that 99% of yeasts is still unknown , and S. cerevisiae fermentation could be specific to a particular substrate, hence more studies of S. cerevisiae from different palm trees will be beneficial. The genus Saccharomyces was previously divided into two groups namely Saccharomyces sensu stricto and Saccharomyces sensu lato and the sensu stricto strains are mostly associated with the fermentation industry . The S. cerevisiae in this study are sensu stricto. Comparative genomics analysis of S. cerevisiae and closely related species has contributed to our understanding of how new species emerge and has shed light on various mechanisms that contribute to reproductive isolation . This knowledge can be applied to palm wine yeasts to ascertain how they differ from well characterized yeasts.
3.4. Pichia kudriavzevii
From recent molecular studies of yeasts present in palm wine, the yeast species Pichia kudriavzevii has emerged as a prevalent non-Saccharomyces yeast species in the drink. The genus has shown probiotic potentials  and multistress-tolerance . It is worth looking closely at this genus because it has been shown that some P. kudriavzevii strains can produce higher quantities of ethanol from lignocellulosic biomass than conventional cells of S. cerevisiae at 45°C .
The tree constructed for P. kudriavzevii showed the least divergence when compared to S. cerevisiae or Candida palm wine yeast relatives. All the relatives and the Elaeis sp. palm wine strain (HG425333) originated from one node and formed separate taxonomic units (Figure 5). In contrast, the P. kudriavzevii (HG425335) from Raphia sp. palm wine formed a separate clade and did not lie on the same branch with the relatives (Figure 6). This indicates intraspecies diversity and confirms findings reported previously . In that study, intraspecies diversity was suggested because P. kudriavzevii (HG425335) from Raphia sp. palm wine formed a separate clade with palm wine isolates from Mexico instead of isolates from the same geographical location.
The information contained in the sequence submission of close relatives of P. kudriavzevii strains also shows different sources of isolation. The strains close to the yeast from Elaeis sp. palm wine (HG425333, Figure 5) with accession numbers KY283159 and KM234455 show that isolation was from composite microbial powders for aquaculture  and naturally fermented cashew apple juice  whereas a close relative of Raphia sp. palm wine (HG425335, Figure 6) with accession number KU167717 was isolated from activated sludge from textile dyeing .
3.5. Geographical origin and sources of palm wine yeast relatives
After ascertaining the sources of very close relatives from the phylogenetic trees constructed, the shortlisted 600 sequences from the aforementioned yeast genera were further examined and the information found was used to group the isolates according to country of isolation, food, beverage, and non-edible source.
3.5.1. Isolates submitted by country of origin and source
Overall, sequences examined for the aforementioned yeasts genera were submitted from 38 countries  and the top 6 countries is presented in this report. Sequence data for both Elaeis sp. (Figure 7) and Raphia sp. (Figure 8) palm trees show that highest number of submissions to the Genbank database was from China. The top three countries from which palm wine yeast relatives originated were the same for both palm tree species. This suggests that a large number of palm wine yeasts may have common ancestors with yeasts found in China. The origins or sources of palm wine yeasts relatives were spread across beverages, food, and non-food sources. The prevalence of S. cerevisiae, P. kudriavzevii, and C. ethanolica from these sources is shown for Elaeis sp. palm tree (Figure 9) and Raphia sp. palm tree (Figure 10). In both palm wine from Elaeis and Raphia palm trees, yeasts relatives of S. cerevisiae and P. kudriavzevii species were isolated mainly from beverage sources whereas relatives representing C. ethanolica species were isolated from non-food sources. The sources of isolation revealed that the closest relatives of palm wine yeasts were from various sources and not specific to any particular food or fermentation mix.
A report  found that laboratory estimates of optimum growth temperature could be used to predict global distributions of free-living microbes. Also, it was pointed out that population genetic analyses show that the genetic diversity of S. cerevisiae is high in the tropics and subtropics of China [51, 52]. It was suggested that without further sampling in tropical and subtropical regions, it is not possible to differentiate whether the higher diversity of S. cerevisiae in Asia reflects a greater habitat area or an Asian origin for S. cerevisiae. It would be beneficial to carry out further studies in order to establish if palm wine yeasts were taken from Africa to Asia or vice versa. The diversity could also be high in temperate regions because a study examined S. cerevisiae and S. paradoxus in northeast America and uncovered a large diversity of yeasts . Up to 24 yeast isolates could not be assigned to any known species and it was suggested that the yeasts identified may be of taxonomic, medical, or biotechnological importance.
3.6. G+C composition of palm wine yeast relatives
The G+C composition is a well known evolutionary property of eukaryotes, archaea, and bacteria. There are suggestions by Chen et al. , that concordance between proteomic architecture and the genetic code is related closely to genomic G+C content and phylogeny. It has been suggested that yeasts with higher G+C content have a higher recombination rate  and recombination is believed to be suppressed around centromeres . The data in Table 1 present the average nucleotide composition and G+C content of partial sequences of 26S rRNA genes analyzed. It shows concentration of arginine, guanine and thiamine, and cytosine concentration in S. cerevisiae, P. kudriavzevii, or C. ethanolica obtained from the aforementioned palm trees. Data were obtained after measuring nucleotide frequencies (%) in 100 sequences of strains relative to each palm wine yeast species listed. It was observed that the G+C content in P. kudriavzevii and C. ethanolica was higher than that of S. cerevisiae. This suggests that the P. kudriavzevii and C. ethanolica have a higher recombination rate than S. cerevisiae strains analyzed in this report. The G+C range observed is within the reported average genomic G+C-content range (13–75%) among species . It was also found to be within range of G+C content (38.3–52.9%) of the MAT locus reported  in different Saccharomycetaceae species.
|1. S. cerevisiae-R||26.3||16.6||26.5||30.7||47.3|
|2. S. cerevisiae-E||26.7||20.2||22.7||30.4||51.0|
|3. P. kudriavzevii-R||20.0||21.9||22.6||35.5||57.0|
|4. P. kudriavzevii-E||19.8||22.2||22.6||35.5||58.0|
|5. C. ethanolica-R||21.1||21.4||21.8||35.7||56.0|
|6. C. ethanolica-E||20.9||21.4||21.9||35.8||57.0|
Further studies are required because G+C-content is associated with multiple biases of different nature during down stream operations and these biases may include sequencing technologies, biological, and methodological reasons . Another factor that could affect the G+C content is that some yeasts like Lachancea kluyveri show an intriguing compositional heterogeneity in that a region of the chromosome has an average G+C content of 52.9% which is significantly higher than the 40.4% global G+C content of the rest of the genome .
Sequence data are useful for comparing palm wine yeasts from different trees. Data show the countries where the relatives of palm wine yeasts are dominant and may be useful for evolution and species migration studies. Palm wine yeast relatives may originate from beverage, food, and non-edible source. The G+C nucleotide data present insights on changes which may have occurred in conserved regions of some isolates over time. Comparing sequences with the highly conserved regions of the 26S rRNA genes gives an immediate picture of the lineage of palm wine yeasts and their relatives. It can also provide a foundation to select candidates for whole genome sequencing for comparision in future.
The free use of MEGA 7.0 is appreciated. Software can be accessed at