This chapter will address the main omics approaches used in studies involving the genus Corynebacterium, Gram-positive microorganisms that can be isolated from many diverse environments. Currently, the genus Corynebacterium has more than 130 highly diversified species, many of which present medical, veterinary and biotechnological importance, such as C. diphtheriae, C. pseudotuberculosis, C. ulcerans and C. glutamicum. Due to the wide application in these fields, several omics methodologies are used to better elucidate the species belonging to this genus, such as genomics, transcriptomics and proteomics. The genomic era has contributed to the development of more advanced and complex approaches that enable the increase of generated data, and consequently the advance on the structural, functional and dynamic knowledge of biological systems.
- genomic era
The genus Corynebacterium was proposed by Lehman and Neumann in 1896, to describe a type strain bacillus Corynebacterium diphtheriae. However, antecedent to its final taxonomic classification, C. diphtheriae had already been described as synonymous species such as Microsporon diphthericum, Bacillus diphtheriae, and Pacinia loeffleri. After its classification, the species was again described as the synonym Mycobacterium diphtheriae by Krasil’nikov in 1941 . Afterward, the genus came to accommodate other bacterial species that presented similar form and/or pathogenicity mechanisms . Currently, the genus has 110 valid species, of which 132 species have synonymous species and 11 subspecies .
Frequently, members of the genus Corynebacterium have a rod morphology, being Gram-positive, immobile, nonsporulated, presenting an aerobic growth, and producers of the catalase enzyme. They are part of the normal skin microbiota and mucous membranes of several hosts, being present also in the environment (soil, water, among others). Bacteria of this genus still share characteristics like G+C content (47–74%), oxygenase enzyme production, and no production of the enzyme collagenase. In addition, their cell wall is thick, and it has the presence of mycolic acids, peptidoglycan and arabinogalactan, also saturated and unsaturated fatty acids .
Bacterial species affiliated with this genus can be classified as: pathogenic, opportunistic and saprophytic. The strains of medical and veterinary interest are commonly divided in two groups: diphtheria and nondiphtheria. The diphtheria group, producer of the diphtheria toxin (TD) encoded by the viral tox gene, present in the DNA of β lysogenic bacteriophages . Among this group, we can mention three species: C. pseudotuberculosis, C. diphtheriae and C. ulcerans . Nondiphtheria species, as agents of infection, are considered opportunistic pathogens, because they are present in the skin normal microbiota, and in the human nasopharynx . The species Corynebacterium jeikeium, Corynebacterium urealyticum and Corynebacterium resistens are considered opportunistic . Nonpathogenic strains, such as Corynebacterium glutamicum, Corynebacterium efficiens, Corynebacterium crenatum and Corynebacterium variabile, have biotechnological importance in the production of amino acids and in the cheese industry .
2. Main species of medical and biotechnological importance
2.1. Corynebacterium diphtheriae
Diphtheria is a disease of acute and transmissible evolution within local and systemic manifestations, affecting the upper respiratory tract, and has been one of the main causes of death, especially in children, in different continents, even in countries with immunization programs. C. diphtheriae was isolated mainly from humans, however it has been isolated from other hosts, such as horses, cats and dogs .
C. diphtheriae is a rod-shaped bacterium, and its cells can measure from 0.5 to 2.0 μm in size. Strains belonging to this species do not produce spores and do not present structures such as capsules and flagella . The strains are classified in four biovars: mitis, gravis, intermedius and belfanti, based on the infection severity, morphological pattern of the colonies, carbohydrate fermentation and hemolysis . These biovars resemble the production of cystinase enzyme (Tinsdale medium), and the fermentation of glucose and maltose. Moreover, these biovars produce neither the enzymes pyrazinamidase nor urease, and also they are not capable of fermenting sucrose. In relation to nitrate reduction, the biovars mitis, gravis and intermedius present a positive reaction, and the belfanti biovar presents a negative reaction .
In its original form, C. diphtheriae does not cause diseases and its pathogenicity is related to infection by a bacteriophage carrying the tox gene encoding TD. Thus, the lysogenic cell carries the tox gene, highly conserved in the bacterial chromosome through generations . TD is a potent exotoxin of protein nature capable of acting in all tissues with special tropism for the myocardium, nervous system, kidneys and adrenals .
TD acts on the inhibition of protein synthesis causing cell death and it is composed of a single polypeptide chain containing two fragments, A and B, connected by a disulfide bond, and both are required for intoxication of tissue culture cells. The fragment A possesses the active site of TD and it is responsible for the enzymatic activity, while the fragment B is responsible for the fixation of the toxin with receptors in the host cells .
2.2. Corynebacterium pseudotuberculosis
The bacterium C. pseudotuberculosis presents characteristics such as the production of the enzyme nitrate reductase (biovar-dependent) and urease, the fermentation of carbohydrates maltose and glucose and the presence of halos of beta-hemolysis in blood agar. As for their colonies, they present a size ranging from 0.5 to 0.6 × 1.0 to 3.0 μm, with a whitish and viscous appearance . Biovar ovis, nitrate reductase negative, affects sheep and goats, and occasionally swine, causing caseous lymphadenitis, in humans may cause chronic subacute lymphadenitis . While biovar equi, which the isolates can reduce nitrate to nitrite, mainly infects equines, buffaloes and camelids, causing ulcerative lymphangitis and edematous skin disease . Due to its prevalence in animals of economic importance, diseases associated with C. pseudotuberculosis strains cause reduction in meat and milk production, wool depreciation, delay in animal development, deficiencies in reproductive indices of the herd, carcass condemnation, early discarding and occasional death of animals, also high treatment costs and veterinary fees .
The virulence of C. pseudotuberculosis is related to three main factors: the cell wall structure, its intracellular capacity for macrophage persistence and the production of phospholipase D (PLD) as exotoxin, which is considered the main virulence factor of the species . Although its main virulence factor is already well established, toxigenic strains of this species can also produce TD [17, 18].
The diagnosis of C. pseudotuberculosis in infected animals is performed by the macroscopic observation of the superficial abscesses formed, associated to laboratory culture in the selective media of tellurian agar, bacterioscopy, catalase test (positive for Corynebacterium) and biochemical tests . In addition to serological tests such as seroneutralization, indirect hemagglutination and Enzyme-Linked Immunosorbent Assay (ELISA), allergic tests and tests based on molecular biology such as the polymerase chain reaction (PCR), through the conserved genes rrs, rpoB and pld in multiplex PCR . Recently, with addition of the narG gene, it is possible to distinguish biovars from the capacity to reduce nitrate .
The animals affected by caseous lymphadenitis are usually treated through lymph node drainage and isolation of infected animals. Yet, this practice does not completely eliminate bacteria, due to the possibility of dissemination to viscera and other internal organs, as well to the great potential of contamination of the environment. In addition, antibiotic treatment does not produce satisfactory results due to difficult penetration into the abscess capsule, making the treatment unfeasible, emphasizing that prophylaxis is the best method to combat the disease .
2.3. Corynebacterium ulcerans
C. ulcerans has been described as the etiological agent of several infections in animals, such as goats, dogs, cats and cattle. The contact with affected animals is the main form of transmission of C. ulcerans to human hosts, causing diphtheria of zoonotic nature. The first cases of human infections were related to the consumption of milk contaminated by this microorganism. In the 1990s, it was presented as an emerging pathogen in countries of large animal production, such as England, Japan, Germany, Denmark, and Brazil .
As for its biochemical characterization, C. ulcerans presents features such as the production of the enzyme gelatinase, inability to reduce nitrate and virulence factors, including toxic lipids associated with the cell wall, which may mediate bacterial resistance to phagocyte attack. C. ulcerans is capable of producing PLD, as well as C. pseudotuberculosis . The third virulence factor for C. ulcerans is the production of the diphtheria toxin. C. ulcerans strains infected by bacteriophage carriers of the tox gene are the major responsible for clinical cases in humans and animals .
The diseases related to these strains show symptoms as frequent nasal bleeding, skin lesions similar to cutaneous diphtheria, necrosis and mucosal ulceration, granulomatous pulmonary nodules, lymph node involvement and the occurrence of cellular death .
Although diphtheria by C. ulcerans is associated with its TD production, vaccination using the diphtheria toxoid has an unknown efficacy. This fact is due to the knowledge limitation of the bacterium molecular epidemiology. This limitation is mainly related to the structure of the tox gene, which has specific differences when compared to both interspecific (C. diphtheriae tox gene) and intraspecific in C. ulcerans .
2.4. Corynebacterium glutamicum
Bacterial strains belonging to C. glutamicum species are commonly found in the environment, in habitats such as soil. This bacterium is rod-shaped, capable of reducing nitrate to nitrite, facultative, mesophilic anaerobic and capable of carbohydrates fermentation. As a generally recognized as safe (GRAS) microorganism, it is widely used in the biotechnology industry, for its ability to produce amino acids like L-glutamate and L-lysine, used as flavor promoters and food additives . More than 2.5 million tons of lysine is produced annually by mutant strains of C. glutamicum, for animal nutrition, applications in the pharmaceutical, cosmetics, fuel and polymer industries .
The nutrients used for industrial fermentation by C. glutamicum include glucose, fructose and sucrose, derived from corn starch, cassava or wheat, as well as cane molasses and beet molasses. Obtaining sugar from raw materials and agroindustry wastes is very common in countries with high agricultural production, such as China, the United States and Brazil, its use reduces the industrial production process. Additionally, C. glutamicum strains are ideal for large-scale fermentation processes, since they are resistant to oscillations in oxygen tension and in the substrate supply, which often occurs in these industrial processes .
One of the factors considered in the selection of strains is the maximum theoretical yield of a cell within production of lysine from glucose. This yield should turn around 75% conversion of carbohydrate into final product. Metabolic flux analysis, considering the main metabolic pathways that can be used by C. glutamicum to produce lysine, indicate that the theoretical yield is increasing, producing more than two million tons of amino acids per year .
3.1. The impact of next sequencing technologies on genomics of the genus Corynebacterium
Forty years ago, the advent of DNA sequencing by Sanger method began to revolutionize genome studies . The first genomes to be sequenced were viruses and organelles. In 1995, Craig Venter and colleagues published the two first complete bacterial genomes: Haemophilus influenzae and Mycoplasma genitalium [29, 30]. Later, several sequencing projects were created, which transformed the biology as a whole, by means of allowing to decipher complete genes and later whole genomes using the methodology developed by Sanger and colleagues in 1977 .
The publication of the first draft of the human genome in 2001 prompted companies to develop new sequencers that would provide more speed and accuracy, as well as cost and labor savings . Since 2005, new sequencing technologies, called next-generation sequencing (NGS) or high-performance sequencers, have been able to generate gigabases (Gb) of data in a few days, (e.g. Illumina, Ion Torrent, Single Molecule Real Time-SMRT, PacBio, and Oxford Nanopore) . Hence, the public domain databases have had, since registered the emergence of NGS platforms, an exponential increase in the number of deposited biological sequences, with more than 144,000 bacterial genomes already registered .
Currently, the genus Corynebacterium has more than 265 genome projects registered in public databases. According to the GOLD website, a database that provides project information in all three domains of life, Corynebacterium genome deposits date back to 2007 . Since then, the increase of these data positively impacted the development of studies with transcriptomic and proteomic approaches, in order to provide a better understanding of several molecular processes from different corynebacterial species.
3.2. Comparative genomics studies
The remarkable growth of the number of complete genomes provided the advance in the comparative analyzes between genomes, allowing studies in large scale. Comparative genomics provide a global understanding of the gene repertoire of a given species or genus, in order to elucidate the essential genes that are involved in processes such as replication, transcription and translation, in addition to the genes considered as accessory, that are also important for the characterization of variabilities in their genetic patterns, as well as allows the analysis of the genomic plasticity .
In another aspect, comparative analyzes between different strains within the same phylogenetic clade make it possible to recognize similarities and differences among genomes, to clarify which sequences are capable of diverging phenotypic changes in organisms, and to elucidate the mechanisms of virulence among pathogenic organisms or in in the case of environmental microorganisms. From this premise, the pan-genome concept emerged .
Regarding C. ulcerans, a study was conducted with 19 strains identifying 4120 genes composing the pan-genome, of which 1405 were present in the core genome and 2715 present in the accessory genome, where proteins involved in the pili formation and the tox gene were found in a large part of the genomes. Furthermore, variations between the transmembrane proteins and proteins secreted among the different species have been identified, contributing to the variability of the pathogenicity between them. This study made a greater understanding possible, regarding the knowledge around the virulence of this emerging pathogen .
The pan-genome is constituted by the core genome, which configures the genes present among all analyzed strains; the accessory genome that shares genes between two or more, but not all, strains and includes the genes the bacteria needs to survive in a specific environment, in addition to species-specific genes belonging to a single lineage, which can be acquired via horizontal transfer [37, 39]. The representatives of the genus Corynebacterium become an interesting object of studies of comparative genomics and evolution, due to its diverse lifestyles .
This approach was used in C. jeikeium by comparing 17 plasmids from different clinical isolates, which identified that plasmid pK43 can act as a natural vehicle for gene transfer conferring antimicrobial resistance between multiresistant strains and possibly between other members of the corynebacteria group, such as C. diphtheriae .
In C. pseudotuberculosis, the pan-genome of 15 strains revealed differences between the biovars of this species, in which the biovar ovis presented clonal behavior, while the equi group has a greater genetic diversity . Recently, a study with strains isolated from equines was analyzed and corroborated the diversity of the biovar, also presenting a wide repertoire of resistance genes and virulence factors such as: beta-lactamases, recombination endonucleases and phage integrase .
In a comparative analysis between Corynebacterium jeikeium, Corynebacterium urealyticum, Corynebacterium kroppenstedtii, Corynebacterium resistens and Corynebacterium variabile, it was possible to identify 83 regulatory genes, being 56 of transcriptional DNA binding regulators and nine sigma factors. Furthermore, 44 regulatory proteins were identified that were present in the core genome. These genes shared by the strains are involved in the generation of short-chain volatile acids, which are related to the odor formation process of the human body, showing the importance of this approach in lipophilic corynebacteria .
Codon deviation studies can aid in the understanding of the evolutionary molecular basis through parameters such as gene expression, amino acid conservation and codon-anticodon interaction. These factors reveal the type of selective pressure in eukaryotic and prokaryotic genes. In order to understand the molecular evolution of the genus Corynebacterium, comparative analyzes of G + C content and codon use were carried out relating different species, revealing evolutionary relationships that allowed divergence between the groups of pathogenic and nonpathogenic corynebacteria .
The genomic approach allowed to know the sequence of DNA of a certain organism, though, only this knowledge does not define the gene function to external stimuli. A protein to be synthesized primarily needs the DNA to be transcribed into an RNA molecule, later translated into a protein molecule. However, the genes are not active all the time in the cell, and they are expressed when necessary to act in cellular biological process. The set of genes are expressed in a cell under a certain physiological condition or stage of development at a specific time is called transcriptome .
Studies that address the transcriptome technique aim the analysis of the collection of all transcripts and provide information about the regulation of the genes, too allow inferring functions of uncharacterized genes, helping to understand the biology of the organism analyzed. One of the applications obtained by this approach is the usage of the data generated to provide more information about the host defense response to the survival and proliferation of bacterial pathogens, which enables an understanding of the pathogenesis of infectious diseases .
Due to the diverse applications of transcriptomics, new technologies and high-throughput methods have been developed for large-scale analysis, such as hybridization-based method (Microarray) and sequencing-based methods such as RNA sequencing .
Microarray technology is considered a large-scale method because it generates the expression profile of thousands of transcripts simultaneously. Studies with microarray technology have identified clusters of genes that are involved in specific physiological responses, through the variations of environmental conditions faced by microorganisms , such as ammonia limitation. This compound is used as a source of nitrogen that is essential for almost all complex macromolecules in bacteria. A study analyzed the response of C. glutamicum in ammonia-limiting medium, demonstrating that there was alteration in the expression of 285 genes, many of which encode transport proteins and proteins involved in metabolism, nitrogen regulation, energy generation and protein turnover .
Other studies with C. glutamicum were carried out aiming to evaluate the level of gene expression essential to the survival of the bacteria in stress environments. The transcriptional profile of this species under growth conditions with citrate as a source of carbon and energy compared to glucose demonstrated that citM and tctCBA encoding citrate uptake systems were induced, while the ptsG, ptsS and ptsF genes encoding the glucose capping system were repressed. Additionally, genes encoding tricarboxylic acid, malic enzyme, PEP carboxykinase, gluconate-glyceraldehyde-3-phosphate dehydrogenase and ATP synthase cycle enzymes were induced .
The microarray technique provided an advance in the research with important organisms, such as the members of the genus Corynebacterium. Nevertheless, this technique has some limitations, such as high noise interference, inability to detect transcripts with a low number of copies per cell, low coverage of transcripts, and dependence on prior knowledge about the genome for the preparation of the probes, consequently generating little information about the transcript sequence .
As a result of these limitations and the advent of NGS platforms, a promising alternative technique was developed, RNA-seq. Through this technique, it was possible to obtain more accurate, fast and reliable analyzes from cDNA sequencing. The advantages of this method are: low occurrence or absence of interference, detection of small transcripts that would not be detected by other methods, low cost and reduction of time and work to prepare the samples. RNA-Seq is considered an ideal tool for the analysis of complete transcriptomes and is applied in the exploration of expression profile, and characterization of differentially expressed genes. Thus, it represents an important tool to uncover the mechanisms of virulence and pathogenicity in microorganisms [51, 52].
Relating to this, two studies with C. pseudotuberculosis simulating the stress conditions faced by the bacterium during infection in host were performed. The first study was with strain C. pseudotuberculosis 1002, biovar ovis, which underwent three stress conditions: thermal, acidic and osmotic. Most of the identified targets were related to oxidation and reduction, cell division and cell cycle, and the stimulon of the three stresses presented induced genes that participate in the mechanisms of virulence, defense against oxidative stress, adhesion and regulation, revealing that they have important role in the infection process . The other study, with strain 258, biovar equi, was performed using the thermal stress condition, similar to the conditions performed on strain 1002. Herein, 113 genes were considered induced, in which hspR, grpE, dnaK and clpB were highlighted due to its expression rates and participation in the mechanism of adaptation of the pathogen to high temperatures .
Recently, the first analysis of RNA-Seq with C. diphtheriae was developed, in which it was sought to investigate the alteration of the transcription profile between a wild strain and a ΔdtxR mutant, also to detect the operon structures from the transcriptome data of the wild type strain. The authors revealed that approximately 15% of the genome was differentially transcribed and that DtxR may also play a role in other regulatory functions, in addition to regulating the metabolism of iron and diphtheria toxin. Finally, they identified 471 operons subdivided into 167 sub-operon structures .
One of the representatives of the genus that had the gene expression regulation most studied is C. glutamicum. The RNA-Seq approach elucidated the regulatory mechanisms of several industrially relevant compounds, such as the dissolved oxygen concentration (DO), which is important in industrial microbial processes, providing new information on the relationship between oxygen supply and bacterial metabolism .
In relation to the production of amino acids, L-lysine-producing C. glutamicum ATCC21300 obtained 543 differentially expressed genes compared to wild type C. glutamicum ATCC13032, highlighting bioA, bioB, bioD, NCgl1883, NCgl1884, and NCgl1885 involved in metabolism or transport of the biotin, of which the bioB gene was hyper expressed about 20-fold, and when it was discontinued, lysine production was reduced to approximately 76% and the genes NCgl1883, NCgl1884, and NCgl1885 were repressed . Genes involved in the production of L-valine were also analyzed, in which 1155 differentially expressed genes were identified, where ilvBN, ilvC, ilvD, and iLvE were hyperexpressed, resulting in the improvement of the carbon flux used to produce valine. Thus, the work involving this approach helps to better understand C. glutamicum for the generation of biotechnological products .
The RNA-Seq technique also can be applied for identification of operon structures, although this approach requires a reliable genome annotation and low gene rate with unknown function. Through these data, transcription initiation sites (TSSs) can be identified and corrected, allowing a more detailed analysis of the promoters and classifying them according to their location in relation to the protein coding regions (CDs). For example, see [59, 60].
The central base of molecular biology involves understanding how cells work and interact among each other. These cellular processes occur through the activity of biomolecules that act together throughout specialized mechanisms. This whole process involves storing the genetic information in the DNA molecule and the unidirectional flow of this information to the RNA and proteins. Proteins make up a large part of the cell molecular machinery, and the overall analysis of them provides the information needed to understand how cells work. This analysis is referred to as proteomics .
In 1995, the term “proteome” was taken as the set of proteins produced by a cell or tissue at a given time and condition . As early as 1996, the term “proteomics” appeared to define the large characterization of all protein contents of a cell line, tissue or organism . The study of the proteome currently refers not only to the knowledge of the protein content of a given organism in a given condition, but also includes the quantification, location, modifications, interactions and function of these proteins [64, 65].
This area has three strands: expression, structural and functional. Expression proteomics generally involves studies to investigate the pattern of protein expression in abnormal cells. This classification encompasses studies of qualitative and quantitative expression analysis of total proteins under two different conditions. The second analyzes the three-dimensional conformation and structural complexities of functional proteins. This strand makes it possible to identify all the proteins of a complex system and characterize the possible interactions of these proteins and protein complexes. Functional proteomics reveals the function of proteins based on their interactions with specific protein complexes and the detailed description of cell signaling pathways to which they are involved .
The most used methods for the identification and quantification of proteins are those based on mass spectrometry (MS). This technique allows the detection of compounds by the separation of the ions through mass-charge ratio. As each compound has a unique fragmentation pattern, the samples are ionized and separated with further identification of this pattern. Generally, two MS-based methods are currently most commonly used. The first method involves two-dimensional electrophoresis (2-DE) followed by staining, selection and MS. The other method involves isotopic markers to label proteins, separation by multidimensional liquid chromatography and MS analysis [65, 67, 68].
A typical proteomic experiment involves the step of preparing the sample, consisting of separating and isolating the proteins from the cell lysates, followed by separation of the protein mixture, and then the individual portions can be analyzed. Analysis may involve the bottom-up strategy and the top-down strategy. The first involves obtaining the peptides by enzymatic digestion of protein solutions and the subsequent separation of these peptides by liquid chromatography and MS analysis. In contrast, the top-down strategy involves the analysis of intact proteins by MS. For the quantitative determination of proteins two approaches are most commonly used: two-dimensional electrophoresis followed by staining, selection and identification by MS; and isotopic markers followed by protein separation by multidimensional liquid chromatography and MS [68, 69].
Proteomics studies involving the genus Corynebacterium mainly comprise studies with C. glutamicum, due to, in large part, its industrial importance in the production of amino acids. This species is investigated in relation to its genetics and physiology and, consequently, a diversity of information about its molecular biology and biochemistry available, including a variety of proteomic techniques. In consequence of the membrane organization with high concentration of mycolic acids, C. glutamicum has been used as a model for development and new proteomic technologies .
Proteomics analysis have also been used as an alternative to traditional molecular methods for the characterization of poorly known bacteria, especially those of clinical interest, by reason of the ability of these methodologies to provide a fast and reliable identification of these species. The matrix-assisted laser desorption/ionization-time of flight mass spectrometry (MALDI-TOF-MS) technique was able to detect strains of Corynebacterium argentoratens, Corynebacterium confusum, Corynebacterium coyleae, Corynebacterium imitans, Corynebacterium kroppenstedtii, Corynebacterium mucifaciens, Corynebacterium riegelii and Corynebacterium ureicelerivorans isolated from different clinical samples, such as blood, wounds and abscesses, also from respiratory, genitourinary, digestive tract, among others. These analyses show that there is a tendency for clinical laboratories to integrate proteomics in order to obtain faster and more sensitive results for the diagnosis of infections caused by rare bacteria .
In the case of pathogenic corynebacterial species, the repertoire of secreted and surface-exposed proteins, the exoproteome, have been documented because of their potential to act as antigenicity and virulence factors, since these molecules are promptly exposed to the host cells, making they are suitable for the use of vaccine and drug targets. A recent study investigated both surface and extracellular proteome of two C. ulcerans strains, where NanoLC-MS/MS was performed to analyze the set of proteins, which were similar expression patterns of putative virulence factors .
The mapping of the extracellular proteome of C. diphtheriae through 2-DE and MALDI-TOF-MS detected proteins present in pathogenicity islands. According to these tests, possibly, the exoproteome of this pathogen is constituted of two distinct classes. The first involves molecules that have functions in the cytoplasm related with cell viability, such as protein synthesis and folding and detoxification mechanisms. The second class appears to be actively secreted and includes iron transporters and possible virulence factors that can be used in new vaccines .
The exoproteome of C. pseudotuberculosis has also been extensively characterized in the past years . The use of transposon-binding proteins was investigated through a method of data-independent LC-MS acquisition (LC-MSE), used for proteins identification and quantification that was applied to compare the exoproteome of two biovar ovis C. pseudotuberculosis strains, C231 and 1002, where there were found 44 presents in both isolates in a total of 93 extracellular proteins .
Further, the combination of different proteomic methodologies as the 2-DE along with MALDI-TOF/TOF allowed the finding of 11 novel molecules in the C. pseudotuberculosis exoproteome, noncharacterized on the first comparative work . The integration with in silico approaches also gives important insights about the behavior of the exoproteome. Pan-genome analysis can be performed to predict the set of exported proteins present in a large number of genomes available on public databases .
The proteomic map of a C. jeikeium strain was examined through 2-DE and MALDI-TOF-MS, through peptide mass fingerprinting (PMF), a high throughput protein identification methodology in which a protein is digested with endoprotease to produce the small constituent peptides. In this investigation, most spots were associated with functions essential for cell viability, such as protein synthesis and energy production, as carbohydrate, lipid and nucleotide metabolism. The surface proteins SurA and SurB, the adhesin CbpA and Che cholesterol esterase, known to act as virulence factors were also identified in the extracellular proteome .
In addition to these efforts, structural characterization methods for protein elucidation have also been used. The DtxR repressor is activated by transition metal ions and acts on the modulation of tox gene expression in C. diphtheriae. Through X-ray crystallography, it was possible to determine the general architecture of this biologically active Ni(II) bound protein with a resolution of 2.4 Å . In C. pseudotuberculosis, the ArgR protein that acts as a regulator of arginine biosynthesis, an important metabolic pathway for bacteria, had the C-terminal domain crystal structure determined from X-ray diffraction with a resolution of 1.9 Å. The interest in this molecule lies in the fact that it participates in a pathway that is absent in its hosts, which makes it a potential target for the design of new drugs .
One of the greatest challenges of the postgenomic era is the amount of data generated through the different approaches, as well as the functional characterization of proteins. In this context, the analysis of protein-protein interaction networks (PPI) has been used for the identification of essential proteins and discovery of new therapeutic targets. This computational method is based on biological data topology information according to known interaction patterns to predict new interactions between molecules, where nodes represent proteins and the edges represent the predicted interactions .
The inter-specific PPI networks of C. pseudotuberculosis were constructed from proteins conserved in multiple pathogens, such as M. tuberculosis, Y. pestis, E. coli, C. diphtheriae and C. ulcerans, where the interaction network of the protein acetate kinase (Ack) was indicated as a possible new broad-spectrum therapeutic target . Later, another study involving the interactome of C. pseudotuberculosis was developed, where the networks were constructed, revealing nonhomologous proteins to humans, cattle, goats, sheep and horses. The fact that such proteins predicted by the PPI result are essential to the pathogen, but not to the hosts, makes them important candidates for use as targets for the synthesis of new drugs .
Corynebacterium comprises several Gram-positive species known mainly for their pathogenic and biotechnological potential. Due to the advent of the NGS platforms, several strains of the genus have had their genomes sequenced in recent years, providing significant advances in the understanding of pathogenic mechanisms, metabolism, regulation, adaptation and evolution, among other aspects of these bacteria behaviors. Through genome projects, it was possible to better understand molecular functions and biological processes of several genes, to know the genomic architecture of different isolates, as well as to compare them at a DNA level, making these studies essential for the execution of more complex approaches. Transcriptomics, for example, has been employed in a wide variety of studies in order to fully and clearly understand the modulation of expression of genes of interest to different stimuli. Also, proteomic analyses provide a more complete and advanced knowledge in the study of biological systems. Hence, the field integration of the genomic era has provided valuable insights, aiming at a deeper understanding of various corynebacteria.
We thank the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—CAPES (88881.068052/2014-2101) for the financial support on this work.
Conflict of interest
The authors declare the absence of any conflict of interest.