Websites covering TB facts, information, and treatment research.
Since the first edition of this book in 2013, many new tools and databases have become publicly available, as well as several have been discontinued. Here, we present an updated version of web resources on tuberculosis, providing more detailed information on some key concepts. However, the purpose of this chapter is by no means to offer an exhaustive list of all the resources available on the Internet about TB, the topic of this book. This would be a massive and perhaps futile work since the evolution of the Internet occurs at a very fast pace. Rather, this chapter concentrates on a selection of the most important, relevant and stable websites with relevance to several aspects of TB, such as research, treatment, main institutions, funding, and specialized platforms. We think this should complement all the other information already presented in this book, offering the reader a more integrated view of the disease, as well as access to new platforms and systems specialized in the analysis of data generated by a series of new technologies such as DNA sequencing.
The Internet has quickly become the universal exponent of the digital world as we know it. Today, billions of people around the world use the Internet as their primary source of information and benefit from electronic mail systems, information distribution, file sharing, multimedia streaming services, and online social networking, to cite just a few examples. With the popularization of smartphones, access to the Internet has become even more frequent, and many people remain connected all the time. This medium is an unprecedented form of communication in the history of humanity, capable of bringing information from virtually anywhere almost instantaneously.
Scientists have been pioneers in benefiting from global interaction and globalized information. Nowadays, we deal with algorithms, data mining, terabytes and petabytes of data, teraFLOPS and petaFLOPS to measure computer performance, distributed computing, cloud computing, and terms and aspects of what we call big data, referring to data sets with huge sizes, exceeding the capability of commonly used software tools to manage and process these data within a feasible time frame. This kind of data stems from a variety of new (as well as old) high-throughput technologies, employed by researchers in different fields, such as astronomy and (recently) biology, as well as from information technology companies, such as Amazon, Facebook, and Google, among many other examples. For instance, the Large Hadron Collider, a particle accelerator at CERN, Switzerland, generates data on the order of terabytes per second.
However, the global connection is not restricted to the virtual world: with the ease of intercontinental travels, infectious agents can traverse the whole world in one single day. Tuberculosis (TB) is a disease considered to be at high risk for people who maintain prolonged and repeated contact with the host. Therefore, the risk of infection when traveling is reduced, but some conditions might present as a risk factor. Immunocompromised travelers (whether by the disease itself or by medication), smokers, children under 5 years old, and healthcare workers are particularly vulnerable to TB. Before traveling, the individual should consult a specialist to assess the risks of contracting TB in the area of stay, checking the incidence of TB and, especially, of resistant strains. Prophylactic measures should be taken when necessary, and medical follow-up should be provided when returning from the trip. These procedures are critical to containing the global circulation of TB strains .
TB is a pandemic disease, with an estimated one-third of the world’s population contaminated by the bacillus Mycobacterium tuberculosis (Mtb). Although treatable, patients without proper follow-up usually abandon the therapy as soon as they feel better. This and the indiscriminate use of antibiotics are causing the emergence of new drug-resistant strains, with patterns of epidemic or outbreak resembling the dissemination of a new piece of information throughout the Internet.
On the other hand, there has also been a revolution in other areas. High-performance technologies, such as genomics, transcriptomics, proteomics, and metabolomics, for instance, offer a new and more integrated view of the genetics and metabolism of targeted organisms. Currently, thousands of mycobacterial genomes are already sequenced or are in progress. Hence, for example, comparing genomes of virulent and non-virulent strains of Mtb, scientists can pinpoint genes and/or polymorphisms possibly involved in pathogenesis; similarly, analyzing transcriptome data, researchers might gain an idea of the effects caused by a given drug in the bacillus’ metabolism.
Since the Internet content is continually updated and changed, providing an exhaustive list of all the resources available online concerning TB would be a cumbersome and perhaps futile work. Hence, in this chapter, we present a selection of the most relevant and stable websites exploring several aspects of TB, such as treatment, research, and funding, or providing access to analytical tools, databases, or specialized platforms for mycobacterial research. We hope this chapter can offer the reader an overview of online, publicly available, computational resources that could help us to fight TB.
2. Bioinformatics and omics
Bioinformatics is an interdisciplinary field that involves distinct areas, such as biology, computer science, mathematics, and statistics, comprising an extensive list of activities, such as research, development, or application of computational techniques or tools to acquire, store, organize, analyze, visualize, and integrate biological, medical, behavioral, or health data and information. Bioinformatics addresses scientific questions imposed by biological phenomena applying several analytical approaches such as image processing, computational simulations, network analysis, and data mining, among others, to perform comparative genomics studies, gene expression analysis, structural protein analysis, phylogenetics, and metabolic networks, just to cite a few examples .
However, bioinformatics preceded the term, as its origins can be traced back to the early 1960s when computers became essential tools in the field of molecular biology, as well as in all other research areas . At that time, Margaret Oakley Dayhoff, a biochemist with a PhD degree in quantum chemistry, pioneered the development of computer methods for comparing protein sequences and for inferring evolutionary histories based on protein sequence alignments. Among many other significant scientific contributions made during her career, she cataloged all known protein sequences and made them available to the scientific community, publishing in 1965 the Atlas of Protein Sequence and Structure, containing sequence information on 65 proteins, considered the first molecular biology database .
Indeed, toward the end of the 1960s, several algorithms and computer programs were already available for analyzing structure, function, and evolution of nucleotide and protein sequences, as well as rudimentary protein databases [3, 5]. In the following decades, new computational methods and approaches were introduced such as algorithms for sequence alignments, public domain databases, efficient data search and retrieval systems, sophisticated methods for protein structure prediction, tools to automatically annotate genes and genomes, and systems for functional genome analysis, among others .
The cost of DNA and RNA sequencing has been gradually decreasing over the years, and the improvement achieved with new sequencing technologies allows sequencing the whole genome or transcriptome of an organism in few hours, creating new avenues for biological research. However, genome sequencing involves the generation of a massive amount of data, in the order of hundreds or even thousands of gigabytes. In the same way, but with higher cost and some technological limitations, mass spectrometry and magnetic resonance imaging are also capable of identifying thousands of proteins and metabolites of an organism, respectively. In this context, new “omics” (an informal term that collectively refers to research fields of biology ending in -omics, such as genomics, proteomics, or metabolomics) approaches have arisen, involving not only the production of sequence data but also the analysis, treatment, and interpretation of biological data on a large scale . The following sections briefly present some of the main “omics” approaches, exemplifying their application in TB research.
Genomics is a field of biology that seeks to understand the structure, functions, and evolution of genes and genomes analyzing large-scale genomic data obtained with high-throughput sequencing technologies. For instance, whole genome sequencing (WGS) and comparative analyses of genomic segments (chromosomes or syntenic regions) or the entire genome content (protein-coding genes, RNA-coding genes, pseudogenes, non-translated regions, etc.) allow the investigation of complex biological phenomena, giving insights into genome evolution and adaptation over time or even helping to improve clinical diagnoses. In the last two decades, WGS has become a routine among researchers, and currently there are thousands of completely sequenced genomes available in public databases. In fact, the availability and accessibility of genomic sequences are increasing rapidly, with the results generated by teams of researchers in collaborative networks or even in independent laboratories, supporting many essential analyses in the field, such as comparison of biological sequences and searching for similarities between them, enabling the inference of functional and evolutionary relationships between genes, gene families, and genomes [8, 9].
For instance, comparative genomics, together with the combination of archeological data and DNA sequencing, has already established a plausible evolutionary scenario for the origin of the principal etiological agent of TB, the pathogen Mtb. Although it is a millennial disease, drugs for the treatment of TB have appeared less than 100 years ago. Since then, multidrug-resistant (MDR) strains have emerged, as well as extensively drug-resistant (XDR) strains, and more recently, several countries have reported cases of total drug-resistant (TDR) strains, frustrating the efforts to fight TB. A typical feature of bacterial agents that develop drug resistance is the occurrence of lateral gene transfer (LTG), but comparative genomic studies did not find evidence of LTG in Mycobacterium tuberculosis complex (MTBC) organisms. Thus, there is a pathogen adapted for living in human host cells, able to remain for a long time as a latent disease, surviving in the host and adapting to yield a persistent infection, often immune to treatments. The primary challenge is then to develop an effective vaccine against most Mtb strains, which would be possible by targeting conserved elements .
Transcriptomics is the analysis of gene expression through the sequencing of RNA in large scale (RNA-Seq). Transcriptome, the whole set of RNA transcripts of a given organism, organ, tissue, or cell lineage, contains different types of RNAs. Thus, transcriptomics provides essential information of the biological sample under analysis, allowing both quantitative and qualitative approaches to gene expression, providing a profile of all coding and noncoding transcripts, in specific conditions. RNA-Seq technologies have been improved gradually, including novel techniques such as single-cell RNA-Seq, in which individual cells of interest obtained from culture, tissue, or dissociated cell suspensions are isolated, converting RNA into cDNA and sequencing of cDNA libraries. However, transcriptomics still faces too many challenges. RNA-Seq produces very short reads and presents a high error rate, yielding a tremendous amount of data from massively parallel sequencing, requiring significant computational resources, as well as specific algorithms and software, to analyze it [11, 12, 13].
Mtb has an extensively resistant cell wall, can adopt an opportunistic switching over to latency, and has many strategies that fool the host’s immune system, compromising the effectiveness of therapeutic approaches available. The analysis of Mtb transcriptome signatures during infection, for instance, provided by the genome-wide expression profile, showed the expression of numerous genes used to evade the host immune responses, suitable to the intracellular lifestyle, and to respond to various antibiotic drugs .
Proteomics establishes a global analysis of a cell’s proteins. Gathering information about the proteome and comparing it with genome and transcriptome data is the way to understand the functioning of the cell. The physicochemical properties of highly diversified amino acids, protein modifications, and degradations and the interconnectivity of proteins in complexes are examples of difficulties encountered in collecting data compared to the other “omics” sciences. Nowadays, mass spectrometry is known as the state-of-the-art proteomics and has been improved with the development of instruments, sample preparation, and new analytical software. Mass spectrometry is capable of characterizing almost a complete proteome, revealing the profiling of the expressed proteins at the cellular and subcellular levels, providing knowledge of the functional status of a cell in response to environmental stimuli [15, 16].
Among all sequenced Mtb strains so far, with different genotypes and phenotypes, only a few have a complete identification of protein-coding regions, lacking the knowledge of protein functions that play roles in the physiology of mycobacteria. This information would be essential to allow, for example, understanding the causes that lead to mechanisms of mycobacterial pathogenicity and drug resistance. Even then, proteomics has already enabled us to know some aspects of virulence and its mechanisms of action in Mtb strains. The next step for proteomics contribution in mycobacteria research, the analysis of the immune response of the host, is one of the ways to establish new treatment programs, especially in the current scenario of drug resistance .
Metabolomics encompasses the study of metabolism within a living cell, that is, it deals with the identification and analysis of biochemical reactions products and their processes within the cell. The metabolome is the set of these products (metabolites), and the metabolism analysis can reveal the cell’s organic response. Metabolic analysis is a valuable resource for identification of specific metabolite biomarkers, which would help, for example, evaluating response to drugs or stress agents. Magnetic resonance spectrometry (NMR) and mass spectrometry (MS) are the appropriate technologies for accurate measurements of metabolites, providing a good phenotype representation of any cell [18, 19].
Being complementary to other “omics,” the metabolomics fills some gaps to a better understanding of diseases caused by mycobacteria. It is known that mycobacterial exosomes can be used as biomarkers, since they come from infected cells, containing mycobacterial proteins, lipoarabinomannan, and metabolites. They have been used for TB diagnosis and in the research of new vaccines. It is important to emphasize that the signatures of TB biomarkers must be validated in geographical and ethnical context, given the worldwide and diversified nature of the strains, besides concomitant infections such as malaria and HIV, as they affect metabolic biosignatures [19, 20].
2.3. One example of integration of omics
An effective vaccine for all strains of TB and the development of therapeutic approaches appropriate to the variations and stages of the disease are the main goals of current research in TB. The achievement of the genomic expression catalog of a global collection of BCG vaccine strains, comparing genomes and transcriptomes of 14 of the most widely used BCG strains, is one example of interaction between “omics” sciences in TB research, contributing to display evidences for highly diverged metabolic and cell-wall adaptations. Moreover, quantitative proteomics has identified the major differences in protein expression, when changes observed in the proteome confirmed the changes observed in the transcriptome, showing how the adaptation to the environment causes phenotypic differences between BCGs .
3. Tuberculosis facts, information, and treatment research
This section presents websites covering diverse information about TB, including history, pathogenesis, transmission, epidemiology, diagnosis, treatment, and infection control, listed in alphabetical order (Table 1). They also provide information about courses, tokens, and links to other websites and general guidelines. We do not prioritize any of them since every effort made to combat this disease has become important because of its comprehensiveness.
|Americas||American Lung Association (ALA) Lung Disease Programs|
|American Public Health Association|
|Bill & Melinda Gates Foundation|
|Centers for Disease Control and Prevention, Division of Tuberculosis Elimination (CDC-DTBE)|
|Food and Drug Administration (FDA)|
|Global Tuberculosis Institute|
|Institute for Tuberculosis Research|
|National Institute of Allergy and Infectious Diseases (NIAID)|
|National Library of Medicine, PubMed|
|Pan American Health Organization (PAHO)|
|Desmond Tutu TB Centre|
|South African Tuberculosis Vaccine Initiative|
|Asia and Oceania||JATA, Research Institute of Tuberculosis|
|National Institute for Research in Tuberculosis|
|Pakistan Anti-TB Association|
|Europe||European Tuberculosis Surveillance Network|
|International Union Against Tuberculosis and Lung Disease (UNION)|
|Max Planck Institute for Infection Biology|
|National Health Service in England|
|Global||World Health Organization (WHO)|
4. Tuberculosis databases and computational tools
Bioinformatics had its origins in the 1960s when computers became essential tools in research. Since then, numerous computational resources have been created, providing the scientific community different analytical tools to interpret a range of biological data. Collectively, these online resources are publicly available and are dedicated to acquire, store, organize, analyze, visualize, and/or integrate the ever-increasing amount of biological data originated from scientific experiments, scientific literature, high-throughput technologies, and computational analyses.
In this section, we provide a selection of online publicly available resources (entirely or partially) dedicated to mycobacteria causing tuberculosis, categorized according to its purpose and functionality (Table 2). Each category is quickly reviewed, presenting the reference to the original paper describing each computational tool, as well as its electronic address.
|Generic and multifunctional resources||Resources for functional and evolutionary genomic study of the genus Mycobacterium, comprising extensive literature review and data annotation on mycobacterial genome polymorphism, virulence factors, in silico generated and manually reviewed information on the complete genome sequence of these organisms, and essential genes||MyBASE|||
|The MycoBrowser portal|||
|Genomic mapping and functional annotation||Systems supporting functional annotation, including protein analysis, subcellular localization prediction, and mycobacterial membrane protein identification and characterization||MycoMemSVM|||
|Genolist (TubercuList, BoviList, BCGList)||[26, 27]|
|Comparative genomics||Collection of databases dedicated to mycobacterial comparative genomics, providing precomputed data of comparative genome analyses among selected mycobacterial genera, as well as inferred orthologous groups, functional annotations, and protein features||GenoMycDB|||
|Mycobacterium tuberculosis Comparative Database|||
|Genetic diversity and epidemiology||Focused on genetic diversity and epidemiology of MTBC, providing information of epidemiological data, strain lineage, genotyping, and phylogeny. They offer analyses of mycobacterial interspersed repetitive units (MIRU), single nucleotide polymorphism (SNP), long sequence polymorphism (LSP), spoligotyping patterns, IS6110-based restriction fragment length polymorphism (RFLP), and regions of difference (RD) profiles||CASTB|||
|Gene expression and regulation||Provide data on mycobacterial transcription factors, predicted operons, predicted transcriptional units, gene expression, and regulatory networks||CMRegNet|||
|Structural biology||These tools use three-dimensional models of mycobacterial proteins to provide information of domain assignments, functional annotation protein-protein or protein-small molecule interactions, and structural analyses of mutations potentially associated with drug resistance||CHOPIN|||
|Drug targets and resistance||Tools for diagnosis of drug resistance in tuberculosis, new vaccines, and drug targets||AuTuMN TB-modeling|||
|TB Drug Resistance Mutation Database|||
|TDR Targets database|||