Open access peer-reviewed chapter

Bioinformatics: Basics, Development, and Future

By Ibrokhim Y. Abdurakhmonov

Submitted: November 13th 2015Reviewed: April 20th 2016Published: July 27th 2016

DOI: 10.5772/63817

Downloaded: 2172

Abstract

Bioinformatics is an interdisciplinary scientific field of life sciences. Bioinformatics research and application include the analysis of molecular sequence and genomics data; genome annotation, gene/protein prediction, and expression profiling; molecular folding, modeling, and design; building biological networks; development of databases and data management systems; development of software and analysis tools; bioinformatics services and workflow; mining of biomedical literature and text; and bioinformatics education and training. Astronomical accumulation of genomics, proteomics, and metabolomics data as well as a need for their storage, analysis, annotation, organization, systematization, and integration into biological networks and database systems were the main driving forces for the emergence and development of bioinformatics. Current critical needs for bioinformatics among others highlighted in this chapter, however, are to understand basics and specifics of bioinformatics as well as to prepare new generation scientists and specialists with integrated, interdisciplinary, and multilingual knowledge who can use modern bioinformatics resources powered with sophisticated operating systems, software, and database/networking technologies. In this introductory chapter, I aim to give an overall picture on basics and developments of the bioinformatics field for readers with some future perspectives, highlighting chapters published in this book.

Keywords

  • bioinformatics
  • databases
  • molecular sequence analysis
  • software and analysis tools
  • bioinformatics training

1. Introduction

Biological data can be described as molecular sequence information and “wet-bench” experimented content of genome and gene product analyses [1]. Being an interdisciplinary branch of the life sciences, bioinformatics targets to develop methodology and analysis tools to explore large volumes of biological data, helping to store, organize, systematize, annotate, visualize, query, mine, understand, and interpret complex data volumes. It uses conventional, modern computer science and cloud computing, statistics, and mathematics, as well as pattern recognition, reconstruction, machine learning, simulation and iterative approaches, and molecular modeling/folding algorithms [1, 2]. The emergence and advances of the bioinformatics field, however, are tightly associated with the computerized programming and software developments needed for the handling and structural and functional analysis of large volumes of molecular sequences of DNA, RNA, proteins, and metabolites.

Presently, although still core for genomics and genetics field, bioinformatics became an umbrella for wider range of biological studies analyzing variety types of biological data, structuring, systemizing, annotating, querying, mining, and visualizing available biological information and a variety of biomedical text records [13]. Although drawing a fine line between bioinformatics and some other related fields is difficult because of increased applications of computers, statistics, and mathematics to scientific problem solving and experiments of life sciences, there should not be a misperception about bioinformatics description and objectives. Bioinformatics should not be mixed with, for example, biometry and biostatistics, development of DNA computers, or computerized generation and filing of data from imaging.

Bioinformatics also should be differentiated from related scientific fields such as biological computation and computational biology [1, 2]. Biological computation aims to develop biological computers using advances of bioengineering, cybernetics, robotics, and molecular cell biology. In contrast, bioinformatics develops and utilizes computational algorithms to understand and interpret biological processes based on genome-derived molecular sequences and their interactions [2]. Therefore, in many aspects, bioinformatics seems similar to computational biology objectives. A computational biology is concentrated on building and/or developing theoretical models for biological analyses [1, 2], whereas bioinformatics focuses on providing practical tools to organize and analyze basic genomic, proteomic and other “omics” data, including sequence analysis and its visualization [1, 2]. Admittedly, computational biology and bioinformatics both target to use genome data, for example, multiple sequence alignments and/or genome assembly tools. This makes distinctive boundaries of these two fields less distinguishable if their theoretical and practical scales are forgotten [2]. Thus, as mentioned above, the common core aims of bioinformatics are to handle, analyze, and interpret the genome-derived molecular sequence data and its organizational principles in broad scales/spectra of comparative, simulative, and evolutionary/phylogenetics perspectives. These tools are applicable and widely used for studies related to genetics, genomics, biochemistry, physiology, biophysics, all agricultural, medical, and environmental sciences as well as evolution, system biology, and artificial intelligence [110].

For instance, bioinformatics tools such as the comparative analysis of genomic and genetic data and/or signal processing help to interpret and understand the molecular and evolutionary processes [9] and interactions from large volumes of raw data in the field of wet-bench experimental molecular biology [1, 2]. In the “omics” fields, it helps to sequence and annotate genomes, and identify distinct patterns, mutation profiles, genetic epistasis, gene/protein expression and regulation, and gene ontologies [1, 2, 4, 811] as well as be instrumental in mining and querying the biological data and biomedical literature text [3, 4, 7]. When applied for system biology [2, 6], bioinformatics is a key instrument to analyze and catalogue the biochemical/genetic pathways and networks, which helps to integrate pieces of analyzed information to depict and model a full picture of the life processes. Application of reconstruction, pattern recognition, folding, simulation, and molecular modeling with bioinformatic tools can identify structural peculiarities and interactions of molecular sequences important for structural biology and medicinal drug design [12, 13]. All of these large scale, genome-derived, molecular sequence analyses of raw “Big Data” are impossible to be analyzed manually [1, 2]. This prompted the biology science research community to apply interdisciplinary methods and tools for “Big Data” analysis in combination with modern computing knowledge, which resulted in the emergence of novel interdisciplinary bioinformatics science. Let us, first, take a look the historic developments in the bioinformatics field.

1.1. History of emergence and development

Bioinformatics term was coined by Paulien Hogeweg and Ben Hesper in 1970 [2, 14]. Its meaning was very different from current description and referred to the study of information processes in biotic systems like biochemistry and biophysics [1416]. However, the emergence of bioinformatics tracks back to the 1960s. It was appeared in concordance with the development of protein sequencing methods from a variety of organisms and with the availability of protein sequences after Frederick Sanger determined the sequence of insulin in the early 1950s [17, 18]. New computer methods to analyze and compare a large number of protein sequences of different organisms were needed because handling many amino acid sequences manually was impractical. This led in compiling the first “Protein Information Resources” (PIR) [1, 19, 20] by Margaret Oakley Dayhoff and her collaborators at the National Biomedical Research Foundation [1]. Dayhoff's team successfully organized the protein sequences into distinct groups and sub-groups based on sequence similarity and percent accepted mutation (PAM) matrices [1]. This was published as protein sequences atlas [21, 22] that has been widely used in performing protein sequence alignments and database similarity searches [1, 2, 23]. This was pioneered methods of protein sequence alignment and molecular evolution [22]. In the 1970s, Elvin A. Kabat further contributed to bioinformatics development by his extended protein sequence analysis of comprehensive volumes of antibody sequences, released in collaboration with Tai Te Wu between 1980 and 1991 [2, 24].

With the objective of providing the theoretical background to immunology experiments in 1974, George Bell and colleagues initiated the collection of DNA sequences into GenBank [1]. During 1982–1992, the first version of GenBank was prepared by Walter Goad's group [1] and the efforts resulted in the development of presently known and widely used DNA sequence databases of GenBank [25], “The European Molecular Biology Laboratory (EMBL) [26], and DNA DataBank of Japan (DDBJ) [27] in 1979, 1980, and 1984, respectively [1]. Most important development in DNA sequence databases, however, was incorporation of web-based searching algorithms allowing researchers to find and compare the target DNA sequences. Such first developments and resulting computer software called “GENEINFO” and its derivative version of “Entrez” were developed by David Benson and David Lipman and colleagues [1]. This software allowed researchers to rapidly search database-indexed sequences and match them with queried sequence. Software became readily available through web-based interface of the National Center of Biotechnology Information (NCBI) database [28]. Molecular sequence analysis, comparison, and visualization methods have been improved, and many different methodologies have been contributed to bioinformatics advancements in this direction. Such advancements can be exemplified by the development of dot matrix and diagram methods [29], alignment of sequences by dynamic programming [30], finding of local alignments between sequences [31], multiple sequence alignment tools [3235], predicting the secondary structures of RNAs [36, 37], determination of evolutionary relationships of sequences [38, 39], and assigning the gene function based on sequence similarity of known function from models [40]. Development of FASTA [41, 42], BLAST [43, 44], and their various modifications [4547] has further powered the bioinformatics field and greatly improved the biological data analysis. Development of tools for predicting the putative protein sequences, structure, and function of proteins/genes based on DNA sequences [4858], completing full genome sequences, and building web-based genome databases for many prokaryotic and eukaryotic organisms [58] has provided a great advance in the bioinformatics field. In addition, rapid genome-wide gene expression profiling and analysis opportunities [5962], biological pathway assignment and identification, data storing, and mining and querying for large volume of biological datasets [6373] have further provided unprecedented popularity of bioinformatics in the scene of world science, which has been briefly reviewed below.

Figure 1.

Dynamics of bioinformatics-related publications over the past four decades. (A) Unquoted and (B) quoted keyword retrieved scientific publications from PubMed [74].

Since its emergence as an interdisciplinary scientific field in 1970, bioinformatics research has continuously increased over the past four-decade period. Unquoted search of keyword of bioinformatics in the PubMed database [74] has found nearly 181,000 scientific publications covering the period of 1958 to March of 2016. Repeating the search with the quoted keyword found 62,402 scientific publications over the four-decade period, demonstrating the starting point of increased publication efforts in the end of 1990s with its first raise in 2000/2001, following significant peaks in 2003/2004 and after 2013 (Figure 1). In this introductory chapter, I aim to give a brief highlight of these four-decade developments introducing the chapters presented in this book.

2. Bioinformatics help in handling and analysis of the genomics data, genome annotation, and expression profiling

Rapid and reliable determination of DNA molecules, because of the introduction of the sequencing technique of Sanger and Coulson [75] and Maxam and Gilbert [76], provided large-scale DNA sequence data that needed to be analyzed by computerized programming. This prompted the development of efficient bioinformatics methodologies. For example, a seminal effort of the Phage Φ-X174 [2, 77] and the Haemophilus influenza [2, 78] genome sequencing using shotgun sequencing techniques generated the sequences of many thousands of small DNA fragments, ranging from 35 to 900 nucleotides [2] and required the assembly of a complete bacterial genome. The ends of sequenced shotgun clones overlap and can be assembled using computerized similarity search algorithms into the complete genome although the assembly tasks are challenging due to the requirement for powerful computers with sufficient memory and issues of generating multiple gaps in assembled genome. Genome assembly algorithms are a critical area of bioinformatics research as fragmented genome sequencing methods have been the core approach for virtually all genomes sequenced today [1, 2].

Therefore, without bioinformatics tools, it is not possible to think about genome sequencing as present bioinformatics programs such as BLAST/sequence alignments not only provide rapid practical tools to handle, analyze, compare, relate, and visualize DNA sequences but also offer help with the sequencing process itself. The development of cost-effective, next generation sequencing (NGS) platforms [79, 80] has helped to completely decode nearly the entire genome of many different organisms including human and many other model and specialty organisms, or crop genomes with complex polyploidy levels within a short period. For example, according to the listings in the Genomes OnLine Database (GOLD) as of March 8, 2016, there were 79,650 genome sequencing projects of which 8018 were completed projects, 33,489 were permanent drafts, 35,609 were incomplete projects, and 1553 were targeted projects [81]. There are 73,000 organism, including archaea (1201), bacteria (55,303), eukaryotes (11,990), and viruses (4473), listed for sequencing. These numbers should be increased if the sequencing of the 100,000 whole-human genomes [82] is added.

Bioinformatics tools are needed in annotation and prediction of genes from sequenced genomes that requires computerized approaches because genomes are large to be manually annotated as mentioned above. Bioinformatics-based gene finding and annotation including a search for protein-coding genes, RNA transcripts, and other functional sequences within a genome is possible because there are patterns to recognize the start, stop regions, introns, exons, motifs, repeats, and other regulatory and sensory as well as signaling regions with some variations between genes and among organisms. With the availability and need for analysis of H. influenza genome, the first genome annotation computer program system was designed in 1995 by Owen White [2, 78], which provided tools to find the genes and identify putative functions of annotated sequences. White’s effort was basic for all currently available gene annotation and prediction software, which keep periodically improving [2].

Bioinformatics tools are very important to analyze gene and protein expression profiles. Large-scale sequencing of cDNA libraries has generated large volumes of serial analysis of gene expression (SAGE), expressed sequences tags (ESTs), massively parallel signature sequencing (MPSS), transcriptome profiling, or RNA-Seq, and various applications of multiplexed in-situ hybridization (microarray) profile data [8395]. All of these gene expression techniques are extremely noise-prone and/or subject to bias in the biological measurement, which requires application of statistical tools to separate signal from noise in high-throughput gene expression studies. In this context, chapter by Zhao et al. in this book reviews and discusses the main tools and algorithms currently available for RNAseq data analyses, discussing rapidly evolving RNAseq technologies such as stranded RNAseq, targeted RNAseq, and single cell RNA-seq. Moreover, Sripathy et al. have comprehensively discussed transcriptome profiling, RNAseq, and micro-RNA expression studies in cotton (Gossypium species), whereas Younis et al. present a chapter on skin microbiome, transcriptome, and microarray data analyses. In this book, readers can find an interesting chapter on bioinformatics challenges and tools for Hepatitis B genome analysis written by Bell and Kramvis, which highlight features of this small genome virus for bioinformatics analysis.

Similarly, protein microarrays and high-throughput mass spectrometry require bioinformatics analysis to identify proteins through the complex sequence similarity searches using protein sequence databases [96103]. Bioinformatics is a great help for analysis of gene regulation through searching and comparing the sequence motifs related to promoters and other regulatory elements. Using bioinformatics tools and sequence motifs/regulatory elements genes can be clustered by function, and the co-expression characteristics can be determined. Examples of such bioinformatics tools include k-means clustering, hierarchical clustering, and consensus clustering methods such as the Bi-CoPaM, and self-organizing maps (SOMs) that can identify functionally active sequences from very complex microarray datasets [104107]

Not only just these, bioinformatics plays a major role in data collection of the functional elements of sequenced genomes that use the next-generation DNA-sequencing technologies and genomic tiling arrays. This is best exemplified “Encyclopedia of DNA Elements (ENCODE)” [108] project developed by the National Human Genome Research Institute that describes the functional elements of the human genome. Thanks to bioinformatics and applications of its tools, genomes and genes, and protein sequences of different organisms can be rapidly compared, searched, and interpreted. In addition, mutations can be identified that help to judge and diagnose many complex human and plant diseases, crop traits, and interpret complex evolutionary process, such as genome duplications, polyploidization, adaptation, and speciation.

3. Structural bioinformatics: molecular folding, modeling, and design

One of the widely used applications of bioinformatics is identification of three-dimensional protein structures, molecular modeling, and folding to predict the possible function of proteins or other molecular structures, model behavior of molecules, fold the molecule to its native biologically functional three-dimensional structure, and design biomedical drugs for many complex human diseases. It helps de novo protein design, enzyme design, protein-ligand/drug docking, protein-peptide interaction, and structure prediction of biological macromolecules and macromolecular complexes [1, 2, 109].

From the coding DNA sequences, the primary structure of proteins can be easily determined that is vital in understanding the function of the protein(s). Further, based on homology patterns in primary structure of proteins and using homology modeling, important structural formations and interaction sites with other proteins can be determined. This helps to predict reliably the structure of a protein based on known structure of a homologous protein(s). Moreover, the identification of secondary, tertiary, and quaternary structures of proteins is very important to understand the function of proteins. The exact three-dimensional structure is essential for correct function, and a failure to fold into native structure generally produces inactive proteins or misfolded proteins that can be toxic [108]. Bioinformatics of protein folding includes (1) energy landscape of protein folding and (2) modeling of protein folding approaches [12, 13, 109].

One of the freely available and leading web server/stand-alone software tools for automated protein structure prediction and structure-based functional annotation can be exemplified by the “Iterative Threading ASSEmbly Refinement”(I-TASSER), which “first generates full-length atomic structural models from multiple threading alignments and iterative structural assembly simulations followed by atomic-level structure refinement” [110]. Using the I-TASSER, all above-mentioned functional and structural characteristics of proteins, including ligand-binding sites, enzyme commission number, and gene ontology terms can be explored in a comparative scale [110, 111].

Molecular modeling through molecular mechanistic and/or the quantum chemistry approaches is the key bioinformatics approaches to study the behavior of molecules. These are routinely used to investigate the structure, dynamics, surface properties, and thermodynamics of inorganic, biological and polymeric systems. It helps to explore conformational changes associated with biomolecular function, and molecular recognition of proteins, and membrane complexes. The protein folding, identification of catalysis sites of enzymes, and protein stability can be studied using molecular modeling. Vast different bioinformatics tools for modeling of biomolecules and designing are available [110112]. In this book, the chapter by Leong et al. presents bioinformatics modeling and tools for biological membranes using molecular dynamic simulations, all-atom, united-atom, and coarse-grained membrane models of lipids and proteins. In addition, in this book, by Filntisi et al. a computational method for the generation of antibody-drug through site-specific cysteine conjugation using structural prediction methods based on PDB files of a drug, linker, and antibody. Moreover, Bórquez and González-Billault have presented an interesting chapter on computational algorithms of predicting kinase-substrate relationships in protein kinases; this chapter compares prediction tools and methods and discusses improving substrate prediction with contextual information.

4. Biological networks and system biology

Watts and Strogatz in 1998 [113, 114] and Barabási and Albert in 1999 [115117] fueled the opinion that complex systems can be viewed as networks where components can be represented as nodes and they are linked through their interactions (i.e., edges). The properties of nodes and edges form the network topology. This approach has widely been applied to many scientific fields including bioinformatics that resulted in construction of large-scale biological networks denoted as “omes” like biome, interactome, microbiome [2, 6].

Above highlighted molecular sequence analysis, prediction and annotation, and molecular modeling-related bioinformatics approaches are also the core for building, organizing, and systematizing biological networks of molecules (e.g., metabolic, protein-protein interactions, etc.), and genetic and biochemical pathways of complex cellular processes. These include reception, signal transduction, and gene regulation and gene co-expression. Such molecular networks integrate many different data types including DNA sequences, regulatory RNA, proteins, secondary metabolites, gene expression data, and other small molecules, which may be all connected physically and functionally. The construction and organization of such physically and functionally connected molecular networks of cellular processes can be achieved only by applying the combination of simulative, iterative, and model-oriented bioinformatics approaches. Such biological networks are useful to analyze and visualize the complex connections of these cellular processes, helping understand other biological networks such as neuronal networks, food webs, between/within-species interaction networks, which are the central component of modern system biology [2, 6]. Examples of “omes”-related networks are the Kyoto Encyclopedia of Genes and Genomes (KEGG), BioCyc database collection, BRaunschweig ENzyme DAtabase (BRENDA), Reactome, Comparative Toxicogenomics Database, and many other [118] biological networks. Some biological network databases and their utilization in plant genomics/epigenomics have been discussed by the chapters of Sripathi et al. and Rahman et al. in this book.

5. Databases

An organized collection of data is referred to as database that aims to collect schemes, tables, queries, reports, images, and other objects. An access to information in the databases is provided by an integrated set of computer software, which is referred to as a “database management system” (DBMS) [119]. The DBMS allows users to access all of the data contained in the databases. It has general functions for data definition, entry, storage, update, administration, and retrieval of large quantities of information in an organized way that requires modeling (hierarchical and network models), clustering, query languages and query optimization, and visualization algorithms [1, 2, 119].

Development of databases, therefore, is significantly dependent on bioinformatics tools, advances, research, and applications. There is a large number of different types of databases available, which cover all aspects of biological data storage and organization. Some aforementioned databases such as GenBank, EMBL, DDJB belong to primary nucleotide sequence databases. There are meta-databases that incorporate data compiled from multiple other databases such as Entrez, mGen, Metascape, etc. Some others are specialized databases such as those specific to an organism, for example, TAIR, the p53 Knowledgebase (p53), the plant alternative splicing database (PASD); the plant secretome, and subcellular proteome knowledgebase (PlantSecKB) [119]. All databases vary in their data definition, usage, format, and access types. In this book, the chapter by Kadam et al. specifically describes databases and bioinformatics algorithms related to allergen informatics, discussing the concepts of allergen bioinformatics and the key areas for potential development in the allergology, whereas Bell and Kramvis highlight public sequence database for Hepatitis B virus. In this book, readers can find a comprehensive discussion for bioinformatics resources, including databases for plant “omics,” written by Rahman et al.

6. Software, analysis tools, services, and workflow

As mentioned above, astronomical accumulation of genomic and proteomic as well as metabolomic data, and their expression profiles and annotation, storage, organization, systematization, and integration into biological networks as well as database systems and their wide utilization by the science research community a priori required computer programming algorithms, analysis tools, services, and workflow systems. Therefore, software and analysis tools, and bioinformatics services and workflow have been the main fields and core targets of bioinformatics since its emergence. Because of the contributions of various bioinformatics companies or public institutions, bioinformatics software, and tools started to exist as simple command-line tools, but later improved to more complex graphical programs standalone packages, and web services. Since development of the first bioinformatics software and analysis tools for molecular sequence evaluations in the early 1980s, many free and open-source software tools have been developed and continue to grow and improve with the advancement made in genomics sciences [2, 120].

The main driving forces for the current and future development of bioinformatics software and tools have been made on the past-decade advances of genome decoding technologies, accumulation of large volume biological data, consequent need for their analyses, as well as advancements of computer technologies, graphics, visualization, and molecular modeling and networking techniques. Moreover, the availability of various open-source codes, shared object models, and community-supported plug-ins has facilitated gathering innovative ideas from the community and performing innovative in silico experiments on existing “Big Data.” These all-created golden opportunities for all research groups and bioinformatics companies to work, experiment, and develop more new generation of bioinformatics software and tools that are user friendly, capable of performing extended and integrated analysis with better visualization and graphical outputs. The range of open-source software packages includes titles such as UGENE, EMBOSS, GenGIS, GENtle, MOTHUR, BioPerl, PathVisio, BioJava, GenoCAD, Biopython, GeWorkbench, GenomeSpace, Bioclipse, .NET Bio, Apache Taverna, BioJS, Bioconductor, and BioRuby [121, 122].

Development of sharing models and web access tools is also an important bioinformatics objective that allows users to utilize and access bioinformatics tools over the internet and from their computer systems to the main computing resources via servers in other parts of the world. Simple Object Access Protocol (SOAP) [123] and Representational State Transfer (REST) [124126] are two bioinformatics tools to provide web services. SOAP is a standard-based web service access protocol, originally developed by Microsoft. REST, providing very simple web service access, has been developed to fix the problems with SOAP [127]. Both tools share similarities over the HTTP protocol and have its own issues and challenges, differ in messaging patterns, rules, architecture style, and flexibility. The main advantages derive from the fact that end users do not have to deal with software and database maintenance overheads [127].

There are several basic bioinformatics services, for example, “Sequence Search Services” (SSS), “Multiple Sequence Alignment” (MSA), and “Biological Sequence Analysis” (BSA) [2, 128]. These web service-based bioinformatics analysis resources represent a collection of standalone or web-based interface data analysis tools as well as integrative, distributed, and extensible bioinformatics workflow management systems (BWMS). The BWMSs are designed specifically to compose and execute a series of interactive computational or data manipulation steps (i.e., a workflow) in a bioinformatics analyses. Such systems provide interactive analysis of biological data, build the specific workflows for the analysis, enable the visualization of the analysis outputs in real time, and simplify the process of sharing and reusing workflows between scientists. Some of the platforms giving this service: Galaxy, UGENE, Taverna, etc. [2, 121]. Several chapters of this book cover bioinformatics software, web-based analysis tools, and bioinformatics services for membrane analysis (see Leong et al.), in plant science and crop genomics (see chapters by Rahman et al. and Sripathi et al.), medicine, viral genome analysis and drug design (see chapters by Younis et al., Bell and Kramvis, and Filntisi et al.).

7. Text mining

Part of objectives in bioinformatics research and application is the utilization of computational algorithms and bioinformatics tools to collect, organize, and structure the growing body of biomedical literature allowing scientists to query, mine, read, and synthesize the specific literature and published articles of their research interest [24, 7, 129, 130]. Biomedical literature and text mining, therefore, are very important for scientific development, innovations, and integration and application of discoveries to society through extracting information (EI) and assessing the relationships of publications [3, 4]. Analysis of world literature demonstrates that more than 80% of text data remain unstructured that what makes it challenging to read every paper, resulting in disjointed sub-fields of research [3]. Biomedical literature text mining uses a variety of “text mining & data mining” tools, applying techniques such as data clustering, visualization and navigation, information retrieval, and extraction, and text categorization and summarization [3]. The use of IE and “Natural Language Generation and Understanding” (NLG and NLU) that have tokenizing, morphological or lexical, and syntactic analysis components helps to build structured text, and extract, collect, organize structured information [129, 130]. Pattern recognition and matching such as the recognition of biological abbreviations, terms, and interactions are important methods in text mining [24].

8. Education

Advances of life sciences and high-throughput biology fields in particular “omics” disciplines, the scale, and complexity of “Big Data,” and growing demand for specialists with multilingual and cross-field expertise to understand and solve multidisciplinary scientific concepts and tasks underlie a great need for training and education in the field of bioinformatics. Bioinformatics training and education aim to create, collect, deliver, and share educational and training materials and techniques as well as develop university degree-program curricula on bioinformatics. This is to prepare scientists and specialists, who can utilize modern bioinformatics tools with the sophisticated operating systems, software and algorithms, and database/networking technologies to handle, analyze, interpret, and publish high-throughput complex biological data. This is a great bottleneck and critical need of current life sciences and bioinformatics field, especially in all developing countries, for example, analyzed by some recent reports for African [82] and Central American [131] countries.

To address this, bioinformatics research community has put specific efforts to develop local and global platforms for bioinformatics training and education. Such examples include “Bioinformatics Training Network” (BTN) [132] and “The Global Organization for Bioinformatics Learning, Education, and Training” (GOBLET) [133] that provide a community educational and training resource for bioinformatics trainers and trainees. As an outcome of European 7th Framework grant, BTN targeted to develop and share educational materials, short courses, and training delivery methods as well as discuss the challenge, issues, and needed requirements for bioinformatics training [132]. Furthermore, GOBLET continues similar efforts beyond Europe, aiming to coordinate efforts at the global scales with concentrated strategy and within the frame of single, dedicated foundation although it requires much time, focused strategic efforts, and modern innovative approaches [133].

“The Swiss Institute of Bioinformatics” training portal [134] also provides online courses for software platforms designed to teach bioinformatics concepts and methods including Rosalind [135]. There are open-access website videos and slides from the “Canadian Bioinformatics Workshops” [136]. Similarly, many different, large bioinformatics conferences, and seminars contribute for training and education on bioinformatics such as Intelligent Systems for Molecular Biology (ISMB), European Conference on Computational Biology (ECCB), Research in Computational Molecular Biology (RECOMB), and the annual Bioinformatics Open Source Conference (BOSC) of the non-profit Open Bioinformatics Foundation [2, 128]. As public bioinformatics databases, the MediaWiki engine with the WikiOpener extension, extensively referenced in this chapter, also contributes for training and education of bioinformatics through gathering research materials and descriptions of tools that can be accessed and updated by all experts in the field [128].

With the specific objectives to develop bioinformatics research and application, its integration to genomics research, and training and education as well as to prepare well-qualified new generation scientists to life sciences, we established a dedicated organization—Center of Genomics and Bioinformatics in the developing country Uzbekistan [137]. As in other developing countries, there are many challenges and limitations in funding and in accessing to sophisticated bioinformatics tools and computer operating systems as well as lack of sufficient experience to carry bioinformatics research and resource development. However, our first step goal is to integrate genomics and bioinformatics curricula to the higher education system of Uzbekistan, develop training and educational materials, provide basic training and research practices to the university students and biology field specialists, and establish international collaborations on this direction. The long-term objective is to efficiently and broadly apply genomics and bioinformatics approaches to all areas of life sciences in national and regional levels that would contribute the development of biological sciences in Central Asia. Some efforts are ongoing regarding the establishment of international collaborations [138] and providing training and education in both national and regional levels.

9. Conclusions and future perspectives

Bioinformatics has become an essential interdisciplinary scientific field to the life science helping to “omics” field and technologies and mainly handling and analyzing “omes” data. Accumulation of high-throughput biological data due to the technological advances in “omics” fields required and prioritized the use of bioinformatics resources, and research and application for the analysis of complex and even further enlarging “Big Data” volumes, which would be impractical and useless without bioinformatics. Therefore, as highlighted herein, there is a critical need for the preparation of well-qualified, new generation scientists with integrated knowledge, multilingual ability, and cross-field experience who are capable of using sophisticated operating systems, software and algorithms, and database/networking technologies to handle, analyze, and interpret high-throughput and increasing volume of complex biological data.

Community resources and a globally coordinated foundation of bioinformatics training and education platforms as well as research conferences, workshops, short online training, and web-based educational courses and materials are available to accomplish toward this goal. However, there is an urgent need for the development of bioinformatics education and training, in particular in developing countries, which requires innovative platforms, training techniques, better funding, web and network access, and high-performance computing systems.

In the research side, bioinformatics tools need to be improved for analysis of the growing body of high-throughput pangenomics, metagenomics, proteomics, and metabolomics data. There are needs for “effective tools” to perform better genome assembly and annotation with high accuracy; however, it requires the improvement of quality of sequenced genomes without gaps, and sequencing of more genome representatives, sub-genomes, polyploidy species, genomes of single cells, and specific tissues that would generate information to work, modify, and correct bioinformatics algorithms and programming approaches.

The use of third generation sequencing approaches and platforms as well as efforts on whole genome sequencing of, for example, 1000 or 100,000 human genome representatives [82] or transcriptome/exon sequencing of 1000 distinct plant species (e.g., 1KP) [139] will ultimately improve and advance the bioinformatics analysis tools. These efforts also help to improve orthologous gene identification tools that currently need attention [120]. There is a great need for sampling and handling diverse strains in pangenomic analysis, integration of prokaryotic genome-organization frameworks (GOFs) as well as integration of non-coding RNAs, pseudogenes, and epigenetics elements into the bioinformatics annotation and ontology tools and software [120]. There is a need to make sequenced genome data more functional and integrated through the construction of more organized, user friendly, cell-wide biological networks, and metabolic pathways [140] with better visualization effects, graphics outputs [120], and knowledge base construction (KB) [141]. This, however, requires the development of real-time imaging systems and high throughput phenotyping (referred to as “phenomics”) tools that would help for efficiently determining biologically meaningful associations between genomic and phenotypic data, advancing the translational sciences, personal genomics, and personalized medicine [7] and/or agriculture [142].

Acknowledgments

I thank Academy of Sciences of Uzbekistan and Committee for Coordination Science and Technology Development of Uzbekistan, the Office of International Research Programs (OIRP) of the United States Department of Agriculture (USDA)—Agricultural Research Service (ARS) and U.S. Civilian Research & Development Foundation (CRDF) for research Grants FA-F5-T030, FA-A6-T081, FA-A6-T085, I-2015-6-15/2, I5-FQ-0-89-870, P120, P120A, P121, P121B, UZB-TA-31016, UZB-TA-31017, and UZB-TA-2992, which have been the key factors for development of plant genomics and bioinformatics in Uzbekistan. I greatly acknowledge the Uzbekistan government support and investments/guide from Academy of Sciences of Uzbekistan, Ministry of Agriculture and Water Resources of Uzbekistan, Cotton Industry Joint Stock Company of Uzbekistan, Ministry Foreign Economic Relations, Investments and Trade of Uzbekistan, USDA-ARS, and Texas A&M University for establishment of Center of Genomics and Bioinformatics in Uzbekistan. I also thank Prof. Gilbert S. Omenn, Center for Computational Medicine & Bioinformatics, University of Michigan, USA for critical reading of this introductory chapter, and Mr. Mirzakamol Ayubov and Mr. Muhammad Mirzahmedov, Center of Genomics and Bioinformatics, Uzbekistan, for their technical assistance while preparing this chapter material.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Ibrokhim Y. Abdurakhmonov (July 27th 2016). Bioinformatics: Basics, Development, and Future, Bioinformatics, Ibrokhim Y. Abdurakhmonov, IntechOpen, DOI: 10.5772/63817. Available from:

Embed this chapter on your site Copy to clipboard

<iframe src="http://www.intechopen.com/embed/bioinformatics-updated-features-and-applications/bioinformatics-basics-development-and-future" />

Embed this code snippet in the HTML of your website to show this chapter

chapter statistics

2172total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

A Bioinformatics Method for the Production of Antibody-Drug Conjugates Through Site-Specific Cysteine Conjugation

By Arianna Filntisi, Dimitrios Vlachakis and George K. Matsopoulos

Related Book

First chapter

Virtual Plant Breeding

By Sven B. Andersen

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More about us