Potential Applications and Challenges of Metagenomics in Human Viral Infections Potential Applications and Challenges of Metagenomics in Human Viral Infections

Complex association of human host and pathogenic viruses makes a necessity to under- stand the overall host and virus interaction network. Identification of virus popula tion and its systematic classification will help in understanding the viral association with the disease outcome. Metagenomics is a recently developing approach for the detection of pathogens in the samples with precise interpretation in a short period of time. Metagenomic approaches have been employed for studying the predominance or spread of the virus within a particular locality and nature of virus during infection. Metagenomics is basically a collective approach of lab-based techniques and in-silico methods for identification of pathogenic viruses without culturing them in specific asep - tic conditions. Lack of unique conserved genes in viruses has made metagenomics study difficult in this juncture. Other challenges in the field of metagenomics are like cellular DNA contamination, free environmental DNA contamination and continuous evolution of viruses. Recent studies have shed light on the advancement of this field in virus iden - tification and characterization however still needs further investigations to overcome the challenges. Current chapter focuses on the application and challenges faced in metage- nomic analysis of human viral infections.

for their propagation. Extracellular viral particles are noninfectious in nature. They can infect a wide range of hosts including plants, bacteria, fungi, algae, protozoa, vertebrate or non-vertebrate animals. In nature, around 1 × 10 31 number of different viruses are present. The number itself suggests the diversity of viruses in nature. They play a very important role such as an increase in diversity via horizontal gene transfer in hosts, and nutrient recycling [1]. Report from Hooda et al. showed the abundance of viruses in nature is around 1000 times more than observed via cell culture dependent technique [1,2]. This suggests the large pool of viruses is still unknown, only around 219 pathogenic viruses have been yet identified [2,3].

Role in pathogenesis
Human viruses: More than 200 viruses are known to infect humans and number is increasing with time, but the diversity of viruses suggests a huge number of viruses still unknown. In humans, yellow fever virus was the first pathogenic virus discovered in 1901. 1900 was the era of human virus discovery and most of the common pathogenic viruses studied during this time. In current scenario, two out of three infection causing organisms are viruses [4] and known to cause a variety of disease ranging from normal acute infections such as common cold, flu, and gastroenteritis to deadly diseases such as Hantavirus pulmonary syndrome (Huntavirus), AIDS (HIV) ebolavirus disease (ebolavirus). Recent outbreaks of viruses show the emergence of previously known viruses with modified virulence properties.

Human gut and viral infection
For decades human gut-associated pathogenic viruses are known for many gastrointestinal diseases as gastroenteritis. Following are the main group of viruses has been identified. Rotavirus, adenovirus (serotype 40 and 41), astrovirus, calicivirus, norovirus, torovirus, herpesviruses, coxsackieviruses, human papillomaviruses [3], Norwalk-like viruses, coronaviruses, picornaviruses, Sapporo-like viruses [4,5]. They infect epithelial cell linings, mucosal linings of the stomach and small intestine, a specific portion of epithelium in the intestine. Depending upon the infection type observed, different samples are used for detection of the infectious agent. In general, feces sample used for general microbiological examination during gut-associated infection [6,7]. Apart from feces, gastric biopsy, gastric juice, saliva [8,9] duodenal fluid, cotton swabs [5] are collected. These samples are very essential for diagnosis as they directly contain the pathogen.

Traditional methods
Since viruses are extracellular inert particles they need to be propagated into on susceptible host or host cells for their growth. Initially, viruses were cultured in vitro with the help of embryonated eggs or laboratory animals. Discovery of tissue culture technique in the 1900s provides an indispensable tool for in vitro virus culture. Tissue culture technique has been then recognized as a "gold standards" for virus discovery. Major advantages of using tissue culture technique for virus identification are an amplification of viruses, characterization of the virus, functional studies, drug targeting, and genome extraction. Due to authentic results and sensitivity of the technique, tissue culture-based techniques are still in use for virus discovery, as well as immune responses study, altered gene expression and characterization of viruses. Successful use of tissue culture technique in virus identification depends on crucial steps involved such as collection of a sample from high titer area of the body, immediate transport of sample, sample processing and selection of appropriate cell line [10]. The major defects of traditional method for virus identification are difficulties in identification of susceptible cell line, time-consuming and laborious in nature [10]. Moreover, culture-based virus identification is further succeeded with the evolution of new scientific techniques and modification in existing techniques. Shell vials with centrifugation, PRE-CFE stain technique, immune-based techniques, e.g., ELISA, agglutination, precipitation, flocculation, microscopy-based techniques, reduced the time of virus identification but compromising sensitivity.

Molecular methods
Gradually field of virology shifted their particles toward molecular biology methods. Together, traditional culture-based methods and molecular biology techniques are used hand in hand for studying virus associated samples [11]. Broadly molecular biology methods are of two types: sequence dependent and sequence independent. Both the methods have proven its usefulness; many viruses have been identified using these techniques.
1. Sequence-dependent method: These techniques are most sensitive molecular biology techniques; it can amplify selective DNA from mixed samples [12]. Since the time of discovery of PCR, it has opened the door for many other variations of PCR for multiple gene modulations. The basic backbone of molecular biology PCR is, it has been used in several approaches such as for sequencing of known viruses depending on similarly in sequence in DNA or consensus sequence of previously known viruses, RFLP and diagnostic purposes [13][14][15][16]. Another technique, microarray introduced in 1995, it is used mainly for gene expression studies, used in gene profiling, usually in infected samples. Two methods have been used for discovery of new viruses, taxa, gammaretroviruses and xenotropic murine leukemia virus, SARS-CoV are few best examples [17]. The subsequent studies were unable to reproduce the earlier results [6,7].

Sequence-independent method:
This approach is independent of prior knowledge of virus genome sequence. Sequence subtractive hybridization and representational difference analysis were methods used for detection of gene expression studies and comparison of genome sequence repetitively [18]. Use of these methods was helpful for detection of human herpes simplex virus type 8 (HHV-8) [9,19], GBV-A, GBV-B virus [20,21], Tonovirus and norovirus [22].
Another independent approach is (SISPA) sequence-independent single-primer amplification circumvents used for detection of the unknown viral sequence by ligation of linker oligonucleotide sequence [23]. Further, it can be used for molecular cloning of viral genome for subsequent characterization. This method has been used successfully for the discovery of well-known Hepatitis E virus [10,24] Parvovirus 2 and 3 [24] and Norwalk virus [11]. As viruses are devoid of consensus sequences, generally culture-based traditional and molecular biology-sequence-dependent and sequence independent technique are useful for the study of limited samples with limited output. Most of the viruses remain unidentified due to this reason.
Compared to above techniques metagenomics is the less biased approach. Any type of virus with either RNA or DNA as a genome, cultivable or uncultivable or novel viruses can be quickly detected. The word metagenomics denotes "transcendent" and "ome" is the all or every in Greek collectively means all genomic content. Metagenomics is the study of genetic material with the help of advanced genomic research technique's and computational tools, directly from the environmental sample. Metagenomics approach bypasses the need for classical biochemical laboratory techniques for microbial analysis. With the help of metagenomics, one can investigate all types of genomic contents of a variety of organisms. This technique provided an indispensable tool for identification of nonculturable species of microbes. It is also used for investigation of known and culturable organisms with great accuracy. Another advantage to use this tool is it bypasses the need to isolate and culture individual species manually and the thereby it reduces the time required to study while providing more information. Initial metagenomics analysis of samples directly from raw environmental samples subsequently provides a necessary foundation for further lab-based analysis ( Table 1). Metagenomics has been used for a variety of purposes, in diverse areas from the time of its discovery in 2002 when for the first time this approach was used in the virology field [12,52].

Process of metagenomics
Metagenomics tool is a successful tool for surveillance in different environmental conditions such as freshwater, soil, marine water and gut of different organisms ( Table 1) Recent advances in sequencing technology improved the speed of novel virus discovery and surveillance of environment [13,53]. In 2000s, increase in literature related to metagenomics use in virome study and increase in a number of virus database show the ease of process.
Basically, there are three main steps involved in metagenomics analysis of sample as follows:

Bioinformatics analysis
Year of study 1. Sample preparation and processing: Since in metagenomics any type of sample can be analyzed with some pretreatment (or enrichment methods). However, for analysis of gutassociated virome collection of the different sample is done from different parts of the human gastrointestinal region. For accurate results, sample collection, proper handling, transportation, stage of the sample is very crucial. There are many standard protocols available for collection of different samples to laboratory and its storage techniques [37]. Different protocols are used for fluid sample and for tissue samples. The tissue sample is generally homogenized in autoclaved saline and collected supernatant filtered through 0.8, 0.45 and 0.2 μm filters, this serial filtration procedure is used to separate larger particles and bacteria from viruses. See Figure 1.
There are different types of sample processing methods used earlier for extraction of viral genomic material [16,[56][57][58].  tools, like riboPicker tool version and blast of viral RNA sequence showed more number of virus domains present in the sample which were processed via the second method, while other methods showed more cellular noise [19].

Sequencing:
The rate of metagenomics study was slow during Sanger sequencing when around 2005 other methods are yet to be evolved, Sangers sequencing was in use. Many studies in this period showed abundant diversity in viruses, analysis of human clinical samples also showed plenty of diversity, while speed of viral genome sequencing is increased several times during pyrosequencing. New viral communities of human and animals have been identified during this period. Some important discoveries are as follows: Astrovirus [21], Rhabdovirus [22], Coronavirus [23], Picornavirus [24], gammapaillomavirus [61]. This technology becomes popular in short time because of low cost, a high number of reads. This technology is also used for sequencing of the clinical sample from tissue fluids and tissue samples [11]. Pacific bioscience sequencing and nanopore sequencing: These sequencing methods were not popular for metagenomics study because of high error rate [52].

Bioinformatics analysis:
Bioinformatics analysis of raw sequence data generated from highthroughput sequencer is a critical step in novel virus discovery and even in diagnostics. There many ready to use pipelines available for analysis of raw data. VIP, VirFinder, Vipie, METAVIR, PHACCS, VIROME, HP Viewer, Fast virome Explorer, EzMAP, Vanator, viruspy and Viral_genome_annotator are few commonly used pipelines for viral metagenomics analysis. Typical workflow of viral metagenomics includes the following steps. Next-generation sequencing (NGS) data obtained is first subjected to trimming for removal of low-quality sequences and adaptor sequences, (Refer Figure 2). Second the trimmed data is subjected for removal of host (humans or bacteria) related sequences and third, these sequences are aligned to reference viral genomes for advance functional characteristics such as novel virus identification, viral taxonomy, identification of viral proteins and phylogenic analysis.

Challenges involved in metagenomics:
For analysis of sequencing data of viral genome through high throughput, sequencing machine needs standard computational tools, software with a high accuracy of data analysis. This needs high-cost involvement with technical expertise. Few high-quality tools available for sequence data analysis such as Diamond [53], UBLAST [52] and Kaiju [54] have increased the speed of metagenomics study. Still, there is a need for technical improvement for rapid and accurate data analysis. The second challenge involved in data analysis of metagenomics sequencing is an assembly of the genome from thousands of small fragments. Assemblers used for the assembly of single genome sets during early times of sequencing study are outdated or non-useful for metagenomics; they create chimeric genomes which misinterpret the genome sequence. Now a days for such studies MetAMOS [55], Meta Velvet [62], MetaSPADes [57] assemblers are available. Still assembly process requires manual editing to sort out genomic chimera generation [15]. Another challenge of virologists for data analysis is reference database deposited which sometimes may cause confusion or problems. If reference database is misinterpreted it will give a wrong interpretation of results. If reference database is high, it decreases the speed as a large number of sequence alignments are required to test data. Sequence data interpretation is a last and very decisive step for metagenomics. Still, we lack clear knowledge about the link between the diversity of virus in the environment and during outbreaks, our surveillance is merely based on a biased collection of only clinical samples and their study. This limits our knowledge about disease spread [63]. Prediction of future outbreaks and limiting the spread of disease needs proper study, development of strong tools [15] Therefore further extensive studies should be encouraged for obtaining maximum and precise knowledge of environmental and gut-associated virome.  the help of surveillance pyramid. The surveillance pyramid explains during disease spread in the community only a few diagnosed cases are reported, the individuals carrying symptoms of the disease and the carriers of the disease are not reported. This phenomenon creates biasedness in sampling. Therefore metagenomics study has been proved a useful tool for constant surveillance of gastrointestinal tract pathogenic virome community. As well as some endemic viral diseases, which causes common gastrointestinal health concerns in community, e.g., astrovirus, calicivirus, norovirus, and torovirus [64], herpesviruses, hepatitis E virus, epstein bar virus, coxsackieviruses, and surveillance with the metagenomics study is useful.

Discovery of new viruses and classification:
Metagenomics is a powerful tool for identification of novel organism(s). Screening of different gut samples can be useful to study novel gut-associated viruses. Initially with the sequence-based studies of Markel cell carcinoma new human papillomavirus has been identified. Markel cell carcinoma is human skin tissue carcinoma, where virus DNA found to be integrated into tumor tissue [65]. Subsequent studies have revealed the diversity of gut-associated viruses in different animals which help in the study of past zoonotic occurred in history. Human-rodent's interaction is well known due to civilization in forest areas or due to the domestication of animals this is leading cause of zoonotic outbreaks. Knowledge of outbreaks in past and monitoring of the present status of the spread of known pathogenic viruses and closely associated pathogenic human viruses provides a base to predict future outbreaks. This approach is also useful to limit the epidemiology of recurrent outbreaks with the study of disease-prone viruses and characterization of unknown viruses. Phan et al. in 2011 extensively studied fecal sample from wild rodents in Virginia and they characterized viruses belonging to mammalian virus families, many new viral families, two new genera were identified. Two viruses closely related to Aichivirus, an associated with acute gastroenteritis worldwide, were characterized through the study [66].
Turkey meat is very popular in the USA and its production is an important part of US economy. One study conducted in California in March 2011on turkey which was suffering from turkey viral hepatitis. Pyrosequencing of RNA, extracted from liver revealed the presence of novel picornaviruses named as turkey hepatitis virus [51]. Another study on cattle's suffering from the unknown disease in Germany and Netherlands affected milk production. Metagenomics study discovered the new virus, Schmallenberg virus, from infected cow sample [67]. Identification and characterization of such viruses will help in facing problems which have a negative impact on countries economic status. Similar to domestic animals, wild-type animals can also act as a reservoir of novel pathogens. Two novel simian hemorrhagic fever viruses diverse from original simian hemorrhagic fever virus were identified from African green monkeys. Simian hemorrhagic fever virus has not yet found to infect human but clinical indices comparable with human Ebola and Marburg viruses. This analogy makes it in the suspect list of emerging viruses [49].

Diagnostic
Metagenomics is a potent method that allows broad analysis of relative genetic variation among viruses and can be used for the study of host-pathogen interactions. This is also more popular because it can be used for uncultivable organisms as well. The recently rising approach is to use metagenomics during epidemics and outbreaks, with a given large number of samples in a lesser time. In hepatitis C virus (HCV) infection, identification of infection is a challenging task due to lack of apparent symptoms and lack of easy laboratory tests for differentiation of acute and chronic phase of the disease. Available molecular methods for virus diagnostic purpose are tedious, time-consuming and costly. A recent report from Escobar-Gutierrez et al. described the use of next-generation sequencing (NGS) method in the diagnosis of HCV infection. NGS allows cost-effective analysis of a large number of samples in detail. The study showed low-frequency mutations, genetic variation [68]. Genetic shift and re-assortment viruses are a leading cause of the emergence of a new strain of viruses, especially in RNA viruses. Well a known example is influenza virus, many pandemics and deaths in history. The recent H1N1 virus is a combination of swine, human and avian genomic segments of RNA [69]. The best approach of metagenomics study in 2009 H1N1 pandemic is the use of metagenomics for characterization and detail study of the virus, followed by manufacture of microarray-based virochip for rapid detection and differential screening from seasonal virus [70].

Evolution of host-virus interaction:
Evolution of RNA viruses is comparatively fast process than DNA viruses. Study of evolution is necessary to understand the source of new variance, spread and keep a check on epidemic initiating variant. In emerging RNA virus, norovirus causative agent of gastroenteritis inter-host, intra-host, and transmission of the new variant has been studied. Usually, it is a self-limiting acute disease but in immunecompromised individuals and in newborns it may cause morbidity and mortality. No vaccine or drugs are available for treatment. A report from Bull et al. hypothesized based on metagenomics study that, norovirus has multiple mechanisms of evolution. Chronic hosts are a major reservoir of new variants while acute patients generally possess a single variant. NGS approach for use assists in comprehensive study of viral population dynamics [71]. Characterization of cardiovirus genus originally believed to possess two genera, metagenomics study has revealed five new genera with full characterization. Cardioviruses are the causative agent of enteric diseases in mice with multiple symptoms. In humans, it causes encephalitis-like condition and diarrhea in children's [72]. Metagenomics based studies help in designing future approach with these new genotypes and associated diseases.

Conclusion
The metagenomics studies have a huge potential to describe about diversity of microbiome in gut microflora and most importantly directly in infectious samples. Among all pathogens viruses are the ones, who cause severe illness to mankind. With rapid improvement in the genomic sequencing techniques, the overall metagenomics approach is very valuable for discovery of new viruses, novel genes, surveillance of pathogens, discover new pathway, host virus interaction, functional studies. The leads obtained through this exercise may have great impact on early diagnosis and treatment. While metagenomic studies also experience limitations and challenges, which need to overcome in near future to obtain a precise results. Unified genomic extraction techniques and development of improved analysis modules may suffice the needs of metagenomics in future.

Conflict of interest
Authors declare no conflict of interest.