Open access peer-reviewed chapter

Determination of Nucleopolyhedrovirus’ Taxonomic Position

By Yu-Shin Nai, Yu-Feng Huang, Tzu-Han Chen, Kuo-Ping Chiu and Chung-Hsiung Wang

Submitted: May 19th 2016Reviewed: October 28th 2016Published: April 5th 2017

DOI: 10.5772/66634

Downloaded: 1069


To date , over 78 genomes of nucleopolyhedroviruses (NPVs) have been sequenced and deposited in NCBI. How to define a new virus from the infected larvae in the field is usually the first question. Two NPV strains, which were isolated from casuarina moth (L. xylina) and golden birdwing larvae (Troides aeacus), respectively, displayed the same question. Due to the identity of polyhedrin (polh) sequences of these two isolates to that of Lymantria dispar MNPV and Bombyx mori NPV, they are named LdMNPV-like virus and TraeNPV, provisionally. To further clarify the relationships of LdMNPV-like virus and TraeNPV to closely related NPVs, Kimura 2-parameter (K-2-P) analysis was performed. Apparently, the results of K-2-P analysis that showed LdMNPV-like virus is an LdMNPV isolate, while TraeNPV had an ambiguous relationship to BmNPV. Otherwise, MaviNPV, which is a mini-AcMNPV, also exhibited a different story by K-2-P analysis. Since K-2-P analysis could not cover all species determination issues, therefore, TraeNPV needs to be sequenced for defining its taxonomic position. For this purpose, different genomic sequencing technologies and bioinformatic analysis approaches will be discussed. We anticipated that these applications will help to exam nucleotide information of unknown species and give an insight and facilitate to this issue.


  • nucleopolyhedroviruses
  • Kimura-2-parameter analysis
  • next-generation sequencing
  • bioinformatic analysis

1. Introduction

Baculoviruses are insect-specific viruses which have a large circular double-stranded DNA genome packaged in enveloped, rod-shaped nucleocapsid and occluded within a paracrystalline protein occlusion body (OB) [1, 2]. The family Baculoviridae has four genera, including Alphabaculovirus, Betabaculovirus, Gammabaculovirus and Deltabaculovirus. Nucleopolyhedrovirus (NPV) is a member of Alphabaculovirus (lepidopteran-specific NPV) [3]; NPV replicates in the nucleus of the infected host cell and causes a disease of nuclear polyhedrosis. Epidemic outbreak of NPV may play a role in regulation of the host nature population [4]. Thereby, it is a potential agent for biological control with a number of eco-friendly benefits including high virulence and specificity against target insects, environmental safety and sustainable existence with target insects. Several baculoviruses showing promising results have been commercialized as biopesticides for the control of insect pests around the world [5]. For biotechnological applications, baculoviruses have been constructed as a eukaryotic protein expression vectors (baculovirus expression vector system (BEVS)) over the last 30 years and used to gene therapy trials. So far, many recombinant proteins have been expressed in insect cells by BEVS and contribute to human life [6].

To date, baculoviruses are known to infect more than 660 insect species; most of them are belonging to the order of Lepidoptera, Diptera and Hymenoptera [7, 8]. Baculoviruses exhibit genetic variations among species and its isolates [9]. Although a large number of baculoviruses in the nature, only a few have been well studied. To the best of our knowledge, a total of 78 fully sequenced genomes have been deposited in GenBank [10] and also several baculoviruses of whole genomes may soon be sequenced and deposited ( Table 1 ). However, these published viral genomes represent only a small fraction and the genetic relationship among nucleopolyhedroviruses (NPVs) in the natural environment remains a puzzle.

GenusVirusVirus AbbreviationGenBank accessionGenome size (bp)ACGTGC contentORFsSequencingAssemblerReference
Alphabaculovirus (Group I)Anticarsia gemmatalis MNPVAgMNPV-2DNC_008520132,23936,62329,33829,51336,76544.5%158SangerPHRED/ALIGNER[11]
AgMNPV-26KR815455131,67836,41129,28829,40536,57444.6%157Roche 454 GS FLXGeneious[12]
Antheraea pernyi NPVAnpeNPVNC_008035126,62929,51334,04133,66429,40653.5%147SangerContigExpress9.1.0 + SeqMan5.0/DNASTAR[13]
Autographa californica MNPVAcMNPVNC_001623133,89439,19527,15127,34740,20140.7%156SangerGCG package[14]
Autographa californica MNPV-WP10AcMNPV-WP10KM609482133,92639,20527,15727,34640,19940.7%151Illumina HiSeq 2000Newbler[15]
Bombyx mandarina NPVBomaNPVNC_012672126,77037,35825,39825,60138,41340.2%141Solexa GAGENETYX-win Software + DNASTAR[16]
Bombyx mori NPVBmNPVNC_001962128,41337,74725,82826,05638,78240.4%143SangerDNASIS/PROSIS[17]
Catopsilia pomona NPVCapoNPVKU565883128,05838,93825,34825,44438,32839.7%131Roche 454 GS FLX+GS de novo assembler[10]
Choristoneura fumiferana DEF MNPVCfDEFMNPVNC_005137131,16035,47430,11029,99335,58045.8%149SangerMacVector + Lasergene/DNASTAR[18]
Choristoneura fumiferana MNPVCfMNPVNC_004778129,59332,2243265632,26132,45250.1%146SangerGene Runner[19]
Choristoneura murinana NPVChmuNPVNC_023177124,68831,40830,98631,37030,92450.0%147Roche 454CLC Genomics Workbench[20]
Choristoneura occidentalis NPVChocNPVNC_021925128,44632,10831,90532,48131,95250.1%148Roche 454 GS FLXSeqMan Pro Lasergene/DNASTAR[21]
Choristoneura rosaceana NPVChroNPVNC_021924129,05233,30931,26131,42533,05748.6%149
Condylorrhiza vestigialis MNPVCoveMNPVNC_026430125,76735,90426,93727,03835,88642.9%138Roche 454Geneious + MIRA[22]
Dasychira pudibunda NPVDapuNPVKP747440136,76131,02237,00837,45431,27754.4%161Illumina MiSeqGeneious[23]
Ectropis obliqua NPVEcobNPVNC_008586131,20440,68324,67624,70841,13737.6%126SangerGenetyx-win[24]
Hyphantria cunea NPVHycuNPVNC_007767132,95936,03130,03930,46536,42445.5%148RISA-384DNASIS[25]
Lonomia obliqua MNPVLoobMNPVKP763670120,02338,99520,93221,96638,10435.7%134Roche 454 GS FLXGeneious[26]
Maruca vitrata MNPVMaviMNPVNC_008725111,95334,04121,66921,56334,68038.6%126SangerPHRED/PHRAP[27]
Orgyia pseudotsugata MNPVOpMNPVNC_001875131,99529,46336,47736,29529,75855.1%152SangerGCG package[28]
Philosamia cynthia ricini NPVPhcyNPVJX404026125,37628,96633,46133,80929,14053.7%138SangerN/A1[29]
Plutella xylostella MNPVPlxyMNPVNC_008349134,41739,43727,30327,39640,28140.7%152SangerLasergene/DNASTAR[30]
Rachiplusia ou MNPVRoMNPVNC_004323131,52639,67425,63025,79340,42939.1%149SangerWisconsin package + Lasergene/DNASTAR[31]
Thysanoplusia orichalcea NPVThorNPVNC_019945132,97840,02226,38826,14240,42639.5%145Solexa GAEdena[32]
Alphabaculovirus (Group II)Adoxophyes honmai NPVAdhoNPVNC_004690113,22036,50520,02520,32836,36235.6%125RISA-384PHRED/PHRAP[33]
Adoxophyes orana NPVAdorNPVNC_011423111,72436,30619,40419,69436,32035.0%121SangerSeqMan II Lasergene/DNASTAR[34]
Agrotis ipsilon MNPVAgipMNPVNC_011345155,12240,20137,4907,86039,57148.6%163SangerLasergene/DNASTAR[35]
Agrotis segetum NPVAgseNPVNC_007921147,54440,23733,20034,24739,86045.7%153SangerGap4[36]
Agrotis segetum NPV BAgseNPV-BNC_025960148,98140,49033,6984,37140,42245.7%150Roche 454DNASTAR[37]
Apocheima cinerarium NPVApciNPVNC_018504123,87641,22320,86520,44941,33233.4%117SangerSeqMan Pro Lasergene/DNASTARunpublished
Buzura suppressaria NPVBusuNPVNC_023442120,42037,56822,15222,14238,55836.8%127Roche 454 GS FLXGS de novo assembler[38]
Chrysodeixis chalcites NPVChchNPVNC_007151149,62245,15129,30429,06046,10739.0%151SangerGap4[39]
Chrysodeixis chalcites SNPVChchSNPV-TF1-AJX535500149,68445,09029,32429,13346,13739.1%150Roche 454Newbler[40]
Clanis bilineata NPVClbiNPVNC_008293135,45441,55725,56025,55842,77937.7%129SangerN/A[41]
Epiphyas postvittana NPVEppoNPVNC_003083118,58435,22124,28723,95635,12040.7%136SangerDNASTAR[42]
Euproctis pseudoconspersa NPVEupsNPVNC_012639141,29141,73628,45528,54942,55140.3%139SangerWisconsin package + GENETYX-win[43]
Helicoverpa armigera SNPV AC53HaSNPV-AC53NC_024688130,44239,12125,38925,60640,32639.1%138Ion Torrent PGMCLC Genomics Workbench[44]
Helicoverpa armigera MNPVHearMNPVNC_011615154,19646,37130,73131,06046,03140.1%162SangerSeqMan 5.0/DNASTAR[45]
Helicoverpa armigera NPVHearNPVNC_003094130,75939,34525,34025,55240,52238.9%137SangerWisconsin package + Lasergene/DNASTAR[46,47]
Helicoverpa armigera NPV G4HearNPV-G4NC_002654131,40539,52925,53025,73840,60839.0%135SangerPHRED/PHRAP[48]
Helicoverpa armigera NPV NNg1HearNPV-NNg1NC_011354132,42539,75425,79126,05440,82639.2%143RISA-384DNASIS[49]
Helicoverpa zea SNPVHzSNPVNC_003349130,86939,27325,47125,67540,45039.1%139SangerWisconsin package + Lasergene/DNASTAR[50]
Hemileuca sp. NPVHespNPVNC_021923140,63342,82726,97726,59544,23438.1%137SangerWisconsin package + Lasergene/DNASTAR[51]
Lambdina fiscellaria NPVLafiNPVNC_026922157,97745,36334,61634,35043,64843.7%137Roche 454CLC Genomics Workbench[52]
Leucania separata NPVLeseNPVNC_008348168,04142,54640,68340,92743,88548.6%169MegaBACE1000DNASTAR[53]
Lymantria dispar MNPVLdMNPVNC_001973161,04634,22946,22646,33134,26057.5%164SangerGCG package[54]
Lymantria dispar MNPV-27LdMNPV-27KP027546164,15835,02047,13347,11834,88757.4%162Illumina MiSeqCLC Genomics Workbench[55]
Lymantria dispar MNPV-BNPLdMNPV-BNPKU377538157,27038,78839,57939,56739,33650.3%154Illumina MiSeqGeneious[56]
Lymantria dispar MNPV-2161LdMNPV-2161KF695050163,13834,85546,64846,81234,82357.3%174Roche 454 GS JuniorSeqMan NGEN Lasergene/DNASTAR[9]
Lymantria dispar MNPV-3029LdMNPV-3029KM386655161,71234,32146,43446,45734,50057.4%163Roche 454Lasergene/DNASTAR[57]
Lymantria dispar MNPV-45LdMNPV-45KU862282161,00634,23446,19246,31434,26457.5%155IlluminaCLC Genomics Workbench[58]
Lymantria dispar MNPV-3054LdMNPV-3054KT626570164,47835,15147,11947,14035,06857.3%174Roche 454 GS JuniorLaserGene/DNASTAR[59]
Lymantria dispar MNPV-3041LdMNPV-3041KT626571162,65834,71546,47846,64734,81857.3%178
Lymantria dispar MNPV-Ab-a624LdMNPV-Ab-a624KT626572161,32134,28246,30246,40534,33257.5%176
Lymantria xylina MNPVLyxyMNPVNC_013953156,34436,20741,67441,93336,53053.5%157SangerPHRED/PHRAP[60]
Mamestra brassicae MNPVMabrMNPVNC_023681152,71046,04230,31130,60445,75339.9%159Roche 454GS de novo assembler[61]
Mamestra configurata NPV-AMacoNPV-ANC_003529155,06045,33632,16032,46345,10141.7%169SangerWisconsin package + Lasergene/DNASTAR[62]
Mamestra configurata NPV-BMacoNPV-BNC_004117158,48247,83131,50431,95347,19440.0%168SangerSequencher 4.0[63]
Orgyia leucostigma NPVOrleNPVNC_010276156,17946,42031,27031,02047,46939.9%135SangerAgencourt BioScience[64]
Peridroma NPVPespNPVNC_024625151,10935,06040,59339,82235,63353.2%139Roche 454CLC Genomics Workbench[65]
Perigonia lusca single NPVPeluNPVNC_027923132,83139,96826,16726,36240,25639.6%145Roche 454Geneiousunpublished
Pseudoplusia includens SNPVPsinNPVNC_026268139,13241,84327,45227,21042,60939.3%141Roche 454 GS FLXMIRA[66]
Spodoptera exigua MNPVSeMNPVNC_002169135,61138,44529,48629,92937,75143.8%139SangerWisconsin package + Lasergene/DNASTAR[67]
Spodoptera frugiperda MNPV virusSfMNPVNC_009011131,33139,41726,34626,50739,06140.2%143SangerLasergene/DNASTAR[68]
Spodoptera litura MNPVSpliMNPV-AN1956JX454574137,99837,46930,80330,84638,88044.7%132Roche 454 GS JuniorLaserGene/DNASTAR[69]
Spodoptera litura NPVSpltNPVNC_003102139,34239,18029,6912990440,56742.8%141MegaBACE1000DNASIS + DNASTAR[70]
Spodoptera litura NPV IISpltNPV-IINC_011616148,63440,99833,21033,67140,75545.0%147n/aN/Aunpublished
Sucra jujuba NPVSujuNPVKJ676450135,95241,39526,15726,39942,00138.7%131Roche 454GS de novo assembler[71]
Trichoplusia ni SNPVTnSNPVNC_007383134,39440,6016,25626,11741,38439.0%145SangerPHRED/PHRAP[72]
BetabaculovirusAdoxophyes orana granulovirusAdorGVNC_00503899,65733,07717,09817,27532,20734.5%119SangerSeqMan II Lasergene/DNASTAR[73]
Agrotis segetum granulovirusAgseGVNC_005839131,68041,89225,17923,95340,65637.3%132n/an/aunpublished
Clostera anastomosis GV isolate HenanClasGV-ANC_022646101,81827,11523,83223,73927,13246.7%122Illumina GASOAPdenovo[74]
Clostera anastomosis granulovirus-BClasGV-BKR091910107,43933,64819,90420,67333,21437.8%123Roche 454 GS FLXNewbler[75]
Cnaphalocrocis medinalis GVCnmeGVNC_029304111,24636,02119,75619,38536,08435.2%118Roche 454 GS FLXGS de novo assembler[76]
Cnaphalocrocis medinalis granulovirusCnmeGVKP658210112,06036,29519,90419,52936,33235.2%133PacBio RS IIHGAP2.2.0[77]
Choristoneura occidentalis GVChocGVNC_008168104,71036,13217,26816,93834,37232.7%116SangerPHRED/PHRAP[78]
Clostera anachoreta granulovirusClanGVNC_015398101,48728,18822,55422,52328,22244.4%123Illumina GASOAPdenovo[79]
Cydia pomonella granulovirusCpGVNC_002816123,50034,02927,72228,18333,56645.3%143SangerWisconsin package + Lasergene/DNASTAR[80]
Cryptophlebia leucotreta granulovirusCrleGVNC_005068110,90738,09518,09017,89036,83232.4%128SangerLasergene/DNASTAR[81]
Diatraea saccharalis granulovirusDisaGVNC_02849198,39232,13317,03217,33731,88034.9%125Roche 454Geneious[82]
Epinotia aporema granulovirusEpapGVNC_018875119,08235,52424,98424,40334,17141.5%132Roche 454 GS FLXNewbler[83]
Erinnyis ello granulovirusErelGVNC_025257102,75931,70719,44020,32431,28838.7%130Roche 454 GS FLXGeneious[84]
Helicoverpa armigera granulovirusHearGVNC_010240169,79450,33634,51834,81050,13040.8%179SangerSeqMan Lasergene/DNASTAR[85]
Plodia interpunctella granulovirusPiGVKX1513952112,536n/an/an/an/an/a123Roche 454 GS JuniorSeqMan NGEN Lasergene/DNASTAR[86]
Phthorimaea operculella granulovirusPhopGVNC_004062119,21738,30621,12721,43138,35335.7%130SangerN/A[87]
Plutella xylostella granulovirusPlxyGVNC_002593100,99930,25220,54620,54629,65540.7%120DSQ-1000 LGENETYX-win[88]
Pieris rapae granulovirusPrGVNC_013797108,59236,61917,86318,16835,94233.2%120SangerNN/A[89]
Pseudaletia unipuncta granulovirusPsunGVNC_013772176,67753,57234,99335,31152,79939.8%183n/aN/Aunpublished
Spodoptera frugiperda GV isolateVG008SpfrGVNC_026511140,91338,13132,85232,28837,64246.2%146Roche 454 GS FLXNewbler[90]
Spodoptera litura granulovirusSpliGVNC_009503124,12138,36023,81324,37737,57138.8%136SangerN/A[91]
Xestia c-nigrum granulovirusXcGVNC_002331178,73353,16636,07936,62752,86140.7%181SangerDNASIS/PROSIS[92]
GammabaculovirusNeodiprion abietis NPVNeabNPVNC_00825284,26428,29213,94814,17727,84733.4%93SangerPHRED/PHRAP[93]
Neodiprion lecontei NPVNeleNPVNC_00590681,75527,74113,59613,64026,61633.4%89SangerSeqMan Lasergene/DNASTAR[94]
Neodiprion sertifer NPVNeseNPVNC_00590586,46229,15814,44414,74528,11533.8%90SangerSequencher 4.1[95]
DeltabaculovirusCulex nigripalpus NPVCuniNPVNC_003084108,25226,62327,22827,83926,56250.9%109SangerCAP3[96]

Table 1.

List of sequenced baculoviruses genomes

N/A: no information is available either in the paper or GenBank file.

The GenBank file with accession number KX1513952 is not available in GenBank website.

Previously, Sanger sequencing was employed to sequence the viral genomic sequences cloned in plasmids. With the advances of sequencing technologies, next-generation sequencing (NGS) is becoming an important technology for large-scale viral genomic sequencing. The high cost of NGS and requirement of intensive bioinformatic analysis remain a hurdle for this application. In a word, NGS is an available tool to facilitate on the study of the genetic relationship of baculoviruses.

2. Identification of NPVs

Biochemical and biotechnology-based methods are the most common approaches employed to identify the NPVs. In most cases, more than one method is employed to compensate the pros and cons for each other. For example, restriction enzyme profiling of viral genomic DNA was used to reveal genetic variations among different isolates [9799] and to distinguish one species from another between closely related viruses such as Rachiplusia ou (RoMNPV), AcMNPV, Trichoplusia ni (TnMNPV), Galleria mellonella (GmMNPV) [100, 101] and the MNPVs of Spodoptera frugiperda [102].

Polymerase chain reaction (PCR)-based methods were then established. These methods have been shown not only to be more sensitive and faster but also more reliable than restriction enzyme analysis for classifying baculoviral species [4, 103105]. Multiple genetic markers (e.g., egt, ac17, lef-2, polh, p35, pif-2) could be used for the identification of baculoviruses [7, 106109]. The late expression factor 8 (lef-8), late expression factor 9 (lef-9) and polyhedrin(polh) were found in a highly conserved genes among baculoviruses [110], therefore, used as targets for degenerating PCR to characterize lepidopteran NPVs through the amplification of the conserved regions from a variety range of baculoviruses [111113]. The Kimura 2-parameter (K-2-P) distances between the aligned polh/gran, lef-8 and lef-9 nucleotide sequences were described by Jehle et al. for baculoviruses identification and species classification [3]. The K-2-P nucleotide substitution model from aligned nucleotide sequences were determined by using the pairwise distance calculation of MEGA version 3.0 applying the Kimura 2-parameter model [114].

Due to the higher cost of NGS for viral genome sequencing, it is frequently required to combine various approaches to cut down the cost but still ensure precision, e.g., PCR-based K-2-P analysis and NGS approach for identifying the potential new NPV species. Two NPVs were isolated from casuarina moth (Lymantria xylina) and golden birdwing larvae (Troides aeacus) collected from the fields, respectively, will be as representative cases for explanation in the following sections. We will focus on the characterization of these two potential new NPVs first and then the use of the sequences of three genes, lef-8, lef-9 and polyhedrin of two NPV candidates was used to examine their taxonomic position by K-2-P analysis. Finally, we will focus on the genome sequencing technology and bioinformatic analysis on NPVs.

3. The identification of ambiguous NPVs

In this section, the discussion of molecular identification of NPV species based on K-2-P distance [3] is presented. Two new NPVs were used as examples in this study to reveal different issues regarding the classification of NPVs.

3.1. LdMNPV-like virus

The K-2-P distances, based on the sequences of three genes, between different viruses could mostly evaluate the ambiguous relationship among the NPVs. It was defined that distances less than 0.015 indicates that the two isolates are the same baculovirus species. On the other hand, the difference between two viruses is more than 0.05 should be considered as different virus species. For the distances between 0.015 and 0.05, complementary information is needed to determine whether these two viruses are of the same or different species [3, 9, 115].

A new multiple nucleopolyhedrovirus strain was isolated from casuarina moth, L. xylina Swinhoe, (Lepidoptera: Lymantriidae) in Taiwan. Since the polyhedrin sequence of this virus had high identity to L. dispar MNPV (98%), it was named LdMNPV-like virus [116]. To precisely clarify the relationship of three Lymantriidae-derived NPVs (LdMNPV-like virus, LdMNPV and LyxyMNPV [60]), the K-2-P of polh, lef-8 and -9 was performed. The distances between LdMNPV-like virus and LyxyMNPV exceeded 0.05 for each gene, polh, lef-8, or lef-9 and also for concatenated polh/lef-8/lef-9 ( Figure 1 ). For LdMNPV-like virus and LdMNPV, not only the single lef-8 and lef-9 sequences but also concatenated polh/lef-8/lef-9, the distances were generally lower than 0.015, but only the polh sequence distance (0.016) exceeded slightly 0.015 ( Figure 1 ). These results strongly suggested that LdMNPV-like virus is an isolate of LdMNPV. However, as indicated by our previous report, the genome of LdMNPV-like virus is approximately 139 Kb, due to large deletions compared to that of LdMNPV [116]. To further investigate the LdMNPV-like virus, a HindIII-PstI fragment (7,054 nucleotides) was cloned, sequenced and compared to the corresponding region of LdMNPV. Nine putative ORFs (including seven with full lengths and two with partial lengths) and two homologous regions (hrs) were identified in this fragment ( Figure 2 ) and those genes, in order from the 5′ to 3′ end, encoded part of rr1, ctl-1, Ange-bro-c, LdOrf151, LdOrf-152-like peptides, Ld-bro-n, two Ld-bro-o and part of LdOrf155-like peptides ( Table 2 ). The physical map of HindIII-PstI fragment of LdMNPV-like virus showed that the gene organization was highly conserved compared to the corresponding region of LdMNPV, although several restriction enzyme recognition sites were different. Additionally, the ld-bro-o gene in the LdMNPV-like virus was split into two ORF7 and ORF8, due to a point deletion in the downstream (+669) of ORF7 and this deletion causes a frameshift that results in the formation of a stop codon (TGA) after 73 bp. Afterward, ORF8 was overlapped with the last four base pairs (ATGA) in ORF7. The nucleotide identities of these genes were 96–100% homologous to those of LdMNPV, except ORF3 which was 68% homologous to Ange-bro-c and ORF7 and ORF8 showing low identities to Ld-bro-o (73% and 26%, respectively). The deduced amino acid sequences of these genes were similar to those of LdMNPV, with identities of 81–100%, except the similarity of ORF3 to Ange-bro-c was 70% and ORF7 and ORF8 also showed low similarity to Ld-bro-o (67% and 26%, respectively). These results imply that the LdMNPV-like and LdMNPV viruses are closely related but not totally identical.

No*LdMNPV-like virusLdMNPV§
PositionLengthNameIdentity (%)
11 → 654654217rr19681
21063 → 122416253Ctl-1100100
31397 → 24731077358Ange-bro-c6870
42590 → 3596504168LdOrf-1519998
53200 → 3952753251LdOrf-1529999
64019 → 50261005335Ld-bro-n9391
75645 → 6391744248Ld-bro-o7367
86388 → 665426488Ld-bro-o2626
96758 → 705429799LdOrf-155100100

Table 2.

Comparison of the nucleotide (nt) and deduced amino acid (aa) sequences for putative ORFs in LdMNPV-like virus genomic fragment and their corresponding LdMNPV homologues.

The directions of the transcripts are indicated by arrows.

§Reference from the genome of LdMNPV (Kuzio et al. [63])

*The nine potentially expressed ORFs are numbered in the order in which they occur in the LdMNPV-like virus genomic fragment from the 5′ to 3′ end. Two ORFs extend past this cloning site are printed in bold; only the N-terminus which contains 217 amino acids (654 nucleotides) and 99 amino acids (297 nucleotides) was examined.

Figure 1.

Pairwise K-2-P distances of the nucleotide sequences of polh, lef-8 and lef-9 and concatenated polh/lef-8/lef-9 fragments of LdMNPV-like virus, LyxyMNPV and LdMNPV. Modified data reproduced with permission of the Elsevier [116].

Figure 2.

Comparison of relative restriction sites and gene locations in the LdMNPV-like virus HindIII-PstI fragment with those of the corresponding LdMNPV fragment. Arrows denote ORFs and their direction of transcription. Gray boxes represent the homologous repeat regions (hrs). ORF homologues in the corresponding regions are drawn with the same patterns. Numbers below the arrows indicate the nine putative ORFs listed in Table 2 .

Based on these results, LdMNPV-like virus has a genomic size significantly smaller than that of LdMNPV and LyxyMNPV and appears to be an NPV isolate distinct from LdMNPV or LyxyMNPV. Moreover, a gene, ange-bro-c of LdMNPV-like virus, was truncated into two ORF7 and ORF8 and the sequence showed relatively low identity to that of LdMNPV ( Table 2 ). Taken together, these results indicate that LdMNPV-like virus is a distinct LdMNPV strain with several novel features. Otherwise, LdMNPV-like virus and LdMNPV have distinct geographical locations (from subtropical and cold temperate zones, respectively) and are distinct in genotypic and phenotypic characteristics and it also showed broad genetic variation among LdMNPV isolates [9].

3.2. An NPV isolate from T. aeacus larvae

A nucleopolyhedrosis disease of the rearing of the golden birdwing butterfly (T. aeacus) larvae was found and the polyhedral inclusion bodies (PIBs) were observed under light microscopy ( Figure 3 ). PCR was performed to amply the polh gene by 35/36 primer set ( Figure 3 ) to further confirm NPV infection [117, 118]. Therefore, this NPV was named provisionally TraeNPV. The three genes, polh, lef-8 and lef-9 of TraeNPV, were cloned and sequenced and then the K-2-P distances between the aligned single and concatenated polh, lef-8 and lef-9 nucleotide sequences were analyzed. The results indicated that TraeNPV belonged to the group I baculoviruses and closely related to BmNPV group. Figure 4 showed that most of the distances between TraeNPV and other NPVs were between 0.015 and 0.050, whereas the distances for polh between TraeNPV, PlxyNPV, RoNPV and AcMNPV group exceeded 0.05. It should be noted that for all the concatenated polh/lef-8/lef-9 sequences, the distances were apparently much more than 0.015 and even to 0.05. These results left an ambiguous situation of this NPV isolate; so far, we could conclude that TraeNPV neither belongs to BmNPV group nor AcMNPV group. More complementary information is needed to determine the viral species of TraeNPV.

Figure 3.

Identification of unknown NPV. (A) Light microscopy observation of liquefaction from the cadavers of T. aeacus larvae, scale bar = 20 μm. Black arrows indicated the polyhedral inclusion bodies (PIBs). (B) PCR detection of partial polyhedrin gene, M = 100 bp marker, (+) = positive control and (−) = negative control.

Figure 4.

Pairwise Kimura-2-parameter distances of the nucleotide sequences of lef-8, lef-9 and polh and concatenated polh/lef-8/lef-9 fragments of TraeNPV and 12 viruses.

In summary, K-2-P distances were employed to further clarify the relationship between closely related NPVs. We discussed two different cases analyzed by K-2-P. From the sequence data of LdMNPV-like virus, results strongly supported that LdMNPV-like virus is an isolate of LdMNPV. Since the RFLP profiles of the LdMNPV-like virus showed the genome of this isolate was deleted tremendously, this deletion also showed coordinately in our partial sequences of genomic DNA fragments and the results of K-2-P. The K-2-P distances between TraeNPV and BmNPV or AcMNPV were among 0.05 and 0.015. Anyway, we cannot define that this virus is a new species with the evidences of RFLP, part gene sequences and K-2-P results; therefore, it is necessary to get more data, especially the whole genome sequence of TraeNPV.

4. The importance of whole genome sequencing on baculoviruses

The rapidly growing mass of genomic data shifts the taxonomic approaches from traditional to genomically based issues. The K-2-P distance supported LyxyMNPV as a different viral species (K-2-P values = 0.067–0.088), even though they were still a closely relative species phylogenetically. But, “how different did LyxyMNPV and LdMNPV?” become another question. Thus, the whole genome sequence could provide deep information of this virus. For example, as the genomic data revealed, the most part of the ORF (151 ORFs) between LyxyMNPV and LdMNPV was quite similar while still have several different ORF exhibits or absent in LyxyMNPV, e.g., two ORFs were homologous to other baculoviruses and four unique ORFs were identified in the LyxyMNPV genome and LdMNPV contains 23 ORFs that are absent in LyxyMNPV [60]. Besides, there is a huge genomic inversion in LyxyMNPV compared to LdMNPV [60]. Another example is Maruca vitrata NPV (MaviNPV). All of the K-2-P distance-supported MaviNPV is quite different from other NPVs (K-2-P values = 0.092–0.237) ( Figure 6 ). While the gene content and gene order of MaviNPV were highly similar to that of AcMNPV and BmNPV, through the genomic sequencing, it showed the 100% collinear to AcMNPV [27] and MaviNPV shared 125 ORFs with AcMNPV and 123 with BmNPV. The detailed information could only be captured after whole genome sequencing rather than partial gene sequences or other phylogenetic analyses. Sometimes, usage of K-2-P data may raise other problems, which we mentioned above; it seems LdMNPV-like virus and LdMNPV were the same viral species. While through the restriction enzyme profile and partial genomic data, we could identify that there are some deletion fragments and different gene contents within the LdMNPV-like virus genome. For the TraeNPV, most of the K-2-P values were ranged from 0.015 to 0.05; thus, whole genome sequencing could be one of the best ways to figure out this ambiguous state. The more detailed information we can get, the more deep aspect we can evaluate, e.g., the taxonomic problems and further evolutionary studies.

Figure 5.

Pairwise Kimura-2-parameter distances of the nucleotide sequences of lef-8, lef-9 and polh and concatenated polh/lef-8/lef-9 fragments of MaviNPV and 12 viruses.

Figure 6.

Common bioinformatic workflow for genome assembly and analysis.

5. Genomic sequences of NPVs

5.1. Genome sequencing technology

Previous NPV genome sequencing employed three types of approaches: plasmid clone (or template) enrichment, NGS, or a combination of the two methods. Initially, the most common approach used restriction enzymes to fragmentize the viral genome into smaller pieces. Plasmid-based clone amplification was then employed to enrich templates for sequencing. Later, conventional Sanger sequencing and/or next-generation sequencing was employed for genome assembly. In addition, purely high-throughput sequencing-based approach from isolated viral genome was also employed [9, 15]. To date, next-generation sequencing technology plays an increasingly important role on viral genome assembly. Previous researches showed that Illumina HiSeq has superior performance in yield than 454 FLX [119121]. Baculoviruses usually contain a novel homologous region (hr) feature, which comprises a palindrome that is usually flanked by short direct repeats located elsewhere in the genome [122]. Thereby, the shorter single-read length of Illumina sequencers might lead the difficulty during genome assembly. Further application of paired-end read sequencing method could certainly provide alternative for sequencing overlap the hrs in baculoviral genomes.

5.2. Bioinformatic analysis

Construction of a complete genome map is essential for future genomic investigations. Besides sequencing, bioinformatic approaches are also required for determining the order and content of the nucleotide sequence information for the viral genome of interest. In general, bioinformatic approaches can be separated into three consecutive steps: genome assembly, genome annotation and phylogenetic relationship inference ( Figure 5 ).

5.2.1. Genome assembly

Sequence reads are the building blocks for genome sequencing and assembly. Thus, quality control of sequence reads plays a key role in determining the fidelity of a genome assembly. The procedure of read quality checking includes, but not limited to, the removal of unrelated sequences such as control sequences, adaptors, vectors, potential contaminants, etc., trimming of low-quality bases and selection of high-quality reads. The control sequences (e.g., PhiX control reads in Illumina sequencers, control DNA beads in Roche 454 sequencer) are routinely used by sequencer manufacturers to evaluate the quality of each sequencing run. There are software applications made available to be utilized to identify and remove control sequences and low-quality bases. For NGS, sequencing adapters could be identified in reads if the fragment size is shorter than read length. Cutadapt [123] was implemented to trim the adapter sequences. Ambiguous bases or bases with lower-quality values can be removed by PRINSEQ [124] from either 5′ or 3′ end. NGS QC Toolkit [125] has programmed module to select high-quality reads. If paired-end technology was applied, paired-end reads could be joined by PANDAseq [126], PEAR [127], FLASH [128] and COPE [129], if a fragment size is shorter than read length.

Genome can be assembled from quality paired-end or single-end reads with de novo or reference-guided approaches. There are two standard methods known as the de Bruijn graph (DBG) approach and the overlap/layout/consensus (OLC) approach for de novo genome assembly. The idea of de Bruijn graph is to decompose a read into kmer-sized fragments with sliding window screening. Each kmer-sized fragment will be used to construct graph for longer path (e.g., contigs). Then, long-range paired reads can be utilized to build scaffolds from contigs with given insert size and read orientation. SOAPdenovo [130] is one of the DBG assembler that has an extreme speed by utilizing threads parallelization [131]. The OLC assembler starts by identifying all pairs of reads with higher overlap region to construct an overlap graph. The contig candidates are identified by pruning nodes to simplify the overlap graph. The final contigs are then output based on consensus regions. Additionally, Newbler [132] is a widely used OLC assembler distributed by 454 Life Sciences.

Reference-guided genome assembly is another solution for genome assembly if the genome of a closely related species is already available. For viral genome assembly, closely related species can be identified by mapping quality reads against sequenced viral genomes deposited in GenBank ( and select top-ranked species as the reference genome(s) to facilitate the assembly of the genome of interest. Reference-guided assembler is also called mapping assembler that the complete genome is generated by mapping quality reads with variant (single nucleotide polymorphism (SNP), insertion and deletion) identification. For example, MIRA (a computer program) [133] can create a reference-based assembly by detecting the difference between references.

During the assembly process, gap filling (or gap elimination) is conducted to resolve the undetermined bases either by bioinformatics or other approaches such as PCR and additional sequencing. Bioinformatic approaches normally use paired-end reads to eliminate gaps. PCR coupled with Sanger sequencing is a common approach to finalize the undetermined regions [134]. In addition, Sanger sequencing can also be used for genome validation and homologous region (hr) checking.

5.2.2. Genome annotation

Annotation determines the locations of protein-coding and noncoding genes as well as the functional elements in the genome. Glimmer [135], N-SCAN [136], NCBI ORF Finder (, GeneMark [137] and VIGOR [138] are gene prediction tools for identifying protein-codivng genes in the genome. Repetitive sequence regions were detected by RepeatMasker ( Viral microRNA candidate hairpins can be predicted by Vir-Mir [139]. The circular map of the viral genome was generated by CGView [140].

5.2.3. Phylogenetic analysis

Phylogenetic relationship inference reveals the evolutionary distances of various, especially closely related, species. MEGA [141] was the most widely used software suite that provides the sophisticated and integrated user interface for studying DNA and protein sequence data from species and populations. Alternatively, phylogenetic relationships among species based on the complete viral genomes or functional regions could also be estimated with Clustal Omega [142]. Clustal Omega was employed for multiple sequence alignment on the complete genomes and DNA fragments, respectively. ClustalW [143] was employed to do file format conversion of multiple sequence alignment. Ambiguously aligned positions were removed by using Gblocks version 0.91b [144, 145] under default settings. Phylogenetic tree inference could be constructed by hierarchical Bayesian method (e.g., MrBayes [146]) or maximum likelihood method (e.g., RAxML [147]) to estimate phylogeny [148]. Tree was depicted with FigTree version 1.4.2 ( The divergence times of different species were estimated using BEAST version 1.8 or version 2.3.2 [149]. In addition, pairwise sequence identity was determined by BLASTN (NCBI BLAST Package) [150] to analyze sequence-level variation. Also, whole genome pairwise alignment can be done by LAGAN [151]. CGView comparison tool (CCT) [152] was used to represent the block similarity among different species. Mauve [153], one of the multiple genome alignment tools, can help us to visualize the consensus sequence blocks among distant-related species.

Up to 78 baculoviruses have been reported; most of baculoviruses have a narrow host range, only infect their homogenous hosts, such as BmNPV, SpltNPV, SpeiNPV, MaviNPV and so on; LyxyNPV can infect LD and LY cell lines, while AcMNPV has a wide host range; at least 40 hosts in vitro have be found. Therefore, a new baculovirus isolate needs to define its taxonomic position and to analyze its phylogenetic relationship with a known baculovirus member.

6. Conclusion

With the accomplishment of the sequencing technologies, more NPV genomes were sequenced. So far, more than 78 baculoviruses have been fully sequenced and based on the sequencing methods, we can divide into two parts, one is sequencing by Sanger method and another is sequencing by NGS method ( Table 1 ). Among these sequenced genomes, 35 genomes were sequenced by Sanger method and 43 genomes were sequenced by NGS methods. It could be expected that whole genome sequencing by NGS method would get much common in this field; however, the upcoming metagenomic era is imperative that one remains aware of and careful about the shortcomings of the information presented about the organisms that are being sequenced and that these databases can oversee neither the correctness of the organismal identifications nor of the sequences entered into the databases.

The natural environment harbors a large number of baculoviruses. However, only a few of them have been sequenced and studied. A lot more information related to the genetic relationship of NPVs in the natural environment is needed to facilitate our understanding of these creatures. Though NGS technology has become an important technology for viral genomic sequencing, high cost of NGS for whole viral genome sequencing remains a barrier. To reduce the cost, it is necessary to evaluate whether the newly collected NPVs are suitable for whole genome sequencing or not. Alternatively, biochemical approaches and biological tools, such as PCR-based K-2-P analysis, can be good options to facilitate the process. As expected, all these applications are anticipated to help us reveal the genetic information of unknown species, so that more detailed insights of their genetic makeup and functional composition can be obtained to help us better understand the nature of these viruses. By using the powerful sequencing technique, the metagenomic progress (e.g., transcriptome analysis of insect host), new pathogen species in the natural environment would be easier to be found in the future. With the increase of new baculoviral genomic data, improvement of bioinformatic analysis methods and further validation of biological information would generate a group of genes, which connect to the viral host range and solve the contradiction situation in the baculoviral genomics.


This research was supported by Grant 105AS-13.2.3-BQ-B1 from the Bureau of Animal and Plant Health Inspection and Quarantine, the Council of Agriculture, Executive Yuan and Grant 103-2313-B-197-002-MY3 from the Ministry of Science and Technology (MOST).

© 2017 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Yu-Shin Nai, Yu-Feng Huang, Tzu-Han Chen, Kuo-Ping Chiu and Chung-Hsiung Wang (April 5th 2017). Determination of Nucleopolyhedrovirus’ Taxonomic Position, Biological Control of Pest and Vector Insects, Vonnie D.C. Shields, IntechOpen, DOI: 10.5772/66634. Available from:

chapter statistics

1069total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Detection of Yersinia pseudotuberculosis in Apollo Butterfly (Parnassius apollo, Lepidoptera: Papilionidae) Individuals from a Small, Isolated, Mountain Population

By Kinga Łukasiewicz, Marek Sanak and Grzegorz Węgrzyn

Related Book

First chapter

Thoughts on Water Beetles in a Mediterranean Environment

By Touaylia Samir

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us