Using Bacterial Artificial Chromosomes to Refine Genome Assemblies and to Build Virtual Genomes

Recent years have seen an explosion in the sequencing of genomes, including those of ruminants. A number of assemblies of the sequence of the bovine genome are now available (Elsik, et al., 2009; Zimin, et al., 2009). Although the sheep genome sequence is not such a high priority, the International Sheep Genomics Consortium (ISGC_website) has a long term strategy to develop a number of tools for the application of genomics in sheep research and breeding (Archibald, et al., 2010). We have demonstrated recently how comparative genomics and Bacterial Artificial Chromosome (BAC)-libraries can be used to construct detailed virtual genomes as a framework for genome assemblies of related species (Dalrymple, et al., 2007). As new and improved genome assemblies of the genomes contributing to an initial virtual genome assembly are produced, the virtual genomes will need to be regularly updated to incorporate the latest available information. In the original analysis, three genomes (bovine, dog and human) with various levels of coverage and stages of assembly were used (Dalrymple, et al., 2007). With the availability of increasing numbers of assemblies, the benefit of using more than three genomes, or the most appropriate evolutionary distances of the genomes, is not immediately clear. Here we describe the construction of a modified version of the bovine Btau3.1 assembly using cattle and sheep BACs and the use of this assembly in the construction of an updated virtual sheep genome, combining information from the original sheep virtual genome (vsg 1.2) and the horse (Wade, et al., 2009) and dog (Lindblad-Toh, et al., 2005) genomes. The impact of inclusion of additional genome sequences is analysed. The approach described here for sheep is an example of an approach which can be applied more broadly to genomes of any source, for example for the fish species, tilapia (Soler, et al., 2010) and catfish (Liu, et al., 2009). Indeed, the same principles also apply to the detection of differences between different individuals of the same species.


Introduction
Recent years have seen an explosion in the sequencing of genomes, including those of ruminants. A number of assemblies of the sequence of the bovine genome are now available (Elsik, et al., 2009;Zimin, et al., 2009). Although the sheep genome sequence is not such a high priority, the International Sheep Genomics Consortium (ISGC_website) has a long term strategy to develop a number of tools for the application of genomics in sheep research and breeding (Archibald, et al., 2010). We have demonstrated recently how comparative genomics and Bacterial Artificial Chromosome (BAC)-libraries can be used to construct detailed virtual genomes as a framework for genome assemblies of related species . As new and improved genome assemblies of the genomes contributing to an initial virtual genome assembly are produced, the virtual genomes will need to be regularly updated to incorporate the latest available information. In the original analysis, three genomes (bovine, dog and human) with various levels of coverage and stages of assembly were used . With the availability of increasing numbers of assemblies, the benefit of using more than three genomes, or the most appropriate evolutionary distances of the genomes, is not immediately clear. Here we describe the construction of a modified version of the bovine Btau3.1 assembly using cattle and sheep BACs and the use of this assembly in the construction of an updated virtual sheep genome, combining information from the original sheep virtual genome (vsg 1.2) and the horse (Wade, et al., 2009) and dog (Lindblad-Toh, et al., 2005) genomes. The impact of inclusion of additional genome sequences is analysed. The approach described here for sheep is an example of an approach which can be applied more broadly to genomes of any source, for example for the fish species, tilapia (Soler, et al., 2010) and catfish . Indeed, the same principles also apply to the detection of differences between different individuals of the same species.

Genome coordinate conversion
The coordinates from the mapping of the sheep BESs to the dog and human genomes were converted to the framework of the bovine genome assembly Btau3.1 using the LiftOver utility (LiftOver; Fujita, et al., 2011) and the canFam2 to Btau3.1 and hg17 to Btau3.1 coordinate conversion chain files respectively, also downloaded from UCSC genome bioinformatics site (UCSC; Fujita, et al., 2011). If the initial application of LiftOver was not successful for a region of the genome, regions of 100 bases either side of the BAC-end sequence were taken and positioned using LiftOver (pseudoliftOver). If this was again unsuccessful the process was repeated in steps of 100 bases until a successful application of the LiftOver utility for a region was achieved, or a distance of 10kb was reached . Coordinate conversion (chain) files able to be read by the LiftOver utility to convert bovine genome assembly Btau3.1 coordinates to bovine genome assembly Btau3.#x version coordinates were built based on the revised order of Btau3.1 contigs and scaffolds in Btau3.#x version. Similarly a coordinate conversion file to convert Btau3.5x coordinates to virtual sheep genome assembly coordinates was built based on the order of Btau3.5x scaffolds in the virtual sheep genome.

www.intechopen.com
Using Bacterial Artificial Chromosomes to Refine Genome Assemblies and to Build Virtual Genomes 375

Assigning BACs to groups and building BAC contigs
BACs were assigned to the groups; "tail-to-tail", "tail-to-head" etc. on the basis of the relative orientations of the two BESs from each BAC on the relevant genome assembly and the distance apart of the BESs. "Outsize" BACs were those with the two BESs mapped to the same chromosome in the relevant genome assembly and less than 10 kb, or more than 200 kb, apart. Data processing was undertaken using a series of Perl scripts. BACs with both BESs mapped to the genome, but mapped to two different chromosomes, were assigned to the "breaks" group. BACs with only one BES mapped to the genome were assigned to the "unpaired" group. BAC-comparative genomic contigs (BAC-CGCs) were constructed for the BACs from each species mapped to each genome assembly using Perl scripts to process the data . Starting from the beginning of each chromosome the first BAC that overlapped with a second BAC was identified, the BAC-CGC was extended until no further overlapping BACs were identified. This process was repeated along the chromosome until the last BAC mapped on the chromosome was reached. The process was repeated for each chromosome in the genome assembly.

Construction of Btau3.5x
Using Perl scripts and the data set of the mapping of the bovine BESs to the scaffolds of the Btau3.1 genome assembly an initial minimization of the number of non-tail-to-tail BACs was undertaken. The scripts started with the first scaffold on chromosome 1 of the assembly and by testing the number of BAC links between this scaffold and all other scaffolds in the assembly identified the most likely adjacent scaffold and the orientation of the scaffold based on maximising the number of tail-to-tail BACs. Two or more linking tail-to-tail BACs without overlapping BES mapping coordinates on both scaffolds were required to continue the chain. Only high confidence bovine BACs (Ratnakumar, et al., 2009) were used in the assembly. Adjacent scaffolds assigned to the same chromosome in the Btau3.1 assembly were preferred over a more highly linked scaffold assigned to another chromosome, if the preferred scaffold on the original chromosome was itself linked to an adjacent scaffold on the original chromosome. If no scaffold assigned to the same chromosome as the rest of the chain was linked into the chain by BACs, or the less strongly linked scaffold from the same chromosome terminated the chain, the most highly linked scaffold from another chromosome was incorporated. If the newly added scaffold was linked back to the original chromosome at the next step of scaffold incorporation it was retained in the chain, otherwise the chain was terminated and the chromosome changing scaffold was also removed from the scaffold chain. For each scaffold in the chain BAC-links from both ends of the scaffold were assessed to enable to inclusion of scaffolds preceding the initiating scaffold, or located between two scaffolds in a chain, but which were only linked to an adjacent following scaffold. The scaffold chain building process was continued until it was terminated with a scaffold not linked by two or more BACs to another scaffold. The penultimate scaffold in the chain was then tested for BAC links to a second scaffold and incorporated if it met the criteria described above. The chain building process was then continued. If no second linked scaffold could be identified the scaffold chain building was terminated. The next unincorporated scaffold from the same chromosome of the Btau3.1 assembly was then used to initiate the next scaffold chain. When all scaffolds from the first chromosome had been tested the first scaffold from the next chromosome was used and the process repeated until all scaffolds assigned to a chromosome of the Btau3.1 assembly had been tested.
Once the scaffold chain assembly had been completed the scaffolds not assigned to chromosomes in the Btau3.1 assembly (the UnChr) were then linked into the scaffold chains in a similar, but separate process. The resulting scaffold chains were then ordered and oriented using the consensus of the mapping of the order of the BACs in the physical bovine BAC map (Snelling, et al., 2007) to the BACs in the bovine scaffold chains. The initial data set Btau3.1x was then displayed as a browseable genome using Gbrowse (Stein, et al., 2002) to allow the integrity of the assembly of the scaffolds to be visually assessed. Genome contigs, scaffolds, bovine BAC mapping positions were displayed as separate groups. Clusters on non-congruent BACs (i.e. not tail-to-tail) identified regions with remaining assembly problems. Using Perl scripts and the data set of the mapping of the sheep BESs to the Btau3.1 genome assembly, including BES mapping data integrated onto the Btau3.1 assembly from the horse, dog and vsg1.2 assemblies sheep BACs were assigned to tail-to-tail etc. groups and displayed on the Btau3.1x genome browser in a series of tracks. Positions of the BESs mapped to the separate genomes were integrated on Btau3.1 as previously described . The mappings of the bovine genome assembly Btau3.1 sequence contigs to the human, dog and horse genomes were displayed as separate tracks on the Btau3.1x genome browser using the UCSC chromosome colour scheme (Fujita, et al., 2011) to identify the chromosome of best match in the relevant species. Asymmetric symbols were used to represent the orientation of the mapping of the contigs to the human, dog and horse genomes relative to the bovine genome. The chromosomal coordinates of the mapping in the non-bovine genome were also readily accessible to the users of the browser using mouse-over and mouse-click display boxes. This information was used in the manual refinement of the assembly, in particular in the definition of scaffold split points for the insertion of other scaffolds and/or the inversion of small numbers of adjacent contigs within a scaffold, where extensive use was made of comparative genomics information at the level of the sequence contigs. Subsequently four major rounds of revision and refinement of the bovine genome assembly were undertaken manually and decisions on the chromosomal assignment, order of scaffolds and orientation of scaffolds and of sequence contigs were made based on the cattle and sheep BAC mapping and the comparative genomics. Generally in cases of ambiguity parsimony was applied. For the construction of each new version of the assembly changes were recorded in an Excel spreadsheet and Perl scripts were used to convert the Excel spreadsheet into a genome assembly agp file (AGP_file_specification). The agp file was used to generate the sequence of the genome assembly, the coordinate conversion chain file (for use by the LiftOver utility) and the contig and scaffold tracks for the genome browser version for the new assembly. For each successive version of the revised assembly of the bovine genome the manual revision was undertaken interactively using the tracks on the genome browser to make decisions.

Construction of the virtual sheep genome vsg2.0
To generate the virtual sheep genome assembly the mid point between each pair of BAC-CGCs built using sheep BACs on the bovine Btau3.5x genome assembly was identified. If the mid point was located in a gene (NCBI human RefSeq mRNAs (NCBI_RefSeq) were used to define the extent of a gene) the position closest to the midpoint and not in a gene was identified. The flanking BAC-CGCs were then extended to this point, or in the case of the first and last BAC-CGCs on a chromosome to the start or end coordinate of the www.intechopen.com Using Bacterial Artificial Chromosomes to Refine Genome Assemblies and to Build Virtual Genomes 377 chromosome. Thus all nucleotides in the bovine genome sequence were included in a block and therefore the virtual sheep genome sequence is exactly the same length as the bovine Btau3.5x genome sequence. The order and orientation of the bovine genome assembly Btau3.5x-based sheep BAC-CGCs in the vsg2.0 was determined on the location and organisation of the sheep linkage map markers (Maddox, et al., 2001) mapped to the Btau3.1 genome and converted to the Btau3.5x assembly using the Btau3.1 to Btau3.5x coordinate conversion chain file and the LiftOver utility. Using Perl scripts the agp file (AGP_file_specification) was built and used to generate the sequence of the virtual sheep genome assembly, the coordinate conversion chain file (for use by the LiftOver utility), and the contig and scaffold tracks for the virtual sheep genome browser (VSG). Using the LiftOver utility and the Btau3.5x to virtual sheep genome coordinate conversion chain file, the BES and BAC-CGC mapping coordinates, and any other features mapped to the Btau3.5x bovine genome, were converted to the virtual sheep genome coordinates. Features were also transferred from the Btau3.1 genome assembly by first converting to Btau3.5x coordinates using the Btau3.1 to Btau3.5x coordinate conversion file and the LiftOver utility and then converting from Btau3.5x to vsg2.0 coordinates. Other features were mapped directly onto the virtual sheep genome using sequence alignment programs such as BLAST and BLAT with the vsg2.0 DNA sequence.

Identification of problems with the Btauassembly of the bovine genome
The cow is the most closely related organism to sheep for which a genome assembly is available. When this project was commenced, an early draft of the bovine genome assembly Btau3.1 (Elsik, et al., 2009) was in the public domain. Since the sheep genome assembly would be built comparatively on the bovine genome, and sheep sequence contigs from the low coverage six animals at approximately 0.5 fold coverage each, were expected to be very small, the accuracy of the bovine genome assembly would determine the accuracy of the sheep assembly at all levels above that of the individual sequence contigs. To assess the validity of this strategy the sheep BESs from the CHORI-243 library were mapped to the Btau3.1 genome assembly to identify the extent of segments of conserved synteny between the two genomes. The reader should keep in mind that the only BACs counted as being in the same organisation in the comparison genome as in the source genome (i. e. congruent) are the tail-to-tail BACs less than 200kb in length. Unexpectedly large numbers of sheep BACs, more than 17% of the BACs with both ends mapped, had both BESs positioned on the bovine Btau3.1 genome assembly within 200kb of each other, but not in the expected tail-to-tail organisation, i. e. many BACs had their two BESs mapped in the tail-to-head and head-to-head organisations (Table 1). In addition, large numbers of BACs had both BESs positioned on the same chromosome, but more than 200kb apart, the outsize groups (Table 1). The average insert size of the BACs in the sheep BAC library is 184kb . Such a result would normally suggest a substantial number of intra-chromosomal rearrangements between the sheep and cattle genomes. However, almost as many, more than 14%, of bovine BACs were also not positioned as tail-to-tail BACs on the bovine Btau3.1 genome assembly (Table 1). The organisation of sheep BACs at the locations of these apparent rearrangements between the two genomes was compared with the organisation of bovine BACs at the same locations in the genomes. Frequently clusters of tail-to-head sheep BACs overlapped with clusters of tail-to-head bovine BACs (Fig 1), suggesting that many such occurrences were in fact due to an incorrect assembly of the bovine genome, not true differences in the structures of the two genomes themselves. However, many clusters of tailto-head sheep BACs that did not overlap with tail-to-head bovine BACs were also observed (Fig 1). These BACs probably represent rearrangements in the sheep genome relative to the bovine genome.

Using cattle and sheep BACs to reorganise the Btau3.1 assembly of the bovine genome
The first step in the generation of the virtual sheep genome was therefore to construct the best approximation to the correct order of the bovine sequence contigs and scaffolds in the bovine genome using the bovine and sheep BACs and comparative genomics. Initially, the scaffolds in the bovine genome assembly (Btau3.1) were kept intact and scaffolds were reordered and reoriented within bovine chromosomes to minimize the number of both cattle and sheep BACs that were not in the tail-to-tail organisation. Then scaffolds apparently assigned to the wrong chromosomes on the basis of the BAC-based links to other scaffolds in the assembly were moved, including being inserted into gaps in other scaffolds guided by the mapping of the BESs. Generally these moves were also supported by the mapping of the sequence contigs to the human, dog and horse genomes (Fig. 2). In addition, scaffolds not assigned to chromosomes in Btau3.1 were included in the assembly where BACs provided unambiguous links. Finally, reordering and reorienting of contigs within the new set of ordered and reoriented scaffolds was undertaken. Given the size of the BACs and the variation in the length of the genomic DNA contained within the BACs the correct position to insert many segments of the bovine assembly was ambiguous based solely on the BAC-end data. Throughout this process, which was mainly undertaken manually, the alignment of the bovine genome assembly contigs to the human, dog and horse genome assemblies was used in making the final decision about where exactly to insert or break scaffolds. In other words, a breakpoint between sequence contigs in an assembly scaffold was chosen that was consistent with the cattle and sheep BES data and the organisation of the human, dog and horse genomes (Fig. 2). Where conflicts between the comparative genome assemblies occurred two out of three consistent organisations were required. However, the integrity of sequence contigs was maintained throughout the process, although evidence for chimeric sequence contigs was also identified during the course of the analysis (data not shown).
To avoid ovinising the bovine genome at least one bovine BAC was required to support all reorganisations, except reordering and reorienting scaffolds within chromosomes in cases where the bovine BAC fingerprint map (Snelling, et al., 2007) also supported the reorganisation. This process was undertaken reiteratively to resolve any errors introduced or new links identified as the chromosome structures approached the most likely structure of the bovine genome. This revised assembly of the bovine genome based on Btau3.1 was named Btau.3.5x.

www.intechopen.com
Using Bacterial Artificial Chromosomes to Refine Genome Assemblies and to Build Virtual Genomes

381
In Btau3.5x the number of bovine assembly scaffolds was reduced from 3053 scaffolds assigned to chromosomes in Btau3.1 to 537 super-scaffolds linked by the cattle and sheep BACs. Of the chromosomally assigned scaffolds in Btau3.1, 974 scaffolds were inverted, and 683 scaffolds were split into 1720 pieces, of which 710 were inverted. 14 scaffolds were moved to a different chromosome and 2192 scaffolds previously not assigned to chromosomes were incorporated into the assembly. 104 of these scaffolds were split into 233 pieces. Coverage of the genome with scaffolds assigned to chromosomes increased from 2.4 Gb to 2.77 Gb. Even after this process it is likely that there remained a number of segments of the bovine genome assembly which may not have been correctly assembled.  Table 2. The first twenty scaffolds of the bovine Btau3.5x assembly, scaffolds numbered BTA1.* were assigned in numerical order to chromosome 1 of the bovine genome assembly Btau3.1 build. Scaffolds numbered BTAUn.* were not assigned to a chromosome in the bovine Btau3.1 build. "inv" indicates scaffolds inverted in the Btau3.5x genome build relative to the Btau3.1 build, and "split" indicates scaffolds split in the Btau3.5x build relative to the Btau3.1 build.

Integration of the positions of sheep BESs on the Btau3.5x, dog, horse and vsg1.2 genome assemblies
We then used the virtual genome strategy , integrating the separate mapping of the sheep BESs to the original virtual sheep genome (v1.2), the dog and the horse genome assemblies, to maximise the positioning of sheep BESs on Btau.3.5x. There was little change in the human genome assembly over the course of the work and mapping of the sheep BACs to the human genome was captured by using the virtual sheep genome v1.2. Thus the virtual sheep genome version 2 was build on top of v1.2, rather than being a completely de novo version. This approach, which uses much lower specificity BLAST parameters, increased the number of sheep BACs able to be positioned on the bovine genome substantially, from 47,818 (in the initial alignments) to 95,757 in the virtual sheep genome, effectively doubling the coverage of the genome ( Table 1). The number of sheep BACs able to be positioned in the tail-to-tail organisation in a genome is a complex function of the sequence coverage, assembly stage and evolutionary distance from the bovine genome. The greater distance of the dog genome appears to be partially compensated for by the more advanced state of the assembly used in this analysis. Very similar numbers of BACs were mapped in the tail-to-tail organisation to the two genomes (  Table 4. Tail-to-tail BACs unique to each dataset. The high coverage and quality of the human genome assembly and the use of the integration strategy presumably contributed to the large number of unique BACs in the tailto-tail organisation present in vsg1.2 (Tables 3 and 4). Over and above the newer assembly of the bovine genome the inclusion of the mapping of the sheep BACs to the horse genome assembly has the biggest impact on the number of BACs assigned and on the number of BAC contigs, where fewer is better (Table 5). This is not surprising since, of the genomes used, the horse is the most closely related species to the two ruminants. BACs with one end directly mapped to the bovine genome and other end mapped to the bovine genome via the horse genome (Table 6). Subsequent addition of the BESs mapped via the dog genome added many fewer BACs than adding the BESs mapped via the horse genome (Table 6). The subsequent addition of the human genome data, incorporated in the vsg1.2, added slightly more BACs than the addition of the dog genome (Table 4). Thus including the dog genome had only a small impact on the improvement in the coverage of the virtual sheep genome whereas the more distant, but the better assembled/higher coverage, human genome was a useful addition to the virtual genome construction, but not unexpectedly the biggest contributions came from well assembled genomes of closely related species. In other words, building on top of vsg1.2 and the use of a higher quality assembly of the bovine genome contributed a large number of new BACs with both ends positioned on the bovine assembly (Table 5). A large group of BACs were positioned with one end using the bovine or vsg1.2 position and the other using horse or dog. Very few BACs were positioned solely using horse and/or dog positions (Table 6). On this basis further improvement of the vsg would appear to be difficult and most likely to come from filling of gaps in the bovine genome sequence itself. Based on the mapping of the sheep BACs to the reorganised bovine genome assembly 943 blocks of conserved synteny, defined by overlapping sheep BACs, were identified between the sheep and cattle genomes (Table 7). Assuming a genome size of 3Gb, the blocks had an average length of just over 3Mb. Although initially disappointing, even in the bovine genome assembly 537 BAC-based super-scaffolds were required to cover the complete genome. The comparison of the number of blocks of conserved synteny identified across the different combinations of datasets demonstrates that the inclusion of additional species beyond the horse has a much greater impact on the reduction in the number of blocks of conserved synteny than it has on the total number of BACs positioned tail-to-tail. Only a 25% increase in the number of BACs, but a 56% decrease in the number blocks of conserved synteny, i. e. on average every block of conserved synteny defined based on the mapping of BACs to the bovine genome has been extended to include one adjacent block of conserved synteny.

Remaining ambiguities in the build of the bovine genome
Since there were many occasions on which there was no unambiguous basis on which to identify the correct break points in the bovine genome assembly a large number of probable inversions identified by BACs remained in the final version of the bovine genome. Most of these inversions were also supported by sheep BACs (Fig 2). In addition, whilst potentially chimeric bovine genomic sequence contigs were identified during the reassembly process, their structure has not been changed in Btau3.5x.

Construction of the virtual sheep genome (vsg2.0)
The sheep markers (sheep map version 4.7) were used to reorganise the bovine genome assembly into the vsg. In the main this involved renumbering of the bovine chromosomes, with five inverted chromosomes (or segments of chromosomes), four chromosome fusions and a single chromosome breakage (Table 8). Reordering of the segments of the bovine genome defined by the BAC comparative genome contigs (CGCs) was undertaken on four chromosomes, 7, 12, 13 and X. Apart from the X chromosome, these were local changes and involved a small number of BAC CGCs covering a small region of the genome. Given the variation in the size of BACs, and the lack of comparative data from other genomes for species specific breaks, the boundaries of such breaks could not be unambiguously identified with the data currently available. Thus no attempt was made to resolve the small potential sheep specific rearrangements within chromosomes where the break points were ambiguous and there was not sufficient marker evidence to support a change in the organisation (Fig 3). The vsg 2.0 has been used in a number of analyses of the genome organisation of sheep and in general a high level of congruence with maps determined using other approaches has been observed (Drogemuller, et al., 2008;Wu, et al., 2008;Goldammer, et al., 2009c;Wu, et al., 2009), although the vsg 2 X chromosome build appears to contain a number of significant discrepancies (Goldammer, et al., 2009a;Goldammer, et al., 2009b

Construction of the virtual sheep genome (vsg2.0) genome browser
The cattle and sheep BAC and BES locations are displayed on the chromosome overview track of the virtual sheep genome browser (VSG) allowing a quick assessment of the quality of the assembly to be made (Fig 3). In addition, the sheep virtual genome assembly was annotated with the locations of the sheep markers, SNPs on the 1536 pilot sheep SNP chip (Kijas, et al., 2009) and the Illumina Ovine SNP50 BeadChip, and human and bovine mRNA RefSeqs downloaded from the NCBI (NCBI_RefSeq).

Conclusion
The new vsg2.0 is a significant improvement over vsg1.2, built on the human genome framework. Clearly using the genome from a closely related species and allowing the data from the species of interest to direct the process has an advantage over a very well assembled, but more distant genome. At the low resolution level down to the level of the BACs the sheep genome has a very high level of overall conserved synteny with the bovine genome structure. A number of regions of ambiguity remain, but many of these are in regions of ambiguity of the assembly of the bovine genome and therefore await further refinement of the bovine genome assembly, or a predominantly de novo assembly of the sheep genome. However, overall it is clear that the vsg 2 makes a robust framework to assemble the large number of short contigs expected from the sequencing of the sheep genome (Archibald, et al., 2010). Two assembled genomes from closely related species is probably the optimal balance between analysis complexity and benefit, with inclusion of a more distant, but much better assembled genome, if the genomes of closely related species are not well assembled. Thus the methods that we have described are very broadly applicable.

Acknowledgement
The authors would like to thank the members of the International Sheep Genomics Consortium (ISGC_website) in particular Jill Maddox, John McEwan and James Kijas for useful discussions. The authors also gratefully acknowledge the early pre-publication access under the Fort Lauderdale conventions to the draft equine genome sequence provided by the Broad Institute and to the draft bovine genome sequence provided by the Baylor College of Medicine Human Genome Sequencing Center and the Bovine Genome Sequencing Project Consortium. This work was partly funded by SheepGenomics (a joint venture of Meat and Livestock Australia and Australian Wool Innovation). The work was undertaken as part of the development of sheep genomics tools by the ISGC.