Sara Del Río García

1chapters authored

Chapters authored

Big Data Supervised Pairwise Ortholog Detection in Yeasts

By Deborah Galpert Cañizares, Sara del Río García, Francisco Herrera, Evys Ancede Gallardo, Agostinho Antunes and Guillermin Agüero- Chapin

Ortholog are genes in different species, evolving from a common ancestor. Ortholog detection is essential to study phylogenies and to predict the function of unknown genes. The scalability of gene (or protein) pairwise comparisons and that of the classification process constitutes a challenge due to the ever-increasing amount of sequenced genomes. Ortholog detection algorithms, just based on sequence similarity, tend to fail in classification, specifically, in Saccharomycete yeasts with rampant paralogies and gene losses. In this book chapter, a new classification approach has been proposed based on the combination of pairwise similarity measures in a decision system that consider the extreme imbalance between ortholog and non-ortholog pairs. Some new gene pair similarity measures are defined based on protein physicochemical profiles, gene pair membership to conserved regions in related genomes, and protein lengths. The efficiency and scalability of the calculation of these measures are analyzed to propose its implementation for big data. In conclusion, evaluated supervised algorithms that manage big and imbalanced data showed high effectiveness in Saccharomycete yeast genomes.

Part of the book: Yeast

Sara Del Río García

Chapters authored

Related collaborators

José Zavala Loría

Asteria Narváez García

Alejandro Ruiz

Maria Priscila Franco Lacerda

Natália Manuella Strohmayer Lourencetti

Maria José Soares Mendes Giannini

Edwil Gattas

Flávia Danieli Ibelli

Cleslei Fernando Zanelli

Ana Marisa Fusco Almeida