Open access

Analysis of Gene Expression Data Using Biclustering Algorithms

Written By

Fadhl M. Al-Akwaa

Submitted: 08 November 2011 Published: 12 September 2012

DOI: 10.5772/48150

From the Edited Volume

Functional Genomics

Edited by Germana Meroni and Francesca Petrera

Chapter metrics overview

3,694 Chapter Downloads

View Full Metrics

1. Introduction

One of the main research areas of bioinformatics is functional genomics; which focuses on the interactions and functions of each gene and its products (mRNA, protein) through the whole genome (the entire genetics sequences encoded in the DNA and responsible for the hereditary information). In order to identify the functions of certain gene, we should able to capture the gene expressions which describe how the genetic information converted to a functional gene product through the transcription and translation processes. Functional genomics uses microarray technology to measure the genes expressions levels under certain conditions and environmental limitations. In the last few years, microarray has become a central tool in biological research. Consequently, the corresponding data analysis becomes one of the important work disciplines in bioinformatics. The analysis of microarray data poses a large number of exploratory statistical aspects including clustering and biclustering algorithms, which help to identify similar patterns in gene expression data and group genes and conditions in to subsets that share biological significance.

1.1. What is clustering?

A large number of clustering definitions can be found in the literature. The simplest definition is shared among all and includes one fundamental concept: the grouping together of similar data items into clusters[1].

Clustering is an important explorative statistical analysis of gene expression data. It aims to identify and group genes that exhibit similar expression patterns over several conditions and also group the conditions based on the expression profiles across set of genes. The successful clustering approach should guarantee two criteria which are homogeneity high similarity between elements in the same cluster, and separation – low similarity between elements from different clusters. When homogeneity and separation are precisely defined, those are two opposing objectives: The better the homogeneity the poorer the separation, and vice versa [2]. Several algorithmic techniques were previously used for clustering gene expression data, including hierarchical clustering [3], self organizing maps [4], and graph theoretic approaches [5].

1.1.1. K-means

K-means is a classical clustering algorithm [6] invented in 1956 to classify or to group objects (genes) based on attributes or features (experimental conditions) into K number of groups (clusters). K is positive integer number and assumed to be known.

K-means computational approach starts by placing K points into the space represented by the objects that are being clustered. These points represent initial group centroids. We can take any random objects as the initial centroids or the first K objects in sequence can also be used as the initial centroids. Then the K means algorithm will do the four steps below until convergence:

  1. Determine the centroids coordinate.

  2. Determine the distance of each object to the centroids using the Euclidean distance.

  3. Group the objects based on minimum distance.

  4. Iterate the above steps till no object moves its assigned group.

Each iteration of k-means modifies the current partition by checking all possible modifications of the solution, in which one element is moved to another cluster. This is done by reducing the sum of distances between objects and the centers of their clusters. This procedure is repeated until no further improvement is achieved (No object move the group) and all the objects are grouped into the final required number of clusters.

A disadvantage of K-means algorithm could be perceived in the need to specify the number of clusters K as a parameter value prior to running the algorithm. In cases where there is no expectation about K, user has to make trails with several values of K or use external techniques to guess the no of clusters may be exist.

1.1.2. Hierarchical clustering (HCL)

Hierarchical clustering does not partition the genes into subsets. Instead it builds a down-top hierarchy of clusters using agglomerative methods or top - down hierarchy of clusters using divisive methods. The traditional graphical representation of this hierarchy is called dendrogram tree. The divisive method begins at the root and starts to breaks up clusters whose having low similarity. Whereas, the Agglomerative method begins at the leaves of the tree and starts with an initial partition into single element clusters and successively merges clusters until all elements belong to the same cluster [3]. (See Figure 1) The agglomerative method is widely used than the divisive one which is not generally available, and rarely has been applied. The idea of the agglomerative method can be summarized as following: Given a set of N items (genes in our case) to be clustered, and an N*N distance (or similarity) matrix [7],

  1. Assign each item to a cluster, so you have N clusters, each containing just one item.

  2. Find the closest (most similar) pair of clusters and merge them into a single cluster.

  3. Compute distances (similarities) between the new cluster and each of the old clusters.

  4. Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.

In Step 3, distance or similarity measurements between the merged clusters and all the other clusters can be calculated in one of three schemes: single-linkage, complete linkage and average-linkage.

Figure 1.

HCL: Agglomerative and Divisive Methods.

1.2. Biclustering

Traditional clustering approaches such as k-means and hierarchical clustering put each gene in exactly one cluster based on the assumption that all genes behave similarly in all conditions. However, recent understanding of cellular processes shows that it is possible for subset of genes to be co expressed under certain experimental conditions, and at the same time; to behave almost independently under other conditions. From this context, a new two mode clustering approach called biclustering or co-clustering has been introduced to group the genes and conditions in both dimensions simultaneously.

This allows finding subgroups of genes that show the same response under a subset of conditions, not all conditions. Also, genes may participate in more than one function, resulting in one regulation pattern in one context and a different pattern in another.

Example, if a cellular process is only active under specific conditions and there is a gene participates in multiple pathways that are differentially regulated, one would expect this gene to be included in more than one cluster; and this cannot be achieved by traditional clustering techniques.

Many biclustering methods exist in the literature [8]. Table 1 summarized some of promising biclustering algorithms developed during the last ten years. In brief, we described some of these algorithms according to their prediction strength, their promising results, to what they extend in the community, whether an implementation was available, and the feedback from their authors to explain some ambiguous issues.

1.2.1. Cheng and Church (CC)

CC algorithm[18] is considered to be the first real biclustering implementation after the primary idea has been introduced by Hartigan [19] in 1972.

AlgorithmApproach TimeComplicityPrediction ability
Bivisu [9] Exhaustive Bicluster Enumeration O(m2nlogm)aCoherent values
MSBE [10] Greedy Iterative Search O((n + m)2)Coherent values
Bimax[11] Divide-and-ConquerO(nmβlogβ)Coherent values
ROBA [12] Matrix algebra O(nmLN)Coherent Evolution
x-motif [13] Greedy Iterative SearchnmO(log(1/α)/log(1/β))Coherent Evolution
SAMBA [14] Exhaustive Bicluster EnumerationO(n2)Coherent Evolution
OPSM [15] Greedy Iterative SearchO(nm3I)Coherent Evolution
Plaid[16] Distribution Parameter IdentificationXXXbCoherent values
ISA [17] Iterative Signature AlgorithmXXXCoherent values
CC [18] Greedy Iterative SearchO((n + m)nm)Coherent values

Table 1.

Biclustering Algorithms Comparison.

CC defines a bicluster as a subset of rows and a subset of columns with a high similarity. The proposed similarity score is called mean squared residue (H) and it is used to measure the coherence of the rows and columns in the single bicluster. Given the gene expression data matrix A = (X;Y); a bicluster is defined as a uniform submatrix (I;J) having a low mean squared residue score as following:

The CC Mean Squared Residue:

H(I,J)=1IJiI,jJ(aijaiJaIj+aIJ)2E1

Where: aij is gene expression level at row i and column j, aiJ is the mean of row i, aI j is the mean of column j, aIJ is the overall mean. CC algorithm will identify the submatrix as a bicluster if the score is below a level alpha which is a user input parameter to control the quality of the output biclusters. Generally; CC algorithm performs the following major steps:

  1. Delete rows and columns with a score larger than alpha.

  2. Adding rows or columns until alpha level is reached.

  3. Iterate these steps until a maximum number of biclusters is reached or no bicluster is found [18].

1.2.2. Iterative Signature Algorithm (ISA)

The ISA algorithm [17, 20] is a novel method for the biclustering analysis of large-scale expression data. It is an efficient algorithm based on the iterative application of the signature algorithm presented in [17]. ISA considers a bicluster to be a transcription module which can be defined as a set of coexpressed genes together with the associated set of regulating conditions (Figure 2). Starting with an initial set of genes, all samples (conditions) are scored with respect to this gene set and those samples are chosen for which the score exceeds a certain threshold (usually defined by the user). In the same way, all genes are scored regarding the selected samples and a new set of genes is selected based on another user-defined threshold. The entire procedure is repeated until the set of genes and the set of samples converge and do not change anymore.

Multiple biclusters can be discovered by running the ISA algorithm on several initial gene sets. This approach requires identification of a reference gene set which needs to be carefully selected for good quality results. In the absence of pre-specified reference gene set, random set of genes is selected at the cost of results quality[17].

Figure 2.

The recurrence signature method. a, The signature algorithm. b, Recurrence as a reliability measure. The signature algorithm is applied to distinct input sets containing different subsets of the postulated transcription module. If the different input sets give rise to the same module, it is considered reliable. c, General application of the recurrent signature method. Copyright © [17].

1.2.3. Biclusters Inclusion Maximal (Bimax)

Bimax[11] is a simple binary model and new fast divide-and-conquer algorithm used to cluster the gene expression data. It is presented in 2006 by Computer Engineering and Networks Laboratory ETH Zurich, Switzerland. Bimax discretized the gene expression data matrix and convert it into a binary matrix by identifying a threshold, so transcription levels (genes expression values) above this threshold become ones and transcription levels below become zeros (or vice versa). Then, it searches for all possible biclusters that contain only ones. This can be done by iterating these steps:

  1. Rearrange the rows and columns to concentrate ones in the upper right of the matrix.

  2. Divide the matrix into two sub matrices.

  3. Whenever in one of the submatrices only ones are found, this sub matrix is returned.

1.2.4. Order Preserving Submatrix(OPSM)

The order-preserving submatrix (OPSM) algorithm [15] is a probabilistic model introduced to discover a subset of genes identically ordered among a subset of conditions. It focuses on the coherence of the relative order of the conditions rather than the coherence of actual expression levels. In other words, the expression values of the genes within a bicluster induce an identical linear ordering across the selected conditions. Accordingly, the authors define a bicluster as a subset of rows whose values induce a linear order across a subset of the columns. The time complexity of this model is O(nm3I) where n andmare the number of rows and columns of the input gene expression matrix respectively and I is the number of biclusters. A disadvantage of OPSM algorithm is that it takes long time for high dimensional datasets. And this is because its time complexity is cubic with regards to the number of columns (dimensions) of the input matrix [15].

1.2.5. Maximum Similarity Bicluster(MSBE)

MSBE Biclustering algorithm [10] is a novel polynomial time algorithm to find an optimal biclusters with the maximum similarity. The idea behind this algorithm is to find subset of genes that are related to a reference gene. The reference gene is known in advance. MSBE algorithm uses the similarity score for a sub-matrix to find the similar expressions in the microarray datasets. And the threshold of the average similarity score is a user input parameter in order to allow the user to control the quality of the biclustering results.

1.3. Clustering or biclustering

Clustering algorithms [21-23] have been used to analyze gene expression data, on the basis that genes showing similar expression patterns can be assumed to be co-regulated or part of the same regulatory pathway. Unfortunately, this is not always true. Two limitations obstruct the use of clustering algorithms with microarray data. First, all conditions are given equal weights in the computation of gene similarity; in fact, most conditions do not contribute information but instead increase the amount of background noise. Second, each gene is assigned to a single cluster, whereas in fact genes may participate in several functions and should thus be included in several clusters[24].

A new modified clustering approach to uncovering processes that are active over some but not all samples has emerged, which is called biclustering. A bicluster is defined as a subset of genes that exhibit compatible expression patterns over a subset of conditions [11].

During the last ten years, many biclustering algorithms have been proposed (see [8] for a survey), but the important questions are: which algorithm is better? And do some algorithms have advantages over others?

Recently Kevin et al.[25]proposed a semantic web algorithm to recommend the best algorithm based on user inputs like: is the dataset contain outliers, is it allowed to get overlapped clusters and the time to retrieve the biclusters.

Generally, comparing different biclustering algorithms is not straightforward as they differ in strategies, approaches, time complicity, number of parameters and prediction ability. In addition, they are strongly influenced by user selected parameter values. For these reasons, the quality of biclustering results is often considered more important than the required computation time. Although there are some analytical comparative studies to evaluate the traditional clustering algorithms[21-23], for biclustering; no such extensive comparison exist even after initial trails have been taken [11]. In the end, Biological merit is the main criterion for evaluation and comparison between the various biclustering methods.

In this chapter we attempt to develope a comparative tool (Bicat-Plus) which is showen in Figure 3 that includes the biological comparative methodology and to be as an extension to the BicAT program[26].

The Goal of BicAT-Plus is to enable researchers and biologists to compare between the different biclustering methods based on set of biological merits and draw conclusion on the biological meaning of the results. In addition, BicAT-Plus help researchers in comparing and evaluating the algorithms results multiple times according to the user selected parameter values as well as the required biological perspective on various datasets.

BicAT-Plus has many features, which could be summarized in the following:-

Algorithms required to be compared could be selected from the biclustering list (left list) to the compared list (right list). External biclustering results for other algorithms could be included in the comparison process. In addition, the organism model, selectable significance level, and GO category should be selected. Finally, Comparison criteria have to be selected based on the user biological metric.

  1. User could perform biclusters functional analysis using the three Gene Ontology (GO) categories (biological process, molecular function and cellular component) (Figure3 with label number 1).

  2. User could evaluate the quality of each biclustering algorithm results after applying the GO functional analysis and display the percentage of the enriched biclusters at different P-values (Figure3 with label number 2).

  3. User could compare between the different biclustering algorithms according to the percentage of the functionally enriched biclusters at the required significance levels, the selected GO category and with certain filtration criteria for the GO terms. (Figure3 with label number 3).

  4. User could evaluate and compare the results of external biclustering algorithms. This gives the BicAT-plus the advantage to be a generic tool that does not depend on the employed methods only. For example, it can be used to evaluate the quality of the new algorithms introduced to the field and compare against the existing ones. (Figure 3 with label number 4).

  5. User could display the results using graphical and statistical charts visualizations in multiple modes (2D and 3D).

Figure 3.

BicAT-Plus Comparison Panel.

Advertisement

2. Materials and methods

Before using the BicAT-Plus, Active Perl version 5.10 and Java Runtime Environment (JRE) version 6 are required to be installed on your machine. BicAT-Plus has been tested and show good performance on a PC machine with the following configurations: CPU: Pentium 4, 1.5 GHZ, RAM: 2.0 GB, Platform: windows XP professional with SP2.

Figure 4.

Functional analysis results of the selected bicluster. Each column represents an enriched GO functional class. And the height of the column is proportional to the significance of this enrichment (i.e. height = -log (p-value).

2.1. GO overrepresentation programs

Many programs like: BINGO[27], FUNCAT[28], GeneMerge[29] and FuncAssociate[30] were used to investigate whether the set of genes discovered by biclustering methods present significant enrichment with respect to a specific GO annotation provided by Gene Ontology Consortium [31]. BicAT-Plus used GeneMerge program as the most popular GO program. GeneMerge provides a statistical test for assessing the enrichment of each GO term in the sample test. The basic question answered by this test is as described by Steven et al.[27] "when sampling X genes (test set) out of N genes (reference set, either a graph or an annotation), what is the probability that x or more of these genes belong to a functional category C shared by n of the N genes in the reference set? The hypergeometric test, in which sampling occurs without replacement, answers this question in the form of P-value. It's counterpart with replacement, the binomial test, which provides only an approximate P-value, but requires less calculation time."

2.2. Comparative methodologies

BicAT-Plus provides reasonable methods for comparing the results of different biclustering algorithms by:

  • Identifying the percentage of enriched or overrepresented biclusters with one or more GO term per multiple significance level for each algorithm. A bicluster is said to be significantly overrepresented (enriched) with a functional category if the P-value of this functional category is lower than the preset threshold. The results are displayed using a histogram for all the algorithms compared at the different preset significance levels, and the algorithm that gives the highest proportion of enriched biclusters for all significance levels is considered the optimum because it effectively groups the genes sharing similar functions in the same bicluster.

  • Identifying the percentage of annotated genes per each enriched bicluster.

  • Estimating the predictive power of algorithms to recover interesting patterns. Genes whose transcription is responsive to a variety of stresses have been implicated in a general Yeast response to stress (awkward). Other gene expression responses appear to be specific to particular environmental conditions. BicAT-Plus compares biclustering methods on the basis of their capacity to recover known patterns in experimental data sets. For example, Gasch et al.[32] measure changes in transcript levels over time responding to a panel of environmental changes, so it was expected to find biclusters enriched with one of response to stress (GO:0006950), Gene Ontology categories such as response to heat (GO:0009408), response to cold (GO:0009409) and response to glucose starvation(GO:0042149). The details of this comparison strategy are described in the results and in Table 3.

2.3. Comparison Process Steps

The following process diagram shown in Fig 5 summarizes the required steps by the user to compare between the different algorithms using the BicAT-plus:

  1. Download BicAT-Plus from (www.bioinformatics.org/bicat-plus/).

  2. Load Gene Expression Data to BicAT-Plus then run the selected five prominent biclustering methods with setting parameters as shown in Table 2.

  3. Run GO comparison tool in the BicAT-Plus and add the available biclustering algorithms to the compared list as shown in Fig 1.

  4. Select the available GO category e.g. biological process, molecular function and cellular components.

  5. Select the P-values e.g. 0.00001, 0.0001, 0.01, 0.005, and 0.05.

  6. Press compare button.

  7. Press comparison menu, Functional enrichment and select 2D or 3D charts.

Bi/clustering AlgorithmParameter settings
ISAtg = 2.0, tc = 2.0, seeds = 500
CCδ = 0.5, α = 1.2, M = 100
OPSMl = 100
BiVisuΕ = 0.82, Nr = 10, Nc = 5, Po = 25
K-meansK=100

Table 2.

Default Parameter settings of the compared bi/clustering methods. The definitions of these parameters are listed in their original publications [9, 15, 17-18, 20] respectively.

Advertisement

3. Results & discussion

The above comparison steps is performed on the gene expression data of S. cerevisiae provided by Gasch [32]. The dataset contains 2993 genes and 173 conditions of diverse environmental transitions such as temperature shocks, amino acid starvation, and nitrogen source depletion. This dataset is freely available from Stanford University website [33]. For each biclustering algorithm, we used the default parameters as authors recommend in their corresponding publications. See Table 2.

Figure 5.

BicAT-Plus Comparison process steps

3.1. The percentage of enriched function

After applying the above steps on Gasch data[32], BicAT-plus produce the histogram shown in Fig 6. Investigating this figure, we observed that OPSM algorithm gave a high portion of functionally enriched biclusters at all significance levels (from 85% to 100 %). Next to OPSM, ISA show relatively high portions of enriched biclusters.

In order to evaluate the ability of the algorithms to group the maximum number of genes whose expression patterns are similar and sharing the same GO category, we use the filtration criteria developed in the comparative tool by neglecting those bi/clusters which have study fraction less than 25%. The study fraction of a GO term is the fraction of genes in the study set (bicluster) with this term.

StudyfractionofaGOterm=NoofgenessharingtheGOterminabiclustertotalnumberofgenesinthisbicluster×100E2

Figure 7 shows that OPSM and ISA have highly enriched biclusters/clusters that have large number of genes per each GO category. On the other hand, Bivisu biclusters are strongly affected by this filtration and they contain a lower number of genes per each category. This filtration will help in identifying the powerful and most reliable algorithms which are able to group maximum numbers of genes sharing same functions in one bicluster.

3.2. The predictability power to recover interested pattern

The user could compare bi/clusters algorithms based on which of them could recover defined pattern like which one of them could recover bi/clusters which have response to the conditions applied in Gasch experiments. In Table 2, the difference between the biclusters/clusters contents were summarized.

Figure 6.

Percentage of biclusters significantly enriched by GO Biological Process category (S. cerevisiae) for the five selected biclustering methods and K-means at different significance levels p.

Figure 7.

Percentage of significantly enriched biclusters by GO Biological Process category by setting the allowed minimum number of genes per each GO category to 10 and the study fraction to large than 50%.

Although OPSM show high percentage level of enriched biclusters (as shown in Fig 6, 7), its biclusters do not contain any genes within any GO category response to Gasch experiments. The k-means and Bivisu cluster/bicluster results distinguished a unique GO category, which is GO: 0000304 (response to singlet oxygen), and GO: 0042542 (response to hydrogen peroxide) The powerful usage of these bicluster algorithms is significantly appeared in GO: 0006995 "cellular response to nitrogen starvation" where these algorithms were able to discover 4 out of 5 annotated genes without any prior biological information or on desk experiments.

GO Term / (number
of annotated genes)
K-meansCCISABivisuOPSM
GO:0042493
Response to drug / (118)
45760
GO:0006970
response to osmotic stress / (83)
35630
GO:0006979
response to oxidative stress / (79)
271100
GO:0046686
response to cadmium ion / (102)
23220
GO:0043330
response to exogenous dsRNA / (7)
23220
GO:0046685
response to arsenic / (77)
20220
GO:0006950
response to stress / (532)
9111620
GO:0009408
response to heat / (24)
30220
GO:0009409
response to cold / (7)
00200
GO:0009267
cellular response to starvation / (44)
02000
GO:0006995
cellular response to nitrogen starvation / (5)
44400
GO:0042149
cellular response to glucose starvation / (5)
02000
GO:0009651
response to salt stress / (15)
27000
GO:0042542
response to hydrogen peroxide /(5)
00020
GO:0006974
response to DNA damage stimulus / (240)
022030
GO:0000304
response to singlet oxygen / (4)
20000

Table 3.

Gene Ontology category per number of annotated genes of the Bicluster/cluster algorithm results for the experimental condition on Gasch Experiments[32].

Advertisement

4. Conclusion

We have introduced the BicAT-Plus with reasonable comparative methodology based on the Gene Ontology. To the best of our knowledge such an automatic comparison tool of the various biclustering algorithms has not been available in the literature. BicAT-Plus is an open source tool written in java swing and it has a well structured design that can be extended easily to employ more comparative methodologies that help biologists to extract the best results of each algorithm and interpret these results to useful biological meaning.

In other words, the algorithms that show good quality of results (per the dataset) can be used to provide a simple means of gaining leads to the functions of many genes for which information is not available currently (unannotated genes).

Using BicAT-Plus, we can identify the highly enriched biclusters of the whole compared algorithms. This might be quite helpful in solving the dimensionality reduction problem of the Gene Regulatory Network construction from the gene expression data. This problem originates from the relatively few time points (conditions or samples) with respect to the large number of genes in the microarray dataset.

Finally there are several aspects of this research that worth further investigation, according to the Studies carried out so far and also introducing new ideas for consideration

  1. Enrich the BicAT-Plus with more comparative methodologies beside GO. For example, KEGG and promoter analysis by identifying the transcription factors for the clustered genes.

  2. Extend the BicAT-Plus to provide users with multiple export options for the interested enriched biclusters.

  3. Embed the BicAT-Plus as a plug-in in the Cytoscape platform[34] which is open source bioinformatics software for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data. Thus, very promising challenge is to get use of the highly enriched biclusters identified by the BicAT-Plus in solving these integrated networks in the Cytoscape.

References

  1. 1. FungG. A.ComprehensiveOverview.ofBasic.ClusteringAlgorithms.Citeseer 20012001137
  2. 2. SharanR.ElkonR.ShamirR.Clusteranalysis.itsapplications.togene.expressiondata.Ernst Schering Res Found Workshop 2002200283108
  3. 3. EisenM. B.SpellmanP. T.BrownP. O.BotsteinD.Clusteranalysis.displayof.genome-wideexpression.patternsProceedings of the National Academy of Sciences of the United States of America199814863 EOF
  4. 4. TamayoP.DSMesirovJ.ZhuQ.KitareewanS.DmitrovskyE.LanderE. S.GolubT. R.Interpretingpatterns.ofgene.expressionwith.self-organizingmaps.Methodsapplicationto.hematopoieticdierentiation.In: Proceedings of the National Academy of Sciences of the United States of America,: 199929072912
  5. 5. Sharan RSaR: Click: a clustering algorithm for gene Expression analysis.In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology: 2000307316
  6. 6. TavazoieS.JDHughesCampbell.MJChoR. J.ChurchG. M.Systematicdetermination.ofgenetic.networkarchitecture.Nature Genetics1999281 EOF5 EOF
  7. 7. JohnsonS.Hierarchicalclustering.schemesPsychometrika.1967
  8. 8. Madeira SC, Oliveira AL: Biclustering algorithms for biological data analysis: a survey.IEEE/ACM Trans Comput Biol Bioinform 200424 EOF45 EOF
  9. 9. Cheng KO, Law NF, Siu WC, Lau TH: BiVisu: software tool for bicluster detection and visualization.Bioinformatics20072342 EOF2344 EOF
  10. 10. LiuX.WangL.Computingthe.maximumsimilarity.bi-clustersof.geneexpression.dataBioinformatics200750 EOF56 EOF
  11. 11. PrelicA.BleulerS.ZimmermannP.WilleA.BuhlmannP.GruissemW.HennigL.ThieleL.ZitzlerE. A.Systematiccomparison.evaluationof.biclusteringmethods.forgene.expressiondata.Bioinformatics20061122 EOF1129 EOF
  12. 12. TchagangA.TwefikA.Robustbiclustering.algorithm. R. O. B. A.forD. N. A.microarraydata.analysisIn: IEEE/SP 13thWorkshop on Statistical Signal Processing. 20052005984989
  13. 13. MuraliT. M. S. K.Extractingconserved.geneexpression.motifsfrom.geneexpression.dataIn: Pac Symp Biocomput. 200320037788
  14. 14. TanayA.KupiecR. S. M.ShamirR.Revealing.modularityorganizationin.theyeast.molecularnetwork.byintegrated.analysisof.highlyheterogeneous.genomewidedata.In: Proceedings of the National Academy of Sciences of the United States of America: 200429812986
  15. 15. Ben-DorA.ChorB.KarpR.YakhiniZ.Discoveringlocal.structurein.geneexpression.datathe.order-preservingsubmatrix.problemJournal of Computational Biology2003373 EOF384 EOF
  16. 16. WangH.YangW. W. J.YuP. S.Clustering.byPattern.Similaritythe. p.ClusterAlgorithm. S. I. G. M. O.SIGMOD 2002
  17. 17. IhmelsJ.FriedlanderG.BergmannS.SarigO.ZivY.BarkaiN.Revealingmodular.organizationin.theyeast.transcriptionalnetwork.Nature Genetics2002370 EOF7 EOF
  18. 18. ChengY.ChurchG. M.Biclusteringof.expressiondata.Proceedings of 8th International Conference on Intelligent Systems for Molecular Biology 2000200093103
  19. 19. HartiganJ.DirectClustering.ofa.datamatrix.Journal of the American Statistical Association1972123 EOF
  20. 20. IhmelsJ.BergmannS.BarkaiN.Definingtranscription.modulesusing.large-scalegene.expressiondata.Bioinformatics20041993 EOF2003 EOF
  21. 21. TavazoieS.HughesJ.CampbellM.ChoR.ChurchG.Systematicdetermination.ofgenetic.networkarchitecture.Nature Genetics1999281 EOF5 EOF
  22. 22. GuthkeR.MollerU.HoffmannM.ThiesF.TopferS.Dynamicnetwork.reconstructionfrom.geneexpression.dataapplied.toimmune.responseduring.bacterialinfection.Bioinformatics20051626 EOF1634 EOF
  23. 23. D’haeseleerP.LiangS.SomogyiR.Geneticnetwork.inferencefrom.co-expressionclustering.toreverse.engineeringBioinformatics 2000707 EOF26 EOF
  24. 24. ReissD.BaligaN.BonneauR.Integratedbiclustering.ofheterogeneous.genome-widedatasets.forthe.inferenceof.globalregulatory.networksB. M.BMC Bioinformatics2006280 EOF
  25. 25. YipK.YaQ.PeishenSchultzMartin.CheungDavid. W.Cheung-HoiKei.SemBiosphere. A.SemanticWeb.Approachto.RecommendingMicroarray.ClusteringServices.In: The Pacific Symposium on Biocomputing. 20062006188199
  26. 26. BarkowS.BleulerS.PrelicA.ZimmermannP.ZitzlerE.BicA. T. a.biclusteringanalysis.toolboxBioinformatics20061282 EOF1283 EOF
  27. 27. MaereS.HeymansK.KuiperM.BiN. G. O. a.Cytoscapeplugin.toassess.overrepresentationof.GeneOntology.categoriesin.BiologicalNetworks.Bioinformatics 2005
  28. 28. RueppA.ZollnerA.MaierD.AlbermannK.HaniJ.MokrejsM.TetkoI.GuldenerU.MannhauptG.MunsterkotterM.et al.TheFun.Cata.functionalannotation.schemefor.systematicclassification.ofproteins.fromwhole.genomesNucl Acids Res 2004
  29. 29. Castillo-Davis CI, Hartl DL: GeneMerge- post-genomic analysis, data mining, and hypothesis testing. Bioinformatics2003
  30. 30. BerrizG. F.KingO. D.BryantB.SanderC.RothF. P.Characterizinggene.setswith.FuncAssociate.Bioinformatics20032502 EOF2504 EOF
  31. 31. AshburnerM.BallC.BlakeJ.BotsteinD.ButlerH.CherryJ.DavisA.DolinskiK.DwightS.EppigJ.et al.Geneontology.toolfor.theunification.ofbiology.The Gene Ontology Consortium. Nat Genet 2000
  32. 32. GaschA. P.SpellmanP. T.KaoC. M.Carmel-HarelO.EisenM. B.StorzG.BotsteinD.BrownP. O.GenomicExpression.Programsin.theResponse.ofYeast.Cellsto.EnvironmentalChanges.Mol Biol Cell 20004241 EOF57 EOF
  33. 33. http://genome-www.stanford.edu/yeast/_stress
  34. 34. ShannonP.MarkielA.OzierO.BaligaN.WangJ.RamageD.AminN.SchwikowskiB.IdekerT.Cytoscapea.softwareenvironment.forintegrated.modelsof.biomolecularinteraction.networksGenome Res 2003

Written By

Fadhl M. Al-Akwaa

Submitted: 08 November 2011 Published: 12 September 2012