Plant Comparative Transcriptomics Reveals Functional Mechanisms and Gene Regulatory Networks Involved in Anther Development and Male Sterility

Gene transcription and transcriptional regulation are crucial biological processes in all cellular life. Through the next-generation sequencing (NGS) technology, transcriptome data from different tissues and developmental stages can be easily obtained, which provides us a powerful tool to reveal the transcriptional landscape of investigated tissue(s) at special developmental stage(s). Anther development is an important process not only for sexual plant reproduction but also for genic male sterility (GMS) used in agriculture production. Plant comparative transcriptomics has been widely used to uncover molecular mechanism of GMS. Here, we focused on researches of anther developmental process and plant GMS genes by using comparative transcriptomics method. In detail, the contents include the following: (1) we described the commonly used flowchart in comparative transcriptomics; (2) we summarized the comparative strategies used to analyze transcriptome data; (3) we presented a case study on a maize GMS gene, ZmMs33 ; (4) we described the methods and results previ-ously reported on gene co-expression and gene regulatory networks; (5) we presented the workflow of a case study on gene regulatory network reconstruction. The further development of comparative transcriptomics will provide us more powerful theoretical and application tools to investigate molecular mechanism underlying anther development and plant male sterility.


Introduction
Gene transcription is an important biological process by which genetic information stored within DNA molecules is transmitted to RNA molecules according to the "genetic central dogma" in molecular biology [1]. After completion of the human genome project, the researchers began to reveal the transcriptional Transcriptome Analysis landscape of all genes in a genome to further investigate the functional mechanisms underlying phenotypic variations at a genome-wide transcriptional level. Therefore, biological studies on high-throughput omics data run from the genomic level into the transcriptomic level. Transcriptome data includes biological information of gene transcriptional activities in a certain cell, a tissue, or an individual (a population of cells) and even in a pool of samples under a certain developmental stage, an environmental condition, or an experimental treatment. Compared with other omics data (e.g., data of genome, epigenome, proteome, metabolome, or phenome), the primary characteristic of transcriptome data is that it includes temporal-spatial bioinformation affected by diverse developmental stages, tissue types, and internal/external environment events. Therefore, transcriptome data is more complex than genome data.
Transcriptomic studies usually focus on the transcriptional content and gene regulations in a genome. Gene expression microarray (GEM) is an early developed but still-utilized biotechnology by which the genome-wide transcription information can be obtained for genome-sequenced or transcriptional loci available species. In 1995, Schena et al. monitored expression levels of 48 genes by GEM in Arabidopsis thaliana [2], and then GEM was gradually and widely used for the estimation of gene expression levels. Until 2013, the amount of transcripts monitored by one microarray had been reached to more than 285,000 in human transcriptomics studies (the human transcriptome array). GEM is a hybrid-based method, while the sequencing-based method has been developed much faster and became one of the most commonly used biotechnologies in scientific studies and applications related to disease diagnosis [3]. Serial analysis of gene expression (SAGE) proposed by Velculescu et al. [4] and massively parallel signature sequencing (MPSS) reported by Brenner et al. [5] are two earlier developed sequencing-based methods to estimate the transcription information at a genome level. Nowadays, the majority of transcriptome data are generated by the NGS-based RNA sequencing (RNA-seq). RNA-seq technology combining with the following developed comparative transcriptomics analysis flowchart that is mainly based on digital gene expression profile (DGEP) is a commonly used research strategy in biological studies at molecular and genomic levels.
Anther is an important organ in sexual plant reproduction. Anther development is a dynamic process from the identity of the stamen to the production of mature pollen grains. During this period, two-thirds of protein-coding genes are transcribed, and more than 6% of them are anther specific (a reanalyzed result based on [6]). Thus, the anther transcriptome is specific and complex compared with transcriptomes of other plant organs. Plant comparative transcriptomics is an effective strategy used to investigate the molecular mechanism underlying anther developmental process. The comparative method based on anther transcriptomes can be performed between different genotypes, different developmental stages, different types of anther cells, and different biotic or abiotic treatments and even between different plant species. Consequently, differentially expressed genes (DEGs) are identified from above comparisons. Based on the comparison results, functionally important coding genes and noncoding transcripts including long noncoding RNAs (lncRNAs), microRNAs (miRNAs), and other small RNAs could be uncovered. However, the goal of plant comparative transcriptomics is not only to identify DEGs but also to reconstruct gene regulatory relationships of the upstream regulators and the downstream regulated targets of the investigated genes. In this review, based on anther transcriptomes, we first summarized the research workflow commonly used in the experimental design and data analyses in plant transcriptomics studies, and then we described several types of comparison strategies in comparative transcriptomics using anther transcriptome data as the analyzed example. In the following section, we generally discussed gene regulatory and co-expression networks used to investigate the molecular foundation of developing anther in a network-based perspective. Additionally, we described two case studies in our laboratory to explain the detailed analysis processes and applications of comparative transcriptomics in plant GMS gene studies.

Comparative analysis using transcriptome data
In comparative transcriptomics, the commonly used pipeline to identify potential functional genes and to reveal the gene functions, as well as to investigate the regulatory relationships between these genes, includes five aspects. They are data preparation, DGEP analysis, DEG analysis, gene set enrichment (GSE) analysis, and gene regulatory network (GRN) analysis, respectively (Figure 1). These five aspects are closely connected in the whole pipeline, and the corresponding analyses mainly depend on data management skills in bioinformatics.
The basic application of comparative transcriptomics is to obtain a transcriptional landscape of the investigated biological sample. It is composed of not only the estimated transcription levels of annotated transcribed loci along the genomes (the known genomic loci with reported or predicted transcription abilities) but also the identification of novel transcribed loci (the stably transcribed loci not annotated or identified in previous studies). More importantly, in current biological studies, transcribed loci identified by researchers include not only the proteincoding genes but also lncRNAs and other noncoding RNAs. Both GEM and RNA-seq technologies can be used to uncover the genome-wide profiles of transcription levels of annotated genes. However, the identification of novel transcribed loci can be only effectively performed by RNA-seq method and the following DGEP analysis. This is one reason why RNA-seq is more commonly used in transcriptomics studies. Moreover, GEM method depends on hybridization probes that are designed based on known whole genome sequence or an appreciable set of sequenced transcripts Transcriptome Analysis 4 (e.g., expressed sequence tags) of the investigated species, which restrict its application on some species without whole genome information or sequence resource. On the contrary, the sequencing-based method of RNA-seq can be applicable for species without sequenced genomes. This is another reason for the popularity of RNA-seq. In genome available species, RNA-seq data should be firstly mapped to the reference genome (Figure 1).
A gene with its transcription levels significantly different between two groups of samples is defined as a DEG under a certain comparison condition (Figure 1). It is notable that the concept of DEG specially represents the expression changes of protein-coding genes at the earlier stages of expression data analysis. However, along with the rapid development of molecular biology and the deeper understanding on the functional element on the genome, the concept of DEG has been expanded to noncoding transcripts, for example, the differentially expressed (DE) miRNA and the DE lncRNA. Furthermore, if both coding and noncoding transcripts are considered in the comparative analysis of transcriptome data, transcriptional alterations between control and treated samples should be defined as DE transcribed loci or DE loci. Thus, DE loci is a broad concept used to describe transcriptional alterations of genetic element. There are several strategies for comparing transcriptomes from different research subjects to identify DE loci (described in Section 3, "Plant comparative transcriptomics in anther").
Identified DEG set or DE loci should be appropriately annotated with functional descriptions to determine which biological process or pathway these DEGs are involved in. In comparative transcriptomics, this step is a critical bridge linking transcriptional changes to gene functions and even gene regulation networks. Two commonly utilized methods to annotate DEGs consist of the Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway gene enrichment analyses. Both of them belong to GSE analysis (Figure 1). The GO database includes tens of thousands of GO terms, and each GO term contains several genes with the same biological function. Each gene has three functional aspects, including molecular function (the molecular activities of gene products), cellular component (the cellular locations of functional gene products), and biological process (the gene products' molecular functions with biological process). GSE analysis based on GO database provides some basic functional descriptions for the investigated DEG set. KEGG analysis is a pathway-based enrichment method. The KEGG database has accumulated hundreds of metabolism pathways in plants, animals, and other species. Thus, KEGG analysis can reveal significant pathways the DEGs participated in. GO-based methods can annotate more genes than KEGG-based method, as the GO terms are more flexible and include a larger number of genes. On the other hand, because most metabolic pathways are conserved across species and more significant in biological processes, annotated results obtained from KEGG-based method may be more conserved and stable. In comparative transcriptomics, GO-and KEGGbased analyses are together utilized in gene function studies.
The locations of transcribed loci on the genome, their transcription levels, and the changed expression can be identified through comparative transcriptomics analysis. The detected DEG set represents a functional gene set related to the function of investigated gene, the phenotype variation, the stress resistance ability, or the development process. Furthermore, gene regulation relationships are the underlining molecular mechanism of altered transcriptomes, and novel gene regulatory networks could be uncovered by comparative transcriptomics analysis (Figure 1). Several types of gene regulatory relationships and the reconstructions of gene regulatory networks based on plant comparative transcriptomics are described and discussed in Section 5 ("Gene co-expression and regulatory networks reconstructed by comparative transcriptomics method").

Plant comparative transcriptomics in anther
One of the major subjects of modern molecular biology is to uncover the functions of genes in the genome and reveal the molecular mechanism of phenotypic variation. Gene transcription levels and their changes in different conditions are important information that can reflect the functions and transcriptional regulation relationships of investigated genes. How to estimate the transcription levels of genes and how to obtain the transcriptional landscape of a genome are two major subjects in biological studies on gene expression. DGEP and DEG analyses are powerful tools to solve these questions. In DEG analysis, according to the scientific or application questions, the comparison strategies between investigated biological samples are classified into six types including (1) different genotypes, (2) different developmental stages, (3) different tissues, (4) different cell types, (5) different treatments, and (6) different species (Figure 2). Here, as we mainly focus on comparative transcriptomics analysis on the developmental anther tissues and the interspecies analysis on anther transcriptome data being rare, the third and sixth types will not be discussed.

Different genotypes
There are two types of genotype-based transcriptome data between wild type (WT) and mutant lines in GMS studies, which are based on whether the causal mutation is known or not ( Table 1). For transcriptomes of male sterility (MS) lines with known causal mutations, the comparison of transcriptomes between WT and MS lines will identify many DEGs associated with the function loss or expression change of the investigated mutation locus. If the causal mutation has not been identified from the MS line, comparative transcriptomics analyses will provide the researchers important results related to the unsettled genetic difference, such as how many genes are changed in expression levels in the MS lines and what the functions of these genes are, even though the causal mutation candidates can be inferred from these genes if the researchers have primary mapping results.

Different developmental stages
The phenotypic differences among tissues and organs (e.g., root, leaf, and flower in plant) due to their differences of transcriptome landscape are well known.  Furthermore, it is a developmental process for most types of plant organs from the organ identity (e.g., meristematic cells) to the final mature organ. Thus, how to reveal the dynamic changes of gene transcription levels and how to explain the morphological alterations regulated by gene expression changes are important tasks in plant comparative transcriptomics studies. Meiosis is an important step in gametophyte generation process and sexual plant reproduction. Morphologic changes during cell meiosis process have been well described by cellular level investigations, while the molecular level alterations and their corresponding gene regulatory networks are not well understood. Plant transcriptomes are a powerful dataset to estimate the gene expression changes and infer the regulatory roles of key genes. Based on GEM technology, Ma et al. investigated maize anther transcriptomes during seven developmental stages and found that transcriptomes during meiosis stages exhibited the lowest complexity [33]. Hollender et al. surveyed the gene transcription profiles of anther of woodland strawberry (Fragaria vesca) from developmental stages 7-12 and identified numerous F-Box genes induced in transcription levels at meiosis stage [34]. Besides, tapetum is the inner cell layer of anther with important functions in anther development and gametocyte maturation. The generation, development, and degradation of tapetum are fine regulated during the anther development, while the regulatory framework and the details are far from complete. Yue et al. identified 243 DEG and 108 stage-specific genes during four anther developmental stages in Hamelia patens [35]. Chen et al. investigated the expression of genes involving in tapetum development of male floral bud during eight developmental stages in Populus tomentosa [36]. Thus, anther transcriptome data during different developmental stages provide valuable data sources for anther development studies. By the combination of comparative transcriptomics and bioinformatics analyses, more key functional genes and the underlying regulatory mechanisms for anther development will be further revealed.

Different types of anther cells
The cytological structure of anther consists of four cell layers, including the epidermis, endothecium, middle layer, and tapetum, and the archesporial cells are directly surrounded by the tapetum. Thus, the transcriptome data of a whole anther tissue is a mixed gene expression data from diverse cell types with different functions in the anther development process. It is necessary to obtain transcriptional dynamics from different cell layers separately to investigate anther development and the underlying molecular mechanism at a cell type-specific level. Several studies have identified cell layer-specifically expressed genes (e.g., tapetum cells or microgametes). Ma et al. identified 104 MS-related and non-pollen expressed genes most specifically expressed in tapetum by comparative transcriptomics analysis on four diverse MS lines in Brassica oleracea [37]. The other way to obtain cell layer-specific transcriptome in anther is firstly separating the investigated cell layer by laser capture microdissection (LCM) technology and then performing RNA-seq or GEM experiment on the separated samples. This strategy has been successfully used in rice, maize, and woodland strawberry to identify the tapetum-or microgamete-specifically expressed genes and their expression dynamics [34,38,39]. A recent published research has investigated maize male meiosis using singlecell RNA sequencing (scRNA-seq) technology on pre-meiotic and meiotic cells from maize anthers, which greatly promoted studies on plant anther scRNA-seq [40]. The comparative studies on transcriptomic dynamics between different types of cells facilitate the deeper understanding of functions of specific cell layers on anther development.

Different treatments
At the reproductive stage, plant is more sensitive to external environment conditions. The abiotic stresses, such as high temperature, drought, and cold and freezing stresses, will critically affect the developmental process of anther and pollen in flowing plants. Though there have been numerous studies on stress resistance and response in plant, the regulatory pathways of stress response and their cross talk at molecular level should be further investigated for anther development. Additionally, more effective stress-resistant genes should be identified for the purpose of crop improvement. Plant comparative transcriptomics between normal and stress-treated plants provide a wide insight into the stress response mechanisms of plant during sexual reproductive stage. Zhang et al. investigated the genomewide transcriptional changes of rice panicle under heat treatment (40°C) and found thousands of DEGs participating in transcriptional regulation, transport, cellular homeostasis, and stress response [41]. Studies on photosensitive or thermosensitive GMS lines can also reveal a lot of genes responding to environmental changes.

A case study: revealing the molecular functions of a MS gene, ZmMs33, by comparative transcriptomics
The discoveries of genes that play key roles in the development of maize anther provide important genetic resources for the utilization of heterosis in maize. Analysis of functional mechanism of GMS genes can effectively promote researches on anther development biology and deepen our understanding of molecular mechanism controlling sexual plant reproduction [42]. There are several published case studies containing comparative transcriptomics analysis on maize GMS genes in our laboratory, including ZmMs7 [43], ZmMs20 [44], ZmMs30 [45], and ZmMs33 [46,47]. We used comparative transcriptomics analysis based on developmental anthers of ZmMs33 wild type and ms33-6038 mutant to analyze the transcription changes corresponding to male sterility phenotype and to further investigate the underlying molecular mechanisms of GMS regulated by ZmMs33 gene.
This ms33-6038 mutant is complete male sterility and displays small and paleyellow anthers ( Figure 3A). Transmission electron microscope (TEM) observation and dynamic scanning electron microscopy (SEM) analysis were performed to analyze the phenotypic alteration of anther wall layers, microspores, Ubisch bodies, and exine between wild type and ms33-6038 mutant during anther developmental stages (Figure 3A-C).
Maize Zm00001d007714 was identified as ZmMs33 via a map-based cloning approach (Figure 3D). ZmMs33 encodes an esterase that belongs to gene family of glycerol-3-phosphate acyltransferase (GPAT) in maize. To further confirm gene function of Zm00001d007714, a CRISPR/Cas9 system was used to generate targeted knockout lines. Three types of T 0 -generation maize plants homozygous for null alleles of Zm00001d007714 were observed to be complete male sterility (Figure 3E), suggesting that function loss of Zm00001d007714 is the causal mutation for male sterile phenotype of the ms33 mutant.
Subsequently, RNA-seq was performed using anther tissues during developmental stages 5-9 to obtain a comprehensive transcriptional profile of WT and ms33-6038. Three biological samples were collected at each developmental stage for sequencing. After data preparation and transcription level estimation, we compared similarities of transcriptional profiles of protein-coding genes by principal component analysis (PCA) (Figure 3F) and found good repeatability among three biological repeats. Finally, we identified DEGs between WT and mutant and between adjacent developmental stages, separately. We found that the amount of DEGs between WT and mutant at stages 5-7 was significantly smaller in magnitude than that at  Figure 3A-C was cited from [46]. Figure 3D and E was cited from [47]. stages 8a-9 (Figure 3G), indicating that ms33 mutant transcriptomes are significantly divergent from WT starting from stage 8a. The transcriptome landscapes of WT were similar to those of ms33 mutant at stages 5-7. Besides, DEG amounts were various between adjacent developmental stages. It is worth noting that the DEG amount between WT and mutant exceeded that between adjacent stages from stage 8a-9. This result implied that the transcriptomes were significantly changed at the later three stages. Therefore, we compared the transcriptomes between genotypes at the former three and the later three stages, separately. In contrast to a limited number of DEGs (only two genes) shared by the former three stages, there were thousands of shared DEGs at the later three stages. GSE analysis based on KEGG database suggested that the upregulated gene set was firstly enriched in the function of biosynthesis of secondary metabolites, while the downregulated genes were significantly related to the photosynthesis process. This pathway enrichment analysis partly represents the alterations in metabolisms and physiological activities closely associated with the transcriptional changes caused by function defect of ms33.

Gene co-expression and regulatory networks reconstructed by comparative transcriptomics method
Though DEGs are mainly identified by pairwise comparisons between transcriptomes of tissues, stages, or treatment conditions and can reflect most of the transcriptional changes between two sets of samples, these transcriptional alterations are not sufficient to explain the detailed molecular mechanism underlying tissue-specific development processes and stress-resistant pathways. Moreover, the molecular functions of genes act under GRNs. All the biological processes of growth, development, stress response, and reproduction are regulated by GRNs. The prediction of gene regulatory relationships and the reconstruction of the GRNs by using the transcriptome data are also the major aims in transcriptomics studies, except for the DGEP and DEG analyses.

Gene co-expression analysis
Function-related genes tend to co-express in a cell, either to form a complex or to involve in the same biological pathway. Thus, the similar pattern of gene expressions can be used as an indicator to predict gene functions. Gene co-expression (GCE) analysis is a powerful tool to discover important functional genes in biological processes including anther development. A relatively early study identified two functional GMS genes, POLYKETIDE SYNTHASE A (PKSA) and PKSB, through detecting co-expressed genes with ACOS5, a GMS gene belonging to fatty acyl-CoA synthetase gene family, based on microarray data in A. thaliana [48]. Similarly, ABORTED MICROSPORES (AMS) gene was reported participating in the pollen wall formation in rice by the analyses of 98 co-expressed genes with AMS in flower development [49]. GCE analysis can be also used to investigate the biological functions and the regulatory targets of a gene. This genome-wide analysis on GCE networks has been performed based on microarray data from A. thaliana anther tissues, and 254 complete GCE groups containing 10,513 anthertranscribed genes were revealed [50]. Another microarray-based GCE network was reconstructed in A. thaliana anther by using 10,797 genes expressed in anther/ flora, and transcriptional landscape of GMS mutant was included in the stable examination of this newly constructed network [51]. In rice, microarrays from WT anther tissue across stages 2-14 and nine GMS lines were integrated to reconstruct a big GCE network containing more than 9000 genes and 0.4 million pairs of coexpression relationships [52].
RNA-seq data-based GCE network analysis was performed in anther when high-throughput sequencing technology was developed. In woodland strawberry, stages 1-12 floral samples dissected by LCM or hand, including stages 6-12 anther tissues, were sequenced by RNA-seq. Gene co-expression network analysis was used to reconstruct GCE networks in strawberry's flower development, and 23 modules were discovered from the GCE networks including 4584 pollen-specific genes [34]. These genome-wide GCE networks are useful for characterization of genes associated with anther development and floral reproduction.

TF-encoding gene regulatory network
Genes with their products forming one protein complex, genes encoding transcription factor (TF) and TF target genes, and genes functioning in the same metabolic pathway or stress-resistant process often tend to be co-expressed in a cell. Therefore, the expression-associated genes in GCE network may be not directly functionally linked. A more accurate and robust gene regulatory network is needed for both the biological function and network researches at molecular and genome levels. One way to improve the gene regulatory network is to introduce gene regulatory types into the network. Several TF gene regulatory networks (TF-GRN), also called as transcriptional regulatory network (TRN), were reconstructed based on expression patterns of TF-encoding genes and TF target genes from transcriptome data. One TF-GRN comprised 19 TFs and their 101 target genes involving in A. thaliana pollen development [53]. Another GRN of early anther development was constructed by interactively analyzing transcriptome data from three GMS lines of TF-encoding gene knockout mutants [9]. In the maize genome, there are 2298 TF-encoding genes identified which belonged to 56 diverse families [54]. A total of 3078 TF-encoding genes belonging to 59 families are predicted in silico analysis in rice genome [55]. These TF databases, combining with increased amount of transcriptome data from mutants of TF-encoding genes and other omics data (e.g., Chip-seq, DAP-seq), provide abundant data for the reconstruction of TF-GRN with increased credibility, applicability, and completeness.

miRNA target gene regulatory network
Both transcriptional and posttranscriptional regulations are crucial in controlling the normal development and stress-resistant process in cellular life. The miRNA-mediated regulation model on target genes is a well-studied posttranscriptional gene regulation pathway that plays important roles in floral identification and the following development of flower organs [56][57][58] as well as male fertility [59,60]. Beyond numerous case studies on functional miRNAs in anther development and GMS genes [61][62][63][64], the expression profile of miRNAs and the regulatory networks were investigated to elevate our understanding on the transcriptional regulatory mechanism of miRNAs. GRNs between miRNA and their target genes have been constructed via flower/anther transcriptomics in the model plant species, A. thaliana, and some other plants [65][66][67][68]. Furthermore, comparative transcriptomics analysis on small miRNAs has been commonly used as a research method to reveal the transcriptional alterations between fertility and sterility lines in economic and food plant species, such as maize [45], tomato [69], cotton [70,71], wheat [72,73], pine [74], lycium [75], watermelon [32], and Brassica campestris [76].

ceRNA-miRNA regulatory network
It is well known that miRNAs are crucial regulators on gene expressions that control key biological functions including anther development, since miRNA was firstly found in nematodes in 1993 [77]. It is noteworthy that a novel type of gene regulatory model, the competing endogenous RNA (ceRNA) hypothesis, was recently proposed [78]. According to the ceRNA hypothesis, some endogenous transcripts have abilities to adsorb miRNA molecules; subsequently, the expression levels of miRNA target genes can be derepressed [78,79]. A typical ceRNA in plant, a long noncoding RNA, IPSI, was found in A. thaliana. It could completely sponge miRNA ath-miR399 and indirectly increase the transcription levels of an important gene involved in phosphate homeostasis [80]. The following studies revealed that transcripts of protein-coding genes, pseudogene, transposable elements, simple sequence repeat, and circular RNAs have molecular functions as ceRNAs [79,81,82], indicating that the ceRNA-miRNA relationship is an essential gene regulatory mechanism in the growth and development of plants and animals. Consequently, it is necessary to introduce ceRNA regulators into GRN construction. Here, we present our recent study on reconstructing ceRNA regulatory network mainly based on RNA-seq and small RNA-seq transcriptomes from developmental maize anther.

A case study: reconstructing ceRNA-miRNA target gene regulatory networks using transcriptome data of maize anther
Here we summarized the research progress of one recently completed research related to the ceRNA-mediated GRN in our laboratory. Generally speaking, this is the first study introducing ceRNA regulation into miRNA target gene regulatory pathway for deeply dissecting the mechanism of anther development and sexual plant reproduction at a network level. This provides a fresh example for GRN research by plant comparative transcriptomics and has dual significance in both theoretical and practical senses. It may also provide new thoughts and strategies for further transcriptome-based GRN studies.
It is well known that gene expressions are controlled by the GRN in cellular life. Newly found regulatory patterns (e.g., miRNA pathway and epigenetic modification) have enhanced our understanding on the GRN. Recently, "ceRNA hypothesis" was proposed as a novel type of gene regulatory relationship and was found to participate in different development and stress response processes of organisms by a number of case studies. However, the network level study on ceRNA regulatory functions is still rare, which limited our deep understanding on the GRN. In addition, studies on the GRN of sexual plant reproduction and male sterility are crucial for both fundamental biological significance and applications in plant hybrid breeding and seed production. We investigated ceRNA-miRNA target gene regulatory network in maize anther developmental process by plant comparative transcriptomics method. Six steps were performed from raw sequencing data preparation to the finally constructed GRN (Figure 4). Firstly, we performed RNA-and small RNA-seq using anther tissues at three developmental stages from two maize lines to obtain a relative broad transcriptional landscape in anther development and transcribed loci that are stably expressed in maize species. Secondly, we identified stably transcribed loci based on the maize reference genome and estimated their transcription levels. In this step, we only used shared transcription loci identified from RNA-seq data between two maize lines ( Figure 4A). Notably, these transcribed loci were divided into five groups such as protein-coding genes, lncRNAs, transposable elements, and unassigned loci. Thirdly, we identified known miRNAs and predicted potential novel miRNAs that may be involved in maize anther development. Sequenced small RNA data were obtained from the same samples that were used in RNA-seq. A matched dataset (e.g., matched RNA and small RNA sequenced dataset here) is important in experimental design and more powerful to reveal the investigated biological questions. Though the analysis workflow of small RNA-seq data is similar to that of RNA-seq data in general (Figure 1), there are some differences between them. In our analysis, we reanalyzed two sets of published small RNA data to compare with their results from our own sequenced data for credible known and potential novel miRNAs involved in maize anther development [23,83] ( Figure 4B). This is an important check method to confirm the stability of research results and conclusions. Fourthly, we predicted ceRNA-miRNA interaction pairs and miRNA target gene regulatory pairs by computational approach (Figure 4C). Bioinformatics analysis in this step is mainly based on genome sequence but not the transcriptomes. Fifthly, we reconstructed ceRNA-miRNA target gene regulatory networks by predicted interaction pairs and transcription correlation patterns from transcriptomics data (Figure 4D). It is well known that miRNAs could repress the transcription levels of their target genes. Additionally, ceRNA was demonstrated to negatively regulate the transcription levels of matched miRNAs. The negatively associated gene pairs in transcription levels may be more credible in mutual interactions. By integrating ceRNA-miRNA and miRNA target gene interactions, we reconstructed ceRNA-miRNA target gene regulatory networks in maize anther. Finally, we generally investigated the functional significance of genes in the regulatory network by GO enrichment analysis. In these networks, we found a number of well-studied genes and miRNA target gene pairs involved in maize anther development and male sterility, suggesting that the ceRNA-miRNA target gene regulatory networks contribute to anther development in maize. Besides, GO analysis of target genes in the network revealed that they are functionally enriched in flower development process ( Figure 4E) [84].

Conclusions
Here, we summarized major points in comparative transcriptomics analysis from the commonly utilized workflow to the closely related research cases and from the single gene-based function analysis to GRN-based gene function investigation. In GMS gene studies, the research experiments using comparative transcriptomics method to investigate key functional genes and the genome-wide GRNs in developmental anther will facilitate our systematical understanding on the biological processes and molecular regulatory networks for anther development and sexual plant reproduction. More importantly, case studies illustrated here have a general meaning on technologies and methodologies for functional researches of other biological pathways and processes. With the fast advancement of sequencing technology, plant comparative transcriptomics has achieved considerable development. However, our understanding on the transcriptional dynamics and gene regulatory relationships of biological processes are far from being completed. Consequently, more efforts are needed for the further improvement of comparative transcriptomics in plant biological studies.