Nucleosome, composed of a 147-bp segment of DNA helix wrapped around a histone protein octamer, serves as the basic unit of chromatin. Nucleosome positioning refers to the relative position of DNA double helix with respect to the histone octamer. The positioning has an important role in transcription, DNA replication and other DNA transactions since packing DNA into nucleosomes occludes the binding site of proteins. Moreover, the nucleosomes bear histone modifications thus having a profound effect in regulation. Nucleosome positioning and its roles are extensively studied in model organism yeast. In this chapter, nucleosome organization and its roles in gene regulation are reviewed. Typically, nucleosomes are depleted around transcription start sites (TSSs), resulting in a nucleosome-free region (NFR) that is flanked by two well-positioned H2A.Z-containing nucleosomes. The nucleosomes downstream of the TSS are equally spaced in a nucleosome array. DNA sequences, especially 10–11 bp periodicities of some specific dinucleotides, partly determine the nucleosome positioning. Nucleosome occupancy can be determined with high throughput sequencing techniques. Importantly, nucleosome positions are dynamic in different cell types and different environments. Histones depletions, histones mutations, heat shock and changes in carbon source will profoundly change nucleosome organization. In the yeast cells, upon mutating the histones, the nucleosomes change drastically at promoters and the highly expressed genes, such as ribosome genes, undergo more change. The changes of nucleosomes tightly associate the transcription initiation, elongation and termination. H2A.Z is contained in the +1 and −1 nucleosomes and thus in transcription. Chaperon Chz1 and elongation factor Spt16 function in H2A.Z deposition on chromatin. The chapter covers the basic concept of nucleosomes, nucleosome determinant, the techniques of mapping nucleosomes, nucleosome alteration upon stress and mutation, and Htz1 dynamics on chromatin.
- 10–11 bp periodicities
- nucleosome-free region
- histone mutation
1. Basic conceptions about chromatin and nucleosome
1.1. Chromatin of eukaryotic DNA, nucleosome, nucleosome compositions, and histone
Eukaryotic DNA exists as chromatin structure, which is composed of DNA and proteins in the nucleus (Figure 1). The proteins can divide into histone proteins (H1/H5, H2A, H2B, H3, and H4) and non-histone ones. The former acts as core which DNA winds. The histone winding with DNA acts as a ball that forms the basic structure. Non-histone proteins have three main functions: (1) enzyme used in different DNA activities, for example, DNA reparation, duplication, and translation, such as DNA polymerase and DNA ligase; (2) scaffold proteins. They play the role of skeleton; and (3) other motor proteins. All play essential roles in cell structure and regulatory functions that make life possible.
Since the package of DNA must be rapidly accessible so that protein machinery is able to interact with DNA in replication, transcription, DNA repair, and recombination, the chromatin is highly different in different cells and different periods. Chromatin can be divided into euchromatin and heterochromatin. Heterochromatin is characterized by its high compactness and its inhibitory effect on DNA transactions such as gene expression. However, according to Volpe et al. , many of them actually can transcribe but are silenced by RNA by RNA-induced transcriptional silencing (RITS). Euchromatin is the chromatin which is not packaged tightly like heterochromatin so it is more accessible. Most of chromatin are euchromatin (92% of the human genome ); it contains activating genes and changes its condensation during cell cycle.
1.2. Nucleosome and histones
Nucleosomes are the basic unit of chromatin. The nucleosome consists of 147 bp of DNA wrapped around an octamer of histones, with two copies of each H2A, H2B, H3, and H4, and about 1.65 superhelical turns arranged in a left-handed manner  (Figure 2). The nucleosome cores are connected by linker DNA, which typically ranges from 10 bp to 90 bp in length, to form a “beads-on-a-string” nucleosomal array with a diameter of 11 nm . At the entry and the exit of the nucleosome, H1 binds the DNA to make the nucleosomes fixate in the space.
The “tails” of these histone proteins stick out, especially H3 and H4, where they can be modified in many ways. Modifications of the tail include methylation, acetylation, phosphorylation, ubiquitination, SUMOylation, citrullination, and ADP-ribosylation. Through these chemical modifications, histone can change its interaction with DNA. Interestingly, many of these modifications have fixed position and function (Table 1).
|Type of modification||Histone|
|Tri-Me||A||R||R||A & R||A||R|
All histones have its variants (Figure 3) , and they have different biological function compared to canonical histones. Exchanging with canonical histones dynamically also plays an important role in regulation of gene expression.
1.3. Research history of nucleosomes, especially in yeast
Clark and Felsenfeld first used staphylococcal nuclease to digest chromatin in 1971 and found that some regions were sensitive to nuclease while some were insensitive; insensitive regions were homogeneous, suggesting it contains subunits. Then Hewish and Burgoyun Researchers in previous study digested the nuclei with endogenous nuclease and isolated DNA from the nucleus. As a result, a series of DNA fragments were found, which corresponded to a basic unit of about 200 bp, indicating that histones bind to DNA in a regular manner which results in only certain restricted regions are sensitive to nuclease.
Kornberg and Thomas then digested the chromatin with a small cellulase in 1974 and centrifuged it to obtain monomers, dimers, trimers, and tetramers. Using electron microscopy, the monomer was observed as a 10 nm body, and the dimer was two associated bodies. The same trimer and tetramer consisted of three bodies and four bodies, respectively, indicating that the structure consisting of 200 bp DNA was “rope beads” units, which are called nucleosomes.
Through all kinds of experiments, it was found that the structure of the nucleosome core is relatively invariant from yeast to metazoans [11, 12] containing a 147 bp DNA wrapped around a histone protein octamer. In 2005, Yuan et al. developed a tiled microarray approach to identify at high resolution the translational positions of 2278 nucleosomes over 482 kb of Saccharomyces cerevisiae DNA, including almost all of chromosome III and 223 additional regulatory regions . However, the study of the location of nucleosomes is quite time-consuming and costly if using experiments alone, so the researchers began to build nucleosome positioning prediction model based on the existing experimental data . In yeast genome, Segal et al. found that DNA sequence contains ~10-bp period pattern of AA-TT-TA/GC dinucleotides . Nucleosomal DNA sharp bending occurs at every DNA helical repeat (~10 bp), when the major groove of the DNA faces inward toward the histone octamer, and again ~5 bp away, with opposite direction, when the major groove faces outward. The property of the ~10-bp periodicity is called “a genome code” for nucleosomes. Since that, many nucleosome prediction models were developed.
2. Nucleosome positioning and its determinant
2.1. Concepts of nucleosome positioning and nucleosome occupancy
The term “nucleosome positioning” is used to indicate where nucleosomes are located with respect to the genomic DNA sequence . Generally, nucleosome positioning can divide into two parts: rotational positioning and translational positioning. The first one is to describe the side of the DNA helix that faces the histones and the next one is to determine the nucleosome midpoint with regard to the DNA sequence.
By doing statistical analysis, “nucleosome occupancy” tries to identify the possibilities of a base pair whether it is in a nucleosome region . It is possible to calculate average nucleosome positioning levels on a given region of DNA in a population of cells. In ideal conditions, nucleosome is “shaking” in the perfect position. By counting the time of sequenced reads that are overlapped by nucleosome center in a ~147 bp window, it gives the most conservative locus which means that it is most possible to have a nucleosome there (Figure 4).
2.2. The association between nucleosome positioning and 10–11 bp periodicities in DNA sequence in yeast
Early in 1990, the 10–11 bp periodicities were reported . In addition to 3-bp periodicity, which is due to the fact that three consecutive bases encode one type of amino acids, the genomic DNA exhibits 10–11 bp periodicities. The 10–11 bp periodicities in complete genomes reflect protein structure and DNA folding .
In alpha helices structure, the hydrophobic amino acids (aa) occur with a ~3.5 aa, and all five hydrophobic amino acids L, I, V, F, and M have a base T (thymine) at middle position of their codons. This leads ~3.5 × 3 = ~10.5 bp periodicity in protein coding DNA sequences, called protein-induced periodicity.
On the other hand, the 10–11 bp periodicities have an intimate association with nucleosome positioning. To sharply bent and tightly wrapped around a histone protein octamer, DNA sequence has intrinsic bias. The position of certain dinucleotides, such as AA, TA, and TT in minor grooves facing toward (every 10 bp) and GG in minor grooves facing away from the histone octamer favors these (Figure 5) distortions . Moreover, when digesting DNA using DNase I (Deoxyribonuclease I), it was observed that the cleavage pattern in nucleosome position shows a ~10.3 bp period, which is equal to a minor groove. For the naked DNA, which is entirely devoid of nucleosomes, the oscillatory pattern in cleavage profile was disappeared in digesting . All of these strongly suggested the role of the 10–11 bp periodicities of the specific dinucleotides in positioning nucleosomes. Based on the features of DNA sequences, many models are developed to predict nucleosomes (Table 2).
|Segal Lab: Online Nucleosomes Prediction||This tool allows you to submit a genomic sequence and to receive a prediction of the nucleosomes positions on it, based on the nucleosome-DNA interaction model||[15, 21, 22]|
|iNuc-PseKNC||A predictor for predicting nucleosome positioning in Homo sapiens, Caenorhabditis elegans, and Drosophila melanogaster genomes|||
|NuCMap||NuCMap is based on chemical modification of engineered histones. The tool reveals novel aspects of the in vivo nucleosome organization that are linked to transcription factor (TF) binding, RNA polymerase pausing, and the higher order structure of the chromatin fiber|||
|NuPoP||Predicts nucleosome position by explicitly modeling the linker DNA length. NuPoP is based on a duration hidden Markov model (HMM)|||
|Epidaurus||Epidaurus is a bioinformatics tool used to effectively reveal inter-dataset relevance and differences through data aggregation, integration, and visualization|||
|Multi-Layer Model||Analyses nucleosome position data obtained with microarray-based approach. MLM is a classifier to distinguish between several kinds of patterns|||
|NucEnerGen||Predicts nucleosome energetics by using high throughput sequencing. It establishes that nucleosome occupancies can be explained by systematic differences in mono- and dinucleotide content between nucleosomal and linker DNA sequences|||
|nuMap||Implements the YR and W/S schemes to predict nucleosome positioning at high resolution. This methodology is based on the sequence-dependent anisotropic bending||[29, 30]|
|NPRD||Compiles the available experimental data on locations and characteristics of nucleosome formation sites (NFSs). The object of the database is a single NFS described in an individual entry|||
|AWNFR||An algorithm based on down-sampling operation and footprint in wavelet|||
|ICM||Allows users to assess nucleosome stability and fold sequences of DNA into putative chromatin templates. It uses an elastic model to place nucleosomes|||
|SymCurv||The tool is able to capture sequence constraints, which are related to structure in genomic regions|
|FineStr||Allows users to upload genomic sequences in FASTA format and to perform a single-base-resolution nucleosome mapping on them|||
|iNuc-PhysChem||Identifies nucleosomal sequences by incorporating physicochemical properties into a 1788-dimensional feature vector. iNuc-PhysChem was able to identify nucleosome positioning for an independent DNA segment extracted from the Saccharomyces cerevisiae genome|||
|TemplateFilter||High-resolution nucleosome mapping reveals transcription-dependent promoter packaging|||
|DANPOS||A comprehensive bioinformatics pipeline explicitly designed for dynamic nucleosome analysis at single-nucleotide resolution. DANPOS is also robust in defining functional dynamic nucleosomes|||
|BINOCh||A package that allows biologists to carry out an analysis of nucleosome occupancy data to discover stimulus-induced transcription factor binding|||
|PING||A package for nucleosome positioning using MNase-seq data or MNase- or sonicated ChIP-seq data. PING uses a model-based approach, which enables nucleosome predictions even in the presence of low read counts|||
|ChIPseqR||A package based on an algorithm for the analysis of nucleosome positioning and histone modification ChIP-seq experiments|||
|NUCwave||A bioinformatic tool that generates nucleosome occupation maps from chromatin digestion with micrococcal nuclease (MNase-seq), chemical cleavage (CC-seq), chromatin inmunoprecipitation (ChIP-seq) and fragmentation by sonication|||
|NucPosSimulator||A simulation tool to identify positions of nucleosomes from next generation sequencing data|||
|NucHunter||Inferring nucleosome positions with their histone mark annotation from ChIP data|||
|DiNuP||A systematic approach to identify regions of differential nucleosome positioning|||
|NucTools||Allows calculations of nucleosome occupancy profiles averaged over several replicates, comparisons of nucleosome occupancy landscapes between different experimental conditions, and the estimation of the changes of integral chromatin properties. NucTools facilitates the annotation of nucleosome occupancy with other chromatin features like binding of transcription factors (TF) or architectural proteins, and epigenetic marks like histone modifications or DNA methylation|||
|Dimnp||Identifies differential nucleosome regions (DNRs) in multiple samples. Dimnp is able to identify all the DNRs that are identified by two-sample method Danpos. It shows a good capacity (area under the curve >0.87) compared with the manually identified DNRs|||
|ArchAlign||ArchAlign identifies shared chromatin structural patterns from high-resolution chromatin structural datasets derived from next-generation sequencing or tiled microarray approaches for user defined regions of interest|||
|SANEFALCON||A tool developed to calculate the fetal fraction for noninvasive prenatal testing based on genome-wide nucleosome profiles, based on single end sequencing of cell-free DNA|||
|NucDe||An R package mapping nucleosome-linker boundaries from both MNase-Chip and MNase-seq data using a non-homogeneous hidden-state model based on first-order differences of experimental data along genomic coordinates|
|Nu-OSCAR||A program that can be used to identify binding sites of known transcription factors|
|NSeq||A multithreaded Java application for finding positioned nucleosomes from sequencing data|||
|ArchTEx||The extension of mapped sequence tags is a common step in the analysis of single-end next-generation sequencing (NGS) data from protein localization and chromatin studies. ArchTEx identifies the optimal extension of sequence tags based on the maximum correlation between forward and reverse tags and extracts and visualizes sites of interest using the predicted extension|||
|PuFFIN||Builds genome-wide nucleosome maps specifically designed to take advantage of paired-end reads. This method can accurately determine a genome-wide set of nonoverlapping nucleosomes without any user-defined parameters|||
|NPS||A python software package that can identify nucleosome positions given histone-modification ChIP-seq or nucleosome sequencing at the nucleosome level|||
We further found that within frequency domains, weakly bound dinucleotides (AA, AT, and the combinations AA-TT-TA and AA-TT-TA-AT) present doublet peaks in a periodicity range of 10–11 bp, and strongly bound dinucleotides present a single peak . A time-frequency analysis, based on wavelet transformation, indicated that weakly bound dinucleotides of nucleosomal DNA sequences were spaced smaller (~10.3 bp) at the two ends, with larger (~11.1 bp) spacing in the middle section. The finding was supported by DNA curvature and was prevalent in all core DNA sequences.
We assessed the roles of the 10–11 bp periodicities for different kinds of dinucleotides . Near the transcription start site, the signals reveal a similar feature that the nucleosome organization exhibits (Figure 6). But, it seems that the species do not share the same dinucleotides patterns. Furthermore, the dinucleotides patterns are dominant at the specific region of genome, indicating their diverse roles in forming and organizing nucleosomes.
2.3. Nucleosome prediction models for yeast
In Table 2, the models for both nucleosome prediction and nucleosome sequencing data processing are listed.
2.4. The chromatin remodeling complex and its roles in altering nucleosomes
Chromatin remodeling complex helps cell to establish the access of genomic DNA for transcription factors. The complexes have two major groups, namely covalent histone-modifying complexes and ATP-dependent chromatin remodeling complexes . They work in a different way.
ATP-dependent chromatin-remodeling enzymes are helicase which use ATP’s energy to reposition (slide, twist or loop) nucleosomes along the DNA, expel histones away from DNA or facilitate exchange of histone variants, and thus creating nucleosome-free regions of DNA for gene activation . All known ATP-dependent chromatin complex can be organized into SWI/SNF, ISWI, CHD, and INO80 families. Each family of ATPase has distinct remodeling activities, including incremental nucleosome sliding on DNA in cis; the creation of DNA loops on the surface of the nucleosome; eviction of histone H2A/H2B dimers; eviction of the histone octamer; or the exchange of histone octamer subunits within the nucleosome to change its composition .
Covalent histone-modifying complexes modify the histone including acetylation, methylation, and phosphorylation which can change the interaction between histone and DNA; for example, methylation of specific lysine residues in H3 and H4 causes further condensation of DNA around histones, making it hard to bind transcription factor or other proteins.
2.5. The statistical model for nucleosomes distribution
A typical nucleosome distribution around TSS is shown in Figure 7 . Nucleosomes are depleted around TSSs, resulting in a nucleosome-free region (NFR) that is flanked by two well-positioned nucleosomes whereas the nucleosomes downstream of the TSS are equally spaced in a nucleosome array. Of all nucleosomes around the gene, the +1 nucleosome often contains histone variants (H2A.Z and H3.3) and modification by acetyltransferases and methyltransferases. These may help to the nucleosome eviction when transcription is needed. The +2 nucleosome follows the +1 nucleosome immediately and shares the some properties but contains less H2A.Z and less methylation and acetylation. In a barrier model for nucleosome organization, the nucleosome distribution is largely a consequence of statistical packing principles. The genomic sequence specifies the location of the −1 and +1 nucleosomes. The +1 nucleosome forms a barrier against which nucleosomes are packed, resulting in uniform positioning, which decays at farther distances from the barrier .
2.6. The nucleosome determinant 
A variety of factors determine the location of nucleosomes including DNA sequence, nucleosome remodelers, transcription factors (TFs), and elongating Pol II (Figure 8). Each of these components has different contribution in nucleosome positioning. Interestingly, these components can affect each other thus resulting in different positioning pattern in a more complex way. The DNA sequence is critical for rotational positioning along the DNA helix, and it is also an important determinant for nucleosome occupancy. In particular, poly (dA:dT) and poly (dG:dC) tracts are intrinsically inhibitory to nucleosome formation, whereas non-homopolymeric GC-rich regions favor nucleosome formation.
3. The experiment methods of determining nucleosome occupancy and the bioinformatics analysis for the data
3.1. The techniques of determining nucleosomes positions
Micrococcal nuclease (MNase), one kind of glycolprotein of Staphylococcus aureus, has capacity of digesting the naked DNA. MNase, firstly, induces single-strand breaks, and then cleaves the complementary strand near the first break [58, 59]. Nucleosomal DNA is protected by wrapping on histone octamer in digesting with MNase, thus being remained as DNA fragments after the digestion. Taking this advantage, a high throughput sequencing technique MNase-seq is developed to probe nucleosome positions in a genome-wide manner. MNase cleavage favors AT-rich region in limiting enzyme concentrations.
DNase I, one kind of endonuclease, can cut the chromatin-accessible DNA, namely DNase I hypersensitive sites (DHSs), and thus is used in mapping opening chromatin regions (Figure 9) . The opening chromatin region is mainly the regulatory sites in gene transcription. Thus, the opening region may alter in different cells types. This can be reflected in DHSs. The change of DHSs often associates one or more nucleosomes loss or formation .
DNase-seq means the DNase I digestion followed by DNA sequencing . DNase-seq has been widely used in probing cell-specific chromatin accessibility. The rotational localization of individual nucleosomes is based on the inherent preference of DNA enzyme I cleavage of DNA at about 10 bp per nucleosome . By coupling bioinformatics analysis, DNase-seq can be used in studying TF occupancy at nucleotide resolution in a qualitative and quantitative manner . In DNase-seq, many cells and many sample preparations and enzyme titration steps are required .
ATAC-seq is an assay for transposase-accessible chromatin with high throughput sequencing . The technique is based on Tn5 transposase’s “cutting and pasting” function to probe the active regulatory regions . ATAC-seq only needs a small number of cells, ~500–50,000 unfixed nuclei. Moreover, its procedure only involves two steps. Therefore, it is able to study multiple aspects of chromatin architecture simultaneously at high resolution, including nucleosomes, chromatin accessibility .
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) sequences the interest DNA fragments that are separated and collected from the immunoprecipitation . The main area of ChIP-seq is in precisely mapping for transcription factor-binding sites (TFBSs). Figure 10 shows a general procedure of a ChIP experiment . This procedure includes the DNA-protein crosslinking with formaldehyde, sonication, immunoprecipitation, reversed crosslinking, and sequencing . Using antibody of the histones, such as histone H3, ChIP-seq is immediately able to determine nucleosome positions.
3.1.5. Other techniques
In addition to the techniques mentioned above, there are other techniques often used, such as Formaldehyde-assisted isolation of regulatory elements (FAIRE-seq) and ChIP-exo. FAIRE-seq is based on the differences in crosslinking efficiencies between DNA and nucleosomes or sequence-specific DNA-binding proteins. Sequencing provides information for regions of DNA that are not occupied by histones . ChIP-exo employs the use of exonucleases to degrade strands of the protein-bound DNA in the 5′–3′ direction to within a small number of nucleotides of the protein binding site . The nucleotides of the exonuclease-treated ends are determined using DNA sequencing.
3.2. Procedures of dealing with the nucleosome DNA sequenced dataset
At the present, nucleosome sequencing dataset are mainly from MNase-seq. In some studies, dataset from ATAC-seq, DNase-seq, and ChIP-seq are used to infer nucleosome positions. A general analysis workflow includes data quality control, mapping, making nucleosome profile, determining nucleosome position, comparing between cell types, and associating with other omics-data (expression data) to find biological meanings.
3.2.1. Data management and genome alignment
Sequencing quality control (QC) is to check the reads quality (fraction of mapped reads) and depth of coverage. Tools BWA and Bowtie are widely used in reads alignments. During the alignment process, multiple-mapping reads and duplication reads are often filtered so as to remove overrepresented regions of the genome due to technical bias . Reads filtering can be performed with SAMtools or Picard tools.
3.2.2. Data visualization
Data visualization helps to observe the reads distribution at specific locus. The Integrative Genomics Viewer (IGV) , which is developed by the University of California Santa Cruz (UCSC), is one of the most powerful tools to visualize. In IGV, the multiple types of annotation data are integrated, including gene information, epigenetic and expression data, single-nucleotide polymorphisms (SNPs), repeat elements and functional information from the ENCODE, and other research projects. IGV accepts many types of data formation including BED, BedGraph, GFF, WIG, and BAM files, which allow to compare with publicly data.
3.2.3. Identification of enriched regions
With respect to nucleosomes sequencing data, there are two basic tasks in analysis. One is to calculate the nucleosome profile (reads coverage) both along the genomic coordinate and near the regulatory sites (for instance the TSSs). This helps to directly check the quality of MNase digestion and DNA sequencing. The other task is to infer the precise nucleosomes positions (dyad position) using the nucleosome profile so as to identify the nucleosome alteration among different cell types.
220.127.116.11. MNase-seq data
For single-end MNase-seq data, one method to make nucleosome profile is as follows . First, the length of each read was extended 73 bp in the 3′ direction, and the Watson-strand reads and Crick-strand reads were oppositely shifted 73 bp. The absolute nucleosome occupancy value of each genomic site was expressed as the number of reads covering the genomic sites. Second, nucleosome occupancy was scaled by dividing the occupancy value by the average nucleosome occupancy of the whole genome; i.e., the nucleosome occupancy was expressed as the fold change of the absolute occupancy relative to the average occupancy. Reads can also be shifted 73 bp toward the 3′ direction, which represented the midpoint .
With paired-end sequencing, it is assumed that the nucleosome midpoint is consistent with the midpoint of the forward and reverse reads. Unless the reads are from the on type cell (single cell), nucleosome positions actually represent the average positions in cell population. Therefore, the overlapping reads have to be clustered over genomic regions .
Calling nucleosomes actually is to find the peak positions along the nucleosome profile. DANPOS is one tool that can identify nucleosome positions . Also, it allows us to detect three categories of nucleosome dynamics, such as position shift, fuzziness change, and occupancy change, using a uniform statistical framework using MNase-seq datasets. Other tools can be found in Table 2.
18.104.22.168. DNase-seq data, ATAC-seq, and ChIP-seq
From the DNase/ATAC/ChIP-seq datasets, nucleosome position cannot be directly inferred, but they provide information about the opening chromatin and protein-binding regions, which associate nucleosome depletion. Therefore, for these datasets, peaking calling is one central task. MACS identifies genome-wide locations of transcription/chromatin factor binding or histone modification, including removing redundant reads, adjusting read position, calculating peak enrichment, and estimating the empirical false discovery rate (FDR) (
3.3. GC-content and cutting bias
GC content bias means the variability between the GC content in a region and the count of fragments/reads mapped to it. The bias can dominate the signal of interest for analyses and leads to false positive. More seriously, the bias tends to be different among the samples; thus, there is no general method to remove it . Two facts associate the variability. One is GC content which is heterogeneous among the genome. In yeast, the open reading frames (ORFs) with similar GC contents at silent codon positions are significantly clustered on chromosomes . Moreover, GC content varies along the genome and is often correlated with functionality. The other is MNase that has a cutting bias. Kinetic analysis indicates that the rate of cleavage is 30 times greater at the 5′ side of A or T than at G or C .
Most current correction methods follow a common path. Both fragment counts and GC counts are binned to a bin-size of choice . Then, the conditional mean fragment count per GC value is modeled by assuming smoothness. At last, a predicted count is estimated for each bin based on the bin’s GC. The predictions represent one normalization for the original signal .
4. The transcription regulation and nucleosome positioning
4.1. The +1 and −1 nucleosomes and nucleosome-free regions (NFRs) near transcription start sites (TSSs)
Nucleosome positioning is in gene regulation since the DNA packing on the surface of the histone octamer can occlude the binding sites of transcription factors (TFs) on genomic DNA. That is to say, the nucleosome positioning at promoters negatively regulates gene transcription by preventing TFs binding. Typically, nucleosomes are depleted around transcription start sites (TSSs), resulting a nucleosome-free region (NFR) that is flanked by two well-positioned nucleosomes (+1 and −1 nucleosomes). In downstream of the TSS, nucleosomes are equally spaced as a nucleosome array. At 3′ direction gene (transcription termination sites (TTS)), there is also a NFR, called 3′ NFR. Additionally, the poly (dA:dT) sequences are found in the 5′ and 3′ NFRs, where they act as nucleosome-excluding sequence. The characteristics of nucleosome organization are found in multiple species, including yeast, worms, flies and humans. In such an organization, the NFRs are often the TFs binding regions. Transcriptional activation involves several steps in yeast . Firstly, special chemical modifications (acetylation and methylation (H3K4me3)) occur on histones of the −1 and +1 nucleosomes (Figure 12). The acetylation marks can be recognized by bromodomain modules of the SAGA histone acetyltransferase complex and Bdf1. SAGA and TFIID then deliver TBP to promoters. Then, the pre-initiation complex (PIC) is mounted.
It is suggested that NFRs at promoters result from a competition between TF and nucleosome binding because that incorporating competition with TFs improves the prediction performance for nucleosome positioning, particularly in promoter regions . Moreover, the mechanism is not restricted to a few promoters, but is the typical configuration along the genomes. Interestingly, it was reported that of the 158 yeast TFs, only 10–20 significantly contribute to inducing NFRs, and these TFs are highly enriched for having direct interactions with chromatin remodelers .
Therefore, theoretically, nucleosome level at promoters should negatively associate gene expression level. As expected, for the acid phosphatase inducible PHO5 gene, a significant cell-to-cell variation was found in nucleosome positions and the nucleosome shift correlates with changes of gene expression (Figure 13) . However, nucleosome positioning is not absolute, and even with major shifts in gene expression, some cells fail to change nucleosome configuration. We found in human CD4+ T cells, a wider NFR at promoters of housekeeping genes and highly expressed genes .
4.2. The difference of nucleosome organization among species
The current studies suggest that almost all eukaryotic organisms hold the nucleosome organization characteristics at the 5′ end of gene, namely a NFR flanked by two (+1 and −1) well-positioned nucleosomes and followed by an array of nucleosomes downstream of TSSs [37, 70, 79]. But compared with multicellular organisms fly, worm, and human, yeast is very simple. Nucleosome organization exhibits some differences. First, averagely, yeast has a short linker DNA. The linker DNA is 18 bp in S. cerevisiae, ~28 bp in Drosophila melanogaster and Caenorhabditis elegans, and ~38 bp in human . Second, the dyad position of the +1 nucleosome relative to TSS appears to vary in different organisms. In yeast, the dyad of the nucleosome is at ~50–60 bp downstream of the TSS . However, in Drosophila, the dyad is found at 135 bp downstream of the TSS, reflecting the differences in transcriptional regulatory mechanisms. Third, the 10–11 bp periodicities of the specific dinucleotides (such as AA/TA/TT-GC, WW-SS) pronounce stronger in yeast than in other multicellular organism. In other words, from single-cell organism to multicellular organism, genomic DNA needs to bear more of “encoding information” to meet a more complex regulation requirement. In genomic DNA of multicellular organism, more of TF binding sites are embedded, which will disturb the coding for other information, such as the coding for nucleosome positioning. Forth, in multicellular organism, there exits exons and introns, thus having a splicing process in transcription. It was found that nucleosomes are also well-positioned at both ends of the exon in multicellular . But yeast lacks the feature since its genomic DNA does not include introns. Moreover, upon stress or mutation, nucleosome dynamics frequently occurs at promoters in yeast cells . But in human cells, the nucleosomes alter mainly at enhancers [82, 83].
5. Nucleosome alteration (dynamics) during stress and histone mutation
5.1. Nucleosome alteration upon mutating at modifiable histone residues
Histones are the fundamental element of nucleosomes, and histone mutation do have direct influence on the genome-wide nucleosome organization.
Mutations in histone H3 N-terminal can affect the binding of Chd1, RSC, and SWI/SNF on chromatin, thus having a role in repositioning nucleosomes . Using a native gel electrophoresis experiment, we can quantitively track the loss of nucleosome in different histone mutations. As for influence on RSC repositioning, mutations of H3 R42A and R49A rank the first, both raise the original rate in wild-type nucleosomes up to 2.1-fold. H3 I51A mutations has the least effect on products of RSC directed remodeling (Figure 14), indicating that H3 I51A is capable of suppressing the nucleosome-unraveling function of RSC.
SWI/SNF-independent (Sin) mutants have various effects on nucleosome alteration. Class I Sin mutants like H4 R45 has the greatest effect; they may completely evanish certain protein-DNA interactions. Influence of class II mutants is relatively mild, which just do little modification on solvent structure as well as the histone octamer main chain conformation, and class III mutants merely weaken the interactions between octamer and DNA. The changes of protein-NDA interactions lead to an increment of nucleosome-sliding rates.
Histone depletion also has influences on nucleosome. For instance, +1 nucleosomes will notably shift away from the TSS (transcription start sites) when conducting histone H4 depletion in nucleosomes, and +2, +3, and +4 nucleosomes also showed different levels of movement away from the TSS . This was first founded in the study of Harm van Bakel et al., with an excellent idea of researching nucleosome reposition under promoter-closing condition (Figure 15).
H3 depletion also causes changes of nucleosome occupancy in genome-wide manner . Depleting HHT1 and using GAL1 promoter to control HHT2 (HHT1 and HHT2 are H3 coding genes) as HHT2 gene nearly does not express when strains grow in dextrose but not in galactose, histone H3 completely disappeared in S. cerevisiae (Figure 16). In this way, Andrea J. got four strains with different types and carbon sources. In strains with the histone H3 deleted (3 hours), severe changes in nucleosome organization were observed from normal histone levels strain (3 hours) while two types of strains are quite similar in the start (Figure 16B). Upon H3 depletion, weakness appears over the whole nucleosomes. An overall view of whole-genome correlation between nucleosome occupancy profiles of normal wild type and H3 depletion strains exhibit an expected decrease compared with ones of H3 shutoff strains grown in galactose (e.g., H3 not deleted) and normal wild type (Figure 16). More clearly, there is an evident nucleosome positioning decrement along with the movement from +1 and +2 nucleosomes to the gene’s transcription termination site (TTS).
5.2. Nucleosome alteration upon heat shock for yeast
Several kinds of changes on carbon source for yeast can alter nucleosome positioning directly or indirectly. After a heat shock, nucleosome occupancy usually becomes higher at promoters that are repressed and the condition is on the contrary at activated ones . A negative correlation is suggested between nucleosome occupancy and transcription levels caused by heat shock. PAPAS is a long non-coding RNA (lncRNA) and was tested carrying out help in the repression of Pol I transcription as it is upregulated by heat shock . CHD4/NuRD is the remodeling complex that could prevent transcription in a way of accessing nucleosomes which should have bound around promoters onto the transcriptional off position. An examination for nucleosome positioning in normal and heat-shocked cells indicated that heat shock led to a promoter-bound nucleosome movement award downstream position via promoting PAPAS expression which could induce recruitment of CHD4/NuRD to rDNA .
5.3. Nucleosome alteration upon changing carbon source for yeast
Several kinds of changes on carbon source for yeast can alter nucleosome positioning directly or indirectly. Gal4, a transcriptional activator discovered in S. cerevisiae has been intensively studied. Two genes, GAL1 and GAL10 are both regulated by Gal4 (Figure 17) . It was found that GAL1 promoter nucleosomes became absent from cells grown for many generations in galactose. But by ChIP experiments, Gal4 is found always present both before and after the nutrition shift. In fact, the follow-up Gal80-absence comparison revealed that galactose could remove Gal80 from nucleosomes, an inhibitor of Gal4. Then the recruiting function of freed Gal4 is quickly motivated, leading SWI/SWF binding to the genes. And this always goes with promoter nucleosomes removal as another two ChIP experiments shows.
Besides, the influence of glucose on nucleosome reassembly was affected by the presence of galactose . The transcription factor Msn2, which is recognized with stress-response feature, not only participates in quite a number of environmental stress response as a mediator but also proactively functions in the restructure activities of nucleosome-depleted region (NDR) during transcriptional reprogramming . Msn2 usually binds to small parts of stress response elements (STREs) and a glucose-to-glycerol downshift could apparently promote Msn2 occupancy near STREs (Figure 18). Moreover, the nutrition downshift-stress also enables Msn2 to promote the nucleosome repositioning over promoters of genes. It is concluded that Msn2 has a main function of removing the nucleosomes-binding to promoter regions during gene activation and acts negative role in these regions when genes expression is in low level.
5.4. Nucleosome alterations caused by mutations at modifiable histone residues inS. cerevisiae
Histone proteins can be modified by chemical modifications on particular residues. We examined the effect of substituting modifiable residues of four core histones with the non-modifiable residue alanine on nucleosome dynamics . We mapped the genome-wide nucleosomes in 22 histone mutants of S. cerevisiae and compared the nucleosome alterations relative to the wild-type strain. The results indicated that different types of histone mutation resulted in different phenotypes and a distinct reorganization of nucleosomes. Nucleosome occupancy was altered at telomeres, but not at centromeres. The first nucleosomes upstream (−1) and downstream (+1) of the TSS were more dynamic than other nucleosomes (Figure 19). Mutations in histones affected the nucleosome array downstream of the TSS. Highly expressed genes, such as ribosome genes and genes involved in glycolysis, showed increased nucleosome occupancy in many types of histone mutant. In particular, the H3K56A mutant exhibited a high percentage of dynamic genomic regions, decreased nucleosome occupancy at telomeres, increased occupancy at the +1 and −1 nucleosomes, and a slow growth phenotype under stress conditions. Our findings provide insight into the influence of histone mutations on nucleosome dynamics.
6. Htz1 dynamics on chromatin and its effect on nucleosome stability
6.1. Htz1 and nucleosome, Htz1 and transcription
Yeast histone H2A variant Htz1, which is called H2A.Z in mammalian, plays important roles in DNA transactions. Zhang et al. gave a detailed study for the genome-wide dynamics of Htz1 . Firstly, Htz1 occupancy is highly reproducible (r ≥ 0.94). Secondly, Bdf1 (a component of Swr1 complex), Gcn5 (a histone acetyltransferase) and histone acetylation all play a part in Htz1 occupancy, as well as Swr1. At several specific locations, Swr1 complex is indispensable to meet the requirements for Htz1 deposition. There are obvious correlations between Htz1 and some histone acetylation, implying Htz1 occupies genes in their repressed/basal states, and Htz1 occupancy was reduced in strains with little Gcn5 or Bdf1. Thirdly, Htz1 shows much greater preference than the poor performance of H2A in occupying promoters. Htz1 occupancy is negatively correlated to the presence of a TATA box, suggesting that the occupancy prefers TATA-less promoters. Fourthly, gene activation associates Htz1 loss from promoters.
Zhang et al. presented a model to explain the mechanism that how Htz1 works to regulate transcription (Figure 20) . In the particular repressed/basal genes, a nucleosome with Htz1 occupies the promoters and tends to TATA-less regions. Bdf1, a component of SWR1 complex, could promote the process and helps targeting. Loss of Bdf1 could confer a decrement of Htz1 occupancy. SWR1 complex is necessary for deposition as its recruitment involves physical interactions between SWR1 and DNA sequence-specific transcriptional regulators, physical interactions between SWR1 and promoter binding initiation factors and finding histone modification via Bdf1 or other SWR1 components. Gcn5 does not just help target deposition, but also acetylates H3K14 and other residues which may well be the primary reason for the association between Htz1 occupancy and histone acetylation. But Htz1 takes no particular role in making favor of repressing genes, even though it is observed in a high frequency during repression. In fact, Htz1 keeps a balance presence between the repressed and basal states for full activation. When genes states transit from basal to active, chromatin remodeling factors take in action and activators bind to the enhancer. All of the above likely contribute to the Htz1 nucleosome replacement, which promotes activation via giving way to occupancy of certain transcription factors (Figure 20).
Martins-Taylor et al. studied Htz1 in a new aspect and revealed that there were some relationship between Htz1 and the cell-cycle progression requirement of establishing transcriptional silencing . Htz1 appeared to work in a direct way to restrict the spread of silent chromatin from the telomere, and the deletion of genes coding Htz1 could make the establishment of silent chromatin independent from cell-cycle progression.
6.2. Nucleosome, Pol II, Chz1, Htz1, and Spt16
Pol II (RNA polymerase II) promotes the transcription of DNA and is positively associated with the transcription rate. At the beginning of transcription initiation, Pol II are assembled with general transcription factors (GTFs) to make up the pre-initiation complex, binding onto the promoter to initiate transcription . The Htz1 generally occupies the Pol II promoters and affects the combination of GTFs with Pol II, thus inhibiting transcription [91, 94]. Chz1 is an H2B-specific chaperone that delivers Htz1 for H2A substitution . The transcriptional elongation factor FACT is an indispensable component in achieving the process of eliminating the nucleosome block in transcriptional elongation . In yeast cells, Spt16 and Pob3 are the counterparts of FACT. Spt16 destroys the nucleosomes before the running of Pol II complex and reconstructs them after the running. Also, Spt16 has a role of chaperone.
We revealed that Spt16 and Pol II interact with each other and together affect or be affected by gene transcription as they both bind at exposed gene regions, and are positively correlated with the transcription rate (Figure 21) . Importantly, Spt16 prefers genes without Htz1 only when Chz1 exists. This discrimination may not be caused for that there are direct interaction mechanism, but is probably to meet the need of transcription initiation. It is found that Chz1 deletion prevents Htz1 occupancy at promoters and telomeres in previous study. Also, in the chz1-deletionmutant, Spt16 binding at ribosomal genes was lost, suggesting that Chz1 is prior in Htz1-bound genes and thus Spt16 has no more binding chances.
The writing work was supported by the National Natural Science Foundation of China (No. 31371339 and No. 81660471) and Natural Science Foundation of Xinjiang Province of China (No. 2015211C057).