It is well-known that photosynthetic cells of various microalgae species display distinct fluorescent properties. The efficiency of self-fluorescence excitation and emission at different wavelengths depends on the structure of photosynthetic system and particularly on the structure of antenna complex of specific strains. The peculiar structure of blue-green algae light-harvesting complex allows to discriminate and classify known and new cells up to species/strain level by means of microscopic spectroscopy. In this chapter, a novel fluorescent spectroscopic technique for microalgae species discrimination will be presented. This method is based on a special data processing of a set of fluorescent spectra, obtained from a single photosynthetic cell of microalgae, particularly from cyanobacterial cells. According to the presented technique, single-cell self-fluorescence spectra are recorded by means of confocal laser scanning microscopy (CLSM), and data processing is conducted via linear discriminant analysis (LDA) and artificial neural networks (ANN).
- photosynthetic system
- biological diversity
- microscopic spectroscopy
- artificial neural networks
Cyanobacteria are the most ancient photosynthetic microorganisms on Earth. Nowadays, cyanobacteria are one of the most widespread organisms in nature, and the ecological aspect in their investigation is quite valuable. On the other hand, thousands of strains belonging to different species are cultivated in biolaboratories all over the world for different cyanobacterial biotechnological applications such as biofuel cells, food production, pharmaceuticals, fertilizers, etc. [1, 2, 3]. Thus the noninvasive spectroscopic methods are quite requisite for monitoring of physiological state of cyanobacterial cultures and natural communities.
It is well-known that the analysis of self-fluorescence of photosynthetic system is a powerful noninvasive tool for investigation of microalgae in vivo. It reports on the energy transfer and trapping and, thus, reflects the metabolic mechanisms in photosynthetic cells and their photosynthetic efficiency. The detected self-fluorescence finally reflects the diversity in morphological and physiological states of photosynthetic cells [4, 5, 6].
Self-fluorescence originates from excited states that were lost before photochemistry took place and usually represents a small fraction of the excited state decay in a functional photosynthetic complex. Nevertheless, this small fraction can be easily detected by confocal laser scanning microscopy (CLSM). With the confocal fluorescence microscopy, a very small excitation and detection areas can be investigated, so that single cells under non-damage conditions can be studied in vivo. Single-cell detection can provide the information on small peculiarities that is regularly buried in normal ensemble average experiments. This is thus a good way to study the time evolution process and spectroscopic properties of individual cells. Both steady-state and time-resolved fluorescence measurements can be used for probing the organization and functioning of photosynthetic systems by means of CLSM.
Till now the best taxonomic differentiation is still obtained using classical inverted microscopy. Unfortunately, this method is time-consuming, human based, and requires appropriate technical skills; this eliminates the possibility of its application for continuous online monitoring. Nearly single-cell flow cytometric analysis, based on light scattering by the cells and fluorescence of the chlorophylls and the phycobilins, can be easily automated, but it is appropriate only for unicellular species and is useless for numerous industrially cultured filamentous strains [7, 8]. The main problem of all chemical methods (e.g., high-performance liquid chromatography (HPLC) [9, 10]) is that during the chemical sample preparation, the most of the information about the peculiarities of individual species is lost and the residual part of the information is not enough for species/strain discrimination inside cyanobacterial genera and is suitable only for the rude differentiation of big classes of phytoplankton. Thus, the analysis of the in vivo fluorescence spectra is the only one noninvasive technique for obtaining qualitative information about the phytoplankton abundance and composition, which is continuously demonstrated by various publications [10, 11, 12, 13, 14, 15, 16, 17]. The relative phytoplankton abundance can be calculated once initial assumptions about the phytoplankton classes are presented and their pigment compositions have been made [7, 12, 13].
Maybe the first attempt to use phycoerythrins as chemotaxonomic markers was done by Glazer et al.  for red algae in 1982, but until now fluorescence spectra of phycobilins do not appear to be useful at familial, ordinal, and class levels in taxonomic studies. Although the investigation in  concerns only purified high-molecular-weight phycoerythrin from red algae, this work clearly demonstrates the possibility of the correct taxonomic analysis on the basis of phycobiliproteins structural differences, which can serve as intrinsical fingerprints for taxons and genera in phytoplankton diversity. Later, the correlation between the distribution of the biliproteins and the genera of Cryptophyceae was discussed in . In 1985, Yentsch and Phinney  proposed an ataxonomic technique that utilized the spectral fluorescence signatures of major ocean phytoplankton. Seppälä  used spectral fluorescence signals to detect changes in the phytoplankton community. In 2002, Beutler et al. reported a reduced model of the fluorescence from the cyanobacterial photosynthetic apparatus designed for the in situ detection of cyanobacteria and presented a commercially available diveable instrument for online monitoring of phytoplankton structure .
However, the correct classification of cyanobacterial species on the basis of their bulk fluorescence signature is hampered by alterations in pigment composition within one strain, which depends on the physiological state of the culture (community) and environmental conditions . On the other hand, several researchers show that the nutrient and light limitations do not significantly change the initial fluorescence spectra and cannot impede the species discrimination [17, 22].
Recent rapid development of confocal microscope functionality initiates new directions in subcellular biology research [23, 24]. Confocal laser scanning microscopes are distinguished by their high spatial and temporal resolution. Modern laser scanning microscopes are unique tools for visualizing cellular structures and analyzing dynamic processes inside single cells. One of the specific fields of CLSM application is the investigation of self-fluorescence of living cells. CLSM single-cell microscopic spectroscopy is undoubtedly the most powerful tool for in vivo investigation of physiological processes in photosynthetic organisms (cyanobacteria, algae, and higher plants). The investigation of self-fluorescence of single living cells reveals the relation between the physiological state and the operational activity of photosynthetic system. A lot of interesting static and dynamic effects can be studied by means of CLSM. The investigation of self-fluorescence gives the information about single-cell processes as well as about the collaboration in cell communities. Changes in spectral characteristics of living photosynthetic cells indicate changes in their physiological state and can be applied for the studies of the results of stress states and external actions [4, 5, 6]. Moreover, the diversity in single-cell self-fluorescence for different species and strains can serve the basis for ataxonomic discrimination of cyanobacterial genera.
In this chapter, a novel ataxonomic approach to differentiation of cyanobacterial cells based on the numerical analysis of in vivo single-cell fluorescence spectra, recorded by means of CLSM, is presented. The differentiation is conducted according to the structure and operation of their photosynthetic apparatus. An optimal set of the parameters is selected, which is sufficient for determination of the taxonomic position of cyanobacteria by means of mathematical statistics. On the basis of the linear discriminant analysis, the obtained spectroscopic data for 23 cyanobacterial strains from CALU collection were analyzed. It was shown that the presented technique allows an accurate differentiation of cyanobacteria up to the species/strain level and enables to distinguish automatically potentially harmful strains. All presented results were obtained using cyanobacterial strains from CALU collection of the Core Facility Center “Centre for Culture Collection of Microorganisms” of the Research Park of St. Petersburg State University as model objects for CLSM studies.
2. Materials and methods
2.1 Cyanobacterial strains and cultivation conditions
All work on preparing cyanobacteria cultures for this research was carried out at the Core Facility Center “Centre for Culture Collection of Microorganisms” of the Research Park of St. Petersburg State University. In the CALU collection  at the core facility center, cyanobacterial strains were maintained in semiliquid agar (0.8%) medium no. 6 after Gromov  in test tubes of volume 5–6 mL under cotton plugs. The strains were stored at 14°C under a constant illumination of 2000 lux and were recultivated with a periodicity of 2–3 months.
Cyanobacteria used in this investigation were grown on liquid medium no. 6. A stock culture was preliminarily prepared, for which it was cultivated in 30 mL of medium and incubated for 2 weeks at room temperature under continuous illumination from fluorescent lamps. To maintain a constant volume, 5 mL of medium were added to the stock culture every 2 weeks. All experiments in this study were conducted with cultures presumably in the logarithmic phase of their growth.
In this work, 23 cyanobacterial strains from CALU collection were used:
Anabaena variabilis Kutz. CALU 824, ponds near Old Petergof, Saint Petersburg, Russia.
Arthrospira sp. CALU 1712, Gulf of Finland, Saint Petersburg, Russia.
Geitlerinema sp. CALU 1315, Lake Kyzyl-Tash, Ozersk, Chelyabinsk, Russia.
Geitlerinema sp. CALU 1718 Lake Kamenetz, Pskov, Russia.
Leptolyngbya sp. CALU 1713, river Tikhaya, Saint Petersburg, Russia.
Leptolyngbya sp. CALU 1715, Gulf of Finland, Saint Petersburg, Russia.
Leptolyngbya CALU 1750 sp., Lake Tarasovskoe, Saint Petersburg, Russia.
Lyngbya sp. CALU 1804, Lake Valdai, Novgorod, Russia.
Merismopedia sp. CALU 666 punctata Meyen f., Pinar del Rio, Rio de Soroa, Cuba.
Microcystis firma sp. CALU 398 (Breb. et Lenorm) Schmidle, Turkmenbashi Canal, Turkmenistan.
Myxosarcina chroococcoides Geitl. sp. CALU 601, Russia.
Nostoc sp. CALU 1763, Lake Ladoga, Saint Petersburg, Russia.
Nostoc sp. CALU 1817, springs on the Island Big Solovetsky, White Sea, Russia.
Oscillatoria sp. CALU 1415, ponds in Vorkuta region, Russia.
Oscillatoria sp. CALU 1416, ponds in Vorkuta region, Russia
Phormidium favosum CALU 624, brook Ammersbek, Hamburg, Germany.
Plectonema sp. CALU 457, pond in Strelna, Saint Petersburg, Russia.
Pleurocapsa sp. CALU 1126, Lake Ladoga, Saint Petersburg, Russia.
Spirulina platensis sp. CALU 550 (Nordst.), Czech Republic.
Synechococcus sp. CALU 535, ponds near Old Petergof, Saint Petersburg, Russia.
Synechococcus sp. CALU 756, Czech Republic.
Synechococcus sp. CALU 1409, ponds in Vorkuta region, Russia.
Synechocystis sp. CALU 1336 aquatilis, Gulf of Finland, Saint Petersburg, Russia.
Fluorescent and corresponding transmission photomicrographs, obtained via CLSM, for several strains from CALU collection are presented in Figure 1. In further illustrations only the CALU numbers for corresponding strains will be used for the clarity of the narration.
2.2 Confocal laser scanning microscopy
Confocal laser scanning microscopes are distinguished by their high spatial and temporal resolution [23, 24]. Modern laser scanning microscopes are unique tools for visualizing cellular structures and analyzing dynamic processes inside single cells. They exceed classical light microscopes especially in their axial resolution, which enables to acquire optical sections (slices) of a specimen. Apart from simple imaging, confocal laser scanning microscopes are designed for the quantification and analysis of image-coded information. Among other things, they allow easy determination of fluorescence intensities, distances, areas, and their changes over time. New acquisition CLSM tools include the detection of quantitative properties of the emitted light such as spectral signatures and fluorescence lifetimes. The most impressive feature of modern CLSMs is their capability for single-cell microscopic spectroscopy, which allows to obtain spectroscopic information inside single cells and small regions.
In the present investigation, Leica TCS-SP5 was used for the investigation of living cyanobacterial cells. Fluorescence emission spectra of the intact cells were measured at eight excitation wavelengths corresponding to all available laser lines. The excitation wavelengths are 458, 476, 488, 496, and 514 nm are the lines of Ar laser, 405 nm is the line of diode UV laser, and 543 and 633 nm are the lines of HeNe laser. In all presented experiments, laser power settings are as follows: 29% of Ar laser power was reflected onto sample with acousto-optical tunable filter (AOTF), and further power percentage for its laser lines was 30% of 458 nm laser line and 10% for all other lines. 405 nm line of diode UV laser was reflected onto sample with 3%; HeNe laser lines 543 and 633 nm were reflected with 10 and 2%, respectively. An acousto-optical beam splitter (AOBS) was used to transmit sample fluorescence to the detector. Emission spectra between 520 and 785 nm were recorded using the lambda scan function of the “Leica Confocal Software” by sequentially acquiring a series (‘stack’) of 38–45 images, each with a 6 nm fluorescence detection bandwidth and with 6 nm wavelength step. For obtaining fluorescence-intensity information, images of 512 × 512 pixels were collected with a 63× Glycerol immersion lens (Glycerol 80% H2O) with a numeric aperture of 1.3 (objective HCX PL APO 63.0 × 1.30 GLYC 37°C UV) and with additional digital zoom factor 5–9 (depending on a cyanobacterial strain). One pixel corresponds to 53.5 × 53.5 nm. The photomultiplier (PMT) voltages were used in range from 900 to 1100 V. The fluorescence emission images were accompanied with the transmission images (in the parallel channel), collected by a transmission detector with the photomultiplier voltages ranged from 300 to 500 V. For better signal yield, lambda scans were performed with “low speed” setting (400 Hz) in bidirectional scan mode and with a pinhole setting of 1 Airy unit (the inner light circle of the diffraction pattern of a point light source corresponds to a diameter of 102.9 μm with the lens used (see ). Regions of interest (ROIs) representing single cells or subcellular regions were used to calculate fluorescence spectra.
For 2D imaging, to raise the sensitivity and contrast, images were recorded at 405 nm excitation wavelength (diod UV laser) and by Leica HyD hybrid detector, which strongly improves contrast in comparison to PMTs. HyD gain was taken as 100 V. The images of 1024 × 1024 and 2048 × 2048 pixels were collected with a 63× Glycerol immersion lens (Glycerol 80% H2O) with a numeric aperture of 1.3 (objective HCX PL APO 63.0 × 1.30 GLYC 37°C UV) and with additional digital zoom factor 10–35. The fluorescence emission images were accompanied with the transmission images (in the parallel channel). The images were recorded with a pinhole setting of 1 Airy unit.
In CLSM applications, the laser light density in the focus point is high. But, generally, it is deposited in short “dwell times” during the laser scanning process. Dwell time and the intervals between the illuminations may influence photodamage and saturation of photosynthetic apparatus of living cells. Thus, since most chromophores bleach under the high laser excitation energies, a bleach test should be performed . It was shown experimentally that especially phycoerythrin (PE) and phycocyanin (PC), as accessory pigments, were very sensitive to photobleaching, while the fluorescence of chlorophyll a (Chl a) and allophycocyanin (APC) remained stable in the intact living cells . During the detection the fluorescence of the main accessory pigments for each cyanobacterial strain should be controlled and the changes in their fluorescence should not exceed 10–20%. In this investigation the power of individual laser lines was chosen according to the photodamage they cause. The repeated spectra were obtained under selected excitation power at a fixed point in a cell to check whether the excitation would affect the cells. It was shown that at the above chosen excitation energies (laser line percentage) the fluorescence spectra did not vary within the experimental error during 10–15 records. When excitation energy was increased, both the height and the center of the bands varied enormously with time because of photodamage or structure breakdown in photosynthetic systems. In the experiments, where several laser lines were involved for the investigation, the first spectrum was recorded again at the end of each series to control the initial state of the cell. It should be pointed out that the whole procedure of fluorescence spectra recording, used in this study, was designed to minimize preparatory manipulation, so as to conduct a noninvasive investigation of small amounts of experimental material and to prevent any damage of living cells.
To exclude unpredictable variations in physiological state of investigated cultures, the fluorescence spectra were taken from the cells of one strain several times, at different days and for various developmental stages of the culture. And it was established that the variations in spectrum shape and intensity among cells of one strain are not considerable.
2.3 Data processing
2.3.1 Data preprocessing
The main difficulty of the considered discrimination problem resulted from the high nonuniformity of the initial data and different numbers of observation for different strains. A small size of initial dataset as well as the sophisticated nature of the experimental data required a complex preprocessing procedure. The original experimental data represents 307 sets of self-fluorescence spectra obtained from cyanobacterial cells, belonging to 23 different strains. Each observation from a data set is described by a series of seven spectra taken from a single cell by means of CLSM. Each initial spectrum is an array of 38–45 numbers, which correspond to the fluorescence intensities on specific emission frequencies of visible light in the range from 520 to 785 nm. In contrast to the previous investigations, which utilized for classification a full spectrum of the samples [12, 13, 14], we used a set of integral and statistical characteristics, describing the shape of each spectrum. To extract a set of classification parameters from initial data, a computer program has been developed in a mathematics system MATLAB. By means of this program, normalization, interpolation, extrapolation, and smoothing of the raw spectra were carried out, to eliminate the random noise and metering fluctuations. All spectra were reduced to the same scale and size of data array, the first derivative was taken over initial spectra, and the fast Fourier transform (FFT) was performed, to exclude random noise, owing to the low intensity of the exciting and emitting light. The specific values characterizing the shape of obtained curves and the spectral composition of their derivatives were calculated. All selected classification parameters can be divided into three groups: asymmetry and excess, fluorescence emission percentage for individual pigments in four main spectral regions (phycoerythrin, 573–586 nm; phycocyanin and allophycocyanin, 649–661 nm; chlorophyll a PSII, 674–689 nm; chlorophyll a PSII, 711–727 nm), and the frequency characteristics of the corresponding first-derivative Fourier transforms for each plot (mean values in three specified frequency domains: 43–58 μm-1, 95–110 μm-1, 123–135 μm-1). The detailed description of the extraction of classification parameters is given in Zhangirov et al. .
2.3.2 Linear discriminant analysis
Linear discriminant analysis (LDA) is well-known and often applied in biology for various classification problems [15, 17, 29, 30]. Linear discriminant analysis (LDA) is a statistical technique for classifying samples into two or more groups (classes) [31, 32]. It utilizes linear combinations of independent variables to form a basis for a classification scheme. In our case, the independent variables are 63 classification parameters extracted from each set of single-cell self-fluorescent spectra.
LDA builds n linear discriminant functions, where n is a number of classes and a row vector with a number of parameters describes each observation. The decision of the sample belonging to the class is based on the selecting of the maximal discriminant function for the sample row vector. Discriminant analysis has two very useful applications. First, it identifies a set of classification parameters that are needed to discriminate between known groups, that is, sets of classification parameters can be identified that are necessary to discriminate between known cyanobacteria strains. Second, the analysis can be used to classify an unknown sample (within a certain probability) into a known group of species or strains. The high classification accuracy of LDA is due to the fact that it works with distribution functions for classification parameters and their statistical characteristics, which allows to build better classification model. However, LDA has strong restrictions on the presence of correlations between classification parameters.
In addition, LDA allows to reduce dimension of the feature space. This so-called linear Fisher discriminant analysis (LFDA) is a data classification method, which classifies the samples by dividing them into groups. The boundaries of these groups are determined by threshold coordinate values. The goal of this method is to find the informative projections by maximizing the function constructed of the projective matrix, the between-class scatter matrix, and the within-class scatter matrix. In this procedure, the first largest component (canonical discriminant function) is the maximal, and the classifications are performed using the three-dimensional space defined by the three largest components. The selection of the best classification parameters is based on the criterion that the dissimilarity between classification parameters of different species/strains should be greater than between those of the same group. Actually, LFDA bases on a solution of eigenvalue problem. The eigenvectors with the first highest eigenvalues are used to construct a lower dimensional space, while the other dimensions are neglected.
Also a stepwise discriminant analysis (SDA) was used in this investigation at the stage of selection of the most valuable classification parameters to determine which parameters discriminate better between the specified groups of observations. Standardized coefficients for each variable in each discriminated function represent the contribution of the respective parameter to the discrimination between groups.
The calculations were performed in MATLAB software using custom-built programs .
2.3.3 Artificial neural network
Artificial neural networks (ANNs) are currently being used in a variety of applications with great success [8, 34, 35, 36]. In contrast with conventional programs for data analysis, neural networks follow an adaptive approach. They are flexible and eminently suited for application to complex data structures that are not apt for other data analysis methods like cluster analysis or principal component analysis. Their first main advantage is that they do not require a user-specified problem solving algorithm (as is the case with classic programming), but instead they “learn” from examples, much like human beings. Their second main advantage is that they possess an inherent generalization ability. This means that they can identify and respond to patterns that are similar but not identical to the ones on which they have been trained.
ANN can be described as a mathematical model of a specific structure, consisting of a number of the single processing elements (called artificial neurons), arranged in interconnected layers. An active neuron multiplies each input vector by its weight, sums the products, and passes the sum through a transfer function to produce the output . The ANN is made up of a group of interconnected artificial neurons, belonging to different layers, while inside one layer neurons are independent. ANN consists of an input, hidden, and output layers. Each neuron transforms input and sends outputs to other neurons to which it is connected.
There are many different types and architectures of neural networks varying fundamentally. In this paper a feed-forward ANN (FFANN) is used for solving considered classification problem [34, 37]. Figure 2 illustrates the model of the ANN used in this work. Due to the simplicity of the classification problem to be solved, a multilayer feed-forward neural network (NN) with one hidden layer was considered. As an activation function, a hyperbolic tangent was used at the hidden layer and Softmax function at the output layer, which allows interpreting the output layer as the distribution of probabilities of belonging to each of the classes. On both layers a bias neuron with a signal equal to unity is added. The size of the input layer () depends on the number of classification parameters. The number of neurons on the output layer was fixed and equal to the number of classes (). The number of neurons on hidden layer was estimated by the following equation .
Learning in ANNs is accomplished through special training algorithms developed based on learning rules presumed to mimic the learning mechanisms of biological systems. According to supervised learning, the network is trained with a dataset of observations and optimized basing on its ability to predict a set of known outcomes. The deviation of the network solution from the target (true) value is computed, and the calculation of the error is propagated backward from the output layer to adjust the connection weights. Since in our case the activation function at the output layer was determined as Softmax, the loss function was calculated via cross-entropy method. A lot of special training algorithms were developed according to learning rules. In this investigation the method of adaptive moment estimation (Adam) was chosen for further calculations .
In the training phase, a sample set of classification parameters and the known solution (the strain number of the corresponding cell) are forced iteratively upon the network. The neuron’s weights (ANN parameters) are adjusted in small steps until the network has learned the training examples. In the experiments described in this study, the training procedure has 500 iterations (epochs). After training, the network is tested. In this test phase, the characteristics of a number of cyanobacterial cells with known identities are fed to the network, and the solutions are compared with these known identities. In this study, after training the network was capable of recognizing about 96% of cyanobacterial cells in the test set. The analysis of generalization quality of ANN is identical to the test procedure; only the identities of the cells are not known beforehand.
The ratio of training sample to the test sampling in this investigation was taken 70:30%. Other parameters of the selected training algorithm were as follows: acceptable error threshold is 0.01, the bandwidth parameter (size of error control window) is 20, the moment parameter is 0.1, and the regularization parameter is 0.001. The selected learning rate was chosen 0.01, and the number of training epochs lays in the range from 300 to 800.
The main criteria for assessing the quality of ANN operation is the value of classification accuracy. There are several approaches to evaluate the accuracy of classification. In the considered case, the classification accuracy is calculated for each class separately, as the ratio of the number of correctly classified class observations to the total number of observations in a given class. Then the average classification accuracy for all classes was obtained. In such case it is possible to build a matrix of errors with size N × N (N—number of classes) and present the results in a bar chart, on which a classification accuracy for each class can be visualized (see Figure 8 in “Ataxonomic differentiation of cyanobacterial strains on the base of single-cell fluorescence spectra”).
On the base of the classification accuracy analysis, it is possible to evaluate the quality of ANN training as well as the quality of internal and external generalization. In our case, the evaluation of the quality of external generalization was obtained on the base a priori knowledge about new species, which was taken from an expert. To validate the correctness of the neural network operation, the results of the NN classification were compared with the results of the LDA.
The ANN architecture presented in this paper, as well as the learning algorithm and its parameters were determined during the study of various configurations. The selected model after training consistently gives a classification accuracy of at least 95%. In this study, ANN was simulated using MATLAB software .
3. Light-harvesting system of cyanobacteria
In cyanobacteria, the antenna complexes for photosystem II (PS II) and to some extent for photosystem I (PS I) are extrinsic and formed as large multiprotein organelles, which are located on the stromal side of the thylakoid membranes. These supramolecular pigment-protein complexes, so-called phycobilisomes (PBSs), first described by Gantt , are the main light-harvesting antennae in cyanobacteria.
Phycobilisomes are primarily composed of phycobiliproteins, a colored family of water-soluble proteins. Their chemical and spectroscopic properties are determined by their structure and function that they perform in the photosynthetic process. The three classes of phycobiliproteins are allophycocyanin (APC), phycocyanin (PC), and phycoerythrin (PE). However, in some cyanobacteria phycoerythrin can be replaced by phycoerythrocyanin (PEC), or both pigments can be lacking; phycocyanin and allophycocyanin are constitutively present in all cyanobacteria. Actually, there are very slight species differences between detached phycobiliproteins, even between prokaryotic cyanobacteria and eukaryotic red algae .
Usually PBSs are assembled from 12 to 18 different types of polypeptides, which may be grouped into three classes: (1) phycobiliproteins, (2) linker polypeptides, and (3) PBS-associated proteins. The amino acid sequences of all components constituting the phycobilisomes of some cyanobacterial strains have been determined, and analysis of these data has revealed phylogenetic relationships .
The polypeptide composition of PBS varies widely among strains of cyanobacteria. It should be noted that the degree of PBS compositional variability, which reflects the ability of an organism to adapt to environmental changes, varies from strain to strain. Moreover, for a single strain it sometimes depends upon the environmental conditions such as nutrient availability, temperature, light quality, and light intensity.
It is well-known that total biliprotein content of cyanobacterial cells is inversely related to the quality and quantity of irradiance. A comprehensive review given in Refs. [41, 42, 43, 44, 45, 46] details the various degrees of such chromatic adaptation. However, for cyanobacteria cultured under white light of reasonable intensity and in the medium with habitual nutrient composition, no chromatic adaptation can occur, and PBS structure remains invariable within each strain. Thus the unique spectroscopic properties of different cyanobacterial strains, while analyzing in vivo, may become promising fingerprints for practical and laboratory applications.
Phycobilisomes are constructed from two main structural elements: a core substructure and peripheral rods that are arranged in a hemidiscoidal fashion around that core (Figure 3). Each core cylinder is made up of four disc-shaped phycobiliprotein trimers, allophycocyanin (APC), allophycocyanin B (APC-B), and APC core-membrane linker complex (APC-LCM). By the core-membrane linkers, PBSs are attached on thylakoids and structurally coupled with PSII. The peripheral cylindrical rods (six or eight) radiate from the lateral surfaces of the core substructure and are usually not in contact with the thylakoid membrane. The rods are made up of hexamers, disc-shaped phycobiliproteins (PE, PEC, and PC), and corresponding rod linker polypeptides [41, 42, 43, 44]. Most linker polypeptides are colorless proteins, but some also contain phycobilin chromophores, endowing them with the ability to harvest light as well as aid in the assembly of the phycobilisomes . For more details about phycobilisome structure, see [18, 39, 47].
The phycobilisome is attached to the membrane by multiple weak charge-charge interactions, either with proteins or with lipid head-groups. Binding is rather unstable. The core-membrane linker polypeptide provides a flexible surface, allowing interaction with a range of structurally distinct membrane complexes, including photosystem II (PSII) and photosystem I (PSI) (see Figure 4). The stability of each interaction may be modulated by covalent modification and/or the presence of accessory subunits.
Recently, it was established that phycobilisomes diffuse rapidly on the surface of the thylakoid membrane, while PS II reaction centers are normally almost immobile. Fluorescence recovery after photobleaching (FRAP) has been used to measure the mobility of phycobilisomes in the intact cyanobacterial cells [48, 49], and it was clearly demonstrated that a significant proportion of phycobilisome-absorbed energy is delivered to PS I as well as to PS II [45, 49, 50].
The high mobility of phycobilisomes along the thylakoid membrane gives the opportunity of the occasional direct interaction of phycobilisome rods or core with PS I (Figure 4). Two ways that energy could be transferred from phycobilisomes to photosystem I are shown in Figure 4; “Spillover” from photosystem II with an attached phycobilisome (supposed by Su et al. ) (Figure 4a, left photosynthetic complex) and direct association of the phycobilisome core with photosystem I (Figure 4a, right photosynthetic complex).
Another possible variant of the interaction between phycobilisome and reaction centers of two photosystems was proposed by Gantt in the Chapter 6.3 of the book . The author assumed that the special close arrangement of both photosystems around the base of the phycobilisome provides the partial transfer of the absorbed energy to PSII and PSI simultaneously (Figure 4b).
4. Fluorescence spectra of intact cyanobacterial cells
The intrinsic fluorescence of photosynthetic organisms originates from excited states that were trapped by light-harvesting system and lost before photochemistry took place. Photoexcitation energy absorbed at the outer surface of phycobilisomes is transported sequentially through several rod chromoproteins to an inner core and then to core-membrane linker (the terminal pigment) that acts as the final energy transmitters from the phycobilisome to Chl a heterodimers of two photosystems (PSII and PSI), incorporated in the thylakoid membrane. This excitation transfer is recognized as due to the Förster dipole-dipole interaction with an extremely high efficiency, near unity.
The more distal parts of the antenna system, a peripheral antenna complex (phycobilisome), maximally absorb photons at shorter wavelengths (higher energies) than do the pigments in the antenna complexes that are proximal to the reaction center. Subsequent energy transfer processes are from these high-energy pigments physically distant from the reaction center to low-energy pigments that are physically closer to the reaction center (Figure 5). With each transfer, a small amount of energy is lost as heat, and the excitation is moved closer to the reaction center, where the energy is stored by photochemistry. Note that the probability of excitation energy escape from the trap in the form of fluorescence at all transfer steps is non-zero and depends on the intensity and wavelength of the excitation light.
During the energy transfer process, the occasional quenching of the absorbed light by fluorescence can occur, and this becomes the essential property for fluorescent spectroscopy. It usually represents a small fraction of the excited states and diminishes in a functioning photosynthetic complex. Nevertheless, the fluorescence is an extremely informative quantity, because it reports on the energy transfer and trapping. Both steady-state and time-resolved fluorescence measurements are widely used methods for probing the organization and functional state of photosynthetic systems.
The fluorescence of intact living cyanobacterial cells is originated from the inefficiency of the energy transfer between all components of the energy transfer chain including the final step, the delivery to PSII or PSI (Figure 5a). Due to the occasional quenching by fluorescence each transfer step result in peak or shoulder on the corresponding spectrum (Figure 5b). This is due to the fact that when phycobilisomes are bound to the thylakoid membrane, most of the energy from phycobilisome is channeled to chlorophylls in the thylakoid membrane and thus did not shade the fluorescence of the previous steps in energy transfer chain. In the course of the energy transfer from the initially photoexcited phycobiliprotein to the reaction center of photosystems PSI and PSII, fluorescence is emitted from almost every type of pigment and can be used as a probe to examine the mechanism of energy transfer within the light-harvesting system [43, 44, 52].
A convenient way to monitor this energy transfer process is to irradiate a sample with light that is selectively absorbed by one set of pigments and then monitor fluorescence that originates from a different set of pigments. Obviously, if the energy transfer is taken place between pigments, the light absorbed by one set of pigments is emitted by another set differently, depending on the excitation wavelength. This type of fluorescence excitation experiment can be used to measure quantitatively the efficiency of energy transfer from one set of pigments to another . Moreover, different species of cyanobacteria contain different accessory pigment proteins and specific linker proteins between them; therefore a set of fluorescence emission spectra excited by different wavelengths have its own unique shape for the cells of one strain and are quite distinguishable from other species and strains. Such sets of fluorescence emission spectra can be used for automatic differentiation of cyanobacterial species.
Figure 6 shows several characteristic sets of single-cell fluorescence spectra corresponding to Microcystis CALU 398, Merismopedia CALU 666, Leptolyngbya CALU 1715, and Phormidium CALU 624, obtained by confocal laser scanning microscope (CLSM) Leica TCS-SP5, which are placed near each set. Each spectrum in the set was obtained using different laser lines for excitation: 405, 458, 476, 488, 496, 514, 543, and 633 nm. Corresponding excitation wavelengths are given over each spectrum. All spectra are normalized to the maximum intensity and shifted along x-axis for convenience of observation. It can be easily noticed that laser line 458 nm excites mostly in vivo fluorescence of Chl a in both photosystems PSII and PSI around 682 and 715 nm, correspondingly, and the emission spectrum by cyanobacterial cells shows no appreciable emission of PC or APC. In cyanobacteria, the 458 nm excitation is preferentially absorbed by PSI that contains more Chl a than by PSII and is stoichiometrically more abundant than PSII. However, because reaction center of PSI turns over faster than the PSII, it has lower fluorescence intensity than the PSII antenna. This is indicated by PSI emission band at 715 nm which is much weaker than the PSII emission band at 682 nm. The excitation by intermediate (blue and green) wavelengths (405, 488, and 496 nm) reveals fluorescent maxima of all photosynthetic pigments, as the light in this range is absorbed by all pigment-protein complexes almost in equal portions and fluorescence emits by all steps of energy transfer chain (Figure 5). The direct excitation of cells in the PE absorption region at 514 and 543 nm results in emission spectrum with two main peaks at 580 and 656 nm, which are due to PE, PC, and APC emission, and for species that lack PE, the emission accumulates mostly near 656 nm. Two chlorophyll fluorescence components can be resolved for some species in a number of spectra. The spectra of the 633 nm excitation directly give a prominent emission band at 656 nm that originates from C-PC, omitting band at 580 nm, which cannot be excited by 633 nm, even for species that have PE (see Figure 6). Other small emission bands, corresponding to fine pigment structure of antenna complex, are not resolved at the room temperature.
These in vivo fluorescence emission spectra reflect the structure of light-harvesting complex of corresponding species and correct or incorrect functioning of its energy transfer chain. Four characteristic wavelengths, corresponding to the fluorescence maximum or shoulder, can be easily distinguished: (1) peak near 580 nm corresponds to the fluorescence of phycoerythrin, (2) peak near 656–560 nm corresponds to the fluorescence of phycocyanin and allophycocyanin in common (they are undistinguishable at room temperature), (3) peak near 682 nm corresponds to the fluorescence of chlorophyll a in photosystem II, and (4) peak or shoulder near 720 nm represents the fluorescence from photosystem I [10, 53].
Comparative analysis of the series of fluorescence spectra for different cyanobacterial species and strains reveals visible variations in their shape. If the fluorescence spectra were taken from live cells in normal physiological state, which are cultured in the same growth environmental conditions, then the interspecies variations in pigment/Chl a ratios are more pronounced than variations within the individual species. And species/strain differentiation could be carried out on the basis of conventional multivariate analysis.
5. Ataxonomic differentiation of cyanobacterial strains on the base of single-cell fluorescence spectra
Fluorescence spectra have been used to classify phytoplankton populations since approximately the early 1970s [54, 55]. However, because of the generally low device precision and poor availabilities, the rate of species discrimination was relatively low. Recently new attempts to conduct the discrimination among microalga on the base of absorption or fluorescence spectra were reported [7, 10, 13, 15]. But again in published experiments only big algal groups with a considerable differences in pigment composition can be successfully separated (e.g., cryptophytes, chlorophytes, cyanobacteria, etc.). Moreover, all the authors pointed out that the discrimination among cyanobacterial species is quite complex and ambiguous. Actually, the correct discrimination of cyanobacterial species on the base of fluorescence signature is usually hampered by alterations in the pigment composition within one strain, which depends on the environmental conditions and physiological state of the culture. These difficulties can be overcome by using single-cell fluorescence spectra instead of bulk ones and by recording 7–8 spectra with different excitation wavelengths for each cell instead of one or two as usually is done.
In the presented investigation, 307 sets of 8 single-cell fluorescent spectra for 23 cyanobacterial strains, belonging to 15 genera, were analyzed. An optimal set of classification parameters was considered that is sufficient for determining the generic membership of cyanobacterial cells by means of mathematical statistics. The results of this study show that LDA and ANN are able to recognize cyanobacteria up to species/strains according to the data recorded by means of CLSM. This implies that the classifier (LDA or ANN) is capable of defining a unique niche in a multiparameter space for each of 23 cyanobacterial strains, used in this investigation.
The results of LDA, evaluated over 63 parameters extracted from 307 single-cell fluorescence spectra, are presented in Figure 7 as 3D-plots in the space of canonical discriminating functions. It is clear that the discrimination between species is sufficiently good. Moreover, the closely related species (e.g., Spirulina and Oscillatoria, Synechococcus and Chlorogloea, Microcystis, Synechocystis and Myxosarcina) appear close to each other. Such species as Leptolyngbia, Geitleninema, and Oscillatoria, which includes several strains, form big groups. However, inside these groups single strains also can be discriminated, which is demonstrated on the right panel, where the corresponding scaled region 1 is presented. This is confirmed by a classification diagram plotted in Figure 7C. The classification accuracy in the presented example was near 97.4%. The high classification accuracy is due to the fact that LDA works with distribution functions for classification parameters and their statistical characteristics, which allows to build a good classification model.
In the legend all used cyanobacterial strains are named and enumerated according to CALU collection. Solid curves bounded the regions, occupied by seven strains (Anabaena variabilis Kutz. sp. CALU 824, Geitlerinema sp. CALU 1315, Myxosarcina chroococcoides sp. CALU 601, Nostoc sp. CALU 1763, Spirulina platensis (Nordst.) sp. CALU 550, Synechococcus CALU 756, and Synechocystis aquatilis sp. CALU 1336) used for testing ANN classificator (in the legend they are indicated by red color).
In the considered classification problem, the quality of the ANN operation should be determined not only by the absolute value of the classification accuracy but also by the ability of the designed ANN to recognize and properly classify unknown species that did not participate in the training process. Thus, the performance of ANN was tested first with the aim only to discriminate between 16 known cyanobacterial species (Figure 8a). Another seven strains were identified as test ones, to verify the correctness of ANN in recognizing new strains (so-called generalization quality). Analysis of a test set with data from the same monocultures confirmed that the parameters extracted from the fluorescent spectrum sets contained enough information to correctly identify cyanobacterial cells at the species/strain level. The trained neural network presented here showed not the highest rate of correct classification—only about 95.7%—but it shows the best recognition quality for new strains. The results of the ANN recognition are presented in Figure 8b.
Bar charts in Figure 8a represent the results of the classification of 268 experimental measurements by 16 classes. Each bar represents the classification results as the probability distributions. Each color in the bar corresponds to 1 of 16 target classes (known cyanobacterial strains). The percentage rate of colors in the bar shows the probability distribution of belonging to the target classes. Maximal eigenclass probability is indicated above each bar.
In contrast to standard classifiers, a classifier built on the base of ANN has a so-called generalization ability. It means that ANN is able to recognize new cyanobacterial strains that were previously unknown for it and suggest possible variants of their generic affiliation to known classes. In Figure 8c, the ANN classification results for 16 target classes and 7 strains that were not presented in the training set are shown. The aim of ANN classifier was to determine which of the 16 known classes and 7 unknown strains could be attributed. The results of ANN classification correlate well with the results predicted by LDA (Figure 7). The closely related strains in this case were 1763–666, 601–398, 756–1409, 1315–1718, 550–1416, 824–1817, and 1336–398 (in the pairs, the first strain is unknown for ANN, and the second is the one of the nearest target classes). The strains of 1336 Synechocystis, 601 Myxosarcina, and 1315 Geitlerinema ANN classifiers relate to the close genera Microcystis and Geitlerinema, correspondingly. And for the remaining strains, it proposed possible classification options. Minor errors in classification of strains 756 Synechococcus, 824 Anabaena, and 550 Spirulina, in which the classifier relates to genera Synechococcus, Nostoc, and Oscillatoria, correspondingly, can be explained by the fact that in the space of classification parameters they lie in the wide free regions between the groups of the known strains, approximately, at equal distances from 2 or 3 nearest ones (see Figure 7). Therefore the ANN cannot make a correct decision. And the false result of ANN classificator in classification of 1763 Nostoc may due to the incorrect initial dataset or false a priori information about 1763 strain affiliation.
To validate the correctness of the neural network operation, the results of the ANN classification were compared with the results of the LDA. The neural network-based classification agrees well with the expected results and with the results of LDA. The identification performance of the network for cyanobacterial strains from the same species is slightly less than for the cells from different species, but anyway they can also be distinguished perfectly well.
The automatization of the cyanobacterial species differentiation is a key problem in both industrial biomass production and environmental monitoring. Unfortunately, all presently utilized methods cannot be implemented in online monitoring procedures due to various reasons. In this work, an example of the use of LDA and ANN technologies for online differentiation of cyanobacterial strains according to their in vivo single-cell fluorescence spectra is presented. The novel discrimination technique demonstrated here includes a strict procedure for recording and processing single-cell fluorescence emission spectra, which eliminates most of usual data processing difficulties and, as a result, has a quite high classification accuracy. And the initial information is obtained via fluorescent spectroscopy; the experimental data can be processed automatically. Moreover, due to the use of CLSM microscopic spectroscopy instead of conventional fluorimetry, the initial data have less variations and can be accurately sorted. Any objectionable and unpredictable impact is eliminated at the first step of obtaining fluorescence spectra. Since noninvasive and nondistructive method is used, the information about vital cell operation (e.g., light harvesting) can be additionally taken into account, to obtain the desirable precision of discrimination.
The universality of the considered technique makes it possible to use it for investigation of any phytoplankton species irrespective of their habitat or cultivation. Utilizing data from several fluorescence spectra, instead of one, results in more fingerprint information which leads to the taxonomic differentiation on a finer scale. Differentiation procedure, presented here, was carried out by means of statistical analysis on the base of mathematical characteristics of intrinsic fluorescence spectra of living single cells; therefore it is free from usual subjectivity, which can occur while using methods of direct optical microscopy. Moreover, formalization of data processing gives a wide opportunity for automating of the classification procedure of cyanobacterial strains in field samples, while online monitoring of water bodies is conducted.
Undoubtedly, the data set should be expanded to include more species and phytoplankton classes/divisions, grown under different nutrient and light conditions. However, this work already demonstrates the potential of the discrimination of phytoplankton classes by means of fluorescence microscopic spectroscopy. Combining the knowledge of phytoplankton structure along with taxon-specific measurements of photosynthetic activity and biochemical cell composition can lead to new models which increase the reliability of online monitoring.