Open access peer-reviewed conference paper

Technology, Science and Culture: A Global Vision, Volume III

Written By

Luis Ricardo Hernández and Martín Alejandro Serrano Meneses

Reviewed: August 18th, 2021Published: May 4th, 2022

DOI: 10.5772/intechopen.99973

Chapter metrics overview

39 Chapter Downloads

View Full Metrics

Universidad de las Américas Puebla

Technology, Science and Culture: A Global Vision, Volume III

Technology, Science and Culture: A Global Vision, Volume III

2020

Editors

Luis Ricardo Hernández

Martín Alejandro Serrano Meneses

Knowledge area co-editors

Aura Matilde Jiménez Garduño

Nelly Ramírez Corona

José Luis Sánchez Salas

Enrique Ajuria Ibarra

Roberto Rosas Romero

We continue with this series discussing on research topics related to the fields of Food Science, Intelligent Systems, Molecular Biomedicine, Water Science, and Creation and Theories of Culture. Our aims are to discuss the newest topics, theories, and research methods in each of the mentioned fields, to promote debates among top researchers and graduate students, and to generate collaborative works among them.

The interactions of recognized specialists in each field and graduate students, through different meetings, generated very interesting discussions, which are presented in this book. Thus, Dr. Luis A. Pardo, from the Molecular Biology of Neuronal Signals Max Planck Institute for Experimental Medicine, contributes with the article titled “Targeting the voltage-gated potassium channel Kv10.1 for cancer therapy”. Dr. Marco Carli, Associate Professor of the Department of Engineering at the Università degli Studi 'Roma TRE', Roma, Italy, explored, along with his co-author Federica Battisti, the quality of experience for immersive media with the work “QoE and immersive media: a new challenge”. Dr. Sandra Harding, Distinguished Research and Emeritus Professor of New York University, wrote the article “Strong objectivity for new social movements”. Dr. Vijay P. Singh, Distinguished Professor, Regent Professor, and Caroline and William N. Lehrer Distinguished Chair in Water Engineering at Texas A&M University contributes with the article “Challenges in flood management”. Dr. R. Paul Singh, Distinguished Professor of Food Engineering of the Department of Biological and Agricultural Engineering at University of California wrote the article “A quest for sustainability in the food enterprise”. Finally, graduate students of the Universidad de las Américas Puebla further present their key findings in a series of articles.

We believe that interactions between students and high-level researchers of different areas contribute to the creation of multidisciplinary points of view generating the advancement of science.

The number and impact of water-related natural disasters has increased since the middle of last century. As result of increased climate variability and the effects of global warming, the hydrometeorological risk has increased and spread, while the resilience of societies, in many cases, is not adequate. Consequently, the risk has increased. Floods and droughts, particularly in a changing climate, require greater understanding to generate better forecasts and proper management of these phenomena. Mexico, like other countries in the world, and of course in Latin America and the Caribbean region, suffers from both weather extremes.

The UNESCO Chair on Hydrometeorological Risks, held at the University of Americas Puebla, is devoted to the analysis, measurement, modelling and management of extreme hydro-meteorological events in the context of a more urbanized world, climate change and further vulnerable regions. Focused on the development of basic and applied research for the design of adaptation and mitigation measures, dissemination and preparation of decision makers as well as the public. In its activities keeps a gender focus, directed in particular to reduce the vulnerability of women to hydrometeorological disasters.

The Chair acts in the following fields:

  1. Hydrometeorological risks and climate change.

  2. Modelling and forecasting of hydrometeorological risks.

  3. Integrated management of hydrometeorological risks.

  4. Gender and hydrometeorological risks.

A detailed description of the UNESCO Chair on Hydrometeorological Risks, members and publications, can be obtained at its Website https://www.udlap.mx/catedraunesco/

The Chair publish a quarterly Newsletter, in Spanish and English, that can be consulted at https://www.udlap.mx/catedraunesco/newsletters.aspx

Advertisement

Contents

Targeting the Voltage-Gated Potassium Channel Kv10.1 for Cancer Therapy 1

Luis A. Pardo

QoE and Immersive Media: A New Challenge 9

Federica Battisti and Marco Carli

Strong Objectivity for New Social Movements 21

Sandra Harding

Challenges in Flood Management 31

Vijay P. Singh

A Quest for Sustainability in the Food Enterprise 45

R. Paul Singh

Evaluation of the Cytotoxic Activity of a Species of the Buddleja Genus in a Prostate Cancer Cell Line 57

Sofía Isabel Cuevas Cianca, Luis Ricardo Hernández and Irene Vergara Bahena

Designing Magnetic Mesoporous Nanoparticles for Cancer Therapy 65

Jessica Andrea Flood-Garibay, Kenneth J. Balkus Jr and Miguel Ángel Méndez-Rojas

Exoplanet Research Using Machine Learning and Multiresolution Analysis Techniques 75

Miguel Jara-Maldonado, Vicente Alarcon-Aquino and Roberto Rosas-Romero

Network Intrusion Detection Using Dendritic Cells and Danger Theory 89

David Limon-Cantu and Vicente Alarcon-Aquino

Automatic Terrain Perception in Off-Road Environments 107

Ethery Ramírez-Robles and Oleg Starostenko

Analysis of Voice and Magnetic Resonance Images to Assist Diagnosis of Parkinson’s Disease with Machine Learning 121

Gabriel Solana-Lavalle and Roberto Rosas-Romero

A Systematic Review of Sensitivity Analysis of Activated Sludge Modeling 135

Rafael Andrés Borobio-Castillo, José Manuel Cabrera-Miranda, Alberto Vargas-Hidalgo and Benito Corona-Vásquez

Microbial Photobioelectrochemical Systems: A Scoping Review 163

Luis Erick Coy-Aceves, José Luis Sánchez-Salas, Mónica Cerro-López, Miguel Ángel Méndez-Rojas and Benito Corona-Vázquez

Methods for Persistent Organic Pollutants Removal in Wastewater: A Review 193

Valérie Pihen and Jose Luis Sanchez-Salas

A Critical Review on Algal-Bacterial Granular Sludge Process as Potential Economical Alternative to AOPs for Textile Wastewater Treatment 207

Celina Sanchez-Sanchez, Guillermo Baquerizo and Ernestina Moreno-Rodríguez

Isolation and Identification of Molds in Selected Dried Fruits and Seeds Sold in Bulk in México 227

David González-Albarrán, Aurelio López-Malo and Enrique Palou

Advertisement

Targeting the Voltage-Gated Potassium Channel Kv10.1 for Cancer Therapy

Luis A. Pardo

Abstract

Survival and quality of life of cancer patients have improved in the last decades. However, some forms of cancer still escape current treatment options and continue to have an ominous prognosis. A plausible strategy to change this situation is to identify unexploited pathways in the cancer cell that open genuinely new therapeutic pathways. Ion channels are among such targets since they participate in all steps in the cancer process, from initiation through growth and metastasis to drug resistance. In some cases, ion channels can thus serve as therapeutic targets. Kv10.1 is particularly well suited for this purpose because the channel appears outside of the brain almost exclusively in cancer cells. Recent research showed that besides its functions as a canonical ion channel, Kv10.1 is required by dividing cells to complete division. For this, healthy cells express the channel only during a short period in the cell division cycle. Cancer cells, rather than increasing the channel’s expression, maintain relatively constant levels throughout their lives, which confers a selective advantage and favors tumor progression. The mechanisms leading to abnormal expression and its consequences, and how we can take advantage of this knowledge to improve current cancer treatments will be discussed.

Keywords:Kv10.1, ion channels, cell cycle, cancer target

1. Introduction

The aim of our research is based on the design of therapeutic approaches that make use of the Kv10.1 voltage-gated channel as a target. Kv10.1 is a voltage-gated potassium channel, which was discovered in the 60s and has been the main focus of our electrophysiological and molecular biology research for many years. During our early experiments, we discovered, contrary to the typical perspective of a voltage-gated ion channel, that Kv10.1 plays a key role in the development of tumors, and even though there is still a lot to understand about how Kv10.1 helps tumor cells to survive, we have been able to unravel many of its mechanisms. An overview of those discoveries will be presented in the present work.

2. The importance of ion channels in oncology

Ion channels represent the second biggest protein family in the human genome after GPCRs. These proteins allow the flux of ions through the plasma membrane. Kv.10.1 (also known as Eag1) is dependent on membrane voltage, and after a depolarization, it allows the efflux of potassium ions out of the cell into the extracellular space in a time range of milliseconds. After the exit of K ions, the consequent change in membrane voltage will then function as a cellular signal. If we analyze the protein structure of Kv10.1 we identify a transmembrane region and a big intracellular domain which represents almost 50% of the whole protein [1]. Such a long cytoplasmic domain reveals the importance of the channel not only as a K ion gate but also as an interactive partner of signaling molecules. Canonical functions of voltage-gated potassium channels encompass action potential repolarization, control of resting potential and excitability and volume control. However, more than 70 different genes encode voltage-gated potassium channels and they are expressed in excitable cells as well as in non-excitable cells. This fact suggests that voltage-gated ion channels are involved in more processes beyond action potential definition, as has been demonstrated in the last decades.

3. Expression profile of Kv10.1 in healthy tissues and cancer

Kv10.1 owns very distinctive electrophysiological features that allow us to identify it in different cells. Its activation depends on the membrane potential before the stimulus [2] and when we evaluate it through a whole-cell configuration of the patch-clamp technique, we can observe how the speed of its activation increases as the membrane potential before the stimulus becomes less negative. This feature endorses Kv10.1 with the ability to “remember” the previous electrical status of the membrane and regulate its gating accordingly. This phenomenon, called Cole-Moore shift can also be analyzed in single-channel experiments, where we observe the same response, channels have a delayed opening when the pre-depolarization potential is at -120 mV contrary to the immediate opening response observed when the pre-depolarization potential is at -50 mV. When the function of Kv10.1 was being unraveled some years ago, major participation during a neuronal action potential was discarded due to its slow gating. Therefore, experiments focused on its role in the synaptic membranes. Our studies on Kv10.1 knockout (KO) mice demonstrated that the channel plays a role in postsynaptic potentiation [3]. When mice cerebellar Purkinje cells were recorded after the electrical stimulus of the granule cell layer neurons (which communicate through parallel fibers to Purkinje neurons) the cell response of KO mice was unaffected to single or low-frequency stimuli. When a train of impulses is applied, the response increases progressively with successive stimuli, but only to a certain point in the wildtype, becoming then constant even if further impulses arrive. On the other hand, when Knockout (KO) mice were recorded, the response of Purkinje cells does not become controlled and continued increasing during stimulation. This effect was only associated with mild behavioral alterations of mice under stress, and therefore, the channel seems to play roles that can be compensated by other channels under less demanding conditions.

Anyhow, during the studies on Kv10.1, we found that its expression is almost exclusively confined to the central nervous system, although our first molecular and functional studies had been made on cancer cells.

4. A selective advantage for cancer cells

Therefore, we looked for the expression of Kv10.1 on a wide variety of human cell lines and cancer samples and we found that it was expressed in 72% of all tumor samples, whereas the healthy tissues where the tumor originates did not express it [4]. This means that we were in front of a tailored designed cancer target, a protein absent in healthy non-central nervous system tissues but expressed in a vast majority of tumors. In addition, our studies demonstrated that tumors expressing Kv10.1 have a worse clinical behavior compared to tumors negative for the channel. Acute myeloid leukemia showed that mortality increased for Kv10.1 positive leukemias [5]. Also, other authors have reported its potential use for bad prognosis in the ovary, gastric, colon, esophagus and cervix tumors [6].

Moreover, we know that imipramine can specifically block the function of Kv10.1, and when the outcome of patients with brain metastatic tumors taking or other tricyclic antidepressants was compared to patients with similar tumors taking a non-Kv10.1 blocker as an antidepressant, we observed that survival was higher in patients with the Kv10.1 blocking treatment [7]. This result suggests that we could be able to delay tumor growth by blocking Kv10.1 and evidences the biological advantage that cells acquire when its expression begins.

If we look for the phylogeny of Kv10.1 we can identify the whole EAG family in species such as Trichoplax adhaerens, long before the appearance of neurons. Therefore, there must be an ancestral function of Kv10.1 that does not involve neuronal activity and excitability [8].

One of the most ancient processes of life is cell division regulation. All cells either divide at least once or descend from the division of another cell, cell division is a very universal process in cells. To divide, a cell must pass through a series of phases that have been well characterized by researchers. The S phase is characterized by the duplication of the DNA content. The M phase is Mitosis when cell division occurs. In between those phases, we find two Gap phases called G1 and G2. G1 is a growth phase when cells prepare to divide and G2 is a checkpoint after the S phase to screen for errors during DNA duplication and if absent, proceed to Mitosis. The role of membrane potential in the process was known for at least fifty years. Clarence Cone showed that the membrane potential of a cell needs to oscillate during cycles of replication [9]. If the dynamics of the membrane potential are blocked then, cell division stops. We generally accept that at the end of G1 a hyperpolarization occurs and then from the S phase to the M phase a depolarization takes place. Those changes are completely dependent on ion channels [10]. Bijlenga et al. have already demonstrated that myoblast express Kv10.1 to fusion, which is a cell cycle-dependent process, however insights into the details of its precise role during the cell cycle were still lacking [11]. Our group evaluated synchronized cancer cells and we could demonstrate that the expression of Kv10.1 changes during the cell cycle and is maximal during the G2 phase of cells, which can be identified by the enrichment of other G2 protein markers [12].

5. Mechanisms of action

This elemental role of Kv10.1 made us propose the following question: Is Kv10.1 expressed only in some cells all the time, or in all cells but only for some time? If we assume that only a very small fraction of cells will be at G2 at any given time it is possible that we simply missed expression because it occurs for very short periods. So, when analyzing with more detail healthy tissues in their replicative zones, such as the bottom of colon crypts that have stem cells, or testis that contain G2 arrested cells, we were able to demonstrate that Kv10.1 was expressed in those healthy cells together with G2 markers such as Cyclin B [4, 12]. Therefore, tissues do express Kv10.1 for short periods during replication. When looking for its expression during G2, our group showed that when cells lose expression of Kv10.1 using RNA interference, G2 phases last more time compared to controls. This means that Kv10.1 somehow accelerates G2 phases and therefore, replication. But how exactly is Kv10.1 speeding up cell division? Well, one of the most important processes during cell division is cytoskeleton rearrangement, specifically, microtubule organization. When Kv10.1 is eliminated from cells, the dynamics of microtubule rearrangement is accelerated, with longer growth periods [13]. These changes correlated with changes in calcium oscillations when analyzed by fluorescent calcium sensors. Cells without Kv10.1 have higher calcium oscillation frequencies. Calcium enters the cells in a voltage-dependent manner, so it makes sense that Kv10.1, which hyperpolarizes the cell stabilizes the entry of calcium, making the oscillations less frequent [13].

Other groups have demonstrated that Kv10.1 functionally interacts with Orai1, a calcium channel. So, we looked for physical proximity between Kv10.1 and Orai1 and found a higher amount of interaction in tumoral cells demonstrated by proximity ligation assays. This would mean that Kv10.1 controls calcium entrance by regulating Orai1 and therefore improves the microtubule dynamics during cell division [13, 14].

Even if many mechanisms by which Kv10.1 promotes cell division are still to be explained, we are certain that blocking its conductive function impacts significatively on tumor growth, therefore, approaches towards drugs and strategies to block Kv10.1 in animal models are also a priority of our lab.

6. Therapeutic approaches

In mice models where MDA-MB435S (melanoma) cells are implanted, cells form tumors that can be easily studied. When we compare the effect of Astemizole (a non-specific Kv10.1 blocker, which has antihistaminic properties) vs. Cyclophosphamide, a known chemotherapeutic, we observed that both drugs can diminish tumor size after 40 days of implantation [15]. Moreover, if we analyze the effects of a non-blocking Kv.101 antibody compared to a blocking Kv10.1 antibody and to cyclophosphamide, we observe that the blocking Kv10.1 antibody again can reduce tumor growth at the same rate as cyclophosphamide in some models. If we implant patient-derived cancer cells in those mice, and we test for the antibodies against Kv10.1 we observed a less potent effect, compared to cyclophosphamide [16].

Nowadays chemotherapeutic treatment schemes use the combined effect of synergic drugs. One recent observation in Kv10.1 knock-down cells was the change in mitochondrial structure, generating a more fragmented pattern. Mitochondria are essential organelles for cancer cells due to the high metabolic rate they sustain. Mitochondrial fragmentation sensitizes cells for the additional use of antimetabolic drugs. We could demonstrate that blockage of Kv10.1 increases sensibility for antimetabolic drugs proportionally to their basal Kv10.1 expression, demonstrating that this effect is Kv10.1 expression-dependent [17].

Another approach currently under study by our group is the use of Kv10.1 attached to a more potent cytotoxic molecule such as TRAIL (TNF-related apoptosis-inducing ligand) which can induce apoptosis specifically in cancer cells [18]. We have now an improved design of such molecule using a single domain antibody (nanobody) against Kv10.1 bound to a single-chain TRAIL, which can induce apoptosis in the central region of the tumor in only 24 h at a dose of 3 ng/ml in tumor spheroids.

In conclusion, we believe that Kv10.1 represents one of the best oncological targets ever known, due to their selective expression in normal tissue. Therefore, we hope that in a near future the best anti-cancer strategy can be developed taking advantage of the Kv10.1 expression.

Author details

Luis A. Pardo

Oncophysiology Group, Max Planck Institute of Experimental Medicine, Göttingen, Germany

*Address all correspondence to: pardo@em.mpg.de

Advertisement

QoE and Immersive Media: A New Challenge

Federica Battisti and Marco Carli

Abstract

New real-world capture and rendering systems are flooding the market. Mobile phones are now equipped with more than one camera, thus creating multi-view portable systems. Virtual reality rendering equipment is now within the reach of the consumer and many applications are available to the user. A big effort is being made by industrial and research bodies for spreading the new technologies. In this contribution, an overview of the main issues related to the quality evaluation of immersive media is presented.

Keywords:virtual reality, immersive media, quality of experience, multiple views, computer-generated data

1. Introduction

Recent years have witnessed an overwhelming rise in multimedia technologies. Their impact on the consumer is very high. The terms immersivity, virtual reality, augmented reality, and 3D content have now become familiar even to non-professionals. Under the boost of the entertainment sector and, more generally, of multimedia interaction, many novel services have been proposed. Immersive media can be defined as technologies that attempt to produce or imitate the physical world by exploiting computer-generated data. This status is achieved by techniques, both aural and visual, able to completely engage the user [1]. As stated by Dale Lovell in [2], “engagement is great, but immersion is the future. Immersion is when you forget the message entirely, forget you are the audience even, and instead fall into a newly manufactured reality”.

One of the first approaches in the direction of providing the user with the feeling of immersion was the Sensorama system in 1957. It is a mechanical device, which includes a stereo color display, fans to generate the sensation of the wind, odor emitters, a stereo sound system, and a chair mounted on a moving platform. The experience shown to users consisted of a motorcycle tour through the streets of New York. The user, sitting on the chair, was able to relive the riding experience through sounds, chair movements and pre-recorded images. The smell of the city (gasoline vapors and snack bar pizza) has been recreated by chemicals. According to the situation surrounding the user, different effects are rendered (i.e., when the rider approaches a bus, the typical bus noise and gasoline smell are sent to the user). However, the user interaction was quite limited.

Nowadays different devices are available for acquiring, processing, and rendering information in the best interactive way. They are the basic elements of immersive media, such as virtual reality, augmented reality, and mixed reality.

Virtual Realityreplaces the user’s physical environment (including surrounding sound) with a computer generated, interactive, 3D environment in which a person is immersed. One of the identifying marks of a virtual reality system is the use of head-mounted displays worn by users. These displays block out all the external world and present to the wearer a view that is under the complete control of the computer. This allows a scene to be seen in any direction from one viewpoint. When using a head-mounted display to watch such content, the viewing direction can be changed by head movements. A less immersive effect can also be obtained by rendering virtual reality content with different devices. On smartphones and tablets, the viewing direction can be changed by touch interaction or by moving the device around, thanks to built-in sensors. On a desktop computer, the mouse or keyboard can be used for interacting with omnidirectional video.

Augmented Realitycombines the real-world with computer-generated data. Most of the AR research is currently concerned with the use of video imagery which is digitally processed and augmented by adding computer-generated graphics. The goal is to enhance instead of recreating the real scenario. A commonly used example of augmented reality is the Snapchat photo filtering tool.

Mixed Realityfuses the information collected by the real world with ad-hoc created digital ones. In this case the user may interact seamless with both. The user is generally equipped with a semi-transparent head-mounted display or with smart glasses. In mixed reality, the user must still be aware that he or she is present in the “real world.” There are three components needed to make an augmented-reality system work: 1) the see-through rendering system, 2) the tracking system, and 3) mobile computing power. All these components are fundamental, and their performances highly affect the perceived Quality of Experience (QoE).

Many sectors will benefit from these technologies. In the following, a few examples are reported.

  • Automotive industry: virtual reality technology allows to design of a vehicle or its constituent parts in a simple and inexpensive way before proceeding with the construction of expensive prototypes. At the same time, virtual reality and augmented reality may improve the maintenance services by showing how the situation should be and giving indications on the spot.

  • Tourism: in this case, the tourist can have the feeling of the trip while not traveling. Immersive media can be used by travel agencies to go beyond the classical booklet of images by showing to the customer virtual guided tours around the world that can improve the final user satisfaction or the QoE.

  • Healthcare: the use of these technologies is already in place for both training and patient care. Students and healthcare professionals can train in a low-risk 3D environment before working on real scenarios.

To achieve immersive goals, sophisticated media acquisition devices, new rendering systems, compression techniques, have been designed and, consequently, new application areas. Among others, 360° camera, light field camera, multiview camera setup, virtual reality equipment (audio and video), AR equipment, Tactile tools.

Especially when human subjects are involved, the impact of new technology on the perceived experience is a fundamental issue. If the human-in-the-loop factor is not properly addressed, the novel technology may not be successful. The negative trend of stereo content, especially in a home environment, is probably because the actual 3D content production, delivery, and presentation, are not compliant with 3D QoE. The success of the immersive imaging market relies on the ability of 3D systems to provide added value compared to conventional monoscopic imaging (i.e., depth feeling or parallax motion) coupled with high-quality image contents. Dealing with these issues can result in the creation of perceivable impairments in the 3D content that may be originated in different points of the 3D chain, from content creation to display techniques. Many artifacts are common to the 2D imaging systems. However, novel distortions typical of the 3D structure should be considered (i.e., crosstalk or keystone) especially because their presence highly impacts the perceived quality (i.e., compression artifacts due to coding). Subjects are prone to prefer 2D contents to 3D ones, as soon as fatigue and discomfort are induced during the content presentation. The understanding of the quality of the experience is mandatory. However, this task is quite challenging. In the following, a brief overview of quality and the related issues is reported.

2. Quality of immersive media

The word quality is widely used in the most diverse fields. However, the agreement on the idea of quality is very hard and depends on several aspects: the application, the historical period, or even the background of each person. The concept of quality is something everybody understands but can hardly define.

Going back to ancient times, Aristotle classified every object of human apprehension into 10 categories: Substance, Quantity, Quality, Relation, Place, Time, Position, State, Action, Affection. Qualities are hylomorphically–formal attributes, such as “white” or “grammatical”. Always remaining in antiquity, Quality in ancient Egypt was “a sign of perfection.”

Nowadays, scientists have tried to better define this concept to be able to measure it. Among others, relevant ones are:

  • General: Measure of excellence or state of being free from defects, deficiencies, and significant variations.

  • ISO 8402-1986 standard defines quality as “the totality of features and characteristics of a product or service that bears its ability to satisfy stated or implied needs”.

  • Google: the standard of something as measured against other things of a similar kind; the degree of excellence of something.

  • Manufacturing: strict and consistent adherence to measurable and verifiable standards to achieve uniformity of output that satisfies specific customer or user requirements.

  • ISO 9000: a family of standards for quality management systems.

To summarizing, quality is a relative concept: it can rather be expressed as a degree of quality. We can agree with this statement: “the quality of something can be determined by comparing a set of inherent characteristics with a set of requirements. We will have high quality if characteristics meet requirements, and low quality, if characteristics do not meet all requirements. Nowadays, the research is devoted to the QoE evaluation, that is “The degree of delight or annoyance of the user of an application or service. It results from the fulfillment of his or her expectations for the utility and / or enjoyment of the application or service in the light of the user’s personality and current state” [3].

3. Measuring quality

Quality evaluation of digital content is critical in all applications of information delivery. This is particularly true in case of digital image and video. Each stage of processing, storing, compression, and enhancement, may introduce perceivable distortions. For example, in image and video compression, the use of lossy schemes for reducing the amount of data may introduce artifacts as blurring and ringing, which leads to quality degradation. Similarly, during the transmission phase, due to the limited bandwidth available and to the channel noise, data might be lost or be modified, thus resulting in quality degradation of the received content.

The visibility and annoyance of these impairments are directly related to the quality of the received/processed data. The possibility of measuring the overall perceived quality to maintain, control, or enhance the quality of the digital data is fundamental. During the last two decades, many efforts have been directed by the scientific community to the design of quality metrics. The choice of an adequate metric usually depends on the requirements of the considered application.

There are two main methods of assessing media quality: subjective or objective. The first is carried out by human observers, while the second consists of the definition of models for predicting subjective evaluation.

3.1 Objective metrics

In objective measurements of the performances of an imaging system, image quality and quality losses are determined by evaluating some parameters based on a given general mathematical, physical or psycho-psychological model. That is, the goal is to obtain a measurable and verifiable aspect of a thing or phenomenon, expressed in numbers or quantities, such as lightness or heaviness, thickness or thinness, softness or hardness.

Objective quality metrics can be classified according to the amount of side information required to compute a given quality measurement. Using this criterion, three generic classes of objective metrics can be classified as Full Reference (FR) when the original and the impaired data are available, Reduced Reference (RR) when some side information regarding the original media can be used, and No-Reference (NR) if only the impaired image is available.

To make an objective assessment, one can use measuring devices to obtain numerical values; another method is to use image or video quality metrics. These metrics are usually developed to consider the human visual system and try to better match the subjective assessment.

To the first class belong the FR quality metrics. Among the most widely adopted FR objective metrics are the Mean Squared Error (MSE) and Peak Signal-to-Noise Ratio (PSNR). Both are pixel-wise measures of the difference between the original and of the impaired media. In particular, the PSNR is a measure of the peak error between the compressed image and the original image. PSNR is given as PSNR = 20log10 MAX(I) / √ MSE, where MAX(I) represents the maximum possible value of the media. The higher the PSNR, the better the quality of the reproduction. PSNR has usually been used to measure the quality of a compressed or distorted image. It is also applied, frame by frame, to video as the first information about video degradation. Other metrics are SSIM [4], MS- 3 SSIM [5], VIF [6], MAD [7], FSIM [8], etc.

Objective metrics have low computational cost, physical meanings, and are mathematically easy to deal with for optimization purposes. However, they have been widely criticized for not being well correlated with the perceived quality measurement.

Figure 1 shows an original image and its version deteriorated by additive Gaussian noise with increasing intensity. Figure 2 shows the same original image and three versions of the same image in which different distortions are introduced. As can be noticed, in the first case the value of the objective metric agrees with the perceptual judgment. In the second case, the objective metric returns the same score, thus indicating an equal level of distortion. However, from a perceptual point of view, the images are perceived as of different quality.

Figure 1.

Additive Gaussian noise of increasing variance.

Figure 2.

Different distortions on the same image. The objective score is 20.42 dB.

To overcome such problems, HVS inspired objective quality metrics have been introduced e.g., PSNR-HVS and PSNR-HVS-M. The main difference between these metrics and the mathematical ones (MSE, PSNR) is that they are more heuristic. It is more difficult to perform a mathematical comparison of their performances. Thus, to adequately evaluate the quality of such metrics statistical experiments are needed [9, 10].

3.2 Subjective metrics

In subjective tests, the digital content quality is assessed by performing subjective psychological tests. In this case, the goal is to find attributes, characteristics, or properties that can be observed and interpreted, and maybe approximated (quantified) but cannot be measured, such as beauty, feel, flavor, or taste. The quality score is generated by averaging the result of a set of standards, subjective tests and it can be considered as an indicator of the perceived media quality. A pool of subjects evaluates a set of images (or videos) ranking the perceived quality according to a specific scale [11]. Table 1 is reported the most used ranking scale in which the score 1 should be given to the media perceived ‘bad’ since it is affected by a ‘very annoying’ artifact. Similarly, the score of 5 should be given to the media showing excellent quality, in which no impairments are perceivable.

MOSQualityImpairment
5ExcellentImperceptible
4GoodPerceptible but not annoying
3FairSlightly annoying
2PoorAnnoying
1BadVery annoying

Table 1.

Mean opinion score assessment table.

Contrary to what it may seem, the subjective evaluation methodology is complex and time-consuming since, to be reliable, it requires to be properly designed and a large number of subjects is needed.

In more details, the subjective test depends on the test environment (i.e., type of monitors/speakers and other test equipment, lighting/acoustic conditions, laboratory architecture, background, …), the test material (i.e., meaningful content for the envisaged scenario/application, best, typical, worst cases, …), the test methodology (i.e., viewing distance/hearing position, subject selection, instruction phase, opinion or judgment collection, training - presentation – grading scale), and the carried-out analysis of the data.

3.3 Test material

To verify the performances of an objective metric, as well as for collecting the subjective score, a large database of distorted test images is usually prepared, and the Mean Opinion Score (MOS) from many human observers is collected. Then, the subjective results are compared with the objective scores of the tested metrics to identify the metric which metric shows the highest correlation with the subjective scores. However, some drawbacks have to be considered: usually, the size of the test database is not big enough, the number of different distortions is limited [12, 13], and methodological errors in planning and execution of the experiments can occur. Since in most applications humans are the ultimate receivers of digital data, the most accurate way to determine its quality is to measure it directly using psychophysical experiments with human subjects. One of the most intensive studies in this field has been carried out by the Video Quality Expert Group (VQEG). In the image quality framework, many datasets have been created as LIVE [4, 14] or TID2013 [15]. Relevant efforts have also been devoted to the design and test of video quality datasets. In this direction, among others, LIVE Video Quality Assessment Database [16] and the EPFL-Polimi [17] video databases have been extensively adopted.

3.4 Concluding remarks

As can be deduced from the brief and non-exhaustive analysis made in the previous paragraphs, the goal of obtaining a general-purpose objective metric is far from being achieved.

There are many difficulties, such as the availability of well-designed test datasets or the need for extensive subjective tests for collecting subjective opinion. In the frameworks of virtual, augmented, and mixed reality, quality evaluation is even more complex. In fact, up to now no standardized guidelines for running subjective tests have been defined. Even the use of ACR (Absolute Category Rating), ACR-HR (Absolute Category Rating with Hidden Reference), DSIS (Double-Stimulus Impairment Scale), DSCQS (Double Stimulus Continuous Quality-Scale) for quality assessment is not well defined, since it mainly depends on the target application and the rendering device. The situation is not different for the virtual reality case.

In the next paragraphs, we will make a quick introduction of one of the systems currently among the most used to acquire a scene from multiple points of view.

4. Light field

The Light Field (LF) expresses the radiance as a function of position and direction in regions of free space [18, 19]. In other words, it represents the number of light rays within a specific area. The capturing of all light rays in a scene allows generating a perspective view from any position. Therefore, LF technology can be effectively used in many applications: from accurate passive depth estimation to change of viewpoint or view synthesis, which can be useful in augmented reality content capture or movie postproduction.

The capturing of an LF is a quite complex procedure from the technological point of view; in fact, the light field represents rays with varying positions and angles, and, to obtain this information, it is necessary to record the scene from multiple positions.

To this aim, different techniques can be adopted: the use of camera arrays, camera gantry, or plenoptic cameras. By spatially locating multiple cameras into an array, the entire LF may be collected at once. This approach is used up to a planar array of 128 cameras. A different system is based on moving a single camera while capturing a stationary scene to measure the incident light rays. The basic idea behind the plenoptic imaging systems is the use of a micro-lens array positioned on the focal point of the camera lens, in front of the imaging sensor as shown in Figure 3.

Figure 3.

Light field vs. 2D imaging system.

This system allows recording multiple views of a scene in a single shot, thus reducing issues related to calibration and camera synchronization. The micro-lens array records the information on the incident light direction at different positions, i.e., it records the LF. The availability of low-cost acquisition devices allows novel applications for these imaging systems. The exploitation of the LF redundancy in the post-processing and editing phases brings photographers and art directors new opportunities. One of the main issues of this technology is related to the rendering modality. Many efforts are being devoted to the design of dedicated displays (e.g., an array of video projectors aimed at a lenticular sheet, 3D Displays, up to recently proposed tensor displays) or devices (e.g., head-mounted systems for virtual reality applications). However, up to now, these systems are very expensive and there are many challenges to be addressed (e.g., the reduced angular resolution of an LF cinema). The simplest and cheapest solution is the rendering of the LF data on conventional 2D screens. Since the LF allows rendering the scene from several points of view and focus points, the questions of what and how to render the scene on a 2D display arise. To solve this issue, recent works have been devoted to an in-deep analysis of the impact of different visualization techniques of LF images on a 2D display [20].

The research community is also trying to define quality metrics and test datasets specifically designed for LF data. In Table 2, a list of available LF datasets, annotated with the corresponding subjective scores, is reported.

DatasetsYearSRCs (Acquisition)Artifacts (HRCs)ProtocolRendering VisualizationStimuli
SMART201716 (Lytro Illum)Coding: SSDC, HEVC-Intra, JPEG, JPEG2KPCEDoF images (all-in-focused view)256
MPI-LFA201714 (synthetic and captured)3D-HEVC, Linear Nearest InterpolationACRstereoscopic viewing336
VALID dataset20185 (Lytro Illum)Compression HEVC and VP9DSIS2D displays140
Win5-LIDt201810 (Lytro Illum and Synthesis)HEVC, JPEG2000, Linear Nearest Interpolationextended DSCQSstereo display200
LF Dataset20198 (Lytro Illum)Gaussian blur, JPEG2000, JPEG, motion blur,
white noise
DSCQSpseudo-sequence240
LFDD20208 (synthetic)Image-based compression, Video-codecs, Geometric distortion, NoiseDSISpseudo-sequence480

Table 2.

Annotated light field dataset.

In the direction of the definition and assessment of quality, some efforts have been made. Methodologies for performing subjective quality assessment experiments were investigated in [20] and the impact of compression systems in [21]. The study was conducted by designing the SMART LF image quality dataset consisting of source images, compressed images, and subjective scores. The impact of the compression, reconstruction, and visualization phases was studied in [22] together with the definition of the Dense Light Fields dataset. The applicability and perceptual impact of existing and specifically designed compression techniques have been studied in [23]. A tentative to assess the subjective quality of experience of decoded LF images was performed in [24]. A reduced reference LF image quality metric based on the relationship between the distortion of the estimated depth map and the LF image quality was presented in [25]. Full reference metrics based on multi-order derived characteristics (MDFM) [26] and EPI [27] were presented. More recently, the log-Gabor feature-based light field coherence (LGF-LFC) feature has been proposed for a full reference metric in [3].

5. Conclusions

Defining objective quality metrics for immersive media is a very challenging task.

It needs a good understanding of both acquisition and rendering devices and subjective perception.

It depends on several parameters, which are difficult to be identified. The knowledge acquired in these decades from the use of images and videos has led to the definition of objective metrics, methodologies, and sufficiently defined test materials.

In the case of new media, the direct transfer of this knowledge is not possible. It is necessary to understand the possibilities of the application of new media and their limitations. The definition of use cases and the identification of significant parameters is needed. In addition, there is a need for databases annotated with subjective data, such as Mean Opinion Score, eye-tracking information, content definition. Open research questions are related to the understanding of the impact of the content on the quality of experience, to the definition of specific assessment protocols, and the definition of effective quality metrics. It is worthful to underline that what is needed in the evaluation of the Quality of the Experience, rather than the ‘simple’ quality of the media. Therefore, the human factor must be included in all the phases of the design of the immersive system.

Author details

Federica Battisti1* and Marco Carli2*

1 Department of Information Engineering, University of Padova, Padova, Italy

2 Department of Engineering, Roma Tre University, Rome, Italy

*Address all correspondence to: federica.battisti@unipd.it and marco.carli@uniroma3.it

Advertisement

Strong Objectivity for New Social Movements

Sandra Harding

Abstract

Standpoint methodology and its strong objectivity standard emerged four decades ago in the context of social justice movements of the 1960s and 1970s. Movements for poor people, African Americans, women, LGBTQ, and the disabled differed in many ways. Yet, all were firmly anti-authoritarian, criticizing the top-down policies and practices of governments and international agencies, as well as the natural and social sciences that served the interests of such institutions. The social justice movements argued that dominated groups would continue to be oppressed by research methodology, epistemology, theory, and public policy that ignored how the conditions of their marginalized lives differed from the living conditions of elite white men. They all insisted that the questions arising from their daily lives provided more effective starting points for maximally objective research results and the democratic public policies that such research was supposed to direct. This essay will focus on how, several decades later, newer social justice movements are demanding additional attention to the research practices that have bad effects on the public policy that shapes the everyday lives of peoples in such groups and elites. How do standpoint research strategies and their strong objectivity standards fare in these new social justice movements?

Keywords:new social movements, social justice, oppressed groups, public polici, standpoint research strategies

1. Introduction

It is now more than 60 years since C.P. Snow’s [1] Two Culturespointed out that scholars in the humanities and those in the sciences lived in two different worlds. They rarely encountered each other in scholarly contexts and were mostly entirely ignorant about each other’s projects. What interested Snow was the scientifically illiterate humanists.

New groups have joined the ranks of the scientifically illiterate, in the eyes of their critics: namely scientists themselves and the educated classes, as well as the policymakers who depend on scientific findings. These newest groups accuse the scientists of ignorance about androcentrism, racism, coloniality, and Eurocentrism that damages the reliability of their results of research. Science is a fully social process, they argue. What we know and do not know is shaped not only by “nature herself,” but also by what the most powerful corporations and governments want to know. They point, for example to widespread ignorance about climate change, which has held back effective public policy in this area. And an increasing number of such critics point to the effective non-modern knowledge systems of non-Western cultures, which have served those cultures well in their distinctive social and natural environments.

Obviously, nobody wants biased research that produces inaccurate accounts of nature and social relations. We want reliable accounts on which to base public policies and our practices. However, this can seem to be a dangerous moment even to take up this question in light of the constant barrage of false claims and “science-bashing” that issues daily, as I write, from the outgoing U.S. president and other authoritarian regimes around the globe.

Yet today, as the front pages of our newspapers have revealed, who catches COVID-19 and who dies from it indicate that in important respects, maximally objective environmental and medical/health assumptions and practices have not been guiding public policy. COVID-19 is an equal-opportunity virus, but the conditions of life for poor people and peoples of color ensure that they are more likely to catch it and have fewer resources to deal with it. Moreover, in the related economic crisis, who falls into poverty and who does not reveal similar faulty assumptions that shape economic policy.

In response to the earlier complaints, the sciences have corrected their processes in significant ways. As most physicians now recognize, the bodies of members of these other groups are not in all respects exactly like the stereotypical model of the human as the idealized elite white man. Women’s bodies are not immature or defective versions of men’s bodies, with simply a different reproductive system characterizing them, as the old, pre-1970s accounts claimed.

Engineers also got the message. Automobile designers created the possibility of adjusting the height and position of drivers’ seats so that even small drivers, such as many white women and most people in some other ethnic groups, could see out of the front window and at the same time reach the gas or brake pedals. Yet this morning, a story on NPR gave voice to women farmers, who were complaining that tractors and other farming equipment are not user-friendly to anyone but big, very strong men. Manufacturers need to design such items for use by the full array of peoples who farm, including white women as well as men and women in ethnic groups that characteristically are shorter and less heavily muscled.

2. Calls for greater objectivity

Thus, accommodation to servicing the needs of physically and socially diverse groups has produced economic, political, and educational revisions of our policy worlds and our daily experiences in them. These groups want to research that is more objective than the conventional supposedly universally valid research that was grounded only in dominant groups’ experiences. They do not want “subjective” research, as their critics often claim. It is the dominant models of the human and their standards that have been only subjective, the critics counter, representing only elite groups’ experiences and interests. Rather, they want “stronger objectivity” that can more accurately chart all of our naturally and socially different lives in the worlds that we share.

Some sciences are more liable to such charges than others. High-energy physics certainly seems reasonably resistant to such charges. It does not seem to be at all about people as social beings. Yet one can still ask questions about why it is that these sciences’ projects are so highly funded by the U.S Department of Defense. Could this have something to do with U.S. military politics rather than only with the objective desire to understand “pure nature”? Why do not sciences that could effectively prepare for a pandemic—one that on last Friday alone newly infected 99 thousand U.S. citizens and killed 1000—receive equal federal funding? It is becoming clear that today we live in a historically extraordinary moment in which deeply anti-democratic infrastructures have become increasingly visible. Such infrastructures ensure that scientific research will not be maximally objective; it will continue to serve the desires of the powerful at the expense of the needs of the vast majority of the world’s citizens. Our standards for objective research that were produced as a result of the earlier social justice movements did not go far enough.

3. The invention of standpoint methodology and its strong objectivity standard

Standpoint methodology was the name given to the research methodology intended to address such problems. It calls for “strong objectivity,” that can provide a more reliable standard for universally valid research. Though it emerged from all of the social justice movements of the 1970s, it was not so named at the time. Each of those movements proposed that reliable research to guide policy about their lives should start off its projects in a different way. It should not be addressing the standard issues that were the focus of mainstream natural and social sciences, but instead, start off from questions arising from the everyday lives of members of groups that experienced oppression and discrimination. Health, environmental, and social science research must take the “standpoints” of the everyday lives of marginalized groups to produce maximally reliable results of research. Through the efforts of marginalized members of the sciences, as well as of many non-marginalized scientists who immediately recognized the importance of the issue, this practice rather quickly became the strong objectivity standard for good research across most of the social sciences as well as health and environmental sciences that are a mix of natural and social science projects. Of course, there persist today continuing cases of both ignorance of and resistance to such practices.

Feminists were the first of these groups to call it standpoint theory. This began with a half dozen such political scientists, sociologists, and philosophers. Interestingly, they were almost entirely working independently of each other in the U.S., Canada, and the U.K. These included the sociologist of science Hilary Rose [2] in the U.K., sociologist of knowledge Dorothy Smith [3] in Canada, political scientists Nancy Hartsock [4] and myself, a philosopher of science, in the U.S. We all began asking such questions in the 1970s. Soon, sociologist Patricia Hill Collins [5] and many more African Americans and other feminists of color also began to refer to it as standpoint methodology, epistemology, and theory.

4. The beginning of the end of Western modernity?

Now newer social justice movements are raising additional issues. The sciences today are beginning to realize that if they want to understand how COVID-19, the associated economic crisis, and climate change actually work, they have to start off their research from the daily lives of the peoples least advantaged by such phenomena. Everyone is affected by what happens to everyone else in our shared world, but we are affected in different ways depending on the circumstances of our daily lives.

As Sheila Jasanoff [6] argued, sciences and their societies co-create and co-constitute each other. Early modern science was co-created and co-constituted with the new economic, political, social, and technical forms of life emerging in early modern Europe [7]. These sciences bore the imprint of the still existing residues of medieval European societies and were directed by the desires of the new social classes coming into power at that time. Today we may well be experiencing the beginnings of a similarly big shift in economic, political, social, and technical forms of life as electronic advances now permit both good and bad news to travel rapidly around the globe, and apparently beyond the kinds of federal controls permissible in democratic societies, and as our existing institutions appear unable to act effectively for the linked phenomena of the pandemic, the economic collapse, and climate change. The gap between the rich and the poor has rapidly escalated over the last four years, but it was well underway before this disastrous period in U.S. and international life. Traditional Liberal governments seem unable to organize the resources necessary to block the anti-democratic effects of such processes. Are we experiencing the beginning of the end of Western modernity its Liberal form of democracy and its philosophy of science?

Standpoint methodologies were developed for the political projects of the 1960s and 70s social justice movements in the global North, as noted earlier. Can they be adapted to these changing circumstances of peoples’ everyday lives, as these are represented in the new global South social justice movements?

The Latin American theorists of recovering ancestral knowledges provide one of the major critical forces developed in the global South that are calling for new scientific epistemologies and ontologies. They claim to offer radically different accounts of how nature and social relations work in our everyday lives and consequently point toward the need for new political resources to advance pro-democratic outcomes. And they insist that this recovery project is necessarily entangled with gender issues. What is the relation between these projects and those directed by standpoint methodologies?

5. Recovering ancestral knowledges: Latin America

In Latin America, social studies of knowledge production have been constructed in opposition to its distinctive history of primarily Spanish and Portuguese colonialism.1 The very modernity that was co-constituted with European sciences is itself both a product of and a contributor to colonialism. Yet the Latin American opposition to its colonial history is articulated also as an opposition to the postcolonial theory with which the North has reevaluated its mostly British colonial history with Asia (e.g., [12]). A significant group of these theorists has named themselves the modernity/colonial/decolonial group, or Decolonial for short (e.g., [13, 14, 15]).

Decolonial analyses occur in significantly different historical contexts than those in which the more familiar postcolonial accounts were generated. First, there are important chronological differences marked especially by the MCD scholars. Colonial relations in the Americas began in 1492—more than two and a half centuries before the British began to establish their colonies in India and the Middle East. For the Decolonial scholars, it is no accident that the so-called discovery of the Americas coincides with the emergence of modernity in Europe, though standard Northern histories tend not to link these two phenomena. “Modernity appears when Europe organizes the initial world-system and places itself at the center of world history over against a periphery equally constitutive of modernity” ([16], pp. 9–10). So, for Latin American theorists, modernity and Iberian colonialism co-produce and co-constitute each other. This not only shifts the beginning of modernity to a much earlier date but also inserts Iberian colonialism centrally into the history of modernity, which is something that has been largely denied by North Atlantic scholars.

Another chronological difference is that formal independence from European rule began much earlier in the Spanish, Portuguese, and French colonies in the Americas than in the British colonies (except for the United States). Most of the other colonies in the Americas achieved formal independence from Spain, Portugal, and France by 1830, except Cuba, which gained independence in 1898.2 Moreover, for the anti-colonial scholars, 1492 is the starting date of anti-colonial thinking. The Amerindians whom Cortes encountered, as well as Nahua and Quechua intellectuals in the early sixteenth century, clearly resisted both the idea and the reality of Iberian colonization [17, 18]. Anti-colonial thought has a longer and different history in Latin America than the familiar British postcolonial accounts.

Second, the origins of the Scientific Revolution are broader than assumed in conventional philosophies and histories of science, and they have roots in colonialism. Colonization of the Americas required that the conquerors interact effectively with physical worlds different from those familiar to them. Yet they lacked astronomy of the Southern hemisphere with which to navigate back to Europe across the South Atlantic. The cartography of the South Atlantic and their environments in the Americas had to be created. They also needed climatology, oceanography, and better engineering to secure the safe travels of their crews and their precious cargoes. In the Americas, they needed knowledge of the unfamiliar geographies and flora and fauna that they encountered. They needed better geology, mining, and engineering, even though they soon appropriated from the Amerindians sophisticated forms of these technologies which they improved to extract the gold and silver that they found in Mexico and Peru. In 1492, the Europeans were behind the Amerindians in these kinds of scientific and technical knowledge: they were the backward ones. Europe’s colonial projects in the Americas turned a huge part of the globe into a laboratory for European sciences [19, 20].

Third, in addition to the scientific and technical needs created by the different chronologies and geographies, the Iberian colonizers lived in social worlds different from those that shaped the coloniality of the British Empire. For the Europeans, the “discovery” of new lands across the Atlantic appeared as a solution to some of their most vexing social problems. Europeans welcomed the thought of being able to leave behind the economic and political challenges of the continual religious and political wars, as well as of overpopulation and famines. The Europeans imagined that they could start over in the “Garden of Eden” that had been “discovered” across the Atlantic.

Fourth, yet those peoples that the Spanish and Portuguese colonized were culturally different from those the British colonized centuries later. For the Amerindians, the arrival of the Europeans was a cataclysmic event. It meant the destruction of their cultural and physical worlds, the loss of sovereignty over their lands, the loss of their freedom, and the destruction and devaluation of their forms of knowledge and spirituality.

It is only relatively recently that demographic, historical, and environmental research undermined long-held assumptions that the Americas were only sparsely inhabited in 1491, and that those inhabitants were at a much more primitive stage of social and scientific development than were Europeans. In 1491, there were probably more people living in the Americas than in Europe (e.g., [21, 22]). Estimations of the actual numbers in the Americas vary hugely, from 10 million to over 100 million. Some of the world’s largest cities at the time were in the Americas [22]. Inca, Aztec, and Mayan architecture, engineering, and road systems were among the most advanced of ancient civilizations, and in some respects superior to those of the Europeans. Amerindians had extensive agricultural techniques, such as controlled fires to clear the land and increase the nutrients in the soil, and were able to preserve food that could last for years through processes of freezing, dehydration, and rehydration.

What did the Amerindians know in 1491 in addition to their agricultural, environmental, and spiritual-philosophical knowledge? The Nahua effectively mined silver and gold, as indicated, and drained the swamps, and then engineered the hanging gardens of the town that became Mexico City. Moreover, the Europeans had no way to project dates into BC eras, and no precise way to measure a solar year. The Nahua, Mayans, and Incas could do both. Amerindians also learned that they could locate their calendars on the European Christian calendars; Aztec and Inca events could be celebrated to coincide with Christian events, unbeknownst to the Europeans. And there was more knowledge production in the realm of medicine, pharmacopeia, and botany. Today indigenous knowledges are being reconstructed and are experiencing a boom perhaps never seen since the conquest.

Indigenous philosophies appeared dormant or invisible to the non-indigenous outsider until very recently and are still largely unknown to Anglo academics. Yet they have existed underground and persisted throughout the centuries inside indigenous communities. Today, indigenous peoples see as their task not only to reconstruct ancestral knowledges for their own survival but also for the survival of the rest of humanity that seems unable to halt the most pernicious aspects of modernity such as infinite economic growth, the destruction of the planet or “Pachamama”—the earth mother, and modern science and technology at the service of profit and constant wars [23]. What is especially remarkable about this process is that for the first time in colonial history, indigenous women’s voices can now be heard. Indigenous women had in the pre-intrusion era occupied important social and political positions that were undermined with colonization. Equally important was the place women occupied in indigenous cosmogonies and ontologies. These positioned women in a parallel, but not always equal position with men. It is this last point that not only defines the particularity of indigenous epistemologies, cosmogonies, and ontologies but also gives rise to one of the most contentious points in today’s feminist debates around gender.

The recuperation of ancestral knowledges is necessarily a contested terrain. The difficulty of recovering them lies not only in their fragmented and dispersed state after centuries of colonization; they also have fused with Western, Christian elements which have altered not only the collective memory but also their existence in the present. It is not always clear what remains of the past and what is a recent invention. To complicate matters, the process of recuperation is often manipulated by present-day political interests of both indigenous and mestizo men, but also of women.

Yet no matter how important it is to keep in mind these contradictions in the process of recuperation of ancestral knowledges, such knowledges do pose serious challenges to Western totalitarian knowledge that sees itself as the only valid knowledge. It is perhaps in the discussions about gender where the disparities seem to be the greatest. Gender permeates the entire recomposition of indigenous cosmovisions.

6. Indigenous conceptions of gender/sexuality

Indigenous conceptions of gender in both the Mesoamerican and Andean regions are based on a cosmic vision of life that is entirely different from the West.3 Cartesian dichotomies that separate mind and body, humans and nature, nature and society are foreign in these cultures. In their cosmic vision, all of these elements are interdependent; they must maintain an equilibrium for a harmonious existence. There is a fluidity that runs through the earth, heavens, water, wind, and the humans and non-humans that fuses them together. The cosmos is itself constituted by dualistic forces that are fluid, but not hierarchical as in Cartesian precepts, nor gendered. Thus, the feminine and masculine forces are complementary, of equal importance to the cosmos, and must maintain an equilibrium to guarantee the perpetuity of life.

Sociologically, this gendered division of the cosmos translates into gender complementarity, gender parallelism, or what the Aymara call chachawarmi. Man-woman constituted a unit of pairs. A married couple of man and woman were the basic unit of the community. Their work in tandem, although differentiated, was of equal worth. Women were not economically dependent on men. In gender parallel structures women constituted a lineage where inheritance was passed down to their daughters.

And yet, historically we can see those elements of gender hierarchies were present. Gender differentiation increased as empire and state-building advanced both among the Mexicas and the Incas [24, 25]. Men as soldiers and warriors had a public face that women lacked. Men were the representatives of the community before the ruler. While noble women had the class privilege and could occasionally occupy positions of power, the highest positions of power were still reserved for men. War, although understood to be as important as women’s power of child birthing, constituted the center of power of indigenous realpolitik.

But it is the elements of complementarity, parallelism, and reciprocity between the genders that many indigenous men and women and their mestizo/criollo allies want to claim as either still existent or in need of resurrection. This position encounters many criticisms. Perhaps most important is the fact that this gender regime did not survive colonialism intact. Colonialism itself involved a social pact between colonized and colonizer men based on the acceptance of the subordination of indigenous women to their men in exchange for limited access for colonized men to power inside the community. Indigenous men while emasculated in the public sphere were granted the control of women, children, and the elderly in the household and the community. These gender colonial norms have in time installed gender violence, something that was unknown to them in the era of pre-intrusion. As Argentinean anthropologist Rita Segato has maintained, the separation of the public and private spheres not only privatized and minoritized indigenous women; it had lethal consequences for them [26]. More recent experiences of genocide, such as the one in Guatemala where the state forced indigenous men to rape, kill, and mutilate indigenous women, have increased violence against indigenous women dramatically, and thereby led to some of the highest femicide rates in the world.

7. Conclusion

As noted earlier, this is a moment of deep and widespread transformation of social institutions, including universities and their related educational, research and publication contexts. In this respect, it resembles the beginnings of the early modern era in European history; this may well be the “other end” of Western modernity and its philosophies of science. It seems to be a moment when we educated elites in the modern West can only now begin to glimpse the fact that Liberal democracy’s meritocracy is a contradiction in terms, as many less fortunate groups have already understood. It does not encourage us to collaborate with others or treat them as equals. We can have a meritocracy or a democracy, but not both [27].

Our recognition of this tension can be a productive event. As a start, we can learn to “walk together” respectfully with peoples whom Western modernity has marked as deeply different from us. Standpoint methodology and its strong objectivity standard can be useful resources for this project.

Author details

Sandra Harding

Philosophy, New York University, United States

*Address all correspondence to:

Advertisement

Challenges in Flood Management

Vijay P. Singh

Abstract

Each year floods occur in many parts of the world and cause huge damages to agriculture, homes, schools, hospitals, highways, industries, water supply systems, infrastructure, levees, dams, and environment. They also cause loss of animal and human lives. Looking at the history of floods and damages caused, it is evident that they are amongst the costliest natural disasters and impact hundreds of thousands of people each year. It is widely accepted that floods cannot be eliminated entirely. However, they can be managed to mitigate the loss of life and property. Revisiting the types and causes of floods, this presentation focuses on the challenges in flood management. The challenges are both technical, including hydrometeorologic, hydrologic, hydraulic, geotechnical, and structural; and nontechnical, including education, communication and Internet, legal, administrative, social, political, risk analysis, and skilled professionals. The challenges have a wide variety but fall under seemingly disparate disciplines so the emphasis here is on their integration. Compounding these challenges is climate change whose impact can be assessed but whose forecast in space and time is still a challenge. The presentation is concluded with a personal reflection on paradigm shift.

Keywords:natural disasters, climate change, floods, risk, management, paradigm shift

1. Introduction

Each year natural disasters strike many parts of the world. Some parts are hit by heavy rains, some by heavy snowstorms, some by floods, some by mudslides, some by windstorms, some by hurricanes/cyclones/typhoons, some by heat waves, some by cold waves, some by snow avalanche, some by droughts, some by tornadoes, some by wildfires, some by earthquakes, some by lightening, some by volcanic eruption, some by tsunami, some by viral/bacterial outbreaks, and some by a combination of one or more of these disasters, such as heavy winds accompanied by heavy rains, drought accompanied by heat wave, cold wave accompanied by heavy winds, heat wave accompanied by viral outbreak, to name but a few. These natural disasters cause loss of life, damage to property, disruption in social and cultural fabric, environmental degradation, and imbalance in the ecosystem. To illustrate the impact of some of these disasters, Table 1 lists yearly average global annual deaths by decade. It is seen that floods, droughts, and earthquakes cause more loss of life than other disasters. Of these three major disasters, floods and droughts are more common and can occur during the same year or at the same time but at different places in the same country. Further, it often happens that floods ravage one part of the country and droughts ravage other parts at the same time. An example is India where each year during the monsoon or rainy season, floods occur in the Northeast and North but droughts in the West at the same time.

DecadeDroughtEarthquakeExtreme temperatureFloodImpactLandslideMass movement (dry)StormVolcanic activityWildfire
1900s130000173020630513180144940
1910s8500628001013800125995648107
1920s47240054935042804301199951410
1930s0237701694361470103493843187
1940s345000161870101030175301271221325
1950s020931502058300215031265101
1960s150865523611332390504218133933247
1970s119084402215550780738735734531
1980s557276015534515506231274667240040
1990s311103599329549083387211159786
2000s1154536491065401077228172132463
2010s333943302116445811010691331777152

Table 1.

Yearly average global annual deaths from natural disasters.

[Source: https://ourworldindata.org/ofdacred by decade-international-disaster-data].

Heavy rainstorms cause huge losses, as shown in Table 2. For example, Hurricane Harvey that struck the Houston area in Texas, U.S., caused damages worth US$126.3 billion and 89 deaths, not to speak of untold misery and disruption in the community. It took a long time to recover from this hurricane. Along a similar vein, floods cause even more losses, as shown in Table 3. For example, floods that occurred in the Eastern U.S. on November 8, 1996, caused 187 deaths and damage worth US$ 4.79 billion.

No.NameYearDateArea affectedFatalitiesCost of damage
1South Carolina Sea Island hurricane1893Aug-27Sea Island, South Carolina2000$27.9 million
2Galveston hurricane and storm surge1900Sep-09Galveston, Texas8000$602.3 million
3Miami hurricane and flooding1926Sep-18Florida Atlantic Coast, Florida372$1.49 billion
4South Florida hurricane and flood1928Sep-16Lake Okeechobee, Florida2500–3000$1.5 billion
5Labor Day Hurricane1935Sep-02Florida Keys, Florida500$100.0 million
6New England hurricane and flooding1938Sep-21New England, Long Island, New York700$5.44 billion
7Pacific tsunami1946Apr-01Hawaii, Alaska165$334.1 million
8Hurricane Agnes flood1972Jun-19Susquehanna, Lackawanna, Pennsylvania128$18.0 billion
9Hurricane Katrina flooding2005Aug-29Southern Louisiana, Louisiana1833$103.9 billion
10Superstorm Sandy2012Oct-29New Jersey, New York233$88.4 billion
11Hurricane Harvey2017Aug-26Houston, Texas89$126.3 billion

Table 2.

Storms impacts are costly (examples from the U.S.).

No.NameYearDateArea affectedFatalitiesCost of damage
1Mill River Dam flood1874May-16Western Massachusetts139$1.0 million
2Johnstown flood1889May-31Johnstown, Pennsylvania2209$12.6 billion
3Brazos River flood1899Jun-17Freeport, Texas284$271.0 million
4Oregon Heppner flash flood1903Jun-14Heppner, Oregon324$17.1 million
5Statewide Ohio flood1913Mar-23Cincinnati, Miami River, Ohio467$82.4 billion
6Brazos and Colorado River flood1913Dec-05Freeport, Waco, Texas177$88.7 million
7San Antonio flood1921Sep-10San Antonio, Texas215$70.2 million
8Great Mississippi flood1927Dec-25Mississippi River region, Mississippi246$41.7 billion
9St. Francis Dam failure1928Mar-12Los Angeles, California400–600$291.8 million
10Great Northeast flood1936Mar-11Maryland to Maine200$85.2 billion
11The Ohio River flood1937Jan-30Pennsylvania, Ohio, West Virginia, Tennessee, Indiana, Illinois385$151.6 billion
12Los Angeles flood1938Feb-27Los Angeles, California115$1.24 billion
13East Coast flood1955Aug-11New England, Northern Virginia200$7.78 billion
14Hurricane Camille and flooding1969Aug-17The Gulf Coast of Mississippi, Mississippi256$9.70 billion
15Black Hills flood1972Jun-09Rapid City, South Dakota238$988.3 million
16Buffalo Creek flood1972Feb-26West Virginia125$64.0 million
17Big Thompson Canyon flood1976Jul-31Big Thompson Canyon, Colorado144$156.3 million
18Floods in eastern U.S.1996Nov-08Appalachians, Mid-Atlantic, Northeast187$4.79 billion
19Southeast U.S. flood1998Oct-17Tampa, Florida132$2.49 billion

Table 3.

Floods impacts are costly (examples from the U.S.).

2. Types and causes of floods

Depending on where they occur, floods can be classified into different types as: watershed, riverine, urban, coastal, and glacial. These different types of floods have different spatial scales. For example, glacial outburst cause flooding at a local level but can be more extensive if the dam is broken. Coastal flooding are confined to coastal areas and can wipe out beaches and damage wetlands and vegetation by bringing in salt sea water. Flooding is quite common in urban areas these days, because urban areas turn pervious areas into impervious areas which do not infiltrate rainwater. The different types of floods are caused by extreme rainfall, hurricanes, tides, combined rainfall and snowmelt, improper drainage, improper watershed management, dam/levee breaching, or glacial outbursts. The ubiquitous cause is extreme rainfall, but rainfall and snowmelt together are also a common cause, especially in areas where snowfall is extensive as in the United States.

Likewise, in monsoon climate countries in Asia, destructive floods occur each year massive investments made in flood defenses notwithstanding. In China, damages caused by floods have been over US$200 billion per decade. Floods during the monsoon season have been commonplace in the Yangtze and its tributaries.

3. Flood management

It is accepted that floods cannot be entirely eliminated because nature cannot be fully controlled, but they can be managed so that the damages caused by them are mitigated. Thus, flood management involves two aspects: technical and nontechnical. Technical aspects are primarily engineering, including hydrometeorologic, hydrologic, hydraulic, geotechnical, and structural; and nontechnical aspects are education, socio-economic, political, legal, communication, internet, and administrative.

3.1 Hydrologic and hydrometeorologic considerations

Hydrology is basic to flood management and to answer basic questions which are fundamental to designing a flood management project. The questions needed for design are: (1) What will be the flood producing rainfall? (2) What will be the return period of this rainfall? (3) What will be the flood magnitude due to a given rainfall event? (4) What will be the probability or return period of a given flood magnitude? (5) What will be the risk of occurrence of a flood of given magnitude? The first three questions are answered by deterministic hydrometeorologic and hydrologic modeling, also called rainfall-runoff modeling or watershed modeling. There are many types of watershed models, such as empirical (regression type), conceptual (unit hydrograph theory), and physically-based (kinematic, diffusion wave, and dynamic wave theories). A comprehensive account of most of the popular models around the globe is given in Singh [1], Singh and Woolhiser [2], and in Singh and Frevert [3, 4, 5].

When managing floods, the questions are: (1) When will the flood occur at a given location? (2) How much area will be impacted by a given flood? (3) How long will a flood last? These questions are answered by stochastic hydrologic modeling, including univariate frequency analysis, multivariate stochastic analysis, and stochastic watershed modeling. Frequency analysis is done in different ways. The most popular method of frequency analysis in practice is the empirical method which involves fitting a frequency distribution to empirical flood data, use of an appropriate parameter estimation technique, goodness of fit, selection of a distribution, establishing confidence bands, and risk analysis. A goof account of the frequency distributions and their fitting and parameter estimation is given in Kite [6], Singh [7], and Rao and Hamed [8], Zhang and Singh [9]. Often multivariate frequency analysis may be needed for not only design but also for management. That is most appropriately done using copulas which along with their applications are comprehensively described in Zhang and Singh [9]. A treatise on risk and reliability analysis in environmental and water engineering is provided by Singh et al.[10].

3.2 Hydraulic considerations

Hydraulics deals with flow in the river. For designing hydraulic structures for flood control, subsequent to hydrologic questions, there are hydraulic questions that need answering. These questions are: (1) What will be the flood stage and flood discharge in a river? (2) When will the flood stage exceed the flood threshold at a given location? (3) How long a river reach will be impacted by a given flood? (4) How much area will be flooded by such a flood? (5) How long will a flood last? (6) What will be the return period of such a stage and discharge? (7) What will be the probability or return period of a given flood stage and discharge? (8) What will be the risk of such a flood stage? These questions are answered by hydraulic modeling. Deterministic flood routing which is either empirical (relation between upstream and downstream hydrographs), or conceptual (Muskingum method), or physically-based (diffusion wave, dynamic wave) answers the first five questions. Singh [11] has given a full account of the deterministic flood routing. The last three questions are answered by stochastic hydraulics involving frequency analysis and multivariate stochastic analysis. Stochastic methods in hydraulics are similar to those in hydrology [9].

3.3 Geotechnical considerations

Geotechnical engineering primarily deals with foundations of structures. It answers a set of basic questions which must be answered before any construction, such as: (1) What is the most appropriate site for constructing a given structure, such as a dam? (2) Can the local soil withstand the pressure? (3) What are the local soil characteristics? (4) How high a structure, such as a levee, should be? (5) Does the foundation need reinforcement? (6) What is the reliability of a given foundation? These questions are answered by geotechnical engineering. There are standard textbooks available which provide comprehensive accounts of these and related issues.

3.4 Structural considerations

Structural engineering deals with the structural design of a flood control structure, such as a dam and its associated appurtenances like spillway, tunnels, etc. or levee. It computes forces the structure must withstand and its dimensions. To that end, it answers questions such as how large a structure should be, what the type of a structure should be, how reliable the structure will be, what skill set will be needed for construction, who will do the dam construction and who would do the supervision, and how long will it take to complete the dam. These questions are answered by structural modeling which can be deterministic, including empirical, conceptual, or physically-based, and reliability and risk-based, including reliability analysis and risk analysis. Standard textbooks are available which provide full accounts of these and related issues.

4. Challenges in flood management

4.1 Climate change

Climate change is a major challenge for humanity in this century. It indeed will decide the fate of our civilization. The Intergovernmental panel on Climate Change [12] notes: “A changing climate leads to changes in the frequency, intensity, spatial extent, duration, and timing of weather and climate extremes, and can result in unprecedented extremes …” It has already started to impact the extremes of atmospheric weather and climate variables (temperature, precipitation, wind), the natural physical environment (floods, extreme sea level, waves, coastal waves, winds, and tornadoes). The questions often arise with regard to the assessment, forecasting-where, when, and how long-impact assessment-where, how much, and how serious a risk.

Three possible changes in weather extremes triggered by climate change are: less extreme cold but more extreme hot weather, more extreme cold and more extreme hot weather, near constant extreme cold but more extreme hot weather. It has a pronounced effect on the hydrological cycle and climate extremes as shown in Figure 1 with and without climate change. The upper most part of Figure 1 shows that there is a shift in the mean to the right from without climate change to with climate change, indicating less cold and less extreme cold but more hot and more extreme weather, whereas the middle part of Figure 1 shows that there is an increase in variability with climate change, translating into more cold, more extreme cold, more hot, and more extreme hot weather, and the bottom part of Figure 1 shows that the weather symmetry changes with climate change such that cold and extreme cold weather is nearly constant but there is more hot and more extreme hot weather.

Figure 1.

Effect of climate change on weather extremes [Source:https://www.ipcc.ch/report/managing-the-risks-of-extreme-events-and-disasters-to-advance-climate-change-adaptation/].

Climate models are showing earlier occurrence of spring peak river flows in snowmelt- and glacier-fed rivers (already being observed), anthropogenic influence on changes in some components of the water cycle (precipitation, snowmelt) affecting floods, projected increases in heavy precipitation which would contribute to rain-generated local flooding in some catchments or region, and potential changes in the magnitude and frequency of floods. IPCC, SREX [12] shows the impacts on precipitation, as shown in Figure 2, considering the standard deviation of wet day intensity, percentage of days with precipitation greater than Q95 (95% quantile), and standard deviation of fraction of days with precipitation greater than 10 mm for June, July and August (JJA); December, January, and February (DJF); and artificial neural network (ANN). In each case it is revealed that the standard deviation increases over most parts of the world. IPCC, SREX [12] further shows for different parts of the world that higher 24-hour precipitation values will occur more frequently indicating their reduced return period. For example, for many parts a 20-year return period of 24-hour precipitation will reduce to 10 years or less. This means that there will be more frequent floods (Figure 3).

Figure 2.

Effect of climate change on weather extremes (Source: IPCC, SREX [12]).

Figure 3.

Projected return period (in years) of 20-year return values of annual maximum 24-hour precipitation rates (after IPCC, SREX [12]).

4.2 Integration of disciplines

For effective management of floods, it is deemed that seemingly disparate disciplines that are associated with floods directly or indirectly should be integrated. These disciplines are: hydrometeorology, hydrology, hydraulics, agriculture, earth sciences, environmental sciences, socio-economic sciences, political and policy making, communication science, legal constraints, and administrative dimensions. These disciplines are on the flood process side. On the other hand, disciplines that provide tools for solving problems are mathematics, statistics, operations research, data science, geographical information systems, intelligent systems, and computer science. These disciplines should also be integrated with flood management.

4.3 Communication

It is vitally important that agencies responsible for flood management communicate to the public as to why floods occur, likelihood of a flood in any given area, and roles and responsibilities associated with flood risk reduction and response. Needs of those people who unable to protect themselves are messages that are continually lacking to be conveyed. These messages do not “stick”, nor last, so they have to be regularly repeated even to the same audiences.

4.4 Flood risk analysis

Conducting analyses of flood risks and contributors to increased flood risk are necessary to have substance in communications. That said, management of risk is unwanted, but necessary. No single organization, within the U.S. or international, can control all aspects of population and property at risk from flooding or contributing to flooding. However, sharing risk is not desired by those who depend on or expect some other organization to provide their protection. The greater value of risk-based analyses lies in the better articulation of roles and responsibilities affiliated with flood risk reduction and response.

4.5 Measurement

For developing flood control measures and flood management, spatial and temporal data from different disciplines are needed. More particularly, hydrometeorologic data, hydrometric data, watershed physiographic data, and land use and land cover data are needed to get started. Measurement technologies-remote sensing, satellite and drones- can be employed at a large scale. The remote sensing technology can provide information on rainfall fields, including storm movement, spatial variability, temporal variability, and rainfall field coverage. Also, measurements techniques are available that help describe the spatial variability of hydraulic roughness. The collected data should be subject to quality analysis/control, should be archived, and be retrievable. Then, the data needs processing and should be made accessible.

4.6 Integrated hydrologic modeling

Hydrologic modeling should be integrated with remote sensing, geographical information system (GIS), data base management system, hydraulics, land use/land cover, hydrometeorology, geomorphology, uncertainty and risk analysis. In distributed hydrologic modeling, it is important to quantify the effect of the spatial variability of watershed characteristics on runoff dynamics and hydrograph, and formation of shocks. Impacting the runoff or flood hydrograph is also the spatial variability of infiltration, hydraulic conductivity, steady infiltration, and mean infiltration. The spatial and temporal variability is directly dependent on scaling. Spatial scaling entails spatial heterogeneity in watershed characteristics, spatial variability in hydrologic processes, as well as physical spatial size involving representative elementary area, hydrologic response units, and computational grid size. On the other hand, temporal scaling involves time interval of observations, computational grid size, and temporal variability of processes. These issues play a vital role in flood model response.

An important issue in integrated modeling is calibration which involves parameter estimation algorithm, an objective function, an optimization algorithm, a termination criterion, calibration data, handling data errors, determination of data needs-quantity and information-richness, and representation of uncertainty of the calibrated model. Artificial neural networks can also be employed for modeling or model calibration.

In modern era, new tools are emerging or the existing tools are being made more accurate and versatile. These tools may include mechanistic models, data mining models, uncertainty analysis, entropy theory, risk analysis, multivariate stochastic analysis (copula theory), intelligent systems (ANN, Fuzzy logic, etc.), optimization algorithms, decision support systems, and GIS software.

With increasing demand on hydrologic models, new challenges are emerging. For flood modeling, such challenges are the need for more data at finer spatial resolutions, regional scale models, quantification of model uncertainty, long-term forecasting (ahead of time), determination of probable maximum precipitation and probable maximum flood, integration with climate models as well as with ecosystems models, and coupling with decision making models (social, political, economic, environmental, etc.).

4.7 Watershed management

Floods should be managed at the watershed scale and watershed management therefore becomes critically important. It involves land use management, drainage, soil conservation, forest management. There is growing need in the U.S. to provide increasing, and reliable, volumes of water for municipal, industrial, and agricultural needs. Reliable is a key criterion, especially during variable climatic conditions. Finding means to store flood waters in aquifers or move flood waters to areas experiencing water shortages are engineering and socio-political challenges where the U.S. will see increasing interest and pressure to address.

4.8 Education

In many cases people are unaware of the flood risk they expose themselves and families to, while in other cases people are intentionally ignorant so others can assume responsibility for their flood risk. Education is an essential long-term measure, but for education to make a difference it needs to be part of the K-12 education system. Education limited to project specific Town Halls and briefings to elected leaders is not achieving any significant change in societal behaviors.

4.9 Skilled professionals

While there is opportunity to improve hydrology and hydraulics and structural analysis tools and models, the tools available are mostly sufficient for the need. What is lacking is experience and competence to use these tools appropriately on the most complicated projects. Identifying the right individuals and teams for unique tasks and convening multi-disciplinary teams with these special skills is a continuing issue and provides the rationale for the engineering, environmental, and social science professional fields to manage themselves and identify credentials recognizing those with advanced education and experience.

4.10 Post-flood work

Often when a flood has passed, leaving a lasting mark on people’s lives and the environment, yet not as much attention is paid to post-flood work as it should be. The post-flood work involves rehabilitation, restoration, reconstruction, timely delivery of resources, and anxiety management.

4.11 Environmental damage assessment

Floods degrade water quality, damage water supply systems, cause loss of productive soil, harm the ecosystem, lead to viral and bacterial activity germane to diseases and harm to human health.

4.12 Paradigm shift

Given the socio-economic conditions prevailing these days all over the world, it is vitally important to ask as to what the development paradigm should be. Thus far, there seems to have been a more focus on concentrated development rather than distributed development. That is one reason for mounting losses due to floods. It seems that a more appropriate way to alleviate social unrest, reduce flood-caused losses, and improve the environment is to distribute the development. That will also reduce urban congestion, eliminate traffic jams, save energy, and reduce health care cost.

Another point that seems to be overlooked is the connect between decision makers and stakeholders. Policies are made for people or stakeholders but their input is not often vigorously sought. That leads to the disconnect between policy makers and the people for who the policies are being made. This seems like a contradiction but is often the case. In democracy, policies should people-driven, not the other way round.

Further, in flood management the focus should be on apriori planning and management which is called proactive approach, but in most cases it is the reactive approach that is followed. It will require a concerted effort on the part of government agencies responsible for flood management to start adopting a proactive approach which will save lives and reduce damages.

5. Conclusion

Floods are a natural disaster and cannot be eliminated entirely, but a priori planning and management can reduce their impact. Following a paradigm shift toward distributed development in place of concentrated development will go a long way in addressing the flood crisis which plagues many parts of the world each year. There is sufficient engineering technology available but society- and government-related issues still need to be fully addressed.

Author details

Vijay P. Singh1,2

1 Department of Biological and Agricultural Engineering, Texas A&M University, College Station, Texas, United States

2 Zachry Department of Civil and Environmental Engineering, Texas A&M University, College Station, Texas, United States

*Address all correspondence to: vsingh@ag.tamu.edu

Advertisement

A Quest for Sustainability in the Food Enterprise

R. Paul Singh

Abstract

The twenty-first century global food enterprise faces numerous challenges. The most critical is how to meet the food needs of the rapidly growing world’s population that is expected to increase by 2 billion persons in the next 30 years. The food system is also under increasing threat from climate change. As a result, the resources required for increasing food production are becoming heavily constrained. Innovative approaches to mitigate these threats to the food system are needed. This paper’s overall goal is to highlight challenges and opportunities to address the sustainability of the global food system. Various examples are drawn from the contemporary literature, including the author’s research, to illustrate some of the steps needed to meet sustainability needs. Relevant issues are discussed for different food system segments from farm production to processing, distribution, storage, retail, and food preparation for consumption.

Keywords:sustainability, food system, climate change, food losses, food waste

1. Introduction

The agricultural production system’s capacity to meet the increasing population demands has been questioned in the past, notably by Malthus in 1798 in “An Essay on the Principle of Population,” where he theorized a specter of largescale deaths due to inadequate food production and increasing population [1]. Luckily, Malthusian prophecy did not materialize as technological advances in farming helped raise agricultural production to feed the growing population. Another alarm regarding the food system was raised by Sir William Crookes, a brilliant experimentalist known for discovering the element called thallium. Sir Crookes is also well known for his inaugural presidential speech, titled “The Wheat Problem,” that he gave on September 10, 1898, to the British Association for the Advancement of Science [2]. In this talk, using data on wheat production and the increasing human population, he raised his concern about the food system’s sustainability. He noted, “we are drawing on the Earth’s capital, and our drafts will not perpetually be honored. England and all… nations are in deadly peril of not having enough to eat.” Sir Crooks’ concern was based on a significant threat to the day’s farming system, the potential depletion of fertilizer to grow wheat and other crops. In the late 1800s, 100% of the nitrogen used in farming was mined and shipped from Peru, Baja California, and Chile as guano. Guano is bird droppings that build up over a long period. But the mining fields were getting depleted of guano, and Sir Crookes could foresee that if the supply of guano is exhausted, then the farming will collapse, and millions will starve. Being a chemist, he observed that the Earth’s atmosphere has plenty of nitrogen. Sir Crookes challenged his fellow scientists to determine how to chemically fix nitrogen from the air to help to create what he called “chemical manure.” One of the chemists, Fritz Haber, took him up on that challenge. Haber discovered the chemical reaction that allowed fixing nitrogen to make ammonia, and in 1918, he received a Nobel Prize. Working with Carl Bosch, he commercialized that research finding to create the Haber-Bosch process. The products of this process are used not only for agriculture but also for the manufacture of pharmaceuticals, plastics, textiles, and explosives.

In the 1900s, when Sir William Crookes was concerned about the food supply, the world population was less than 2 billion, now it is about 8 billion, and by the year 2050, it is predicted to increase to 10 billion. According to the current estimates, to meet the increasing population’s needs and fill the food gap to 2050, an increase in agricultural production by almost 60% is required—a daunting task facing today’s food and agricultural scientist [3]. Whereas the supply of guano was the main threat to the food production system in the late 1800s, today, multiple threats impact the food system. These include the increasing population, rapid urbanization rate, a dramatic ongoing depletion of natural resources, and the various impacts of climate change.

The United Nations has recognized the global scope of the problem by issuing a call for developing sustainable development goals [4]. These goals are intended to provide a blueprint to achieve a better and more sustainable future for everyone on the planet. A list of 17 sustainable development goals was identified. These goals underpin the future developmental projects supported by the United Nations. The global food system’s sustainability has a significant role in several developmental goals such as zero hunger, good health, clean water, conserving marine resources, reversing land degradation, and climate action.

A simplified version of today’s food system, from farm to fork, involves production to consumption, as seen in Figure 1. The output from agricultural production moves through processing, storage, and distribution sectors, before preparation and consumption either at home or out-of-home establishments. Primary inputs at various steps in the system include arable land, labor, energy, and water. There are food losses at each stage, and waste products, including wastewater, are generated, and greenhouse gas emissions are released into the atmosphere. Each step of the system will be next considered with a description of some of the threats it faces.

Figure 1.

A simplified version of a modern food system from farm to fork.

2. Sustainability of agricultural production system

In response to the recurring famines in the early twentieth century caused by a lack of sufficient food supply and the rapidly increasing global population, several international research institutes focused on agricultural research were set up around the 1950s. Their mandate included developing science-based approaches to increase agricultural production. Among these institutes, the International Rice Research Institute (IRRI) in the Philippines is well known for developing many new rice varieties, resulting in a dramatic increase in rice production in South and South-East Asia. Similarly, the International Maize and Wheat Improvement Center (CIMMYT) in Mexico City, where Norman Borlaug and his colleagues developed dwarf varieties of wheat and new varieties of maize, significantly increased yield of these crops around the globe. Norman Borlaug received a Nobel Prize for his work at CIMMYT. World grain production has grown remarkably during the past 5 decades. Wheat production increased almost four to five times than what it was in the 1960s [3].

The agricultural production system is now under a significant threat by the many facets of climate change in meeting the impending food gap. The global average temperature has been increasing at an alarming rate, as seen in Figure 2. The last 6 years have been the hottest years on Earth [5]. This dramatic increase in temperature, trending upwards at a rapid rate, has severe consequences on agriculture. Along with an increase in the global average temperature, there is also a rapid increase in greenhouse gas emissions, mainly carbon dioxide, nitrous oxide, and methane (Figure 3). In each case, dramatic shifts have been occurring since the 1960s. A variety of economic sectors impact the global greenhouse gas emissions, such as industry, transportation, buildings, electricity and heat production, and agriculture, forestry, and land use. Up to 12% of the global greenhouse gas emission is attributed to agricultural operations (Figure 4) [3]. Estimates by the Intergovernmental Panel on Climate Change (IPCC) indicate that if there is no intervention within the agricultural sector, greenhouse gas emissions are likely to increase by about 30–40% by 2050 [6]. This estimated increase is mostly due to the increasing demands of the population, income growth, and dietary changes.

Figure 2.

Global average temperatures from 1850–2020 (http://berkeleyearth.org/global-temperature-report-for-2020/).

Figure 3.

Greenhouse gas emissions from 1850–2017 (based on data obtained from [5]).

Figure 4.

World greenhouse gas emissions from various economic sectors in percent of total 49.4 Gigaton CO2 equivalent in 2016 (based on data obtained fromhttps://www.wri.org/).

With climate change, the frequency of extreme weather events has been increasing. For example, the heat waves, melting of polar ice resulting in rising sea levels, increase in the number of heavy precipitation events causing floods, an increase in the length of drought periods, and increased incidence of wildfires as observed in California and Siberia. The strong links between agriculture and weather underscore the impact of weather on farming. In many regions with irrigated arable land, as more water is drawn from the underground aquifers for irrigation to overcome droughts’ effects, the aquifers are getting depleted. For example, there has been a serious depletion of aquifers in central California in recent decades, causing land shrinkage and earthquakes [7]. Assuming the current rate of groundwater pumping for agriculture from the Ogallala Aquifer, it will be depleted by 60% by 2060 [8]. Water drawn from the Ogallala aquifer is used to meet 30% of the U.S. irrigation requirements. Similar impacts of climate change are seen in the western part of the Gulf of Mexico and the Indo-Gangetic plain, which serves as India’s breadbasket. Aquifers take a very long time to replenish. Therefore, the lowering of the water table in these heavily farmed regions is of grave concern to agricultural production sustainability.

Recent studies on the impact of climate change on agricultural production indicate that there will be a 25% reduction in maize production for most regions of the globe, a 3% reduction in wheat, and an 11% reduction in rice and potatoes [9]. These estimates indicating significant decreases in the crop yield will challenge efforts to meet the food gap predicted for the next decade. Along with reducing the yield, the increased carbon dioxide levels due to greenhouse gas emissions are also projected to lower the crops’ nutritional quality. For example, when wheat is grown at high carbon dioxide levels, there is 6–12% less protein, 4–6% less zinc, and 5–7% less iron [6]. The reduction of nutrients in staple crops will have severe consequences for public health. Other climate change-driven impacts include the emergence of new pests and diseases, such as citrus greening, with growing risks and disruptions in the food system. Any shortages and subsequent increases in cereal prices will put more people at risk of hunger. Innovative farming practices are being considered to help mitigate some of the negative impacts of climate change such as increasing the soil organic matter and erosion control, improved land management, genetic improvements of crops for tolerance to heat and drought, and more diversification of the food system to implement integrated production systems. To address the needs of a sustainable agricultural production sector, many academic institutions in the United States are now focused on developing “smart” farming methods, seeking technological innovations in farming employing more efficient ways to use water and energy. For example, a multidisciplinary program referred to as SmartFarm at the University of California, Davis [10]. Similar efforts are underway at several land-grant universities in the United States.

In assessing the influence of producing foods for human consumption on the global environment, meat and dairy products rank high on the list. Meat production from livestock is responsible for using 30% of global ice-free land, 8% of global freshwater, and it generates 18% of the worldwide greenhouse gas emissions [11]. Many public and private institutions are currently engaged in research for developing cultured meats produced in vitrousing tissue engineering techniques. Cultured meat production has the potential for substantially lowering the impact on the environment. Based on a life cycle assessment study, the environmental impact of cultured meat production in comparison to conventionally produced European meat, depending upon the product selected, shows 7–45% lower energy use, 78–96% lower greenhouse gas emissions, 99% lower land use, and 82–96% lower water [12]. Cultured meat production offers numerous opportunities for research and development for scale-up from the laboratory to the marketplace.

The increasing trend in urbanization has created numerous megacities worldwide—for example, Mexico City, with a population of 24 million, and Tokyo, with almost 40 million. Many of the cities with large populations are facing inner-city food deserts. Novel opportunities are being considered to fulfill the needs of fresh foods in the inner cities to develop urban agriculture, including vertical farming, and the production of vegetables and other crops under a controlled environment. These new farming methods in urban environments offer considerable opportunities for research and development of sustainable production, processing, and distribution systems.

3. Sustainability in food processing

In a modern food processing plant, it is not uncommon to find equipment designed and built several decades ago during the era of plentiful water and energy. Since water use and energy use were most often not used as design constraints, there is considerable opportunity for retrofit and new design of systems to efficiently use water and energy. To identify such opportunities, industrial data of resource use in processing operations is crucial. Studies aimed at energy accounting conducted in food canning plants provide such data methodologies [13]. For example, as seen in Figure 5, the energy accounting diagram of canning whole-peeled tomatoes provides quantitative information on energy use in the form of electricity and natural gas and the mass flow of products. The energy use data obtained from accounting studies are helpful to identify energy-intensive operations to develop modifications and design new equipment to conserve energy.

Figure 5.

Energy accounting diagram of canning of peeled tomatoes [13].

Recent advances in sensor technology, data acquisition, and data handling offer ways to collect and retrieve data using cloud-based systems. Process data from line operations are passed on to the cloud server, stored, and made available to the equipment manufacturer for remote diagnostics and updates. Such systems offer advanced control and maintenance levels to minimize equipment breakdown and the loss of food during manufacturing operations. The development of these systems for the food industry requires skills in the computational field and electronic hardware.

A related emerging area in food manufacturing is creating digital twins of processing equipment. The digital twin technology has its origins in the aircraft industry. There is a digital twin for an airplane in flight, essentially a simulation of the plane fed with live data from the aircraft in flight to help identify any operational issues before they become severe. A similar approach is also feasible in the food processing industry. For any processing equipment, a digital twin operates in a virtual environment, providing valuable information to operators and equipment manufacturers. These systems can reduce frequent interruptions in the processing lines, thus reducing food losses during processing. While artificial intelligence and machine learning are still in their infancy, they promise to minimize human error in food processing operations.

Along with energy, considerable water is used in food processing operations. Water recovery and recycling are vital for sustainability. A typical practice in a food processing plant is to discharge water streams from various processing equipment into a common floor drain. Different water streams containing multiple chemicals used in processing and cleaning equipment get mixed in the common drain, and the commingled stream is then conveyed to a water treatment facility. A potential approach to reduce water use and food waste is to recover effluent water from each piece of equipment separately to recover any food or chemicals and recycle water in the same or other operations as appropriate. For example, as shown in Figure 6 for canning whole-peeled tomatoes, pure water is used to aid the separation of the peel from the tomato in the disc-peeling process. The effluent from the disc peeler is water with tomato solids. By separately treating the peeler’s effluent using a filtration system, both the tomato solids and water are recovered. Numerous such examples exist for different food processing operations where economically valuable food and chemicals can be recovered as long as the discharge from individual operations is handled separately without mixing discharge streams into a common waste stream. Membrane-based separation systems are most suitable for such applications. A comprehensive project conducted at the University of California demonstrated this water recovery and recycling approach at over 50 food processing plants across the United States [14]. This project also reinforced the importance of industrial collaboration in academic research to reduce water use and improve the food system’s sustainability.

Figure 6.

A disc-peeler used to separate tomato peels.

In designing the next generation of food processing equipment, it is imperative that due consideration is given to design constraints such as low water discharge and minimal energy use. There are certain situations where these constraints become essential. For example, these constraints were at the forefront in a project to design a food processing system for a manned mission to Mars under a contract with the National Aeronautics and Space Agency (NASA) [15]. Specifically, a multipurpose fruit and vegetable processor was built for operation on the Mars surface (Figure 7). The design of this equipment involved a strict design constraint of zero-water discharge and the use of minimal energy. Several innovations were introduced to process fresh fruits and vegetables such as tomatoes to create multiple products. Based on the results of parallel research studies to determine optimal processing conditions, a multipurpose processor was fabricated using an ohmic heating system for rapid heating of crushed tomatoes, and membranes for separation processes. The final processed products were diced tomatoes, tomato juice, tomato sauce, and tomato paste. Water extracted from tomatoes during the concentration process was recovered and reused for cleaning equipment and other purposes. With minimal energy requirements, the processor, although built for space applications, is equally adaptable for small-scale processing operations on Earth. Notably, the project demonstrated that it is possible to incorporate novel concepts in designing equipment that is highly conserving in its resource use. This equipment scale is particularly well suited for processing products of urban agriculture with minimal release of effluents in the inner-city setting.

Figure 7.

A multipurpose fruit and vegetable processor built for manned mission to Mars.

Recent developments in the area of additive manufacturing offer new opportunities for precision food processing. While the 3D printing of foods is mostly in the research stage, this technique promises minimal food loss and an efficient process with low water and energy use. Additive manufacturing processes are also being considered in new food product development involving meat analogs derived from plant proteins. Meat analogs are gaining rapid growth in consumer acceptance. They offer health benefits and improved sustainability of the food system by reducing reliance on meat from livestock in the traditional diet.

4. Reducing food losses and waste for a sustainable food system

In the United States, food wastes amount to approximately $278 billion annually, equivalent to feeding nearly 260 million people [16]. Globally, more than 1 billion metric tons of food per year never make it to the market. The market value of this lost food is almost a trillion dollars, and it has a significant negative impact on the environment. Food lost and wasted each year results in about 8% of the annual greenhouse gas emissions.

Around the globe, food losses are generally in the range of about 30% [17]. Many factors contribute to food losses, and they vary depending upon the region. In sub-Saharan Africa, a considerable amount of food loss occurs at the production stage, typically on-farm or close to a farm, during the handling and storage of harvested crops. These high losses are often due to a lack of proper infrastructure for the safe storage of cereal grains and a cold chain for perishables such as fruits and vegetables. However, in these regions, food losses during home preparation are generally small. In North America and some of the more industrialized countries, food losses during the production stage are small because of the highly developed infrastructure of the storage and transportation sector. Still, losses increase notably at the home and out-of-home preparation and consumption stage. Therefore, region-based solutions are necessary to reduce food losses for a sustainable food system.

In the food processing sector, trimming, overproduction, product and packaging damage, product graded as of low market value due to esthetic reasons, and technical malfunctions of processing equipment are often cited as fundamental causes of food losses and waste [18]. To minimize these losses in the processing sector, technological know-how and resources for operators need improvement, including training the staff and reengineering processes to avoid product wastage during changes in product lines [17].

In most industrialized countries, packaged foods are often labeled with an expected shelf life to inform the consumer of how long the manufacturer assures safety and quality. While there is considerable merit in providing such information to the consumer, unfortunately, due to the lack of a standard shelf-life dating system, considerable confusion exists in interpreting shelf-life information. Furthermore, both elapsed time and environmental conditions, most notably temperature, affect food quality and safety. Consequently, using only a time-based shelf-life dating system, there is increased food wastage at the consumer level when acceptable food is discarded just because the label indicates that an expiration date has been reached. Since many of the food’s quality characteristics change due to an integrated effect of time and temperature, there has been considerable interest in developing indicators that can be used for objective interpretation of the food’s shelf life. Research in this area originated in the early 1980s [19]. Time-temperature indicators are used commercially in the distribution of vaccines and other medical drugs. They provide an objective indication of any heat abuse that a product may have received during shipment and storage and its remaining shelf life. While the early devices used biochemical or polymeric materials as indicators, with recent advances in electronic sensing and miniaturization, digital indicators are now being investigated for these applications. With cloud-based systems, data obtained from the indicators can be directly transferred to the server and used for inventory management. Such systems can be effectively used in the transportation, distribution, storage, and retail marketing of perishable foods [16, 20].

An emerging technology, blockchain, offers considerable promise to manage and share data in the food distribution systems. Blockchain allows a decentralized approach to distributing encrypted records of data securely over peer-to-peer networks. Besides information about product flow, other data relevant to food safety, quality, and resource use can be efficiently transmitted transparently. In tracking food from production to retail, this technology, when fully implemented, offers the potential to improve the safety and quality of food delivered to the consumer. Integrated systems for accessing and processing data on distribution are especially useful in time-sensitive situations involving product recalls. Innovations in food distribution such as blockchain will be necessary for the quest to improve the sustainability of the food system.

5. Conclusions

The current food enterprise is under multiple threats from increasing population, depletion of resources, and the impact of climate change. Challenges in developing sustainable solutions to address these threats offer numerous research opportunities for innovations in the food processing sector, increase in agricultural production, and reduction in food wastage. Multidisciplinary efforts and cutting-edge developments will be necessary to approach many of the complex problems facing the food and agricultural enterprise. Since food is essential to sustain life, it is indeed the responsibility of everyone to ensure that the food system is sustainable not only for the current but also for future generations. Advances in science and technology are deemed to play a major role in addressing our food system’s future sustainability.

Author details

R. Paul Singh

Department of Biological and Agricultural Engineering, University of California, Davis, CA, USA

*Address all correspondence to: rpsingh@ucdavis.edu

Advertisement

Evaluation of the Cytotoxic Activity of a Species of the Buddleja Genus in a Prostate Cancer Cell Line

Sofía Isabel Cuevas Cianca, Luis Ricardo Hernández and Irene Vergara Bahena

Abstract

Over the centuries, humans have used medicinal plants to treat various diseases. Initially, these medications took the form of crude medications such as tinctures, teas, poultices, powders, and other herbal formulations. Almost 80% of the world population uses traditional medicines for primary health care, most of which involve the use of plant extracts. The study of plants continues, mainly, with the aim of discovering new secondary metabolites that can be used to recover health, both human and animal or vegetable. Cancer is a major public health problem worldwide and Mexico is not exempt from this problem. However, the great challenge for anticancer treatments is the specific release of the drug in the tumor tissue to avoid the adverse effects on normal cells. In this investigation, a species of the Buddlejagenus is studied in terms of its cytotoxic activity in a prostate cancer cell line. Regarding the results found, it was obtained that the polar extract of aerial parts and the medium polarity extract of aerial parts have no cytotoxicity and high cytotoxicity, respectively against a prostate cancer cell line.

Keywords:Buddleja, prostate cancer, cytotoxicity, medicinal plants

1. Introduction

Cancer constitutes a major public health problem worldwide and Mexico is not exempt from this problem. However, the great challenge for anticancer treatments is the specific release of the drug in the tumor tissue to avoid adverse effects on normal cells. Prostate cancer in Mexico represents the one with the highest incidence in men with 41.6 per 100,000 inhabitants in 2018. Likewise, worldwide, prostate cancer represents the second type of cancer with the highest incidence after lung cancer in 2018 (Figure 1). Prostate cancer in Mexico represents the highest mortality in men with 10 per 100,000 inhabitants in 2018 and the highest prevalence (5 years) in men with 55,565 cases from 2013 to 2018, respectively (Figure 2) [1].

Figure 1.

Comparison of the incidence and mortality worldwide and in Mexico of prostate cancer in men of all ages (based on [1]).

Figure 2.

Estimated 5 year prevalence in Mexico of prostate cancer in men of all ages (based on [1]).

Cancer occurs when healthy prostate cells change and proliferate uncontrollably, eventually forming a tumor. A tumor can be cancerous or benign. When a cancerous tumor is malignant, it means that it can grow and spread to other parts of the body. When a tumor is benign it means that the tumor can grow, but it will not spread [2]. Some types of prostate cancer grow very slowly and may not cause symptoms or problems for years. Even when prostate cancer has spread to other parts of the body, it can often be controlled for a long time, allowing men even with advanced prostate cancer to live in good health and quality of life for many years. However, if cancer cannot be controlled well with existing treatments, it can cause symptoms such as pain and fatigue, and can sometimes lead to death. An important part of managing prostate cancer is monitoring growth over time, to determine if it grows slowly or quickly [3].

Over the centuries medicinal plants have been used as raw medicines in the form of tinctures, teas, poultices, and powders to treat all kinds of diseases. Currently, 80% of the world population uses traditional medicines, the majority involves the use of plant extracts, and 50% of all medicines for clinical use in the world come from plants, where higher plants provide no less than 25% of the total [4, 5].

The chemical study of the plant kingdom has provided a large number of potentially useful compounds, and since only a small percentage of the planet’s superior plant species have been investigated for their active compounds, the chemical study of plants is considered to follow being promising for the discovery of pharmacologically useful compounds [6].

Recent phytochemical studies of plants that have or do not have an ethnobotanical history for the treatment of cancer have often resulted in the isolation of principles with antitumor activity, finding active metabolites such as flavonoids and chalcones [7], alkaloids [8, 9], sesquiterpenic lactones [10], diterpenes [11], and cardenolides [12] among others, which were shown to have activity against cancer cells.

The Buddlejagenus (Fam. Scrophulariaceae Juss.)has around 300 species of shrubs, where there are both perennial and deciduous species. This group of plants is native from the southern United States to Chile and from Africa and warm parts of Asia (Figure 3). Dioecious plants are found in the southern part of the United States up to Chile, while monoecious plants are found in Africa and Asia [13].

Figure 3.

The geographical location of plants of the Buddleja genus (recovered fromwww.Tropicos.org).

There are some studies about species of the Buddlejagenus, it was analyzed the chemical composition of Buddleja polystachyaessential oil where it was found that there are monoterpenes such as bulnesol and limonene; this oil showed cytotoxic activity against carcinoma cell lines [14]. The antiproliferative and apoptotic activity of Buddleja davidiiextracts was studied in gastric cancer and breast cancer cell lines, where it was concluded that colchicine and luteolin generate apoptosis in cells, which makes them potential drugs for the treatment of carcinoma, it was observed that they also generate apoptosis in tumor cells [15].

It should be emphasized that it is of great importance to carry out the study of extracts and fractions of a plant of the Buddlejagenus regarding its cytotoxicity since, thanks to previous studies of plants of the same genus, highly toxic results have been obtained, which may be an indication that the plant has anticancer activity.

2. Methodology

To be able to determine the cytotoxic activity of the extracts, the growth of tumoral cells quantitated by the ability of living cells to reduce the yellow dye 3-(4,5-dimethyl-2-thiazolyl)-2,5- diphenyl-2H-tetrazolium bromide (MTT) to a purple formazan product was used. The cells were seeded and incubated in a 37°C incubator supplemented with 5% CO2. The products to be evaluated were added to the culture of the cells at different concentrations once the cells reach 80% confluence. At the end of the 24 h of incubation of the previously treated cells, 40 μL/well of the MTT solution (5 mg/mL in phosphate-buffered saline) were added, the plates were incubated for 3 h under 5% CO2 and 95% air at 37°C. At the end of the 3 h of incubation, 400 μL/well of the solubilizing solution is added and gently shaken. The microplate is kept at room temperature in darkness for 24 h. The absorbance was then determined by a microplate reader at 490 nm. The percentage of growth inhibition was calculated using the following formula:

%cellviability=AtAbAcAb×100(1)

where, At = absorbance value of test compound, Ab = absorbance value of blank, and Ac = absorbance value of control. The effects of extracts were expressed by LC50 values (the drug concentration necessary to reduce cell viability to 50% with respect to untreated cells) [16].

3. Results

The results obtained in this study showed that the medium polarity extract of aerial parts of the plant of the Buddlejagenus showed cytotoxic activity against a prostate cancer cell line, while the polar extract of aerial parts has no cytotoxic activity against a prostate cancer cell line. More studies are needed on the extract that showed activity, it is necessary to do a chromatographic separation followed by isolation of the compounds that give the desired cytotoxic activity.

4. Conclusions

Overall, this study evaluates that the medium polarity extract of aerial parts of the plant showed cytotoxic activity against a prostate cancer cell line, the next step is to determine which compounds are responsible for this biological activity and thus obtain potential drugs for the treatment of cancer. This study provides only basic data, further studies are necessary for isolation and identification of biologically active substances from these extracts, as well as to determine what type of death they cause through flow cytometry.

Author details

Sofía Isabel Cuevas Cianca, Luis Ricardo Hernández* and Irene Vergara Bahena

Department of Chemical-Biological Sciences, Universidad de las Américas Puebla, Puebla, Mexico

*Address all correspondence to: luisr.hernandez@udlap.mx

Advertisement

Designing Magnetic Mesoporous Nanoparticles for Cancer Therapy

Jessica Andrea Flood-Garibay, Kenneth J. Balkus Jr and Miguel Ángel Méndez-Rojas

Abstract

Cancer is the second most cause of mortality worldwide. The most common treatments are surgery, radiotherapy, and chemotherapy. Magnetic mesoporous nanoparticles (MMNPs) have attractive features such as high surface areas, large pore volumes, uniform and tunable pore sizes, high mechanical stability, and surface functionalization options for application as drug delivery systems. The latter make them a promising platform for the cancer treatment. Magnetic properties can be controlled by selecting the chemical nature and concentration of the magnetic materials to be embedded into the porous structure. These magnetic composites may be guided to allow precise targeting of a tumor using an external magnetic field. The mesoporous structure can also be loaded with different types of therapeutic agents, radiotracers, or fluorescent markers. Doping of the magnetic nanocomposite with rare earth elements may generate novel composites with physical properties useful for medical imaging or radiotherapy. The MMNPs could generate hyperthermia temperatures when exposed to an alternate-magnetic field (AMF). Many promising anticancer drugs have poor solubility, a problem that can be solved by using the MMNPs as nanocarriers, improving the bioavailability of the drugs. These MMNPs could become a promising multifunctional platform for the design of chemotherapeutic, medical imaging, drug delivery, and hyperthermia agents for cancer treatment.

Keywords:mesoporous, magnetic, nanoparticles, drug delivery system, theranostic

1. Introduction

Although great advances in the treatment and cure of several public health issues have been developed in the last decades, cancer is still a major burden worldwide. Cancer has been the second or third leading cause of death in both the United States and Mexico over the last decade [1]. Tens of millions of people are diagnosed with cancer every year, and it is considered the main cause of death globally. In the USA, there were more than 1,700,000 new cases of cancer diagnosed in 2018, with nearly 600,000 people dying from the disease, while in Mexico it was projected that nearly 1,200,000 cancer cases will be diagnosed in the next few years. Lung cancer is the leading cause of death in the US and Mexico, and this is expected to increase in coming years. Cancer therapies include surgery, chemotherapy, and/or radiotherapy. For some types of cancer, there is also the possibility of specific targeted therapy.

Chemotherapy involves the use of nonspecific cytotoxic compounds toward cancer cells, which is why they usually have multiple serious side effects in patients [2]. In order to decrease side effects, improve bioavailability and have a selective release to tumor cells, intelligent drug delivery systems (DDS) are being developed. It is important to understand that DDS should avoid high nonspecific accumulation in tissues [3]. In addition, it is important that the material of the drug carrier should be biocompatible. Furthermore, a sufficient dose of API (active pharmaceutical ingredient) should be loaded into the system and the release of the drug should be achieved without premature leakage. That way, the API could be delivered to the target site in a controlled manner; maintaining an adequate release rate in order to achieve an effective local concentration of the drug [4].

The development of new nanomaterials for biomedical applications is a rapidly growing area of research. The use of nanoparticles (NPs) as drug carriers may present different advantages, such as protecting the drug from degradation, reducing renal clearance, as well as allowing specific bioaccumulation in cancerous tumors due to improved permeation and retention effect (EPR). Magnetic nanoparticles (MNPs), in particular, iron-based ferrites are highly attractive as their magnetic properties can be easily tuned by controlling the type and ratio of metal ion substituents. Many of them have been found to be highly stable, even at physiological conditions, as well as biocompatible. Their small size may allow them to pass through several biological barriers, increasing their systemic circulation and enhancing biodistribution. Rare-earth ions can be embedded into the crystal lattice, making possible their transmutation into beta or gamma emitters by neutron activation. Also, the large surface of the mesoporous material can be used for the immobilization of different types of fluorescent dyes or biomarkers that improve both traceability and molecular recognition specificity. Localizing with precision a tumor site, either using a radiation detector or the luminescence of the nanomaterial, could be of great value for targeted delivery, helping to minimize the amount of radiation or the chemotherapeutic agent that the patient receives, thus reducing the undesirable side effects. There are several examples of nanomaterials used to deliver radionuclides in vivo[5]. However, controlling size to achieve an enhanced permeability and retention effect (EPR) as well as functionalization and targeted delivery remain challenges. The incorporation of radioactive isotopes into the spinel crystal structure of magnetic ferrites is a good option to achieve that goal, without compromising the size, biocompatibility, stability, or magnetic properties of the proposed nanomaterials. Another strategy could be doping the mesoporous structure around the MNPs with the radioisotope ions. That may be achieved either by adding the radioisotope-containing metal salts during the mesoporous phase synthesis or by dispersing and trapping the ions into the mesoporous structure once the material is formed. The high surface area or the mesoporous structure, depending on the choice of chemical composition and crystalline phase, may present the advantage to be easily functionalized with either radiosensitizers, fluorescent dyes, and/or to trap into the mesoporous structure different types of chemotherapeutic agents to further reduce the amount of radiation required to eliminate a tumor. In particular, the chances to improve bioavailability and aqueous dispersibility of low soluble chemotherapeutic agents make these magnetic mesoporous composites of great value for the transport and delivery of several promising anticancer drugs that have poor water solubilities, such as taxanes (paclitaxel, docetaxel), platinum-based drugs, curcumin, and many others (Figure 1). This is important, as poorly water-soluble drugs usually require the use of a high concentration of surfactants and co-solvents, or the administration of doses of the drug for longer periods, leading to adverse side effects [6].

Figure 1.

Examples of cytotoxic agents used for cancer chemotherapy that present low solubility and, therefore, bioavailability problems.

Therefore, the development of new strategies for the treatment of this disease is urgently needed. The development of functionalized nanoparticles for both medical imaging, diagnosis, chemo- and radio-therapeutic therapies depends in part on effective tumor targeting. Conventional approaches using tumor binding ligands have been effective in cell cultures but have been disappointing in vivo. Nonconventional targeting, such as magnetic nanoparticles (MNPs), are promising but in the early stages of development. The preparation of magnetic nanoparticles is a very attractive and active research field. In addition to advanced clinical treatments in modern anticancer therapies, MNPs can be used in several other practical applications such as biomarkers, magnetic storage, biomolecule separation, sensors, and medical imaging contrast agents. In particular, superparamagnetic iron oxide nanoparticles (SPIONs) offer high biocompatibility than other MNPs such as maghemite and have been widely used in several biomedical applications. Although some biocompatible, nanostructured MNPs with excellent stability, improved magnetic properties, and good biodistribution have received approval for clinical use, such as Feridex®, Resovist®, Sinerem®, Clariscan®, and Lumirem®, they are currently discontinued for biomedical use as MRI agents due to potential harmful side effects following administration [7]. However, their potential use as therapeutic agents may still make these materials clinically viable agents, as less MNPs would be required, compared to MRI use, reducing potential side effects; carefully checking of toxicity and biocompatibility is a must for these magnetic materials in order to look for real clinical applications. MNPs require them to be superparamagnetic in order to avoid spontaneous aggregation in vivowhile they move through systemic circulation through the body. Aside from the potential use MNPs as MRI contrast agents, they can be used for drug transport and delivery, as well as for magnetic heat generation (hyperthermia). The advantages of MNPs in nanoscale delivery systems are numerous—drug delivery can be enhanced, increasing the biodistribution of the nanocarrier by avoiding clearance due to their small size and stability in physiological conditions. MNPs can be chemically modified in their surfaces by attaching functional molecules, such as proteins, antibodies, peptides, or sugars, in order to enhance bioselectivity and achieve fine-tuned drug delivery and bioaccumulation in specific targets, in particular in tumor tissues [8].

The magnetic response of MNPs can be controlled by transition metal ion substitution in the crystal lattice, a strategy highly exploited for the preparation of numerous magnetic ferrites with spinel structure [9]. Substitution using transition metal and rare-earth elements is an active field of research, looking to enhance saturation magnetization (Ms), permittivity, permeability, and blocking temperature (TB). Several works available in the scientific literature report the design of small MNPs with controlled magnetic properties and low dispersion, with sizes less than 35 nm, by the formation of core-shell structures using the co-precipitation method [10, 11].

As a proof of concept of this idea, iron oxide nanoparticles containing Ho(iii) were neutron activated and injected into athymic nude mice having tumors of non-small cell lung cancer (NSCLC) A549 cells [5]. A 12,000 Gauss magnet was placed on the tumor for 4 hours to allow the Ho-doped magnetic nanomaterial to collect in the tumor. There was a statistically significant reduction in the tumor size after 30 days and a 10-fold increase in Ho accumulation in the tumor with the magnet. While these results were promising, the Ho-doped magnetic nanomaterial presented several problems including the difficulty to functionalize the surface, as well as their relatively large size, which may lower the chances for cell internalization and efficient biodistribution. Furthermore, the low-intensity magnetic properties may not be appropriate to reach tumors below the surface. The ability to functionalize the surface of the MNP allows for the introduction of radiosensitizers and chemotherapeutic drugs as well as promote the suspension of the MNPs. The size is important to achieve the enhanced permeability and retention (EPR) effect for tumor penetration. Finally, the magnetic properties are important because treatment of certain cancers such as lung cancer may require the MNPs to be directed by a magnet several centimeters away.

2. Methods of preparation

Magnetic nanoparticles can be prepared easily by co-precipitation in alkaline aqueous media. Aqueous preparation is preferable to obtain products mean to be used in biomedical applications. Ferrite nanoparticles, both pristine or doped with rare-earth ions can be prepared by the addition of the corresponding Fe(iii), Fe(ii), and rare-earth salt precursors in the oxidation state (iii) [X(iii): Ce, Pr, Nd, Pm, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, Lu], in an appropriate stoichiometry that allows to control the total amount of the rare-earth ions incorporated into the spinel structure, as previously reported [7, 10, 11]. Several rare-earth salts are available as nitrates, halides, or oxides from several commercial sources. Preparation of the rare-earth-doped magnetic nanoparticles can also follow several other modified synthetic procedures reported in the literature [9, 12]. Different types of magnetic ferrites, with a M+2X+3xFe+32−xO4 stoichiometry, where M = Zn, Co, Ni, Mn, Cu, can be prepared by selecting the proper amounts of the metal salt precursors. From previous works, superparamagnetic and biocompatible MNPs with strict control in size can be produced by this synthetic methodology (from 8 to 20 nm) [10, 11]. MNPs produced under these conditions are nearly monodisperse (10–15 nm), with zeta potential values higher than −30 mV, low blocking temperatures (TB), and high magnetic saturation (Ms), which make them small enough for internalization into tumors, stable, water-soluble and highly responsive to external magnetic fields, and suitable for biomedical applications. We have also explored recently how the incorporation of rare-earth metals induces not only structural changes but also impacts the magnetic properties, so the novel Ho-containing MNPs will have controlled magnetic and size properties [13, 14].

The surface of the magnetic nanoparticles can be easily modified by coating it with a layer of conveniently selected mesoporous materials (SiO2, carbon, ZnO…). Coating MNPs with a thin layer of silica could be used to produce core-shell nanoparticles with an active surface that can be easily modified and derivatized (Figure 2). Once the surface of the MNPs is modified, the functional groups present in the surface could be used to grow another layer mesoporous layer, in order to increase the internal surface required for drug loading or it can be used to attach different chemical functionalities such as bioactive molecules (peptides, amino acids, antibodies, sugars), fluorescent dyes or to immobilize rare-earth ions, useful for radiotherapy or medical imaging.

Figure 2.

Schematic representation of the process for the preparation of core-shell MMNPs.

A second approach for the preparation of MMNPs is to embed the magnetic nanoparticles into them by seeding them during the formation of the mesoporous structure (Figure 3). The magnetic nanoparticles could also be trapped into the voids of the mesoporous structure by sonication, stirring, or simple mixing, depending on the affinity among the materials.

Figure 3.

Schematic representation of the process for preparation of MMNPs where the magnetic nanoparticles were trapped into the mesoporous structure.

After preparation and purification, the products obtained from any of these strategies can be characterized using several analytical techniques such as Fourier transform infrared (FT-IR) spectroscopy, Raman spectroscopy, fluorescence spectroscopy, dynamic light scattering (DLS), thermogravimetric analysis (TGA), powder X-ray diffraction (pXRD), magnetometry, energy dispersive spectroscopy (EDS), BET surface area analysis, and scanning and transmission electron microscopy (SEM and TEM, respectively). Once the magnetic mesoporous nanoparticles have been fully characterized, in vitrotest of the MMNPs, can be performed using a panel of different cell lines (normal and cancer cells), in order to evaluate their biological activity. There are several methods to determine cell viability, such as the MTT viability assay, which is a quantitative colorimetric assay based on the conversion of MTT to formazan crystals by mitochondrial dehydrogenase. In vivotesting in small animal models may give further information on the effectiveness and performance of these MMNPs for cancer treatment, as well as on the toxicology of the nanomaterials. Morphological changes such as cell shrinkage, membrane blebbing, apoptotic body formation, cytoplasmic swelling, and cytopathic effect in cells treated with MMNPs, may be also useful to understand better the mechanisms of the biological interaction among the MMNPs and the cells. Epifluorescence microscopy analysis of the cell cultures, using differentially stained wells with different types of dyes, may also be useful to understand the mechanisms of internalization and cell death.

3. Conclusions

The design, synthesis, and characterization of MMNPs systems, with optimal characteristics to be stable, water-soluble, biocompatible, with good size control and distribution is a promising field for the design of innovative nanoplatforms for cancer therapy. NPs with sizes lower than 100 nm, and optimal size distribution, are more easily dispersed in physiological aqueous suspensions, allowing the nanoparticles to be bioavailable and facilitating cell internalization through endocytosis or pinocytosis. Loading of the MMNPs systems with poorly soluble anticancer drugs into the mesoporous structure, and not on the surface of the nanoparticles, may be useful to improve the transport and bioavailability of these therapeutic agents, increasing their performance and lowering their side effects. Preliminary studies in our group showed that silica-based MMNPs are biocompatible, as no impact on cell viability was observed even at high concentrations of the mesoporous material. When the chemotherapeutic agent was loaded into the MMNPs, testing showed that cell viability was affected even at when low concentrations were loaded into the nanocarrier. Comparison with cell cultures exposed to the free anticancer drug showed lower antiproliferation activity with respect to that of the drug-loaded WMS nanoparticles, indicating an enhancement of bioavailability for the chemotherapeutic agent under the conditions of this study. These preliminary results are stimulating and suggest that MMNPs could become an effective alternative for the treatment of certain types of cancer.

Acknowledgements

Financial support from ConTex-CONACYT (2019-21B) and CONACYT (JAFG, Ph.D. Scholarship) is acknowledged.

Author details

Jessica Andrea Flood-Garibay1, Kenneth J. Balkus Jr2 and Miguel Ángel Méndez-Rojas1*

1 Departamento de Ciencias Químico-Biológicas, Escuela de Ciencias, Universidad de las Américas Puebla, Puebla, Mexico

2 Departament of Chemistry and Biochemistry, University of Texas at Dallas, USA

*Address all correspondence to: miguela.mendez@udlap.mx

Advertisement

Exoplanet Research Using Machine Learning and Multiresolution Analysis Techniques

Miguel Jara-Maldonado, Vicente Alarcon-Aquino and Roberto Rosas-Romero

Abstract

The research of planets from outside our Solar System, termed exoplanets, has opened a wide range of new possibilities. Some of the current interests in exoplanet research are related to their discovery and the characterization of their atmospheres. Finding these planets is important because it may lead to answering several questions; such as the formation of planets and stellar systems, and possibly finding life outside planet Earth. There are several works that propose using artificial intelligence to ease the processes involved in exoplanet research. Many studies have focused on the detection of such celestial bodies, as well as reducing the number of false detections. Recently, the study of exoplanet atmospheres has also received considerable attention, due to its potential for finding life on these planets. In this work, we describe an artificial intelligence approach for reducing the number of spurious detections of exoplanets using the transit technique. This approach is based on using spectral multiresolution analysis techniques, which allow the artificial intelligence algorithms to better identify the exoplanet signals.

Keywords:artificial intelligence, deep learning, exoplanets, light curves, machine learning, multiresolution analysis, neural networks

1. Introduction

The term exoplanet is an abbreviation for extra-solar planets. Exoplanets are planets found outside our Solar System, either orbiting a star or not. Their study is important for several reasons, such as obtaining statistical information about planets, which in turn allows us to extend our understanding of how our Solar System was created. One of the reasons to study exoplanets is to look for habitable planets found outside the Solar System; which could lead to finding life outside planet Earth (although no evidence of life has yet been found in exoplanet atmospheres) [1]. In order to search for exoplanets, several missions have been launched. The Kepler [2, 3], Convection, Rotation and Planetary Transits space observatory (CoRoT) [4], and (TESS) [5] missions, are some examples of those missions.

In order to look for exoplanets, astronomers have developed different detection techniques. Among the most used are the transit method, radial velocity, gravitational microlensing, direct imaging, and others. In this work, we focus on the transit method. This method looks for transits, which occur when an exoplanet passes between the observer and its host star. To look for transits, scientists use light curves, which are records of the light flux received by the star at different moments in time. When an exoplanet transits its star, a reduction of the light flux characterized by a “U” or “V” shape is observed. This technique has provided the greatest number of exoplanet discoveries. But this technique is not infallible, and it is sensitive to noise sources that may look like transits or that hide the transit signal. In order to deal with these and other difficulties (see [1]), several artificial intelligence algorithms such as [6, 7, 8, 9, 10, 11, 12] have been developed. These approaches have the aim of ameliorating the detection and identification accuracy of exoplanet transit signals within the light curves.

In this work, we summarize the work done in [1, 13], where simulated light curves are used to test the performance of artificial intelligence and multiresolution analysis techniques for exoplanet identification.

2. Methodology

Automating the exoplanet discovery process requires a pipeline that describes clear instructions for the artificial intelligence algorithms to work with. We have proposed a data pipeline in [1], that establishes the whole process of exoplanet discovery with artificial intelligence. This pipeline is shown in Figure 1. The data acquisition step refers to the process of obtaining the light curves to work with. These light curves may be obtained by real telescopes (such as the Keplersatellite), or by simulating them. The light curves contain different sources that difficult their analysis. For this reason, the next step is to preprocess the light curves in order to reduce the influence of noise in the light curves. With the transit signals already enhanced, the detection step may be performed by an artificial intelligence algorithm, to search for periodic signals within the light curves that could be explained by an exoplanet. Finally, it is required to analyze the periodic signals found, to make sure that they belong to an exoplanet, and not to an event of similar geometry. In the remaining of this section, we explain how we applied this pipeline to simulated light curves generated by us, to identify exoplanet signals.

Figure 1.

Proposed pipeline for exoplanet discovery.

2.1 Light curve datasets creation

We generated two simulated datasets consisting of 10,000 light curves each. For each dataset, half of the light curves contain simulated transits and the other half does not. Each light curve contains 15,000 datapoints. These datasets can be used to train and test machine learning algorithms for exoplanet identification with controlled, though realistic, noise sources. The presented work considers four different types of transit models. Furthermore, we explain the light curve preprocessing methodology that has been used by several works such as [6, 7, 14]. The first dataset, which is called the Real-LC dataset was generated using real light curves from the Mikulski Archive for Space Telescopes (MAST4) with periodic events marked as non-transiting planets and then adding simulated transits to them. The second one is called the 3-median dataset, and it was created by simulating the light curves, and then adding the simulated transits. Next, it is described how the light curves were simulated.

There are several models that can be used to generate simulated transit light curves. Some examples of these models can be found in [15, 16, 17, 18, 19]. We used the BAsic Transit Model cAlculatioN (BATMAN) model proposed in [15], which is a python package based on several models such as [16, 20], and others. We selected this model because it uses the model proposed by [8], and it allows one to model light curves very fast. Also, it can be parallelized with the use of OpenMP (in case it was necessary to produce a greater number of samples), and it includes a wide variety of limb darkening models including the uniform, linear, quadratic, and nonlinear models which we used. Even more, it can generate secondary eclipses which are useful for accounting for these astrophysical false positive phenomena. An example of a simulated transit is presented in Figure 2, which was generated using the BATMAN nonlinear model.

Figure 2.

Example of a simulated transit light curve using the BATMAN nonlinear model.

In order to add noise, we used Eqs. (1)(4) [7]. The generated noise adds quasi-periodic systematic trends to the simulated transit data.

t=ttmin(1)
At=A+Asin2πtPA(2)
ωt=ω+ωsin2πtPω(3)
FtransittNRp2Rs2/σtol1+Atsin2πtωt+ϕ(4)

where Ftransittis the simulated transit signal created by using BATMAN, tis time, Ais the amplitude of the stellar variability, ωis the period of oscillation, ϕis the phase shift, Rpis the planet radius, Rsis the star radius, σtolis the noise parameter, and Nis a Gaussian distribution to generate random numbers with a mean of 1 and standard deviation of Rp2Rs2/σtolas explained in [7]. Each dataset contains two types of light curves, namely light curves containing a transit and light curves that do not contain a transit. Notice that for generating light curves without the transit signal, the Ftransittof Eq. (4) is omitted.

The parameters used to simulate the transits are presented in Table 1. These parameters were chosen from a list of 140 real exoplanets presented in the Q1-Q17 Kepler Data Release 24 [11], which were discovered using the transit method. In Table 2, the parameters used to simulate the noisy light curves are presented.

Fixed transit parameterValues
Stellar radius (Rs)0.12–2.59 Solar radii
Planet radius (Rp)0.063–1.98 Jupiter radii
Scaled semi-major axis (aRs), where ais the semi-major axis0.0058–0.2535 AU
Argument of periastron (Ω)90
Mid transit time (t0)75 days
Transit resolution150 datapoints
Phase offset (Φ)0
Amplitude variability period (PA)100
Wave variability period (Pω)100
Light curve length15,000 datapoints
Limb darkening modelUniform, linear, quadratic, nonlinear
Limb darkening coefficients (u1, u2, u3, u4)[none], [0.5], [0.5, 0.1], [0.5, 0.1, 0.1, −0.1]
Transit duration0.253–0.4113 days
Transit Depth0.0085–3.23%
Orbit eccentricity (e)0–0.53
Orbit inclination (i)78.3–96.5 deg
Orbital period (P)0.0253–46.69 days

Table 1.

Simulated transit parameters.

Varying transit parameterValues
Noise parameter (σtol)0.25, 0.75, 1.25, 1.75, 2.25, 2.75, 3, 10
Wave amplitude (A)0.025, 0.05, 0.1, 0.2
Wave period (ω)6/24, 12/24, 24/24
Period offset (Φ)0
Amplitude variability period (PA)−1, 1, 100
Wave variability period (PA)−3, 1, 100

Table 2.

Noisy light curve simulation parameters.

After simulating the light curves, they can be preprocessed in order to accentuate the transits and to reduce the noise sources. We used the spline fitting method proposed in [6] to preprocess the Real-LC light curves, and a 3-median filter was applied to the 3-median dataset. This process is also called flattening and it is performed to remove confusing data from the light curve. An example of a simulated light curve is presented in Figure 3. In this figure, each vertical blue line represents a transit, and the red line represents the mid-transit time from the first transit present in the light curve. This light curve consists of 15,000 simulated datapoints; which were added with a transit signal simulated using the BATMAN model.

Figure 3.

Simulated light curve using synthetic noise and the BATMAN model.

The next step is to phase fold the light curve to overlap all the points in the light using the transit event as the center. We used the PyAstronomypython package.5 An example of a folded light curve is presented in Figure 4. There is a major dim in the light flux in the middle of the light curve, which corresponds to the transit. Also, there are other sources that could belong to another transit within the same light curve, although these are not centered because they do not correspond to the event that is being analyzed in this example.

Figure 4.

Phase folded light curve.

Finally, the binning step allows one to reduce the dimensionality of the dataset by grouping the values in a limited number of bins. Figure 5 explains the construction of one bin: the bins are created by calculating the mean of all the npoints found inside a bin. We used 2048 bins; in other words, the length of the light curves is reduced from 15,000 to 2048 datapoints, and each bin is then represented by the mean of all the values inside that bin. An example of a binned light curve can be seen in Figure 6, this is the same light curve as the one presented in Figure 4, with the difference that it is now binned.

Figure 5.

Binned process.

Figure 6.

Binned light curve.

2.2 Transit signal identification

In our datasets, the possible transit signals have already been detected. To determine if these detections are real, we have used different machine learning models (i.e., artificial intelligence algorithms). Even more, we have occupied multiresolution analysis techniques to preprocess the light curves, and we have compared the performance of the machine learning models, using multiresolution analysis and without it. Multiresolution analysis techniques are used to obtain the different levels of resolution of a signal, in order to “look at it from different perspectives.” This process is similar to using a microscope to observe small objects, at different magnification levels different details of these objects will be visible. An example of such a technique is wavelets. Wavelets are functions that grow and decay over a finite time interval (they are short waves, hence their name, wavelet). By varying the translation and dilation parameters of the wavelet, it is possible to localize a function in both position and scale. The wavelets are convolved with the signal in order to determine how much does a section of the signal resemble the wavelet. The wavelet equation is shown in Eq. (5).

ψλ,τu=1λψuτλ(5)

where ψ·is a function called mother wavelet, used to create several wavelets by varying the λ>0dilation parameter and τtranslation parameter.

We have also used the empirical mode decomposition and ensemble empirical mode decomposition techniques. These multiresolution analysis techniques adaptively obtain intrinsic mode functions by iterating a process called sifting. In this process, the signal is separated into its different components. A description of these processes is shown in the diagrams from Figures 7 and 8. For a more detailed explanation of these techniques, refer to [13].

Figure 7.

Empirical mode decomposition technique.

Figure 8.

Ensemble empirical mode decomposition technique.

3. Results

Several machine learning models were tested using these techniques to preprocess the light curves. The models tested were a convolutional neural network (CNN), different multilayer perceptron (MLP) architectures, least squares (LS), random forests (RF), Naïve Bayes, and a support vector machine (SVM) with the discrete wavelet transform. For the empirical mode decomposition and ensemble empirical mode decomposition techniques, we used a CNN, RF, K-nearest neighbors (KNN), and a Ridge classifier. Refer to [1, 13] for more details concerning these models and their configuration. In order to measure the performance of each model, we compared the models in terms of their accuracy and execution time. These metrics are based on the number of correctly classified exoplanets (true positives), and correctly classified nonexoplanets (false positives). The accuracy measures how many times the model was correct. The formula for this metric is presented in Eq. (6).

Accuracy=TP+TNTP+TN+FP+FN(6)

The accuracies obtained by the models that used the discrete wavelet transform with both datasets are presented in Figures 9 and 10, where the blue bars represent the results obtained without using the discrete wavelet transform, and the orange ones are the results obtained using it. It is noticeable that in most cases, the accuracy is increased, or at least it does not decrease. Then, in Figures 11 and 12, the execution time results are presented. As it can be seen, the execution times are always reduced, and this is due to the downsampling property of the discrete wavelet transform. At each level of resolution, the length of the signal is reduced by half.

Figure 9.

Accuracy results using the discrete wavelet transform in the Real-LC dataset.

Figure 10.

Accuracy results using the discrete wavelet transform in the 3-median dataset.

Figure 11.

Execution time results using the discrete wavelet transform in the Real-LC dataset.

Figure 12.

Execution time results using the discrete wavelet transform in the 3-median dataset.

In Figures 13 and 14, the accuracy results of the empirical mode decomposition and its ensemble variant are presented. The blue bars, again, represent the signal without multiresolution preprocessing. The orange bars represent the results obtained using the empirical mode decomposition technique, and the gray bars represent the results obtained using the ensemble empirical mode decomposition technique. Finally, Figures 15 and 16 show the execution times for these techniques. These figures demonstrate that in most cases, using these techniques increase the performance of the identification models, both in time and accuracy. The only case in which the execution time is severely affected by these techniques is with the CNN model. We attribute this to the fact that the data obtained several decimal positions after the sifting processes.

Figure 13.

Accuracy results using the empirical mode decomposition and ensemble empirical mode decomposition techniques in the Real-LC dataset.

Figure 14.

Accuracy results using the empirical mode decomposition and ensemble empirical mode decomposition techniques in the 3-median dataset.

Figure 15.

Execution time results using the empirical mode decomposition and ensemble empirical mode decomposition techniques in the Real-LC dataset.

Figure 16.

Execution time results using the empirical mode decomposition and ensemble empirical mode decomposition techniques in the 3-median dataset.

4. Conclusions and future work

The huge amounts of data, faced when analyzing transiting exoplanet light curves, have encouraged data scientists to develop machine learning models capable of automatically identifying exoplanets. These models can reduce the time spent eyeballing the light curves while enhancing the identification accuracy. For such algorithms to exist, simulated light curves are necessary, because they provide a wide variety of labeled scenarios that can be used to train the models. For this reason, in this work, we presented the methodology followed to create two datasets of simulated light curves with different parameters, labeled as transit and nontransit signals. These light curves were used to train machine learning algorithms, and later test them. Once that the results obtained with the simulated data are satisfying enough, real data can be used to identify transiting exoplanets and contribute to the existing catalogs of exoplanet discoveries. Furthermore, some useful preprocessing steps were explained in this work. They can be used with simulated or real data. Our results show that using the multiresolution analysis techniques to preprocess the light curves improves the identification rates of the machine learning models. Future work will be done in proposing a new machine learning model based on multiresolution analysis techniques, instead of using them to preprocess the light curves.

Acknowledgements

The authors would like to acknowledge the Mexican National Council on Science and Technology (CONACyT) and the Universidad de las Américas Puebla (UDLAP) for their support through the doctoral scholarship program. Also, the authors would like to thank Kyle A. Pearson for his feedback regarding the light curve preprocessing steps.

Author details

Miguel Jara-Maldonado, Vicente Alarcon-Aquino* and Roberto Rosas-Romero

Department of Computing, Electronics and Mechatronics, Universidad de las Américas Puebla, Puebla, Mexico

*Address all correspondence to: vicente.alarcon@udlap.mx

Network Intrusion Detection Using Dendritic Cells and Danger Theory

David Limon-Cantu and Vicente Alarcon-Aquino

Abstract

The Dendritic Cell Algorithm (DCA) is a bioinspired, population-based, supervised binary classifier, designed for anomaly detection in communication networks. The proposed model is inspired by the behavior of Dendritic Cells and Danger Theory. The main contribution of this research addresses two contemporary challenges of Network-based Intrusion Detection Systems, namely feature selection and generalization capabilities to improve classification performance. Feature selection improvement is achieved by using information gain and mutual information. A Decision Tree model is incorporated as a classification mechanism in order to improve accuracy, as a substitute to the classification threshold of the DCA. The proposed model is assessed using two publicly available datasets, namely UNSW-NB15 and NSL-KDD. Experimental results are compared against state of the art bioinspired and machine learning approaches for binary classification. The proposed approach provides competitive results when compared to other state of the art approaches, such as Support Vector Machines, and Artificial Neural Networks, achieving a 97.25 and 93.28% accuracy for the UNSW-NB15 and NSL-KDD datasets, respectively. Future challenges include multi-class classification, further performance improvements, and online detection.

Keywords:Anomaly detection, Dendritic Cell Algorithm, Decision Tree, binary classifier, Danger Theory, Artificial Immune System

1. Introduction

Anomaly detection refers to the problem of finding unexpected behavior. These are often known as anomalies, outliers, or discordant observations [1], and are usually patterns not conforming with a notion of normal behavior. The detection of anomalous patterns consists of defining a region represented as normal behavior, and any element distant from such a region is determined as anomalous; this distinction is achieved through several methods, including searching, signature-based, anomaly-based, feature learning, and feature reduction.

Intrusion Detection Systems (IDS) aim to prevent undesired usage of computer networks. This is performed using tools such as machine learning algorithms and signature-based detection, to generate alerts based on the status of the protected resources. This helps system administrators to make decisions that can affect the network systems, depending on important factors, such as response time and accuracy of the status. IDS can be classified into two broad groups, namely Network Intrusion Detection Systems (NIDS) and Host-Based Intrusion Detection Systems (HIDS). NIDS are IDS whose main purpose is to analyze network communications, find anomalies and predict incoming attacks; whereas HIDS are specific purpose IDS whose objective is to protect a specific computer system.

Machine learning NIDS have generated relevant results [2, 3]. Alternative approaches aim to solve relevant NIDS anomaly detection challenges, namely high computational complexity and online detection. Artificial Immune Systems (AIS) are a type of evolutionary computing algorithms and models, inspired by the behavior of the Human Immune System (HIS). Their aim is to imitate the favorable qualities of their biological counterpart. Although there exist other evolutionary computing algorithms, such as Genetic Algorithms (GA), the immune system is sorely focused on the protection of its host system.

The Dendritic Cell Algorithm (DCA) is a computational model developed around the immune Danger Theory (DT) and is a population-based binary classifier designed for anomaly detection, where Dendritic Cells are represented as agents known as artificial Dendritic Cells (DCs). The algorithm is able to assess whether a group of observations are anomalous or normal through temporal correlation of preprocessed features and linear equations to simulate part of the observed behavior of biological DCs. The DCA algorithm evolution has been marked by three different contributions, starting with the “prototype” DCA [4], followed by a more elaborated version using stochastic elements, known as the “stochastic” DCA [5], and further developed as the “deterministic” DCA [6, 7, 8, 9].

1.1 Related work

Several machine learning, bioinspired, and meta-heuristic methods have been developed for anomaly detection in communication networks. Machine learning algorithms used for intrusion detection can be divided into two broad groups. Deep learning models have achieved remarkable results and can automatically learn feature representations, such as Convolutional Neural Network (CNN) [10], and Deep Neural Network (DNN) [3]. Traditional machine learning techniques, conversely, are characterized for their lack of “depth” in the analysis, such as Support Vector Machine (SVM) [11] K-Nearest Neighbor (KNN) [12], Decision Forest [2], Random Forest [3] and Naive Bayes classifier (NB) [13].

Artificial Immune Systems (AIS) are classified into two major categories, namely network-based and population-based. Network-based algorithms make use of the Immune Network Theory and are based on Artificial Immune Networks [14]. Population-based algorithms, on the other hand, imitate immune cell behavior through artificial agent interactions and are based on Negative Selection [15], Clonal Selection [16, 17], or Danger Theory [8, 18, 19, 20, 21, 22, 23]. AIS models have focused on imitating some characteristics of the HIS, such as multiple-level detection mechanisms based on DT [20], and modifications to the DCA. Said modifications include incorporating probability theory [19], fuzzy inference systems [21], feature selection [22], and detection improvements in a semi-supervised context [23].

1.2 Contribution

The main contribution of this research is a biologically inspired NIDS approach based on the deterministic DCA [6]. This model aims to tackle two challenges (and contemporary issues) of NIDS, namely feature selection, and generalization capabilities to improve classification accuracy. A comparison with different bioinspired and machine learning techniques using two publicly available benchmark datasets (NSL-KDD and UNSW-NB15) is presented. The rest of this paper is organized as follows. Section 2 details the related methodology, as well as the proposed model. Section 3 presents datasets definition, model parameters, and numerical results, as well as a comparison of efficiency metrics with state of the art approaches for binary classification. Section 4 presents conclusions, challenges, and future work.

2. Methodology

Binary classification is the task of classifying elements of a given set into two groups, on the basis of a classification rule [18]. The objective of the proposed model consists of achieving anomaly classification based on the provided observations. The first process consists of performing feature selection and data categorization, to provide the proposed algorithm with input data. The DCA performs context assessment and finally, a classifier is used to produce a concrete assessment. Each observation is then classified as normal or anomalous and performance metrics are generated. The objective of this section is to introduce mathematical and algorithmic background. The proposed methodology contains four phases, namely dataset preprocessing, algorithm initialization, detection, and classification.

The Danger Theory model [24] was proposed by French immunologist Polly Matzinger and is mainly centered on the interactions of signals emitted by cells and antigens. These signals denote when a cell or a tissue is experiencing regular or abnormal behavior, such as programmed or unexpected cell death (known as apoptosis and necrosis respectively) or stress caused by antigens (pathogen or harmful organism signatures). The signals are categorized into three groups, namely Pathogen Associated Molecular Patterns (PAMP), Safe Signals (SS) and Danger Signals (DS). Biological Dendritic Cells are Human Immune System cells, constantly sensing the environment for such signals. These are collected (ingested) in order to assess whether the present alterations are due to an attacking organism or as a result of a normal process, for which an immune response is not necessary (known as a regulatory or tolerance process).

2.1 Feature selection

The DCA requires input data to be represented as three input signals, namely PAMP, SS, and DS, as well as antigen representation (such as data ID’s or attack type). Each input signal used by the algorithm denotes part of the context for the observations analyzed. As antigens in the immune system are organisms associated with disease, this signal category is related to the presence of attacks. Safe Signals are associated with the normal behavior of a biological cell life cycle. This signal category is related to normal behavior in the observed network communications. Danger Signals are emitted by cells and tissues that are stressed or damaged. This signal category indicates suspicious behavior in the network.

The preprocessing phase assigns a set of features from the original dataset to each of the signal categories (PAMP, SS, DS). This is commonly done by using expert knowledge or feature reduction methods such as PCA, Fuzzy Set Theory [18], or K-Nearest Neighbors [25]. In order to determine the features with the most influence [21, 26], the proposed approach relies on the information gain method, along with maximizing feature-class mutual information for signal categorization, followed by an average feature aggregation and normalization for each category. The information gain of an attribute Fand a given dataset Sis evaluated as shown in Eq. (1),

GSF=HSvvaluesFSvSHSv(1)

where values(F) represents all the possible values of a given feature Fin the set S, SvSwhere vis a potential value that attribute Fmay take, Gis the information gain function and Hrepresents the entropy of a system, as shown in Eq. (2),

HS=i=12pilog2pi(2)

where pirepresents the probability of a given class iin the dataset S, based on the values of attribute F. High entropy implies the attribute provides a high amount of information about a feature in the dataset, high ranking attributes are preserved such as to have at least one feature per signal category. Each selected feature is assigned into one of the three signal categories, namely PAMP, DS, and SS. This is performed by performing feature-class mutual information maximization. Given two random features Fand C, the mutual information among them I(F;C) is the amount of information that the feature Cgives about F, as shown in Eq. (3),

IFC=fvaluesF,cvaluesCpfclogpfcpfpc(3)

where p(f,c) represents the joint probability of attribute values fand c, p(f) and p(c) are the marginal probabilities. In order to categorize the selected features, the feature-class mutual information between each attribute and class is calculated. If a given attribute has higher mutual information with the normal class than it has with the anomalous class, it is categorized as SS. Conversely, if the attribute has higher mutual information with the anomalous class than with the normal class, it is categorized as PAMP. The remaining features are classified as DS.

The DCA contains a population of artificial Dendritic Cells, to simulate the behavior of biological cell context assessment capabilities in a human body. Each cell in the population has a predefined migration threshold (or lifespan). After which the cell does not sense signals or antigens. Its state is aggregated to the antigen repository used to classify after all observations have been processed. Algorithm initialization is performed in order to provide the detection phase with the required parameters, namely migration threshold and DC population size. The preprocessing phase is summarized in Figure 1. Dataset features are defined as Dataset=F1F2Ft, tis the total number of dataset features. The information gain selected features Ranked=F1F2FrDataset, ris the total number of ranked features, are then compared against normal and anomalous data in order to generate three subsets of categorized features, namely danger signals F1F2FdRanked, safe signals F1F2FsRanked, and PAMP signals F1F2FpRanked, d, s, p, are the total number of features for each signal category (DS, SS, and PAMP). Categorized features are averaged and normalized in the closed range of [0, 1], in order to generate the processed dataset, where only four predictors are present namely DS, SS, PAMP, and antigen representation.

Figure 1.

Dataset preprocessing.

2.2 Detection phase

The detection phase aims to generate an antigen repository. This process is achieved after a population of artificial DCs (or agents) is created. The agent population performs signal (PAMPi,DSi,SSi,i=1,2,,n, n is the dataset size) and antigen (α) collection until a threshold is met. Antigen types that are collected by each cell are counted and stored as cell state signals αg, where grepresents antigen categories. For each observation fed into the algorithm, the entirety of the DC population samples signals and antigens. The proposed approach incorporates cumulative signals known as Costimulatory Molecule Signal (CSM), Semi-mature Signal (smDC) and Mature Signal (mDC) [4]. These are defined in Eq. (4),

CCSM,smDC,mDC=WPCP+WSCS+WDCD(4)

where CCSM,smDC,mDCrepresents the signal concentration for CSM, smDC, and mDC respectively, WP,S,Dare the weights used for PAMP, SS, and DS [5, 27]. CP,S,Dare the signal concentration values for each antigen sampled by the artificial DC. The role of CSM is to limit the time an artificial DC spends on antigen sampling by imitating the cell’s lifespan (or signal collection limit). The smDC and mDC signals determine the cell context for the antigens collected in the DC population and are the basis used to generate the k̂anomaly context. When a DC has exceeded the DC maturation threshold (set in algorithm initialization), it migrates to a separate DC pool where it no longer samples antigens. A new DC is created in the original DC population poll to always preserve the initial number of DCs. The deterministic DCA employs k̂Rto reflect the anomaly characteristic (or signature) of a migrated cell, this is shown in Eq. (5), where srepresents the signals received by each artificial DC, CmDCand CsmDCare the intermediary mature and semi-mature signals respectively.

k̂=1sCmDCCsmDC(5)

After all data instances in the dataset have been processed, all migrated cells anomaly context and observed antigen count are summarized using kα, defined as the sum of all kˆαpresented by each DC for antigen category α, in proportion to the amount of antigens presented in all migrated DCs, as defined in Eq. (6), where mrepresents the index of a DC in the migrated population.

Kα=mk̂mmαm(6)

2.3 Classification

The classification phase generates a distinction criterion for all obtained kαanomaly signatures in the antigen repository. The DCA classification was based on a constant classification threshold [5, 6, 8]. This threshold was commonly set as a user-defined parameter, or derived from observations obtained in the detection phase. This approach is known to have issues [28], as the assigned threshold may not properly separate normal kα. The proposed model removes the use of such anomaly threshold, in favor of including a Decision Tree Classifier.

A Decision Tree (DT) is a supervised learning model commonly used for classification and regression tasks. The main objective of a DT is to build a model based on (simple) decision rules that are derived from data predictors. Decision Trees are commonly easy to understand, as they can be visualized. Some favorable characteristics of Decision Trees are low computational complexity for prediction, not requiring large amounts of observation to generate a model, and transparency (as generated rules can be visualized and understood). Decision Trees are also known to overfit. In order to solve this, several constraints and optimization features have been developed, such as pruning, sample number minimum for each leaf node, and maximum tree depth [29].

A Decision Tree is built in a sequential manner, where a set of simple tests are combined logically. For example, comparing a numeric value against a threshold or a specific range, or comparing a categorical value against a set of possible categorical values. As observation is compared against the set of rules generated by a DT, it is determined as belonging to the most frequent class present in that “region”. A Decision Tree can be constructed using graphs, and can be expressed as shown in Eq. (7),

G=VE(7)

where EV2, Vis a set of nodes, and Eis a set of edges. The set of nodes Vcan be further described as the joint of three sets, namely D, U, T, where Dare decision nodes, Uare chance nodes, and Tare terminal nodes, this set is expressed in Eq. (8). Decision nodes execute decision making, in which an action is selected. A chance node randomly selects a related edge. Terminal nodes are the end of action and chance nodes. Each edge contains a parent node association, as well as a child node. Decision Trees have further functions and conditions [30].

V=DCT(8)

2.4 Proposed model

The proposed model is summarized in Figure 2. Similar to the deterministic DCA approach, feature ranking is obtained by using Information Gain. Selected features are sorted into one of the three signal categories, namely SS, DS, and PAMP. Each feature set selected for each category is aggregated and normalized. Segment size, migration threshold, and DC population size are set as the algorithm initializes. Data from the processed dataset is fed to the algorithm sequentially, where a set of DSiSSiPAMPiαgi,i=1,2,,n, nis the dataset size, and gis the antigen category for observation i. Each cell DC1,,DCpin the DC population, where pis the amount of DC in the population, receives the same set of signals and antigen. An update process is performed to CSMp,smDCp,mDCp,kαp,αgp. After signal collection in the current iteration, the CSMstatus signal is compared against the migration threshold for all DCp. If the said threshold is surpassed, the DCpis migrated and does no longer perform signal and antigen collection. The accumulated status signals kαm,αgm, where mis the migrated population size, are accumulated into the antigen repository.

Figure 2.

DCA with Decision Trees.

Finally, all migrated DCs in the current iterations are reset. Classification is performed after all data elements are processed and by using a Decision Tree (DT). Stage (1) denotes Decision Tree model building. After the model has been built, testing can be performed by providing the testing dataset and starting the algorithm again. Stage (2) achieves classification by using the previously trained DT model after all data elements have been processed. Classification metrics are finally obtained to analyze the model performance.

3. Experimental work

The proposed model was tested using the NSL-KDD and the University of New South Wales (UNSW-NB15) datasets. The dataset preprocessing and algorithm was developed using the MATLAB R2020 environment and executed in a computer running the Linux operating system with an Intel Core i7 8700 CPU and 16.0 GB of RAM. A confusion matrix is used to describe performance. For a binary classifier, the confusion matrix consists of positive and negative classes. The positive class refers to any anomaly (attack) present in the dataset. The negative class refers to normal behavior. In order to generate a confusion matrix, the classified records are compared against the dataset actual classes (i.e., ground truth). The anomalous records correctly classified are called True Positives (TP). When TP records are wrongfully classified, they are False Negatives (FN). In the case of normal behavior, correctly classified records are known as True Negatives (TN). Wrongfully classified normal records are known as False Positives (FP). The resulting performance metrics are then used to generate statistic measures for further analysis and comparison, namely precision, sensitivity, specificity, and accuracy. Precision reflects the proportion of correct classifications and is given by Eq. (9). Sensitivity (also known as TP rate) refers to the proportion of correctly classified anomalies and is given by Eq. (10). In contrast, specificity (or TN rate) is the proportion of correctly classified normal behavior, given by Eq. (11). Finally, accuracy reflects the proportion of true results, either of anomaly or normal behavior, and is given by Eq. (12).

Precision=TPTP+FP(9)
Sensitivity=TPTP+FN(10)
Specificity=TNTN+FP(11)
Accuracy=TP+TNTP+TN+FP+FN(12)

3.1 Dataset description

The UNSW-NB15 is a publicly available dataset [31]. It contains nine different attack types, namely Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, and Worms, as well as normal traffic. The dataset is divided into train and test sets. The training set contains 175,341 records (119,341 anomalous and 56,000 normal). The testing set, conversely, contains 82,332 records (45,332 anomalous and 37,000 normal). Two tools (Argus and Bro-IDS) along with 12 developed algorithms were used to generate 49 different features, which are categorized into flow features, content features, time features, basic features, and additionally generated features. Statistical analysis, feature correlation, and complexity were evaluated and showed the train and test sets to be of similar distributions [13].

The KDD-99 dataset was developed for the Third International Knowledge Discovery and Data Mining Tools Competition and is publicly available. It was generated to support NIDS development by simulating several intrusions in a military network environment. This dataset contains four attack types, namely Denial of Service (DoS), Probe, User to Root Attack (U2R), and Remote to Local Attack (R2L), and normal traffic. The dataset is divided into two subsets, namely train, and test. The train set contains 494,021 records (97,278 normal and 396,743 anomalous). The test set consists of 311,029 records (60,593 normal and 250,436 anomalous). In total, 41 features for each connection were generated. This dataset was widely used in IDS research. However, it has been the subject of wide criticism due to the probability distribution of the records in the testing set, as well as inconsistencies in the values of the training and testing sets. This has led to an unbalance in normal and anomalous observations, as well as several duplicate data instances [31, 32].

The NSL-KDD [32] is a publicly available dataset developed by the Canadian Institute for Cybersecurity. It was created to solve two main problems of the KDD-99 dataset, namely the distribution of the attacks in the train and test sets, and the over-inclusion of Denial of Service (DoS) attack types (neptuneand smurf) in the test dataset. This dataset also provides the following improvements. The omission of redundant or duplicate records in the train and test sets, the balancing of records for the train and tests sets, in order to avoid dataset sub-sampling, to reduce computational time in model testing. Given this dataset is an improved version of the KDD-99, it has the same features and attack types. The complete training dataset contains 125,973 features (58,630 anomalous and 67,343 normal). There is a reduced version of the train set (KDD + Train 20%) that contains a 20% subset of the training set. The full testing dataset contains 22,544 records (12,833 anomalous and 9711 normal). Additionally, there exists a testing dataset that does not include records that were not validated by all 21 classifiers used to match the KDD-99 ground truth labels in the dataset creation [32]. The attack types for the presented datasets are detailed in Table 1.

TypeDescriptionDataset
NormalNormal transaction data.NSL-KDD, UNSW-NB15
FuzzersAttempting to cause a program or network to suspend by feeding it with randomly generated data.UNSW-NB15
AnalysisA series of port scans, spam, and HTML file attacks.UNSW-NB15
BackdoorsTechnique to bypass security mechanisms stealthily.UNSW-NB15
DoSMalicious attempt to make a network resource unavailable by overwhelming its capacity to serve requests.NSL-KDD, UNSW-NB15
ExploitsLeverage the knowledge of a system vulnerability by exploiting it to achieve unauthorized access to a system.UNSW-NB15
GenericA technique that works against all block ciphers (encryption method) without consideration of its structure.UNSW-NB15
ReconnaissanceAttacks that aim to gather information about the network.NSL-KDD, UNSW-NB15
ShellcodeA small piece of code is used to exploit a software vulnerability.UNSW-NB15
WormsA piece of code that replicates itself in order to spread over the network, relaying on exploits to gain access.UNSW-NB15
User to Root Attack (U2R)The gains access to a regular account on the system and exploits vulnerabilities to gain root access.NSL-KDD
Remote to Local Attack (R2L)An attacker without an account sends packets to a system to gain access as a user by exploiting vulnerabilities.NSL-KDD

Table 1.

Attack types and descriptions for NSL-KDD and UNSW-NB15 datasets.

3.2 Dataset preprocessing

As part of the proposed model phases, dataset preprocessing was performed by ranking the most relevant features to be used for each signal category (PAMP, Safe, and Danger) required for the DCA. The feature ranking, selection, and categorization were based on information gain and feature-class mutual information maximization [21]. As a result, 10 and 17 features were selected for the NSL-KDD and UNSW-NB15 datasets respectively, as shown in Tables 2 and 3. Anomalous records of any category are labeled as one, whereas normal records are labeled as zero, to fit binary classification constraints. The selected features were combined by performing normalization in the range from zero to one. Each signal category is equal to the average of its corresponding features, similar to the approach in [21]. Antigen representation was achieved by using several dataset categorical features to generate antigen categories. Attack categories can be compared to biological antigens invading the body, as they tend to have similar patterns and can also attack recurrently [33].

Feature nameDescriptionSignal category
countNumber of connections to the same host as a current connection.Danger
srv countNumber of connections to the same service as the current connection in the past 2 seconds.Danger
logged inIndicates if a user is logged in.Safe
srv diff host ratePercentage of connections to different hosts.Safe
dst host countCount of connections having the same destination host.Safe
serror ratePercentage of connections that have “SYN” errors.PAMP
srv serror ratePercentage of connections that have “SYN” errors.PAMP
same srv ratePercentage of connections to the same service.PAMP
dst host serror ratePercentage of connections to current host that has an S0 error.PAMP
dst host rerror ratePercentage of connections to current host that has an RST error.PAMP

Table 2.

Feature descriptions for signal categorization, NSL-KDD dataset.

Feature nameDescriptionSignal category
sbytesSource to destination bytes.Danger
dbytesDestination to source bytes.Danger
dloadDestination bits per second.Danger
dmeanMean of the flow packet size transmitted by the destination.Danger
dpktsDestination to source packet count.Safe
sttlSource to destination time to live.Safe
smeanMean of the flow packet size transmitted by the source.Safe
ct state ttlNo. for each state according to source/destination time to live.Safe
ct dst sport ltmNo. of connections of the same destination address and source port in 100 connections.Safe
ct srv dstNo. of connections that contain the same service and destination address in 100 connections.Safe
durRecord total duration.PAMP
rateTransfer rate.PAMP
dttlDestination to source time to live.PAMP
sloadSource bits per second.PAMP
ct srv srcNo. of connections that contain the same service and source address in 100 connections.PAMP
ct src dport ltmNo. of connections of the same source address and destination port in 100 connections.PAMP
ct dst src ltmNo. of connections of the same source and destination address in 100 connections.PAMP

Table 3.

Feature descriptions for signal categorization, UNSW-NB15 dataset.

3.3 Model parameters

The proposed model has two configurable parameters, namely migration threshold and DC population size. The DC population size was set to 10 artificial DCs. The impact of migration threshold selection for the DC population is still an open research question. As noted in [9], a high migration threshold results in degraded performance for the DCA. Migration threshold selection was performed analyzing input signals for the datasets tested. Currently, this process needs to be adjusted depending on the dataset and selected features. The migration threshold was set as a uniform distribution generated in the closed real interval [0,.001]. This was chosen to have at least one migrated cell per iteration, in order to avoid oversampling in the antigen signature generation in the detection phase. The classification phase was performed by building a Decision Tree with pruning using the fitctreeMATLAB model builder.

The parameters used to build the DT are detailed in Table 4. The Decision Tree contains only two predictor categories, namely “Normal” and “Anomalous”. Antigen categories are defined as a combination of categorical features from the dataset, namely flagand attack categoryfor the NSL-KDD dataset, and protocol, service, and attack categoryfor the UNSW-NB15 dataset. Each predictor is determined as Anomalous if it is an attack of any kind. This process aims to increase antigen signature diversity, as only providing two antigen categories for the DCA detection phase will produce a two-observation classification task. This also reduces the performance penalties for miss-classification. One predictor is used as input for the classifier, namely Kα, as detailed in Eq. (6). An exact search is used as a predictor split. The cost for miss-classification is set as one. No maximum depth is set for the training process. For each split node, a maximum of 10 category levels is set, as to not increase computational complexity considerably. Leaf merging is also performed, where all leaves coming from the same parent are merged, if the risk value total is considered greater or equal to the associated parent risk value. The minimum amount of branch nodes is set to 10. Prior probabilities are set as empirical, as class probabilities are obtained from class frequencies in the class label.

ParameterValue
Predictor categoriesNormal, anomalous
Predictorskα
Predictor splitExact search
Miss-classification cost1
Max. categories10
Leaf mergingYes
Min. branch nodes10
Prior probabilitiesEmpirical

Table 4.

DT model parameters.

3.4 Numerical results

The tested model performance is summarized in Table 5, the testing performance for each dataset is highlighted in bold. The model was tested using the full train sets for each tested dataset, namely the UNSW-NB15 training set, and KDDTrain+. Testing performance is analyzed and used for comparisons. Classification performance for each dataset was tested using the UNSW-NB15 testing set and KDDTest+. Precision indicates the correctly classified anomaly proportion. The precision for the UNSW-NB15 dataset achieved 95.01%, NSK-KDD dataset was 88.91%. Specificity (or true negative rate) showed 94.24% for the UNSW-NB15 dataset, whereas the NSL-KDD dataset achieved 87.11%.

DatasetStagePrecision (%)Sensitivity (%)Specificity (%)Accuracy (%)Computation time in seconds
UNSW-NB15Train99.9899.9899.9699.97183.95
UNSW-NB15Test95.0110094.2497.2566.95
NSL-KDDTrain95.3194.8595.9095.41126.42
NSL-KDDTest88.9199.2087.1193.2819.78

Table 5.

Experimental results.

Additionally, the UNSW-NB15 and the NSL-KDD showed 99.98 and 94.85% in sensitivity. Higher sensitivity indicates the algorithm excels at identifying anomalies, whereas higher specificity denotes normal behavior correctly identified. Accuracy indicates the overall number of corrected assessments, and the UNSW-NB15 dataset achieved 97.25%, whereas the NSL-KDD achieved a 93.28%. Computation time in seconds was calculated. Training and testing time are aggregated to show the total algorithm runtime. The UNSW-NB15 dataset training time was 183.95 seconds, whereas the testing time was 66.95 seconds. Conversely, the NSL-KDD dataset training time was 126.42 seconds, and the testing time was 19.78 seconds.

Contemporary models based on the DCA are compared and presented in Table 6, the proposed method results are highlighted in bold. The proposed model was able to surpass other approaches and achieved a 97.25% in accuracy. The stochastic DCA [5] was tested using the UNSW-NB15 in [21], two proposals are included in the comparison and achieved results between 60.4 and 78.04% accuracy. The deterministic DCA [8] achieved a 90.14% accuracy. The fuzzy inference DCA [21] achieved 89.30% accuracy. The deterministic DCA without signal categorization achieved the second best result with a 90.23% accuracy for the UNSW-NB15. The NSL-KDD dataset model accuracy was compared with two other models. The deterministic DCA with the multiplication of antigens [34] achieved the best results with a 98.6% accuracy, whereas the same model without implementing antigen multiplication achieved a 96.1% accuracy. The proposed approach achieved a 93.28% accuracy.

DatasetModelAccuracy (%)
UNSW-NB15Deterministic DCA with Decision Trees97.25
DCA Without Signal Categorization [22]90.23
Deterministic DCA [6]90.14
Takagi-Sugeno-Kang and Fuzzy Inference DCA [21]89.30
Improved Stochastic DCA [23]84.2
Stochastic DCA [21, 23]60.4–78.04
NSL-KDDDeterministic DCA with Antigen Multiplication [34]98.6
Deterministic DCA without Antigen Multiplication [34]96.1
Deterministic DCA with Decision Trees93.28

Table 6.

DCA accuracy comparison.

The accuracy of contemporary methods for binary classification is presented in Table 7. Accuracy for the NSL-KDD and UNSW-NB15 datasets were compared, the proposed model results are highlighted in bold. A comparison with state of the art machine learning-based models was performed. The best accuracy result for the NSL-KDD dataset was obtained by K-Nearest Neighbors classifier [35] with a 94.92% accuracy. The second-best result was achieved by the proposed model with 93.28% accuracy, followed by a Deep-Learning Long-Short Term Memory model [36] with an 86.99%. Other methods compared include Random Forest classifier [36] with 85.44% accuracy and Artificial Neural Network with 85.31% accuracy.

DatasetModelAccuracy (%)
UNSW-NB15Deep Feed-Forward Neural Network [3]99.19
Random Forest [3]98.86
Gradient Boosted Tree [3]97.72
Deterministic DCA with Decision Trees97.25
Locally Deep Support Vector Machine [2]93.30
NSL-KDDK-Nearest Neighbors [35]94.92
Deterministic DCA with Decision Trees93.28
Deep Long-Short Term Memory [36]86.99
Random Forest [36]85.44
Artificial Neural Network [36]85.31

Table 7.

Proposed model comparison with machine learning models.

3.5 Discussion

The deterministic DCA performs context assessment by using a population of artificial DCs. Each element in the dataset is sequentially processed. All cells in the population receive the same signals and antigens for the current iteration. When a cell migration threshold is met, a cell does not receive any new signals or antigens and its antigen context values, namely the accumulated antigen signature of all cells that migrated in the current iteration, for each antigen type α(kˆα) and the sum of antigens αreceived by cell in its lifetime sα̂. Said outputs are accumulated in an antigen repository. All cells in the population are able to determine a spatial correlation between signals and antigen types αby using coefficient kα̂, as the accumulated difference of two linear functions, namely smDCand mDC. Antigen type αis determined in the dataset preprocessing phase and can be a distinctive categorical feature that represents similar observations (i.e., attack type, protocol, source port, etc.). Once all signals and antigens in the dataset have been processed, the anomaly metric coefficient kαis obtained, and is given as the relation between the sum of all kfor each antigen type αand the amount of times antigen category αwas sensed by any migrated DC. For classification, the DCA proposed a constant classification threshold, based on the collected data [6]. Any antigen category αabove said threshold is considered an anomaly. As the threshold calculated using the proposed equation for the deterministic DCA is a constant, it may be prone to large classification penalties when any antigen category is miss-classified, as all instances in the dataset that present this antigen category are affected. This issue may increase when antigen category count is low, or as a large dataset is processed and kαtends to have low variance. When the count of signal instances is large enough, the classification threshold tends to zero, and even though the normal antigen category (or categories) may be linearly separable, the classification threshold may not be adequate. This is further worsened if the mean of safe signals is greater or equal to the mean of danger signals, as Equation kαRcan produce negative values. To solve this, the proposed model builds a DT classifier after the detection phase. The decision rules derived from this model are used to classify the antigen repository, generated from all migrated DCs. The proposed model aims to avoid the dependability on a linear classification threshold, as DT can perform classification using a non-linear approach.

The presented computation time results are related to the computational complexity, where the deterministic DCA presents a big O notation of O(n2) for a worst-case scenario. Computational complexity increased with the incorporation of a DT classifier in the classification phase. As N(DC population size) changes, the DT construction does not present an increment or reduction in computation time, since all antigen signatures are summarized in the antigen repository of size m. Conversely, increasing the amount of antigen types mpresents an increment in computation time. The main drawback of this model resides on the dependence on the DC migration threshold, dataset size, and antigen categories. It is necessary to provide a migration threshold that does not cause cells to migrate prematurely or late, as the over and under-sampling of signals in a migrated cell tends to cause classification errors or reduce antigen signature separability. This affects the DT classifier as it may not be able to assess several signatures of similar magnitudes, and all observations presenting this antigen category are thus incorrectly classified. To decrease this likelihood, it is necessary to provide dataset selection and signal categorization that can produce a relatively low average migration rate. The classification threshold proposed in the deterministic DCA is also highly dependent on the amount of observations and attack distribution in the observed data for training. The proposed model introduced an increase in computational cost. One final issue is, as Decision Trees receive a large number of observations for training, it is known to over-fit, as well as when dealing with high dimensional problems. Further DT optimization procedures in relation to dataset features may need to be implemented to solve such issues.

4. Conclusions

Anomaly detection in computer networks is a complex task that requires the distinction of normality and anomaly. Artificial Immune Systems are biologically inspired computational models designed for the development of Intrusion Detection Systems. The Dendritic Cell Algorithm (DCA) is a population-based binary classifier, initially designed for network anomaly detection. The proposed model was inspired by the behavior of Dendritic Cells and immune Danger Theory. This research proposed solutions to two relevant anomaly detection challenges, namely feature selection and generalization capabilities to improve classification performance. The proposed model was based on the DCA and incorporated Decision Trees for the classification phase. Two publicly available datasets, namely UNSW-NB15 and NSL-KDD, were used. The model was trained using each training set provided. A comparison to assess the accuracy of other DCA models, along with state of the art approaches for network anomaly detection was performed. The proposed approach achieved a 97.25% accuracy, with the contemporary UNSW-NB15 dataset, and provided competitive results when compared to other state of the art machine learning approaches. The results using the NSL-KDD dataset achieved a 93.28% accuracy and surpassed machine learning methods, such as Artificial Neural Network and Random Forest. The proposed model was able to surpass other contemporary proposals using the DCA. Relevant challenges derived from the results obtained are the following. The potential of large miss classification due to the low amount of antigen categories; model dependence on migration threshold and their relationship with dataset features; lack of online detection; dependence on a large amount of observations to perform classification; as well as the lack of multi-class classification. There have been several proposals to address some of the presented issues, such as a variable functional migration threshold function [23], and signal categorization optimizations [22]. Said approaches need to be analyzed to further improve the proposed model. Multi-resolution analysis may provide insight to solve some of the mentioned challenges, such as reducing dependence on feature selection and multi-class classification. The proposal of a segmented version of the DCA [7] may provide a framework to implement online classification, reduction of computational complexity, and further increase the model learning capabilities. Although other proposals have included the use of machine learning techniques to perform classification in the DCA [34], the proposed method provides a starting point to incorporate a robust feature selection and classification mechanism to the ongoing research and development challenges of the DCA.

Acknowledgements

The authors of this paper would like to thank the Mexican National Council of Science and Technology (CONACYT), as well as the Universidad de las Americas Puebla, Mexico, for providing funding for this research.

Author details

David Limon-Cantu and Vicente Alarcon-Aquino*

Department of Computing, Electronics and Mechatronics, Universidad de las Americas Puebla, San Andres Cholula, Puebla, Mexico

*Address all correspondence to: vicente.alarcon@udlap.mx

Automatic Terrain Perception in Off-Road Environments

Ethery Ramírez-Robles and Oleg Starostenko

Abstract

Autonomous driving is a growing research area; however, there are no fully autonomous vehicle (AV) in the world. Existing AVs have different capabilities and can drive by themselves only in specific scenarios with several constraints. This paper discusses several studies from the point of view of a modular system approach. This approach perceives autonomous driving as separate tasks to solve. Studies are classified in object/pedestrian detection, road detection, obstacle avoidance, terrain perception, mapping of the environment, and path planning. Furthermore, various perception sensors are reviewed and compared. Important datasets and metrics found in the literature are presented. Finally, one of our experiments obtained a weighted IoU of 83.88% in the segmentation of five classes. Since this is a work in progress, more research needs to be done, but our proposal shows promising results in terrain perception in off-road environments.

Keywords:autonomous driving, terrain perception, semantic segmentation

1. Introduction

Autonomous driving is a growing research area; recently, it has received a lot of attention due to its many advantages. According to a study by Morgan Stanley Research, autonomous vehicles (AVs) can save money from reduced labor costs, improved productivity, lower fuel consumption, and fewer accidents [1]. There are two types of scenes in autonomous driving, on-road and off-road. In the first type, we can find pavement roads, lane markings, defined cues, etc. In the second, there are uneven surfaces, not clear delimiters, vegetation, and different terrains. Several projects have brought significant advances and had a meaningful impact on the state of the art of autonomous driving. The DARPA Challenge held in 2004 was one of the first important competitions initially; it was oriented for military applications, but then the focus change to civilian purposes in urban scenarios. None of the contestants finished; in the second edition, five teams complete the challenge without human intervention.

Autonomous vehicles (AVs) are complex systems. The Society of Automotive Engineers (SAE) [2] defines six autonomy levels in cars, starting from 0 to 5. In level 0, there is no driving automation; the human driver performs all driving tasks. In level 1, some tasks are performed by the car, like adaptive cruise speed control, stability control, and anti-lock braking systems. Partial driving automation is level 2. In this level, there are combined automated functions like acceleration and deceleration in defined situations. Level 3, known as conditional automation, is when the vehicle can control some functions under limited conditions for a certain period. In level 4, the vehicle is capable of fulfilling all driving tasks under certain conditions. Level 5 is full automation; there is no need for a human driver, the car can drive under all conditions.

There are two main classifications for system architecture in AVs, based on their connectivity and their algorithmic design [3]. In the first, we find ego-only systems and connected systems; for the second, there are modular and end-to-end systems. The majority of the research focuses on modular systems since it is an easier form to implement an AV. In this work, we present several proposals found in the literature from the point of view of modular systems. In addition, there is a short review of the most common sensors used to perceive on AVs. Finally, we implement an existing model for segmenting different terrain types using a lightweight and fast network that can be used in mobile devices.

2. Related work

As mentioned before, there exist different classifications for system architecture in AVs; based on their connectivity, we find ego-only systems and connected systems. Ego-only systems are when a single self-sufficient vehicle carries all the necessary automated driving operations at all times. In contrast, connected systems depend on other vehicles and infrastructure to make decisions. This last approach is still in an initial phase, but in the future, with the growing area of the Internet of Things (IoT), this will be possible. It is expected to have vehicle-to-vehicle (V2V) communication, vehicle to infrastructure (V2I) communication, and vehicle to everything (V2X) communication. A large amount of data will be available to vehicles, so more informed decisions will be taken; however, new challenges will exist, and AVs could become even more complex.

A second classification is based on the algorithmic design; we find modular systems and end-to-end systems. In modular systems, AVs are seen as separate tasks to accomplish. Every module represents one task that can be solved separately, and then the results of each are integrated to form a complete system. However, this approach is prone to error propagation. On the other hand, in end-to-end systems, all the modules are seen as a black box. In general, the system receives data from the sensors, and the output is the directions for the actuators of the vehicle.

In this work, we will discuss different proposals found in the literature based on modular systems. Some tasks of this classification are object/pedestrian detection, road detection, obstacle avoidance, terrain perception, mapping of the environment, and path planning.

2.1 Object/pedestrian detection

Object detection identifies and locates instances of objects in an image. In this task, when an object is detected, it is marked with a rectangular bounding box. Some general steps in object detection are preprocessing, Region of Interest (ROI) extraction, object classification, and localization. In the preprocessing steps, some subtasks are performed, such as exposure and gain adjustment, camera calibration, and image rectification. Extract regions of interest can also be implemented as a preprocessing step. Approaches that use ROI extraction usually have more computational cost since the system becomes more complex; however, the results are better. Another disadvantage of ROI is processing time; in modular systems, time is an essential consideration because other modules need to be executed so the vehicle can take and implement the decisions in real-time.

The most common approach is Deep Convolutional Neural Networks (DCNN). One of the most known DCNN is YOLO (You Only Look Once) [4]. YOLO works with a single neural network that predicts bounding boxes, confidence for those boxes, and a class probability map. This network process 45 FPS, and there is a modified version that is faster, but the accuracy is lower. A different method is proposed by [5]; this method consists of a multi-scale CNN. First, they use a proposal sub-network and then a detection sub-network. The proposal network could work as a detector, but it is not strong since its sliding windows do not cover objects well. Thus, the reason to include a detection network was to increase detection accuracy.

Alternatively, Tabor et al. [6] not only apply their own initial implementation of a CNN but also considered aggregated channel features (ACF) and deformable parts model (DPM). ACF uses a sliding window approach where candidate bounding boxes are considered at regular intervals throughout images. DPM uses a two-stage classification process to model parts of an object to move relative to each other and the object centroid. Another approach is Region Proposal Networks followed (RPN) by boosted forests [7] which is a more simple but effective method. RPN generates the candidate boxes as well as convolutional feature maps, while Boosted Forest classifies the proposal using convolutional features.

2.2 Road detection

There is no general definition of the road detection problem. Mei et al. [8] defines the problem as “detecting the region in front of the robot that is mechanically traversable by the robot that is apt to be chosen by a human to drive.” This definition can be applied to off-road environments where there are no defined roads like in cities. Approaches in the literature usually consider the scenarios since some methods are more reliable in urban scenarios than off-road.

Lane and road boundary detectors are proposed despite the lack of boundaries in some unstructured scenarios. Jiménez et al. [9] presents a new algorithm based on this using a laser scanner and a digital map when available. They applied two methods in parallel to increase the robustness of their results. Their first method is about the study of variations in the detection of each layer of the laser scanner. They detect boundaries when there are sidewalks higher than the road. The second method is for the study of the separation between intersecting sections of consecutive laser scanner layers. The solution proposed in the method is to try to identify areas with constant radius differences within a predefined tolerance, which allows determining the roadway area.

Cameras are a common form of perception; some works consider the color model of the terrain surfaces and illumination conditions to extract and segment roads [9]. The problem, in that case, is formulated as a joint classification. Moreover, Procházka [10] uses the Monte-Carlo algorithm to segment road regions. They estimate the probability density function (PDF) of road pixels from a sequence of observations. The sequential Monte-Carlo estimation is the one that approximates the PDF. In contrast, Li et al. [11] combines camera information with laser data. They apply a preprocessing step to detect roads, and then they analyze texture features in grayscale images. The laser sensors provide a traversable region near the front of the vehicle.

2.3 Obstacle avoidance

This task satisfies the objective of non-intersection or non-collision with objects, and it is very related to path planning. Obstacle avoidance is a crucial aspect of autonomous driving; however, some researchers took more emphasis on optimizing the avoidance of crashes while others only comply with satisfying this task but not in the best form. Some important aspects to consider are the vehicle’s characteristics, like the turning radius and the velocity. Similar to some path planning proposals, some authors use cubic splines [12] to generate several paths considering obstacles, and the best path is selected using optimization techniques. Other approaches use fuzzy algorithms [13] to control AVs, considering vehicle dynamics and the geometry of the obstacles.

2.4 Terrain perception

This task is a vital component of AVs in an off-road environment. In contrast with cities, off-road scenes are more unstructured, and surfaces are not expected to be flat. AVs must be able to decide whether the terrain ahead is passable easily, passable with caution, or whether it is better to avoid. Usually, the information processed in this task comes from images. Another widely used sensor is a laser; information obtained with this kind of sensor helps build a 3D map of the scene to understand terrains with different altitudes.

Cameras are usually mounted on top of AVs, but in some cases, like in small robots, cameras view only the ground, so only one type of terrain is perceived at the time. In automobiles, the perspective is different; bigger pictures are obtain containing also information from the sky. Some researchers use a more classical approach; it is common to see feature extractors and classifiers. There are different forms of feature extraction; for example, Filitchkin and Byl [14] uses a bag of visual words from speeded up robust features (SURF). Other works use local binary patterns (LBP) and local ternary patterns (LTP) [15]. Besides that, some approaches create a combination of features, for instance, color and edge directivity (CEDD) and fuzzy color and texture histogram (FCTH) [16].

A common classifier used not only in computer vision tasks but in other areas is Support Vector Machine (SVM), which is one of the most robust prediction methods in the literature; however, this classifier uses supervised learning. Random forests had been found useful to classify asphalt, tiles, and grass with information from cameras and lasers [15]. A different approach is the use of CNN [17, 18]; usually, no preprocessing steps are performed; only RGB images are the inputs to the network. One disadvantage is the need for large amounts of data needed to train this kind of network.

2.5 Mapping of the environment

Mapping presents a digital representation of the environment; it helps to decide a safer path to follow. Usually, 2 and 3 Dimensional (2D and 3D) information is used. Create 3D maps can be computationally expensive and can increase processing time. Some approaches use a priori maps; the system compares real-time readings with previous data. The main disadvantage in a priori map is the changes in the environment; specifically, in off-road scenes, it is difficult to have the same characteristic all the time, that is, vegetation growth.

Some representations in this task are superpixels, stixels, and 3D primitives [19]. In pixel-based representation, each pixel is a separate entity; due to this, in high-resolution images, the complexity is more. Superpixels are groups of pixels used to solve the problem of complexity. These groups are obtained by segmenting the image into small regions; these should be similar in color and texture. Stixels are presented as a medium-level representation of 3D traffic scenes with the goal to bridge the gap between pixels and objects. These are represented by a set of rectangular sticks standing vertically on the ground to approximate surfaces. 3D primitives are blocks of 3D basic geometric forms such as cubes, pyramids, cones, among others.

With the help of 2D and 3D information obtained from LIDAR and other sensors, the systems can have a sense of the geometric structure around the vehicle. A way to map the environment is by using semantic segmentation combined with CNN [20]. The combination of neural networks with other approaches creates more robust methods than the approaches that used a single algorithm. Nevertheless, sometimes there are difficulties in estimating the pose of lasers, which is required for the proper registration of the range measurements. As a result, Parra-Tsunekawa et al. [21] proposed the use of the extended Kalman filter to estimate in real-time the instantaneous pose of the vehicle and the laser rangefinders by considering various measurements acquired by different sensors.

2.6 Path planning

In this task, the main goal is to find a geometric path from an initial point to an endpoint. Sometimes, the vehicle dynamics can be considered in the problem even though this will mean the work can only be applied to vehicles with the same characteristics [22, 23]. A better approach is to work in path planning considering a general solution that is not tied to any specific vehicle [24]. There are two main approaches global route planning and local path planning. Global planners search routes from origin to the final destination, some proposals focus on efficiency in real-time traffic. In contrast, others can compute directions in milliseconds, and others consider space requirements. Local planners find trajectories in real-time considering obstacles, and their objective is to complete the global route. Despite the different approaches, there is an existing controversy among some researchers that if an AV should drive like a human or should look for the optimum path.

Some proposals use SVM [22] and Genetic Algorithms (GA) [25]; with these algorithms usually, other methods are applied in the first step, for instance, A* algorithm which is a typical graph search algorithms in pathfinding. Another method extensively used is the use of Artificial Neural Networks (ANN). There are different variants, the more used in autonomous driving include CNN [23] and Fully Convolutional Networks (FCN) [24].

3. Sensing hardware

In this type of system, sensor redundancy is commonly used. Sensors are a necessary part of AVs; there are different devices to perceive the environment. Some of the most used are presented next.

  • Monocular vision: RGB images are usually obtained with cameras mounted in front of the upper part of the vehicle. One of the most significant advantages is their cost, compared with other devices, is cheaper, and the results are overall good. Nevertheless, this type of sensor is affected by weather and illumination. Several studies use cameras to perceive color, which is important to know in some tasks.

  • Light detection and ranging (LIDAR): This type of sensor is commonly seen in AVs; the data obtained can be helpful to achieve better success rates. LIDARs work emitting light waves and measuring the reflection to have distances with objects. Until years ago, the main disadvantage was their size and cost; however, recently, the size started to decrease, and now we can find these sensors even in mobile devices like the iPhone 12. Nonetheless, these new smaller sensors do not have the same range of detection as the bigger ones. LIDARs are generally used for mapping the environment and object detection.

  • Stereo imaging: This modality provides similar information to LIDARs. 3D data is obtained through two cameras and can be used in some basic tasks that a LIDAR can perform, but the accuracy and reliability are not the same; however, stereo imaging is cheaper.

  • Radio detection and ranging (RADAR): This kind of sensor work in the same way as LIDARs, with the difference that it uses radio waves instead of a laser, and its resolution is lower. One of the main differences between these two sensors is that RADARs can detect at longer distances than LIDARs, while both are not affected by illumination conditions.

  • Global positioning systems (GPS): GPS devices are commonly used in several systems, not only AVs. They communicate with several satellites to provide geographical information about where the sensor is located in the world. These devices obtain precise information; however, there are scenarios where the signal can be lost, for example, in tunnels, tree-lined streets, or underpasses. In these scenarios, the Inertial Measurement Unit (IMU) is very important; they can improve the accuracy and help to estimate the position of the vehicle.

  • Vehicle dynamics: In this section, we find all the sensors typically installed in vehicles. They perceive speed, yaw rate, and acceleration. These sensors are useful in the implementation of control navigation. Nevertheless, sometimes automobiles do not provide an easy form to obtain this information from the communication bus.

4. Datasets and evaluation metrics

Datasets are important to train and evaluate algorithms. In the literature, there are several known datasets to use on autonomous driving projects. Some of the most popular are PASCAL VOC [26], KITTI Vision Benchmark [27], MS-COCO [28], ImageNet [29], Berkeley DeepDrive [30], nuScenes [31], Oxford RobotCar [32], Waymo Open [33], and Cityscapes [34]. There are other small datasets like Freiburg Forest Dataset (FFD) [18], Hand-Labeled DARPA LAGR Datasets [35], and NREC Agricultural Person-Detection Dataset [36]. Every dataset contains a different structure, but in general, all have a set for training and others for evaluation. In the area of autonomous driving, the majority of datasets contain images, but there are others containing information from LIDAR sensors. Only a few contain other kinds of data like depth, near-infrared, radar, GPS, vegetation indexes, etc.

One of the most common metrics to evaluate classification algorithms is accuracy [Eq. (1)]. It is defined as the number of correct predictions over the total number of predictions made. In binary classification, it can also be calculated in terms of positives and negatives [Eq. (2)]. In the case of imbalanced data, accuracy does not present an accurate representation. In those cases, precision [Eq. (3)] and recall [Eq. (4)] are better metrics to use. The first metric attempts to answer the question what proportion of positive identifications was actually correct? And the second answer the question what proportion of actual positives was identified correctly?

Accuracy=NumberofcorrectpredictionsTotalnumberofpredictionsmade(1)
Accuracy=TruePositives+TrueNegativesTruePositives+TrueNegatives+FalsePositives+FalseNegatives(2)
Precision=TruePositivesTruePositives+FalsePositives(3)
Recall=TruePositivesTruePositives+FalseNegatives(4)

A common metric used in object detection, road detection, and terrain perception is the Jaccard index, also known as Intersection over Union (IoU) [Eq. (5)]. This metric measures the similarity between two finite sets, in this case, the ground truth and the prediction. Depending on the task, the ground truth can be a bounding box or a mask. IoU is defined as the area of overlap between the ground truth and the prediction divided by the area of the union of both. The metric range goes from 0 to 1 (0–100%), where 0 means no overlap at all and 1 is a perfect overlap of masks.

IoU=AreaofunionAreaofoverlap(5)

There are other forms of evaluation proposed by several researchers that are not standardized metrics. In some papers for obstacle avoidance, it is not only evaluated if the vehicle hits or not an obstacle but also the distance of the AV with objects. The evaluation can be very subjective for path planning since there is not an exact and unique path to follow. An important consideration is the mechanical characteristics of the AV. Not all vehicles can traverse through the same roads, that is, an all-terrain vehicle compared to a commercial automobile or a military vehicle. In Mei et al. [8], they proposed a metric called mechanical traversability; they defined it as the percentage of extracted road pixels that are mechanically traversable. Another form of evaluating an AV is proposed by Bojarski et al. [23], which measure the percentage of autonomy of the vehicle [Eq. (6)]:

autonomy=1numberofinterventions6secondselapsedtime100(6)

They assumed that human intervention in an AV would require 6 seconds to take control of the vehicle, re-center it, and restart the autonomous mode. The elapsed time is the total time in seconds of the simulated test.

5. Our proposal

Our proposal is focused on the terrain perception stage for off-road environments. Our approach uses an existing public dataset named Freiburg Forest Dataset (FFD) [18] to train a convolutional neural network to segment five different classes in daylight considering good weather conditions.

FFD is an open dataset that contains multi-modal/spectral images. There are 230 training images and 136 validation images. It also contains manually annotated pixel-wise ground truth segmentation masks. Besides RGB images, the other modalities included are two vegetation indexes: the Normalized Difference Vegetation Index (NDVI) and the Enhanced Vegetation Index (EVI); also Near-infrared (NIR) and depth data. The five classes in this dataset are Obstacle, Trail, Sky, Grass, and Vegetation. All the data was captured at 20 Hz with a camera resolution of 1024 × 768 pixels (Figure 1).

Figure 1.

Sample image Freiburg forest dataset with its ground truth mask.

The semantic segmentation was achieved using convolutional neural networks. For this step, we select DeepLab [35], a model created by Google to perform semantic segmentation. This model is commonly used to segment objects like persons, vehicles, animals, etc. There are different versions of DeepLab, but the latest v3+ implements a novel encoder-decoder structure and a spatial pyramid pooling module (ASPP). DeepLab supports different network backbones like Xception [36], MobileNet [37], ResNet [38], and PNASNet [39]. Besides, there are pretrained checkpoints to retrain with different data.

For this work, we select MobileNet as the backbone due to its fast and lightweight structure. We select two checkpoints, the first is pretrained on ADE20K and the second in MS-COCO. The main reason to select these two checkpoints is based on the content of that datasets. Both datasets contain labels related to off-road environments, that is, tree, sand, ground, among others. Since the data is in a different format all the RGB images and the PNG ground truth images were transformed into TFRecord format.

All the training and evaluation were run in Python on a Laptop with a Core i7-8750H and an NVIDIA GTX 1050Ti GPU.

6. Results

In this work, we study several perspectives on how to attack the problem of autonomous driving. As mentioned before, the two main approaches are modular and end-to-end. Our approach is based on the task of terrain perception. Our approach was to apply transfer learning to retrain an existing model to segment different terrains. Two checkpoints were selected, and five classes were segmented: (0) Object, (1) Vegetation, (2) Sky, (3) Soil, and (4) Grass.

We use Intersection over Union as a metric to evaluate the performance of our approach. Also, the mean IoU (mIoU) is presented, but it is not a good form to evaluate since it does not consider the number of times that a class is presented in the data. mIoU can be skewed by imbalanced datasets giving more importance to classes with more presence. In order to present a more accurate metric, we obtained weighted IoU that gives us an average IoU of each class, weighted by the number of pixels in that class.

Table 1 presents results for each class, as it can be seen the best results were obtained by the checkpoint pre-trained in MS-COCO with 83.88% of IoU. We believe the reason is the bigger number of images in this set of databases that contained off-road scenes. Nevertheless, class 0, which is objects, was not detected at all in FFD_MS-COCO. For that specific class, the best results were obtained with FFD_ADE20K with an IoU of 20.45%. It is important to mention that this class is not found in every image, so the network does not have enough examples to learn and give better predictions.

ADE20K (%)MS-COCO (%)
Class 020.450
Class 185.0085.57
Class 278.8389.46
Class 369.7775.07
Class 475.7080.10
mIoU65.9566.04
Weighted IoU79.6783.88

Table 1.

IoU comparison results.

Figure 2 shows some qualitative results; the first column shows the original images. The ground truth mask is presented in the second column, and the rest of the columns are the results of the two different experiments. As shown in some of the results with FFD_ADE20K, the bottom part of the image was detected as the sky and in some results as soil. This problem was persistent in most of the images, in the same way for FFD_MS-COCO; it was existent but only in a few images.

Figure 2.

Qualitative results.

7. Conclusions

This research presents different approaches found in the literature of modular systems in autonomous vehicles. We focused on modular systems since it is an easier form to solve the problem of autonomous driving. One of the main advantages is redundancy. This type of system needs to be redundant and reliable since it can be dangerous consequences in case of error like human fatalities. Alternatively, end-to-end systems have become more studied in the latest years. In the future it is expected to have more proposals using this approach, however, at this moment there is more limitation with this type of system like the lack of hardcoded safety measures.

Our approach is based on the terrain perception stage of modular systems. We select an existing model that obtains similar results to the one found in the literature. Our model is lightweight to be run on mobile devices but still robust enough. Two checkpoints were compared, obtaining 83.88% of weighted IoU for the best result.

We expect to improve the semantic segmentation results by augmenting the dataset for future experiments so the network has more data to learn. Further research could explore different parameters and hyperparameters of the model and their influence on the results. Since the architecture selected is oriented to run in mobile devices, we look for implementing and testing the video segmentation in smartphones.

Author details

Ethery Ramírez-Robles* and Oleg Starostenko

Department of Computing, Electronics and Mechatronics, Universidad de las Américas Puebla, Puebla, México

*Address all correspondence to: ethery.ramirezrs@udlap.mx

Analysis of Voice and Magnetic Resonance Images to Assist Diagnosis of Parkinson’s Disease with Machine Learning

Gabriel Solana-Lavalle and Roberto Rosas-Romero

Abstract

Parkinson’s disease (PD) is a chronic neurodegenerative disease that affects 1% of the population and whose diagnosis is considered one of the most challenges in the area of neurology. The goal of our work is to assist physicians with the correct diagnosis and early detection of PD. This chapter provides a review of previous work on PD detection under two perspectives, voice analysis and Magnetic Resonance Imaging analysis, by comparing our work with those from other authors. For the case of voice-based PD detection, accuracy reaches 95.9% in female patients and 94.36% in male patients on the largest available dataset. Another contribution in this area is the analysis of voice features to assist the clinical interpretation of the binary result of voice-based detection. For the case of structural Magnetic Resonance Imaging (sMRI)-based PD detection, detection accuracy reaches 96.97% for female patients and 99.01% for male patients using the Parkinson’s Progression Marker Initiative dataset. We provide a discussion about the finding of new regions of interest to assist in the detection of PD on sMRI. There is also a comparison between voice-based and MRI-based PD detection methods. Finally, a perspective on future work for PD detection is discussed.

Keywords:Parkinson’s disease, machine learning, biomedical engineering, magnetic resonance imaging, voice analysis, diagnostic tool

1. Introduction

Parkinson’s disease (PD) is a chronic neurodegenerative disease that affects over six million people worldwide. Because PD is most common in people over the age of 50, the number of PD patients is expected to double by 2040 due to the increase in life expectancy [1]. The loss of dopaminergic cells in the sustantia nigra region of the brain reduces the amount of dopamine in PD patients, causing dyscontrol in several areas of the brain. Some of the main symptoms of PD are motor symptoms, such as tremors, rigidity, and slow movement (bradykinesia). These symptoms, however, become apparent at an intermediate-advanced stage of the disease when the patient may have had the disease for over ten years [2].

The diagnosis of PD is considered one of the most challenging in the area of neurology. The autopsy of the brain of PD patients has shown that 35% of the cases clinically diagnosed with PD were incorrect [3]. Usually, the diagnosis is done by a physician who looks for cardinal symptoms of the disease and starts dopaminergic therapy as a differential diagnosis. However, these symptoms appear at a late stage, and the patient may have lived with PD for years. Added to the similarity to other parkinsonian disorders that in some cases have the same motor symptoms as PD, may cost the patient crucial time and money, as inadequate treatments could be given by physicians. On the other hand, if detected in time, PD patients can improve their quality of life by taking the correct medication and therapy [2].

Great efforts are being made to find biomarkers that share some light into the causes and development of PD. Advances in technology provide alternatives to help physicians correctly diagnose PD patients at an early stage, and at the same time, obtain relevant information for understanding the disease. As shown in this article, non-invasive techniques such as medical images and voice recordings combined with machine learning and signal processing have proven to be adequate tools for solving the problem of PD detection with great accuracy.

The interest in using voice recordings for PD detection comes from the knowledge that voice disorders are prodromal symptoms present in over 90% of PD patients at an early stage. Some of the alterations include dysphonia (defective use of the voice), hypophonia (reduced vocal loudness), and imprecise hypokinetic articulation [4]. The advantages of this method are that voice recordings can be obtained without going to a hospital, and the economic cost is low, among others.

On the other hand, medical imaging techniques are important tools for understanding and helping diagnose PD. The images give information about neuro-anatomical and pathophysiological processes related to the disease [5]. Some of the most used imaging techniques for neurological disease detection are DaTscan, Magnetic Resonance Imaging(MRI), and Diffusion Tensor Imaging(DTI). DaTscan images detect the concentration levels of dopamine in different regions of the brain, but the availability and cost of the studies may be prohibitive for patients. Structural Magnetic Resonance Imaging (sMRI) is a technique that provides structural information of the tissues and connectivity of the brain. It is available in most countries and is economically viable compared to other studies.

Work on voice-based and sMRI-based detection of PD and their clinical interpretation is reviewed in this article. The rest of the article is structured as follows: Section 2 presents voice-based analysis, including the database characteristics, classification results, and clinical interpretation of the extracted features. Section 3 introduces the analysis of sMRI of PD patients, the detection performance, and regions of interest for the diagnosis of both female and male patients. Section 4 gives conclusions and future work on the area of PD detection.

2. Voice-based analysis applied to the diagnosis of Parkinson’s disease

In general, PD detection based on voice analysis consists of two stages: feature extraction and classification; however, to train classifiers, two additional stages are used: feature selection and performance assessment. The main reasons to analyze voice for PD detection are: (1) voice-based analysis is a low-cost and non-invasive technique, (2) speech problems start at early stages of the disease so that voice-analysis is appropriate for early detection, (3) we have conducted research so that the detection of PD is extended to provide clinicians with quantitative information to help in the understanding of a binary result [6]. In the following, a description of voice-based PD detection is given, the advantages of conducting separate tests for men and women are highlighted. Another issue to explain is the importance of the used dataset size when conducting separate PD detection experiments. It is also explained that the most contributing features to a high detection performance are those obtained with extraction processes that resemble the way the auditory system works.

The first step in the analysis of voice recordings consists of the extraction of features. Different groups of features have been used by researchers, from which baseline featuresare the most common. Baseline features include jitter, shimmer, detrended fluctuation analysis(DFA) among others, and are the most traditional set of features. Other commonly used features are Mel Frequency Cepstral Coefficients(MFCC), Wavelet transform, and Tunable-Q Wavelet Transform(TQWT). These features are obtained by using banks of filters, which extract information over multiple frequency bands at different bandwidths so that the higher the frequency content, the higher the bandwidth. Among all the features, a reduced set of relevant features is obtained through a selection process where correlated features are eliminated. An observation, within the used dataset, is characterized by 754 features extracted from voice recordings. These features belong to six groups of features; however, in our work, there were two groups of features that were not relevant for the classification of voice recordings.

The classifiers are trained with the sets of selected relevant features. In our work, we have used four different classifiers. The classification result is binary since the subject, under analysis, is identified as a patient with PDor as not having PD. The different stages (feature extraction, feature selection, classification, interpretation), within the methodology, are shown in Figure 1.

Figure 1.

The most relevant ROI’s for PD detection in 1.5 T and 3 T MRI of female patients are highlighted with colors red yellow and green.

3. Dataset

Most of the previous works have conducted PD detection by using a population of patients and controls without separate studies for female and male subjects. One reason for not conducting separate studies has to do with the dataset size, which is not large enough to conduct such separate analyses. However, the work from Sakar et al. [7] has provided the research community with the largest voice-based dataset publicly available so far. This dataset was built from 756 voice recordings, where 754 voice features were extracted from each recording by using different signal processing techniques. The recordings were obtained from 107 male patients, 81 female patients, and 64 controls. This number of observations, within the dataset, is high enough to obtain statistically relevant results after partitioning it into two datasets according to gender. A total of 252 individuals were involved during the generation of this dataset. The involvement of each individual consisted in pronouncing vowel /a/ in front of a 44.1-kHz microphone three times. Each recording duration is 220 seconds (9,702,000 samples per recording). Each recording was divided into frames of 25 ms to conduct stationary signal processing for feature extraction. Feature vectors, from different frames, were averaged. Six signal processing techniques were applied for feature extraction. This dataset is found in the Machine Learning Repository of the University of California Irvine. This dataset was generated by the Cerrahpsa Faculty of Medicine at the Department of Neurology, Istanbul University, from 188 PD patients (107 men and 81 women) with an age range between 33 and 87 years old, and from 64 controls (23 men and 41 women) with an age range between 41 and 82 years old.

4. Classification results

Feature selectionwas applied to reduce the dimensionality of feature vectors. Feature selection was conducted by running Wrappers feature subset selection, which results in an optimal subset of features for a specific classifier. Feature subset selection was accomplished for each classifier. The most relevant groups of features, selected by Wrappers, were the Mel Frequency Cepstral Coefficients(MFCC) and the Tunnable Q-factor Wavelet Transform(TQWT) features. MFCCs are based on the way the human auditory system works. The computation of MFCCs involves the use of multiple band-pass filters where the filter bandwidth is increased as the central frequency is higher. In the work by Sakar et al. [7], the two most relevant groups of features were TQWT and MFCCs, and the work by Solana-Lavalle et al. [8] is also based on these features.

Multiple classifiers have been applied to the problem of voice-based PD detection such as the k Nearest Neighbors(kNN), Multi-Layer Perceptron(MLP), Random Forest(RF), and the Support Vector Machine(SVM). However, after conducting separate studies for male and female populations, it was found that the classifiers, with the highest detection performance, were (1) the Support Vector Machines(SVM) with a radial basis function kernel(RBF), and (2) the Multi-Layer perceptron(MLP) [6]. The highest accuracy reported is 94%, which is a considerable improvement over the previous works that used the same dataset [7, 8]. In addition, the complexity of the last reported model has been reduced from 50 to only 20 used features.

5. Classification results in male and female populations

In the work by Tsanas et al. [9] it was claimed that the problem of PD detection would be more adequate if the problem were addressed by conducting separate analyses on male and female subjects. At that time (2012), such experiments were not possible due to the reduced size of publicly available datasets. However, an adequate number of recordings, for separate statistical studies, has been currently available since the introduction of the dataset by Sakar et al. [7].

In the first work by Solana-Lavalle et al. [8], datasets for PD detection on male and female subjects, were unbalanced, i.e., the number of PD patients is greater than the number of controls. Experiments, with balanced sets of PD patients and controls, were later conducted by Solana-Lavalle et al. [6] with interesting results. It was found that detection performance is increased if balanced datasets are used to train and test classifiers.

A comparison of the different methods of voice-based PD detection, proposed by the research community, is shown in Table 1. It is observed that different datasets have been analyzed; however, the largest one is the dataset introduced by Sakar et al. [4]. The method that achieves the highest detection performance with the largest dataset is the one proposed by Solana-Lavalle et al. [6]. In addition, to reach the highest detection performance, this method is characterized by the lowest feature vector size.

Author, yearDatasetResults
Peker, 2016 [10]195 sound measurements from 8 healthy people and 23 PD patientsAccuracy = 0.99, sensitivity = 0.96, specificity = 1
Guruler, 2017 [11]195 sound measurements from 8 healthy subjects and 23 with PDAccuracy = 0.99, sensitivity = 1, specificity = 0.99, F1 score = 0.99
Sakar et al., 2017 [12]42 patients with PD and 8 healthy controlsAccuracy = 0.96, MCC = 0.77
Braga et al., 2019 [13]22 speakers with PD and 30 healthy speakersAccuracy = 0.99 for RF classifier
Sakar et al., 2013 [4]20 patients with PD and 20 healthy individualsAccuracy = 0.85, sensitivity = 0.85, specificity = 0.9
Raza et al., 2020 [14]195 voice samples from 8 healthy people and 23 PD patientsAccuracy = 0.97
Vital et al., 2021 [15]1200 voice samples from 51 healthy people and 62 PD patientsAccuracy = 1
Peker et al., 2015 [16]195 sound measurements from 23 PD patients and 8 healthy peopleAccuracy = 1, sensitivity = 1,
specificity = 1
Tsanas et al., 2011 [17]10 healthy controls and 33 patients with PDAccuracy = 0.977 and accuracy = 99.03
Montaña et al. 2018 [18]27 healthy controls and 27 patients with PDAccuracy = 0.944
Sakar et al., 2019 [7]756 voice recordings from 64 healthy individuals and 188 patients with PDAccuracy = 0.86, MCC = 0.59
Proposed approach756 voice recordings from 64 healthy individuals and 188 patients with PDAccuracy = 0.947, sensitivity = 0.984, specificity = 0.9268, precision = 0.9722, false alarm rate = 0.0277, MCC = 0.8686

Table 1.

Voice-based Parkinson detection.

6. Clinical interpretation

The binary output from a classifier implies that a clinician will need further tests to gather strong evidence at the time of taking a diagnosing decision if the patient presents PD or not. Thus, a deeper quantitative analysis of the results must be carried out if the binary results from multiple voice-based tests are contradictory. For this reason, the work from Solana-Lavalle et al. [19], provides an analysis of the most important features used to classify a subject as a PD patient or control. By using Principal Component Analysis(PCA), the features, with the highest contribution to the detection of PD, were obtained and analyzed. It was found that the features which explained better the diagnosis result, for the case of female subjects, are related to higher frequencies, such as the 32nd and 33rd TQWT coefficients. On the other hand, for the case of male subjects, it is found that features, with the highest contribution to PD detection, are related to lower frequencies such as the fifth TQWT coefficient. The mean and the standard deviation of the most important features were computed for the PD.

Patients and controls and a comparison (PD patients vs. control) is done by using box-plots. According to the box plots, it is shown that there is a clear separation between both groups in most cases. This analysis could help the physician during the interpretation of a binary result to understand how much affected voice is, and the likeness that a patient belongs to one group or the other.

7. Analysis of MRI to assist the diagnosis of Parkinson’s disease

Medical images are an important tool to assist the detection and track the progression of neurodegenerative diseases. For PD detection, structural Magnetic Resonance Imaging(sMRI) provides relevant information on the thickness and structure of brain tissues. A quantitative analysis is recommended to assist the visual interpretation of the physician [20, 21]. When working with sMRI, some parameters should be taken into account including the strength of the magnetic field (measured in teslas), contrast, noise, relaxation times (T1 and T2), among others. These factors may vary depending on the characteristics of the equipment. The work by Solana et al. [6] aims to identify the regions of the brain that are affected by the disease. It shows how different regions of the brain contribute to the classification, depending on the gender of the patient, and the strength of the magnetic field (1.5 T or 3 T).

Voxel-based morphometry(VBM) is a technique to determine the differences in local concentrations of gray matter by comparing MRI voxels between two templates or atlases, where an atlas or template represents a group of subjects. For the cases of PD detection, one group corresponds to PD patients and the other to controls. To apply VBM, images are extracted from multiple individuals, then these images are registered and integrated to generate a brain atlas that represents that particular group of individuals. This study is useful since PD patients are characterized by a decrease in gray matter volume when compared with controls. The motivation for applying VBM to PD detection is to identify regions of interest for subsequent classification.

According to reported research efforts, VBM-based PD detection from MRI consists of the following stages: (1) VBM to identify regions of interest, (2) feature extraction from regions of interest, (3) selection of the most relevant features for subsequent classification of regions of interest, (4) classification, and (5) performance assessment. The different stages for VBM-based PD detection are shown in Figure 2.

Figure 2.

Main stages of VBM-based PD detection from MRI.

8. Dataset description

To conduct VBM-based PD detection on MRI on separate datasets, one for men and another for women, the largest publicly available collection of MR images is the Parkinson’s Progression Makers Initiative(PPMI) dataset. This dataset is the result of collecting clinical data (including images) for PD research around the world. Clinical data includes genomics, patient data, and imaging data. The PPMI dataset is publicly shared to accelerate research discoveries to assists the treatment and diagnosis of PD. PPMI’s T1-weighted MR images have been applied to VBM-based PD detection. T1-weighted MR images were generated by using a 1.5–3 T scanner with (1) a scanning time between 20 and 30 min, (2) a slice thickness of 1.5 mm or less, and (3) at three different views, axial, sagittal and coronal. The MR images were obtained from 226 men with PD, 86 male controls, 104 women with PD, and 64 female controls.

9. Classifiers

For classification over the regions of interest, detected with VBM, texture information is very useful since the measurement of texture requires statistical analysis to determine how voxel intensity values are distributed. Texture measurement involves the computation of the first-order and second-order statistics of the regions of interest.

The number of features extracted from one atlas is very large because of the number of regions of interest, the number of directions for second-order statistics features (co-occurrence matrix), number of views (sagittal, axial, coronal), number of different first-order statistics features, and number of second-order statistics features. Thus, Principal Component Analysisand Wrapperswere applied to detect the most relevant features for the classification of regions of interest.

10. Regions of interest

From the results of applying VBM to brain MRI, it has been found that regions of interest for PD detection in men are the basal ganglia, brainstem, fourth ventricle, lateral ventricle, cerebellum, frontal lobe, temporal lobe, putamen,and thalamus. The generation of signals for involuntary movement and instincts is generated within the putamen and thalamus. Other regions of interest lie in the upper cortex, which is related to brain functions such as reasoning, decision making. On the other hand, the application of VBM to female brain MRI shows that regions of interest, for PD detection in women, are occipital lobe, basal ganglia, a small part of the cerebellum, frontal lobe, thalamus, brainstem, and temporal gyrus. The last three regions are associated with visual stimuli processing and spatial awareness. Regions of interest, within the cortex area, are not as large as those in men. These results are significant since most works, for automated PD detection, have been focused on the striatum regionof the brain to detect damage. Another finding is that regions of interest in men are bigger than those in women, which agrees with medical findings that state that men are more prone to PD than women by almost twice. It is also found that the number of regions of interest are more in women than in men and that regions of interest, in men and women, are generally scattered over the same brain zones. Regions of interest, in men, are found within areas where more and smaller regions of interest for women occur. The number of selected features for PD detection in women is more reduced than in men.

Another finding from Solana et al. [6] is that the regions of interest from which most features are selected for PD detection, vary if the image is acquired with a different magnetic field. When the scanner uses 1.5-T for obtaining the MRI images, the features from the striatum region of the brain were chosen for the classification algorithm. On the other hand, when 3-T MRI are analyzed, features from regions like the primary somatosensory cortex, the cerebellum, and temporal lobeare selected as it is shown in Figures 3 and 4. Detection of PD with MRI achieves good performance with both genders and magnetic fields. When classifying female patients’ MRI, accuracies of 96.77% and 93.28% for 1.5 and 3 T respectively. For male patients’ MRI, excellent results were obtained, with 99.01% and 95.56% accuracy for 1.5 and 3 T respectively. Table 2, shows the results obtained by different methods in recent years, and how they compare to the proposed work.

Figure 3.

The most relevant regions of interest for PD detection in 1.5 and 3 T MRI of female patients, are highlighted with colors red, yellow, and green.

Figure 4.

The most relevant regions of interest for PD detection in 1.5 and 3 T MRI of male patients, are highlighted with colors red, yellow, and green.

Author, yearDatasetPerformance
Long et al., 2012 [22]MRI from 19 PD patients and 27 healthy subjectsAccuracy 86.96%, sensitivity 78.95%, specificity 92.59%
Lei et al., 2018 [23]PPMI MRI datasetAccuracy 86.48%
Sivaranjini et al., 2020 [24]CNNAccuracy 88.9%
Esmaeilzadeh et al., 2018 [25]PPMI MRI dataset and personal information (age, gender)Accuracy 100%
Shah et al., 2018 [26]PPMI MRI datasetAccuracy 93%
Salvatore et al., 2014 [27]MRI from 28 PD patients and 28 healthy controlsAccuracy, sensitivity and specificity above 90%
Shinde et al., 2019 [28]Neuromelanine MRI85% of accuracy
Amoroso et al., 2018 [29]PPMI MRI datasetAccuracy 93%, sensitivity 93%,
specificity 92%
Proposed methodPPMI MRI datasetAccuracy 99.01% (men) and 96.97% (women), sensitivity 99.35% (men) and 100% (women), specificity 100% (men) and 96.15% (women)

Table 2.

A comparison between different works on PD detection based on MRI.

11. Conclusions

Parkinson’s disease (PD) detection is an active area of research. These efforts are oriented to assist the provision of a better quality of life for PD patients. Vocal-based detection of PD is a non-invasive and inexpensive alternative for the early detection of the disease. According to neurology studies, the female brain and the male brain are functionally different and this is the motivation to conduct separate studies, according to gender. Fortunately, the availability of large datasets allows such research efforts. The work by Solana-Lavalle et al. [8, 19] is based on the largest publicly available dataset to train and test different classifier so that separate studies for male and female patients are carried out. Experiment results show that the most relevant features for accurate classification are highly dependent on gender. In the case of male patients, low-frequency voice content is the most significant, while for female patients, high-frequencies give better results. Most features selected in the feature selection process are extracted by using the Tunnable Q-factor Wavelet Transform(TQWT) and the Mel Frequency Cepstral Coefficients(MFCC). Both groups of features are obtained through the use of banks of filters, where these extraction mechanisms operate in a similar way the human auditory system does. The accuracy obtained by the classifying algorithms reaches up to 95.9%, showing the best results with the male population. Also, a statistical analysis of the variability of the most significant features, from each gender, is done to assist the clinical interpretation of the classification result (PD positive and PD negative).

Another method to detect neurological alterations is through medical images such as Magnetic Resonance Imaging, DaTscan, and Diffusion Tensor Imaging. Physicians have used these images modalities to help diagnose PD. However, they rely on visual inspection, which is prone to misdiagnosis due to human error. For this reason, a quantitative analysis of these images is suggested. Solana et al. [6] proposed a method for using structural MRI combined with signal processing and machine learning classifiers to assist the diagnosis of PD. This method achieves competitive results and insights. The classification results deliver an accuracy of 99.01% in male patients and 96.97% in female patients.

Voxel-Based Morphometryis a statistical study that has been used to identify brain regions that show differences between PD patients and controls. Features, based on first-order (histogram) and second-order statistics (co-occurrence matrix), have been extracted from the regions of interest identified by VBM. Since the number of features, extracted from multiple regions of interest, is very large, feature selection techniques have to be used such as wrapper for feature selection. The aim of using feature subset selectionis to identify the most important features for discrimination and to reduce computational complexity. Regions of interest for PD detection usually include the striatum. However, by using feature subset selection it has been possible to identify several regions, outside the striatum, suggesting an affectation in those areas of the brain. These regions include the somatosensory cortex, temporal gyrus, and cerebellum.

Future work on the detection of PD could make use of other imaging techniques such as Functional Magnetic Resonance Imagingand Diffusion Tensor Imaging. These imaging techniques provide information about the activity within the brain, and about the connectivity of the brain respectively. Thus, these modalities are good candidates to provide new information about PD and an alternative to assist the physicians with early detection of the disease. On the other hand, some of the best classification results in voice recordings are obtained using deep learningtechniques which demand the availability of a larger dataset. To the best of our knowledge, deep learning has not been applied to the largest dataset from Sakar et al. [7] and could be an opportunity to compare these new learning techniques with classical approaches.

Acknowledgements

The authors would like to acknowledge the support of the National Council for Research and Technology (CONACYT) in Mexico (Scholarship 934454 and stimulus 68150).

Author details

Gabriel Solana-Lavalle* and Roberto Rosas-Romero

Departamento de Computación, Electrónica y Mecatrónica, Universidad de las Américas Puebla, San Andrés Cholula, Puebla, Mexico

*Address all correspondence to: gabriel.solanale@udlap.mx and roberto.rosas@udlap.mx

A Systematic Review of Sensitivity Analysis of Activated Sludge Modeling

Rafael Andrés Borobio-Castillo, José Manuel Cabrera-Miranda, Alberto Vargas-Hidalgo and Benito Corona-Vásquez

Abstract

There are a series of sensitivity analysis performed around activated sludge models for wastewater treatment. Comparison is presented both for local and global approaches, and the most used methods are reported. It is observed that sensitivities depend on the modeling objectives. Furthermore, local methods are applicable only for linear models, thus, the global ones are often preferred. Due to the current wastewater resource recovery trend, more sensitivity analysis regarding phosphorus removal and model refinement will be required. Finally, knowledge gaps are identified in association with uncertainty in the influent fractions, and variance-based methods for factor interaction. The sensitivity analyses are quality assurance tools that, if applied properly, it is expected to improve complex phenomena understanding as well as decision making.

Keywords:activated sludge models (ASM), benchmark simulation model (BSM), membrane bioreactor (MBR), uncertainty, sensitivity analysis, local sensitivity analysis (LSA), global sensitivity analysis (GSA)

1. Introduction

Disposal of urban wastewater (WW) with adequate treatment is a major concern in developing countries. In most of them, a considerable amount of WW is discharged into the environment (rivers, lakes, and oceans) as raw WW or poorly treated WW. Consequently, surface water and groundwater get polluted, affecting human health, aquatic ecosystems, food production, and drinking water availability [1]. Thereupon, it is vital to treat wastewater to mitigate the environmental impact.

Wastewater treatment plants (WWTP) are infrastructure dedicated to water sanitation. The most commonly applied process is activated sludge, a biological treatment consisting of a bioreactor coupled with a secondary settler. Within the bioreactor, biomass (heterotrophic, autotrophic, and/or phosphorus accumulating) is synthesized for biodegradation of the pollutants as well as for the removal of nutrients suchlike Nitrogen and Phosphorus [2]. Then, the secondary settler concentrates the biomass for its removal and further solid treatment. Finally, a clarified effluent result after the treatment.

However, in developing countries, most WWTPs only aim for primary (physical) and secondary (biological) treatment, without tertiary treatment nor sludge treatment (anaerobic digestion) [3]. Hence, the lack of advanced treatment techniques as well as inefficient operation/control of the WWTPs results in increased water pollution.

Lately, it has surged a trend to conceive wastewater treatment plants as water resource recovery facilities (WRRF). This is because it is possible to recover organic matter, nutrient-rich by-products, energy, and water itself, representing an economic revenue for the WRRF [4]. Consequently, there is a need for designing new infrastructure and for process optimization to meet stringent water quality standards together with resource recovery.

Either for design or diagnosis, it is vital to consider the processes influencing the WWTP performance. In the AS process, it is governed by the interaction of raw wastewater fluctuations (quality and quantity), the biokinetics, the mixing conditions, the aeration system, together with the operational conditions [5]. Due to process complexity, mathematical models have arisen as an ideal tool for assessment of the AS performance, allowing to provide continuous feedback in an understandable, faster, and cheaper manner.

Over the past few decades, process models have been established for designing, upgrading, and optimizing wastewater treatment plants [4]. In the wastewater industry, specifically for the biological processes area, the activated sludge models (ASM) were introduced for the latter, given its capability to proximately simulate the process kinetics taking place in the bioreactor in a simpler fashion. Mind that the ASMs, are deemed as core models, i.e., that can be modified according to modelers’ needs. While benchmarking frameworks-also known as BSMs-have been proposed to assess environmental and economic aspects in an activated sludge plant-wide context [6]. Moreover, recently membrane bioreactor (MBR) models-activated sludge process plus membrane filtration have surged as an alternative for meeting stringent water regulations and for resource recovery of water given its high-quality effluent post-membrane treatment [7, 8]. Mind that the above-mentioned activated sludge modeling frameworks have been developed by various modeling task groups of the International Water Association (IWA). Thus, the group of them will be referred to as the IWA models.

According to Saltelli et al. [9], building any kind of model requires specifying model archetype, parameters, resolutions, and calibration data including its acceptance criteria, and so on. Nevertheless, sometimes information and data are missing or are not well-known, resulting in uncertainty in each of the previous requirements. Hence, model implementation highly depends on the understanding of the AS process, as lack of it results in augmentation of model uncertainty.

Mind that model applicability relies on how proximate model inputs and outputs are to the real-plant data. Therefore, appropriate modeling practices based on high-quality data collection and model calibration are essential. According to Rieger et al. [10], a good modeling practice (GMP) of the activated sludge process consists of 5 phases: project definition, data collection and reconciliation, plant model set-up, calibration, and validation, as well as scenario simulation. Inherently, uncertainty is present in the input data required in each phase (e.g., influent flow rate, pollutant fluxes, seasonal conditions, model parameters), that if not heed and reduced, will spoil model applicability.

Consequently, Belia et al. [11] stated the importance of identifying the sources of uncertainty in WWTP modeling for project risk reduction and model validation. Thereby, identification and classification of the sources of uncertainty as input data (influent, operational settings, etc.), model data (e.g., structure and process interaction), model parameters (hydraulic, biokinetic, settling), and technical aspects (solver setting and computational thresholds) within each of the GMP phases is strongly recommended [11]. After sources identification, an uncertainty analysis is to be conducted. It consists of propagating the model input-also called input factor-uncertainty in the desired model output(s) via Monte Carlo simulations or by probabilistic methods [9, 12]. Hence, determining probabilistic distributions of the model output given uncertain input factors.

Nevertheless, the IWA models (including model refinements) are often over parameterized. Hence, a vast number of input factors may be uncertain, troubling calibration principally of complex models. Therefore, after an uncertainty analysis, a sensitivity analysis is conducted for quantifying how much uncertainty is related to an induvial input factor or a group of them [13]. So, a sensibility analysis (SA) is a method used to characterize and prioritize uncertainty. According to Al et al. [14], and SA can be used as a quality assurance technique for modelers as it improves promotes a better understanding of the activated sludge model behavior.

There are a series of sensitivity analysis performed around activated sludge models for wastewater treatment. These can be classified according to their nature, is to say, local or global. Local approaches-also called OAT approaches or LSA-assess parameters sensitivity in the function of partial derivatives of the outputs given small perturbations of an input factor for control/identification of problems [15]. Nevertheless, local approaches are fiercely criticized due to method limitations such as linearity, normality assumptions, and local variations of the input space [16]. Still, there is a significant amount of local sensitivity analysis (LSA) in the activated sludge modeling field.

To overcome the limitations of LSAs, global approaches have been established for the assessment of the entire domain of the input space of parameters variation. Global sensitivity analysis (GSAs) methods can be deemed as an analysis of variance (ANOVA), thus, it fractions variance among the uncertain input factors to elucidate its influence in model output [9, 15]. Therefore, unlike local approaches, the GSAs allow studying mathematical models as a whole, even, some methods account for the effect of factor interaction [14, 16]. Fortunately, over the past years, most activated sludge modelers have taken previous considerations and conducted several GSA methods to reduce model uncertainty in predicting system performance.

However, either for local or global approaches, in the activated sludge modeling area, there has been published a wide range of sensibility analysis methods, along with a different focus, i.e., for SA method introduction or application of it. The latter under different modeling goals and their respective scenarios demonstrating the applicability of the sensibility analysis in the field. Consequently, due to the advantages of the SAs, together with the need for complex models for improving process understanding, it is expected that more AS stakeholders will rely on sensitivity analysis results for process design, control, and upgrade.

Yet, up to the author’s knowledge, there is not a systematic review concerning the sensibility analysis around IWA models. Hence, according to the statements above mentioned, the objective of this review is to (I) compare the sensitivity analysis performed in the IWA models, distinguishing them from local and global approaches, (II) report the used method, (III) look for similarities and misinterpretations found in the reviewed papers, (IV) determine if the purpose was developing a methodology or the sensitivity analysis were used for an application, (V) catalog the papers according to the aim of the papers (e.g., control, operation, etc.), and (VI) demonstrate lacunae in knowledge concerning the sensitivity analysis in the IWA models.

According to these objectives, this paper presents the collective effort of the authors to collect the up-to-date most relevant works in the activated sludge modeling area. We summarize what we consider the most relevant features that current and future AS modeling practitioners must heed. For improving readers’ understanding the paper is divided into six major sections. First, an overview of the IWA models is presented for the readers to become au fait of the activated sludge models, the benchmark simulations models, as well as the membrane bioreactor models. Then, some of the most applied sensitivity analysis methods in the area are presented, principally distinguishing them as local or global approaches. In the third section, we outline the systematic selection of the papers across the activated sludge modeling area. The results of the systematic review are presented in section four. Finally, the results discussion and our main conclusions are reported in sections five and six, respectively.

2. The IWA models

Mathematical modeling of activated sludge systems is an optimal technique for WWTP design and operation, human resource training, and research [10]. Therefore, the International Waster Association (IWA) has developed activated sludge models together with benchmark models for assessing control strategies of the AS process even for a plant-wide context including a primary treatment and sludge digestion [5, 6]. Moreover, due to the advantages to treat and reuse wastewater, membrane bioreactor (MBR) models have gained attention [17]. Therefore, the activated sludge models (ASM), benchmark simulation models (BSM) together with those for membrane bioreactor modeling are briefly discussed below.

2.1 Activated sludge models (ASM)

The activated sludge models were introduced in the 1980s with the core model known as ASM1 [10]. Its main purpose was to assess the activated sludge process utilizing simple relationships to mimic the biokinetics occurring within the bioreactor. It consists of a set of biokinetic rates for biological WW treatment based on Monod-like equations (Eq. (1)) for particulate and soluble compounds or state variables (denoted by Sand X, respectively) [5].

SKs+SorXKx+X(1)

Since the introduction of the ASM1, many attempts were made to improve the model’s capability for reproducing biological nutrient removal. Table 1 presents a brief overview of the most applied ASM developed by the IWA. Mind other models have been developed, however, only the most applied and those strictly related to biological wastewater treatment were considered. Notice the models have different scopes together with a variation in the number of model parameters. Moreover, it is important to notice the similarity in the overall process among the ASM1 and ASM3 as well as for ASM2d and ASM3 BioP. However, these differ from the state variables to be modeled and their parameters.

ModelOverall processState variables# of parameters# of processesReference
ASM1C/N13238[5]
ASM2daC/N/P197421[5]
ASM3C/N134612[5]
ASM3 BioPC/N/P178323[18]

Table 1.

Overview of the activated sludge models.

Fermentable COD fractions state variables.


C, carbon removal; N, nitrogen removal; P, phosphorus removal.

For example, the ASM3 was developed to deal with the ASM1 limitations concerning the kinetics for nitrogen and alkalinity of heterotrophic microorganisms [5]. While the ASM2d consists of a modified version of the ASM2 (not included in Table 1), as ASM2 does not consider denitrification due to phosphorus accumulating organisms (PAOs), together with the glycogen storage as carbon storage for PAOs [5]. Finally, the ASM3 BioP adds biological phosphorus removal to the ASM3. It differs from ASM2d as it does not include P chemical precipitation (easily implemented), the use of endogenous respiration rates, lower rates for anoxic rates (compared to aerobic ones), as well as neglecting fermentation [18], so, influent fractionation becomes simpler.

As the activated sludge models are core models, i.e., this can be subjected to refinements to meet modelers’ needs, usually for representing the AS process more accurately. For example, to surpass the limitation of the ASM3 of modeling nitrification in a single-step process (SNH4 → SNO3), Iacopozzi et al. [19] developed a two-step nitrification process (SNH4 → SNO2 → SNO3). Hence, they were able to represent the separation of autotrophic biomass into ammonia and nitrite oxidizers within their model. Other modifications have been made for portraying microbial processes in detail [20, 21, 22], including modeling the AS process in an MBR scheme, discussed later.

However, the ASMs (including refinements) were developed for assessing the efficiency of the bioreactor. Consequently, the need for a modeling framework that couples the bioreactor, the secondary clarifier, as well as sludge treatment together with key performance indicators, among other features, result in the development of the benchmarking simulation models.

2.2 Benchmark simulation models (BSM)

The benchmark simulation model No 1 (BSM1) is a model framework for evaluation using performance indexes of an AS process based on simulations of the WWTP [23]. The BSM1 models the bioreactor following the ASM kinetics and dividing the reactor into anaerobic, anoxic, and oxic (aerobic) phases according to the AS model being reproduced. The secondary clarifier is modeled using the Takács settling model [24]. Also, the BSM1 considers recycling flowrates as well as the wastage and return of the activated sludge to the system.

Moreover, the BSM1 framework allows the modeler to assess plant performance by measuring the effluent quality index (EQI, in kg pollution units d−1) and an operational cost index (OCI) [6]. The EQI is a measure of the water quality being discharged to the environment. It sums the main effluent pollutant fluxes (BOD5, COD, TKN, NOx-N, and TSS) by employing weighting factors. Like the EQI, the OCI weights the sum of different costs within the system suchlike energy requirements (aeration, pumping, mixing), sludge disposal, external carbon sources, and methane production (income) if available. Nevertheless, it does not provide an operational cost but could be easily calculated.

The BSM1 limits to assess local control strategies for the AS process, without accounting the interactions with the primary and sludge treatment. Consequently, the benchmark simulation model No. 2 (BSM2) was developed as a plant-wide assessment model [6, 25]. Its framework couples the features of the BSM1 with a primary clarifier model [26], and a sludge treatment including anaerobic digester following the ADM1 kinetics [27], a sludge thickener as a dewatering unit. Consequently, the BSM2 allows for the evaluation of unit process interaction in a wider context than BSM1.

2.3 Membrane bioreactor (MBR) models

According to Judd [7], membrane bioreactors (MBR)-a promising biological-physical technology for WW treatment that couples an AS process with microfiltration (MF) or ultrafiltration (UF)-considerably reduce WWTP footprint and achieve higher effluent quality and reduced sludge yield. Mind that models are fully capable to be simulated within membranes bioreactors (MBR) since both systems are alike from the biochemical engineering aspect [17]. Moreover, it is also possible to modify the BSM frameworks to include membrane bioreactor processes.

However, for modeling the system, the ASMs are compatible with or without modifications. The refinement of the ASMs in MBRs, are extended versions mainly for incorporating the release and degradation of soluble microbial products (SMP) and extracellular polymer substances (EPS) [28, 29, 30]. The EPSs are mixtures of organics (proteins, lipids, DNA residuals, etc.) that support bacterial growth in high-density biomass communities (as in MBRs), while SMPs are soluble excreta produced during biomass growth and decay that serve as indicators of substrate consumption and biomass decay rates [17]. According to Hai et al. [17], EPS and SMPs play a major role in membrane fouling as these can adhere to the membrane surface, thus, limiting its permeability.

Moreover, for modeling more precisely the MBR operation, physical sub-models can be coupled. An example of these can be found in Mannina et al. [8], who proposed a physical model for modeling cake deposition, deep-bed filtration, and membrane resistances (to simulate transmembrane pressure and resistance variations, e.g., pore fouling, sludge cake, among others). Mind that the model includes mathematical representations of particles’ drag and buoyant forces, particle deposition in membrane probability, biomass and sludge attachment and detachment (including backwashing effect) rates, together with cake deposition, deep-bed filtration, and membrane resistances themselves [8].

3. Sensibility analysis

Sensibility analysis is a tool for modelers for the appreciation of the dependency between input factors and model outputs, allowing them to investigate the relevance of each factor around the outputs [16]. Hence, elucidating which model inputs provoke most of the uncertainty in model outputs according to the studied scenario, usually done via Monte Carlo simulations [15]. Thereby, the SA’s scope is factor prioritization together with factor fixing (non-influential), and in some cases, to ascertain factor interactions, for potentially reducing model uncertainty [9].

There are two classifications, local sensitivity analysis (LSA) and global sensitivity analysis (GSA) [13]. However, according to Saltelli et al. [9], local approaches, i.e., varying one factor at a time (OAT), are not recommended when dealing with non-linear models, as this approach does not explore a multi-dimensional space, thus missing important effects such as factor interaction (FI). While the GSA methods do vary all the factors together like in an analysis of variance (ANOVA), thus, informing the modeler about factors’ global influence in model output variance [9]. Therefore, local approaches (LSA) and global ones (GSA) are to be briefly explained as follows.

3.1 Local sensitivity analysis

A local sensitivity analysis (LSA) is a simple analysis where only one factor (OAT approach) changes value between consecutive simulations [13]. An advantage of this method is that the modeler can determine the influence of the perturbed parameter under a local range in a rapid manner. According to the author’s knowledge, the most common LSA method applied for AS modeling is the normalized sensitivity index (NSI) described in Eq. (2).

NSI=θYYθ(2)

where Δθand ΔYare the selected are the observed differences of model input and output, respectively. Nevertheless, there seems to be an issue with the NSI method term as sometimes it is called by different names, however, those terms seem to be the same according to Eq. (2). Moreover, some authors have reported other LSA techniques [31, 32, 33].

3.2 Global sensitivity analysis

Unlike the LSA, global sensitivity analysis does consider the entire probability distributions of the input factors, thus assessing the entire domain of the input space [16]. Global sensitivity analysis methods most widely used can be classified as elementary effects methods, linear regression models, variance-based, as well as derivative-based sensitivity analysis. Consequently, Morris Screening, Standardized Regression Coefficients, Sobol Sensitivity Indices, Extended-FAST, together with derivative-based global sensitivity measures are discussed below, as these were the GSA methods applied in the sensitivity analysis-activated sludge modeling literature of this review.

3.2.1 Morris screening method

The Morris screening method measures the factor’s sensitivity by adding up Elementary Effects (EEs), i.e., averaging local measures. An EE (see Eq. (3)) indicates the variation between model output (y) predisposed to a factor (xn) perturbation being replicated [29, 30]. Where A is the model output after perturbation, B represents no perturbation, while Δis a factor depending on the number of levels of the n-dimensional p-level grid {1/(p − 1), …, 1 − 1/(p − 1)}, comparable with the uncertainty range.

EEix1xn=yx1xi1xi+xi+1xnyx1xn=AB(3)

Morris screening standardizes the model inputs and outputs (yand xn) according to its mean and standard deviation for measuring the sensitivity indices. The mean (μ) measures the influence of the factor in model output uncertainty, whereas the standard deviation (σ) determines the factor’s influence. For example, high values of σ indicate the output variance is related to non-linearity or interactions. To avoid the effect of opposite signs the EEs are referred to as the absolute mean (μ*) [34]. Whenever μ* > mean threshold the factors are considered as influential and vice versa. The mean standard error (σi · r(−0.5)) provides information about the factor effect. Whenever the factor lies above or below the threshold line (μ*i = 2 σi · r(−0.5)), its effect is involved to model linearity and interactions, respectively [35]. According to Morris [35], the number of simulations (or replicas) is equal to r·(n + 1).

3.2.2 Standardized regression coefficients method

The standardized regression coefficients (SRC) are sensitivity measures that fit a first-order linear multivariate model to a scalar output (b0…bi) of the MCS and correlate the model inputs and outputs (yand θi, see Eq. (4)). The quantification of the SRCs (βi) is done by scaling the regression coefficients (bi) according to the standard deviation of model input and outputs (Eq. (5)).

y=b0+ibiθ