\r\n\tThis book will aim to survey the most recent diagnostic techniques as well as the most promising therapeutic options we can count on to deal with optic nerve disorders. The audience of the book is quite wide and it aims at being the main entry to this fascinating topic for students, clinicians, and researchers.
",isbn:"978-1-80356-774-7",printIsbn:"978-1-80356-773-0",pdfIsbn:"978-1-80356-775-4",doi:null,price:0,priceEur:0,priceUsd:0,slug:null,numberOfPages:0,isOpenForSubmission:!0,isSalesforceBook:!1,hash:"e3d02512ccae0638a73c5c2839e50015",bookSignature:"Prof. Felicia M. Ferreri",publishedDate:null,coverURL:"https://cdn.intechopen.com/books/images_new/11704.jpg",keywords:"Toxic Neuropathy, Ethambutol, Methanol, Leber Neuropathy, Congenital Anomalies, Coloboma, Optic Disc Excavation, Systemic Anomalies, Optic Disc Swelling, Anterior ION, Posterior ION, ION Variants",numberOfDownloads:null,numberOfWosCitations:0,numberOfCrossrefCitations:null,numberOfDimensionsCitations:null,numberOfTotalCitations:null,isAvailableForWebshopOrdering:!0,dateEndFirstStepPublish:"March 25th 2022",dateEndSecondStepPublish:"June 2nd 2022",dateEndThirdStepPublish:"August 1st 2022",dateEndFourthStepPublish:"October 20th 2022",dateEndFifthStepPublish:"December 19th 2022",remainingDaysToSecondStep:"12 days",secondStepPassed:!1,currentStepOfPublishingProcess:2,editedByType:null,kuFlag:!1,biosketch:"Prof. Felicia M. Ferreri graduated summa cum laude from the University of Messina, Italy in 1998. She served as co-investigator for many national and international clinical trials. Since 2002, she is an Assistant Professor in Ophthalmology at the University of Messina",coeditorOneBiosketch:null,coeditorTwoBiosketch:null,coeditorThreeBiosketch:null,coeditorFourBiosketch:null,coeditorFiveBiosketch:null,editors:[{id:"32442",title:"Prof.",name:"Felicia M.",middleName:null,surname:"Ferreri",slug:"felicia-m.-ferreri",fullName:"Felicia M. Ferreri",profilePictureURL:"https://mts.intechopen.com/storage/users/32442/images/system/32442.png",biography:"Felicia M. Ferreri graduated summa cum laude from the University of Messina, Italy, in 1998 and completed her ophthalmology residency at the Policlinico Universitario, Messina, in 2002. She interned at the Corneal Section of San Raffaele Hospital in Milan and at the Pediatric Ophthalmology Diseases Section of Hospital Careggi in Florence. She spent research periods at Virginio del Rocio hospital in Seville, San Carlos hospital in Madrid, the Royal Bolton Hospital in Manchester, and Universidade Fluminense in Rio de Janeiro. She served as co-investigator of many national and international clinical trials. Since 2002, she is an Assistant Professor in Ophthalmology at the University of Messina. Her research interests are in the areas of glaucoma, neuro-ophthalmology, pediatric ophthalmology, and cataract. She authored more than 50 scientific papers and edited two IntechOpen Books.",institutionString:"University of Messina",position:null,outsideEditionCount:0,totalCites:0,totalAuthoredChapters:"2",totalChapterViews:"0",totalEditedBooks:"2",institution:{name:"University of Messina",institutionURL:null,country:{name:"Italy"}}}],coeditorOne:null,coeditorTwo:null,coeditorThree:null,coeditorFour:null,coeditorFive:null,topics:[{id:"16",title:"Medicine",slug:"medicine"}],chapters:null,productType:{id:"1",title:"Edited Volume",chapterContentType:"chapter",authoredCaption:"Edited by"},personalPublishingAssistant:{id:"444318",firstName:"Nika",lastName:"Karamatic",middleName:null,title:"Ms.",imageUrl:"https://mts.intechopen.com/storage/users/444318/images/20011_n.jpg",email:"nika@intechopen.com",biography:"As an Author Service Manager, my responsibilities include monitoring and facilitating all publishing activities for authors and editors. From chapter submission and review to approval and revision, copyediting and design, until final publication, I work closely with authors and editors to ensure a simple and easy publishing process. I maintain constant and effective communication with authors, editors and reviewers, which allows for a level of personal support that enables contributors to fully commit and concentrate on the chapters they are writing, editing, or reviewing. I assist authors in the preparation of their full chapter submissions and track important deadlines and ensure they are met. I help to coordinate internal processes such as linguistic review, and monitor the technical aspects of the process. As an ASM I am also involved in the acquisition of editors. Whether that be identifying an exceptional author and proposing an editorship collaboration, or contacting researchers who would like the opportunity to work with IntechOpen, I establish and help manage author and editor acquisition and contact."}},relatedBooks:[{type:"book",id:"6550",title:"Cohort Studies in Health Sciences",subtitle:null,isOpenForSubmission:!1,hash:"01df5aba4fff1a84b37a2fdafa809660",slug:"cohort-studies-in-health-sciences",bookSignature:"R. Mauricio Barría",coverURL:"https://cdn.intechopen.com/books/images_new/6550.jpg",editedByType:"Edited by",editors:[{id:"88861",title:"Dr.",name:"R. Mauricio",surname:"Barría",slug:"r.-mauricio-barria",fullName:"R. Mauricio Barría"}],productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"9500",title:"Recent Advances in Bone Tumours and Osteoarthritis",subtitle:null,isOpenForSubmission:!1,hash:"ea4ec0d6ee01b88e264178886e3210ed",slug:"recent-advances-in-bone-tumours-and-osteoarthritis",bookSignature:"Hiran Amarasekera",coverURL:"https://cdn.intechopen.com/books/images_new/9500.jpg",editedByType:"Edited by",editors:[{id:"67634",title:"Dr.",name:"Hiran",surname:"Amarasekera",slug:"hiran-amarasekera",fullName:"Hiran Amarasekera"}],productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"1591",title:"Infrared Spectroscopy",subtitle:"Materials Science, Engineering and Technology",isOpenForSubmission:!1,hash:"99b4b7b71a8caeb693ed762b40b017f4",slug:"infrared-spectroscopy-materials-science-engineering-and-technology",bookSignature:"Theophile Theophanides",coverURL:"https://cdn.intechopen.com/books/images_new/1591.jpg",editedByType:"Edited by",editors:[{id:"37194",title:"Dr.",name:"Theophile",surname:"Theophanides",slug:"theophile-theophanides",fullName:"Theophile Theophanides"}],productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"3161",title:"Frontiers in Guided Wave Optics and Optoelectronics",subtitle:null,isOpenForSubmission:!1,hash:"deb44e9c99f82bbce1083abea743146c",slug:"frontiers-in-guided-wave-optics-and-optoelectronics",bookSignature:"Bishnu Pal",coverURL:"https://cdn.intechopen.com/books/images_new/3161.jpg",editedByType:"Edited by",editors:[{id:"4782",title:"Prof.",name:"Bishnu",surname:"Pal",slug:"bishnu-pal",fullName:"Bishnu Pal"}],productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"3092",title:"Anopheles mosquitoes",subtitle:"New insights into malaria vectors",isOpenForSubmission:!1,hash:"c9e622485316d5e296288bf24d2b0d64",slug:"anopheles-mosquitoes-new-insights-into-malaria-vectors",bookSignature:"Sylvie Manguin",coverURL:"https://cdn.intechopen.com/books/images_new/3092.jpg",editedByType:"Edited by",editors:[{id:"50017",title:"Prof.",name:"Sylvie",surname:"Manguin",slug:"sylvie-manguin",fullName:"Sylvie Manguin"}],productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"371",title:"Abiotic Stress in Plants",subtitle:"Mechanisms and Adaptations",isOpenForSubmission:!1,hash:"588466f487e307619849d72389178a74",slug:"abiotic-stress-in-plants-mechanisms-and-adaptations",bookSignature:"Arun Shanker and B. Venkateswarlu",coverURL:"https://cdn.intechopen.com/books/images_new/371.jpg",editedByType:"Edited by",editors:[{id:"58592",title:"Dr.",name:"Arun",surname:"Shanker",slug:"arun-shanker",fullName:"Arun Shanker"}],productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"72",title:"Ionic Liquids",subtitle:"Theory, Properties, New Approaches",isOpenForSubmission:!1,hash:"d94ffa3cfa10505e3b1d676d46fcd3f5",slug:"ionic-liquids-theory-properties-new-approaches",bookSignature:"Alexander Kokorin",coverURL:"https://cdn.intechopen.com/books/images_new/72.jpg",editedByType:"Edited by",editors:[{id:"19816",title:"Prof.",name:"Alexander",surname:"Kokorin",slug:"alexander-kokorin",fullName:"Alexander Kokorin"}],productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"314",title:"Regenerative Medicine and Tissue Engineering",subtitle:"Cells and Biomaterials",isOpenForSubmission:!1,hash:"bb67e80e480c86bb8315458012d65686",slug:"regenerative-medicine-and-tissue-engineering-cells-and-biomaterials",bookSignature:"Daniel Eberli",coverURL:"https://cdn.intechopen.com/books/images_new/314.jpg",editedByType:"Edited by",editors:[{id:"6495",title:"Dr.",name:"Daniel",surname:"Eberli",slug:"daniel-eberli",fullName:"Daniel Eberli"}],productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"57",title:"Physics and Applications of Graphene",subtitle:"Experiments",isOpenForSubmission:!1,hash:"0e6622a71cf4f02f45bfdd5691e1189a",slug:"physics-and-applications-of-graphene-experiments",bookSignature:"Sergey Mikhailov",coverURL:"https://cdn.intechopen.com/books/images_new/57.jpg",editedByType:"Edited by",editors:[{id:"16042",title:"Dr.",name:"Sergey",surname:"Mikhailov",slug:"sergey-mikhailov",fullName:"Sergey Mikhailov"}],productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"1373",title:"Ionic Liquids",subtitle:"Applications and Perspectives",isOpenForSubmission:!1,hash:"5e9ae5ae9167cde4b344e499a792c41c",slug:"ionic-liquids-applications-and-perspectives",bookSignature:"Alexander Kokorin",coverURL:"https://cdn.intechopen.com/books/images_new/1373.jpg",editedByType:"Edited by",editors:[{id:"19816",title:"Prof.",name:"Alexander",surname:"Kokorin",slug:"alexander-kokorin",fullName:"Alexander Kokorin"}],productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}}]},chapter:{item:{type:"chapter",id:"72346",title:"Looking at Data Science through the Lens of Scheduling and Load Balancing",doi:"10.5772/intechopen.92578",slug:"looking-at-data-science-through-the-lens-of-scheduling-and-load-balancing",body:'\n
\n
1. Introduction
\n
Private corporate networks, as well as the Internet, generate and share data at ever increasing rates. This unconstrained growth can easily lead disorganization and, as a consequence, missed opportunities to analyze and extract knowledge for these data. There is an essential difference between the concepts of data and information. Data cannot express something outside a particular field of expertise. In turn, information enables the coherent transmission of knowledge. Data science aims to close the gap between data and knowledge through the use of computational tools. More specifically, data science is a tool for converting raw data into knowledge [1]. The field of data science leverages many methods originating from computer science and statistics [2]. Figure 1 illustrates a Venn’s diagram that correlates the research areas with major influence in data science.
\n
Figure 1.
Venn’s diagram for correlating the influence of other research areas on data science.
\n
Although data science receives significant influence from expert knowledge, it is plausible to say that a data scientist knows more about computer science than a statistician and more about statistics than a computer scientist [3]. Besides, it also encompasses the intersection of data analytics and machine learning. Therefore, data science encompasses a heterogeneous group of studies and methodologies such as big data, machine learning, data analytics, and statistics, which challenge the implementation of a single platform capable of incorporating all the available techniques.
\n
There are a variety of widely adopted platforms available for data analysis and knowledge extraction, for example, Tableau,1 Dataiko,2 Microsoft Azure Machine Learning Studio,3 Orange BioLab,4 each one suitable for a specific step of a data science process. A workflow can be formulated based on the coordinated application of different tools to extract knowledge from massive datasets. In this context, the use of cloud platforms for data science steadily grows because they offer scalability and distributed execution of individual tasks.
\n
In data science, a large dataset allows the generation of a more in-depth model, which provides more robust insights because there are more instances to compose the statistical analysis of data. One of the most relevant aspects regarding a dataset is the quality of available data. Thus, before the use of any statistical method, the dataset must go through a cleaning process that ensures the uniformity of values and the elimination of duplicated data. On one hand, a large dataset with high-quality data enables an insightful model. On the other hand, the computational power required to process data is directly proportional to the size of the available dataset. In this scenario, high-performance computing (HPC) provides the infrastructure (clusters, grids, and cloud platforms) required to optimize the processing time of data science workflows. In particular, data science demands are transformed in a collection of tasks, with or without the notion of dependency among them, which must be efficiently scheduled along the computational resources [memory, processors, cores, cluster nodes, graphical processing unit (GPU) cores, grid nodes, and virtual machines, for example] to provide the results in an acceptable time interval. To map such tasks to resources, a scheduling policy takes place where load balancing algorithms are important to provide a better execution equilibrium among the tasks and a fast response time, mainly when considering either dynamic or heterogeneous environments. While some articles explore the use of HPC for data science tasks [4, 5, 6], in the best of our knowledge, there are no studies that conduct an in-depth analysis of how the aspects of scheduling and load balancing affect data science workflows.
\n
Hence, the present book chapter proposes an analysis of scheduling and load balancing from the perspective of data science scenarios. Furthermore, it presents concepts, environments, and tools for data science to summarize the theoretical background required to understand the definition and execution of data science workflows. Even though its focus lies on presenting concepts, the chapter also illustrates new trends concerning the intersection of data science, scheduling, and load balance.
\n
The remainder of this chapter is organized as follows: Section 2 presents an in-depth explanation of concepts, workflow, problem classes, and tools used by data science. Section 3 explores scheduling and load balancing as tools to leverage the computational power required by data science applications. Section 4 points to open challenges and trends in the use of HPC applied to data science problems. Finally, Section 5 concludes the chapter with closing remarks and directions for future work.
\n
\n
\n
2. Fundamental concepts
\n
This section presents the fundamental concepts related to data science. These are key to understand the concept of HPC, more specifically scheduling and load balancing, impact data science processes, as discussed later in the chapter. The remainder of the section discusses the fundamental components of a data science pipeline, as observed in real-world scenarios.
\n
\n
2.1 Data science workflow
\n
Data science is highly dependent on its application domain and employs complex methods. Nevertheless, it has a very organized pipeline, which varies in the number of steps required to extract knowledge. Current work explores a pipeline that varies between five and seven steps, but in all cases, the process yields similar outputs. This section aims at presenting the most complete process, composed of seven steps, widely used by both companies and researchers. Figure 2 depicts the flow of information step by step. Moreover, the seven proposed steps can be enumerated as:
business understanding;
data extraction;
data preparation;
data exploring;
data model;
results evaluation; and
implementation.
\n\n
Figure 2.
Flow diagram of data science steps.
\n
Step 1 refers to the process of understanding in which context the data are inserted on, and what is the expected output. This is a high time-consuming process in a project. However, data scientist must have a deep understanding about the application domain to validate the model’s structure as well as its outputs. After understanding the scope of the project, on Step 2, exploring the data that correlate with the problem understood in the last step. These data can be hosted at the client or not. If the client does not have useful data available, the data scientist must look for a synthetic or publicly available dataset to extract the knowledge. Furthermore, on Step 3, techniques are employed to clean data because there is a high chance that it is unorganized or unreadable, so it is necessary to preprocess and standardize it. An example of this step is a dataset that has a column with country names, but in some registers, the value of this column is “Brasil” and in others, it is “Brazil,” both values symbolize the same information but are encoded in different languages. Regarding Step 4, the data are organized, and it is indispensable to execute a detailed analysis in order to figure out patterns or insights that would be valuable to the client. In this stage, the data scientist usually uses plotting techniques to make the data more readable and figure out information.
\n
On Step 5, previously identified insights serve as input. But at this step, it is vital to fully understand the data since, without formal knowledge, it is very hard to fit a model that correctly represents it. At this stage, it is required that the data scientist uses computer science expertise to choose the better approach to plan and validate the model. In Step 6, outputs generated by the model are evaluated in order to analyze how useful they actually are. Usually, this evaluation is conducted by the client and the data scientist together, to examine graphs, numbers, and tables and define if the model generated acceptable results. Finally, on Step 7, the results are validated, and the model is ready to be implemented and deployed in production; thus, it is applied to prediction tasks, having real data as input. The architecture used to implement the model is very important, that is, it is necessary to understand what will be used for the client application since if the client needs a real-time response, the structure will be very different than a nonreal-time scenario.
\n
\n
\n
2.2 Solving problems with data science
\n
Data science is not exclusively employed in business scenarios, and it can be generalized to a plethora of applications, such as in Obama’s campaign for US presidential elections in 2012. In the context of this election, technology was applied to identify who were the voters that should receive more attention and marketing influence. Some analysts highlighted the use of data science as fundamental to Obama’s victory. However, data science is not limited to the analysis of scenarios, such as in the above example. Many other challenges can benefit from solutions based on data science methodologies. For instance, some problem classes in data science are pattern detection, anomaly detection, cleaning, alignment, classification, regression, knowledge base construction, and density estimation [7]. These classes of problems are explored next.
\n
\n
2.2.1 Pattern detection
\n
The patterns existing in a dataset are not always easily identifiable due to the organization of the data. It is the method employed to discover information standards in the dataset. Figuring patterns hidden into data is a relevant task in several scenarios, for example, to join the clients with similar characteristics, such as those with the same taste or opinion.
\n
\n
\n
2.2.2 Anomaly detection
\n
The distribution of a dataset regards the positions of data points among each other in the dataset. Usually, the representation of this distribution employs a Cartesian plane, in which each point is an instance of the dataset. Within this representation, regions with a defined concentration of data points become clusters. Therefore, outliers are the data points that are too far from these clusters. Anomaly detection aims to classify each data into a dataset as an outlier or not. For example, the banks employ this approach in the scenario of fraud detection. In this scenario, transactions of a client become clusters, and a new transaction is considered as unclassified data.
\n
\n
\n
2.2.3 Cleaning
\n
Dataset contents can present a broad variety of formats, for example, dates, numeric values, and text data. In many cases, values may also be differently formatted across dataset entries, even for the same field, due to human error or lack of standardization. This situation incurs in errors in the dataset, resulting in possible information loss when processing it. The best solution to this problem is to employ methods that manage the dataset contents and standardize the data. For example, in a dataset containing clients birthdays, it is possible that a user fills the information with an invalid date. This case results in information loss because the analysis cannot extract useful information related to the user’s birthday.
\n
\n
\n
2.2.4 Alignment
\n
An organized and standardized dataset is fundamental for the generation of trustworthy outputs from data science processes. The process of data alignment is an essential step in dataset standardization. It involves updating the dataset to avoid the use of multiple values to represent the same information. For example, in a dataset containing a gender field, users may use both “M” and “Male” values to represent the same gender. In this context, it is fundamental to unify both entries in one because the information in both is the same.
\n
\n
\n
2.2.5 Classification
\n
Classification regards assigning specific labels to entries in a dataset. A label is any information that presents a limited scope of possibilities, for example, a dozen options present in a specific dataset field. Examples of labels include sentiments, states, and the scale of integer numbers. This dataset field can then be used to group multiple dataset entries according to the unique values that the label may take. For example, in a dataset about client’s purchases, each product may be associated with a set of keywords. These keywords can then be used to classify the types of purchases a particular client makes, enabling targeted recommendations for other products.
\n
\n
\n
2.2.6 Regression
\n
Datasets do not always contain fields with labels that enable the classification of data. Nevertheless, in some problems, it is necessary to label values without a restricted group of options, for example, using a field that contains real numbers. This scenario requires a regression approach to estimate which classes an entry should receive, without considering the limited options available in labels. For example, in a dataset with prices and sizes of houses in New York, it is possible to use regression to estimate the price of a new house according to its size. Although the regression problem has an output similar to classification, its output does not have a limited set of values, as occurs in labeling.
\n
\n
\n
2.2.7 Knowledge base construction
\n
Datasets are essential for data science, and the problem of knowledge base construction refers to the process of compiling information to create them. Frequently, this process requires the use of cleaning and alignment methods to standardize data. There are a broad group of knowledge bases on the Internet, for example, Kaggle,5 UCI Repository,6 Quandl,7 and MSCOCO.8\n
\n
\n
\n
2.2.8 Density estimation
\n
Density estimation focuses on identifying the clusters that group sets of data points that represent the entries in a dataset. This process is a fundamental step to generate the clusters required by anomaly detection methodologies, as described above. Clustering is another suitable technique to identify groups of entries that may contain related knowledge within a dataset.
\n
\n
\n
\n
2.3 Data science tools
\n
It is difficult to find a tool that fits all data science processes because, as previously mentioned, there are multiple steps with a variety of methods available for use. Hence, there are specific tools for each step, which will provide the most appropriate result. Table 1 summarizes the most commonly cited tools for each one of the data science workflow steps. The table has a row that is not considered a step of the process, but it is fundamental to results that are Storage Data, which refer to all technologies used for persisting the data in an environment. In the section of data model, some programming languages are cited, but it is hard for a data scientist to employ a language without libraries. For example, using Python is very usual to use libraries such as pandas, sci-kit learn, numpy, and ggplot.
3. Exploring scheduling and load balancing on data science demands
\n
The scheduling problem, in a general view, comprises both a set of resources and a set of consumers [8]. Its focus is to find an appropriate policy to manage the use of resources by several consumers in order to optimize a particular performance metric chosen as a parameter. The evaluation of a scheduling proposal commonly considers two features: (1) performance and (2) efficiency [9]. More specifically, the evaluation comprises the obtained scheduling as well as the time spent to execute the scheduler policies. For example, if the parameter to analyze the achieved scheduling is the application execution time, the lower this value, the better the scheduler performance. In turn, efficiency refers to the policies adopted by the scheduler and can be evaluated using computational complexity functions [10].
\n
The general scheduling problem is the unification of two terms in everyday use in the literature. There is often an implicit distinction between the terms scheduling and allocation. Nevertheless, it can be argued that these are merely alternative formulations of the same problem, with allocation posed in terms of resource allocation (from the resources point of view), and scheduling viewed from the consumers’ point of view. In this sense, allocation and scheduling are merely two terms describing the same general mechanism but described from different viewpoints. One important issue when treating scheduling is the grain of the consumers [11]. For example, we can have a graph of tasks, a set of processes, and jobs that need resources to execute. In this context, scheduling schemes for multiprogrammed parallel systems can be viewed in two levels. In the first level, processors are allocated to a specific job. In the second level, processes from a job are scheduled using this pool of processors.
\n
We define static scheduling considering the scheduling grain as a task [8]. If data such as information about the processors, the execution time of the tasks, the size of the data, the communication pattern, and the dependency relation among the tasks are known in advance, we can affirm that we have a static or deterministic scheduling model. In this approach, each executable image in the system has a static assignment to a particular set of processors. Scheduling decisions are made deterministically or probabilistically at compile time and remain constant during runtime. The static approach is simple to be implemented. However, it is pointed out that it has two significant disadvantages [11]. First, the workload distribution and the behavior of many applications cannot be predicted before program execution. Second, static scheduling assumes that the characteristics of the computing resources and communication network are known in advance and remain constant. Such an assumption may not be applied to grid environments, for instance.
\n
In the general form of a static task scheduling problem, an application is represented by a directed acyclic graph (DAG) in which nodes represent application tasks, and edges represent intertask data dependencies [12].
\n
Each node label shows computation cost (expected computation time) of the task, and each edge label shows intertask communication cost (expected communication time) between tasks. The objective function of this problem is to map tasks onto processors and order their executions, so that task-precedence requirements are satisfied, and the minimum overall completion time is obtained.
\n
In the case that all information regarding the state of the system as well as the resource needs of a process is known, an optimal assignment can be proposed [11]. Even with all information required for the scheduling, the static method is often computationally expensive getting to the point of being infeasible. Thus, this fact results in suboptimal solutions. We have two general categories within the realm of suboptimal solutions for the scheduling problem: (1) approximate and (2) heuristic. Approximate scheduling uses the same methods used in the optimal one, but instead exploring all possible ideal solutions, it stops when a good one is achieved. Heuristic scheduling uses standard parameters and ideas that affect the behavior of the parallel system. For example, we can group processes with higher communication rate to the same local network or sort works and processors in lists following some predefined criteria in order to perform an efficient mapping among them (list scheduling).
\n
Dynamic scheduling works with the idea that a little (or none) a priori knowledge about the needs and the behavior of the application is available [9]. It is also unknown in what environment the process will execute during its lifetime. The arrival of new tasks, the relation among them, and data about the target architecture are unpredictable, and the runtime environment takes the decision of the consumer-resource mapping. The responsibility of global scheduling can be assigned either to a single processor (physically nondistributed) or practiced by a set of processors (physically distributed). Within the realm of this last classification, the taxonomy may also distinguish between those mechanisms that involve cooperation between the distributed components (cooperative) and those in which the individual processors make decisions independent of the actions of the other processors (noncooperative). In the cooperative case, each processor has the responsibility to carry out its portion of the scheduling, but all processors are working toward a common system-wide goal.
\n
Data science comprises the manipulation of a large set of data to extract knowledge [13, 14]. To accomplish this, we have input that is passed through processing engines to generate valuable outputs. In particular, this second step is usually processed as sequential programs that implement both artificial intelligence and statistical-based computational methods. We can take profit from the several processing cores that exist in today’s processors to map this sequential demand to be executed in a multithreading program. To accomplish this, Pthreads library and OpenMP are the most common approaches to write multithread parallel programs, where each thread can be mapped to a different core, so exploiting the full power of a multiprocessor HPC architecture.
\n
In addition to multiprocessor architectures, it is possible to transform a sequential code in message passing interface (MPI)-based parallel one, so targeting distributed architectures such as clusters and grids [15]. In this way, contrary to the prior alternative that encompasses the use of standard multiprocessor systems, the efficient use of MPI needs a parallel machine that generally has higher financial costs. Also, a distributed program is more error prone, since problems in the nodes or the network can put all application down. Repairing these future problems sometimes is not trivial, requiring graphical tools to observe processes’ interactions. Finally, in addition to multicore and multicomputer architectures, we also have the use of GPU, where graphic cards present a set of nongeneral purpose processors to execute vector calculus much faster than the conventional general-purpose processors [14]. The challenge consists in transforming a sequential code in a parallel one in a transparent way at the user viewpoint, in such a way the data science demand can run faster in parallel deployments. Moreover, the combination of these three aforesaid parallel techniques is also a challenge since optimizations commonly vary from one application to another.
\n
Cloud computing environments today also represent a viable solution to run data science demands [16]. Providers such as Amazon EC2, Microsoft Azure, and Google Cloud have HPC-driven architectures to exploit multiprocessor, multicomputer, and GPU parallelism. In particular, different from standard distributed systems, cloud computing presents the resource elasticity feature where an initial deployment can be on-the-fly changed following the input demand. Thus, it is possible to scale resources in or out (through the addition or removal of containers/virtual machines) or to scale down or up (by performing resource resizing in virtual units) in a transparent way to the user. Logically, the own data science application must be written in such a way to take profit of newly available resources as the current set of working resources.
\n
The CPU load is the most common metric to drive resource elasticity data science demands since most of them execute CPU-bound artificial intelligence-based algorithms. Any network data manipulation through the TCP protocol uses CPU cycles since this is a software protocol executed in the kernel of the operating system and executes software routines to provide data transfer reliability.
\n
Load balancing and resource scheduling are sometimes seen as having the same functionality. However, there is a slight difference: one of the members of resource scheduling is the scheduling, and this policy can employ or not load balancing algorithms [14]. The basic idea of load balancing is to attempt to balance the load on all processors in such a way to allow all processes on all nodes to proceed at approximately with the same rate. The most significant reason to launch the load balancing is the fact that exists an amount of processing power that is not used efficiently, mainly in dynamic and heterogeneous environments, including grids. In this context, schedulers’ policies can use load balancing mechanisms for several purposes, such as: (1) to choose the amount of work to be sent to a process; (2) to move work from an overloaded processor to another that presents a light load; (3) to choose a place (node) to launch a new process from a parallel application; and (4) to decide about process migration. Load balancing is especially essential for some parallel applications that present synchronization points, in which the processes must execute together with the next step.
\n
The most fundamental topic in load balancing consists of determining the measure of the load [13, 15]. There are many different possible measures of load including: (1) number of tasks in a queue; (2) CPU load average; (3) CPU utilization at specific moment; (4) I/O utilization; (5) amount of free CPU; (6) amount of free memory; (7) amount of communication among the processes; and so on. Besides this, we can have any combinations of the above indicators. Considering the scope of processes from the operating system, such measures will influence in deciding about when to trigger the load balancing, which processes will be involved, and where are the destination places in which these processes will execute. Especially on the last topic, other factors to consider when selecting where to put a process include the nearness to resources, some processor and operating system capabilities, and specialized hardware/software features. We must first determine when to balance the load to turn the mechanism useful. Doing so is composed of two phases: (1) detecting that a load unbalancing exists and (2) determining if the cost of load balancing exceeds its possible benefits.
\n
The use of load balancing in data science demands can vary depending on the structure of the parallel applications: Master-Slave, Bag of Tasks, Divide-and-Conquer, Pipeline, or Bulk-Synchronous Parallel [15, 16]. In the first two, we usually have a centralized environment where it is easy to know data about the whole set of resources, to dispatching tasks to them following their load and theoretical capacity. A traditional example of a combination of these parallel applications is the MapReduce framework. In the divide-and-conquer applications, we have a recursive nature to execute the parallel application where new levels of child nodes are created with the upper one cannot execute the tasks in an acceptable time interval. The challenge consists of dividing the tasks rightly following the capacity of the resources. Pipeline-based applications, in their turn, have a set of stages where each incoming task must cross. In order to maintain the cadence between the stages, they must execute in the same time interval, so an outcoming task from the stage n serves as the direct input for the stage n + 1. However, the fact of guaranteeing this capacity is not a trivial procedure because of the stages commonly present different complexities in terms of execution algorithms. Finally, bulk-synchronous applications are composed by supersteps, each one with local processing, and arbitrary communication and barrier phases. Load balancing is vital to guarantee that the slowest process does not compromise the performance of the entire application.
\n
\n
\n
4. Open opportunities and trends
\n
This section aims at compiling the previous two sections, so detailing open opportunities and trends when joining resource scheduling and load balancing and the area of data science. In this way, we compile these aspects as follows:
Automatic transformation of a sequential data science demand to a parallel one—today data science executes locally to query databases and to build knowledge graphs. Sometimes these tasks are time consuming, then it is pertinent to transform a sequential demand in a parallel one to execute faster on multicore, multinode, and GPU architectures.
Use of GPU cards as an accelerator for data science algorithms—write of data science demands that combine R and Python together with OpenCL or CUDA programming languages, so combining CPU and GPU codes with running fast and in parallel to address a particular data science demand.
Combination of multimetric load balancing engine to handle data science efficiently—data science typically encompasses excellent access to IO (including main memory and hard disk) and a high volume of CPU cycles to process CPU-bound algorithms. In this way, the idea is to execute data science demands and learn their behavior, so proposing an adaptable load balancing metrics that take into account different parameters as input.
Task migration heuristics—when developing long-running data science parallel codes, it is essential to develop task migration alternatives to reschedule demands from one resource to another. This is particularly pertinent on dynamic environments, either at the application or infrastructure level.
Cloud elasticity to address data science demand—cloud elasticity comes to adapt the number of the resource following the current demand. Thus, we propose a combination of vertical and horizontal elasticity, together with reactive and proactive approaches to detect abnormal situations. We can use both consolidation and inclusion of resources, aiming to always accommodate the most appropriate number of resources for a particular and momentaneous data science demand.
Definition of a standard API to deal with data science—frequently enterprises present several departments, each one with its data science demands. In this way, we envisage an opportunity on developing a standard framework (with a standard API too) to support the data science demands of the whole enterprise. The idea is to provide a dashboard with a collection of data science functions, also expressing the expected input and the output for each one.
Smart correlation of events—enterprises regularly have timed data in several databases. We present an opportunity, at each time a problem is found, to take this particular timestamp and compare in the data sources looking for eventual data correlations. Thus, we can perceive relations such as: (1) if this happens, these other things will also happen and (2) this event happened because a set of prior events happened beforehand.
Benchmark to evaluate a mapping of data science tasks to HPC resources—how we know if particular scheduling outperforms another one for executing a particular data science demand? We see as an opportunity for the exploration of benchmarks to evaluate scheduling and load balancing techniques that manipulate data science tasks. Thus, such benchmarks must define what they expect as input and provide a set of metrics as output. Yet, the output can be a single value, a collection of values (as a data vector), or a collection of elements of a data structure (e.g., timestamp and data are useful to develop user profiles and tracking of assets).
Simulation environment to execute data science demands on distributed resources, but doing all of this a local program—simulation environments, like Simgrid or GridSim, are useful to use a sequential program to test and simulate complex parallel demands on a set of virtual resources. Thus, we can save time on testing different parameters and algorithms when developing scheduling and load balancing algorithms for data science.
Definition of metrics to evaluate the scheduling and/or load balancing of data science tasks—CPU load, memory footprint, disk space, network throughput, and cache hit rate are examples of metrics that are commonly employed on distributed systems. Data science is a new area of knowledge, where we encourage the definition of new metrics to compare the execution of data science demands.
\n\n
\n
\n
5. Conclusion
\n
The continuous generation of data by different industry segments presents a valuable opportunity for analysis and knowledge extraction through data science methods. There is a high interest in studies that explore the application of data science to a variety of scenarios, each one with distinct characteristics that reflect on the composition of available datasets. Furthermore, there is not a single data science methodology that is applied to all possible data science problems. Consequently, the most common approach to data science problems is to define a sequence of methods that depend on the characteristics of the dataset and the intended results.
\n
The constant growth in dataset sizes and the complexity of specific data science methods also impose a considerable challenge to provide the computational power required to process data and extract meaningful knowledge. In this context, cloud, fog, and grid computing architectures present themselves as ideal solutions to apply data science processes to massively sized datasets.
\n
The distributed nature of such environments raises a series of new challenges, some of which widely studied in the literature. Nevertheless, the unique characteristics of data science workloads bring new aspects to these challenges, which require renewed attention from the scientific community.
\n
This chapter focused on the specific challenge of scheduling and load balancing in the context of computational environments applied to data science. We presented an overview of data science processes, in addition to how scheduling and load balancing methodologies impact these processes and what aspects to consider when using distributed environments applied to data science. In particular, the challenge of enabling the automatic transformation of sequential data science demands into parallel ones is of particular interest because it abstracts part of the complexity involved in parallelizing data science tasks. As a result, such an automatic transformation promotes wider adoption of distributed environments as standard tools for large-scale data science processes.
\n
Another notable challenge is to develop cloud elasticity techniques tailored to data science tasks. Such techniques must consider the specific requirements of data science processes to guarantee the proper reservation of resources and migration of tasks in order to guarantee a high throughput for such scenarios. These and the other investigated challenges represent prime research opportunities to increase the performance of data science processes.
\n
\n\n',keywords:"scheduling, load balance, high-performance computing, data science, big data",chapterPDFUrl:"https://cdn.intechopen.com/pdfs/72346.pdf",chapterXML:"https://mts.intechopen.com/source/xml/72346.xml",downloadPdfUrl:"/chapter/pdf-download/72346",previewPdfUrl:"/chapter/pdf-preview/72346",totalDownloads:654,totalViews:0,totalCrossrefCites:0,totalDimensionsCites:0,totalAltmetricsMentions:0,impactScore:0,impactScorePercentile:45,impactScoreQuartile:2,hasAltmetrics:0,dateSubmitted:"May 31st 2019",dateReviewed:"April 16th 2020",datePrePublished:"May 29th 2020",datePublished:"July 8th 2020",dateFinished:"May 29th 2020",readingETA:"0",abstract:"The growth in data generated by private and public organizations leads to several opportunities to obtain valuable knowledge. In this scenario, data science becomes pertinent to define a structured methodology to extract valuable knowledge from raw data. It encompasses a heterogeneous group of techniques that challenge the implementation of a single platform capable of incorporating all the available resources. Thus, it is necessary to formulate a data science workflow based on different tools to extract knowledge from massive datasets. In this context, high-performance computing (HPC) provides the infrastructure required to optimize the processing time of data science workflows, which become a collection of tasks that must be efficiently scheduled to provide results in acceptable time intervals. While few studies explore the use of HPC for data science tasks, in the best of our knowledge, none conducts an in-depth analysis of scheduling and load balancing on such workflows. In this context, this chapter proposes an analysis of scheduling and load balancing from the perspective of data science scenarios. It presents concepts, environments, and tools to summarize the theoretical background required to define, assign, and execute data science workflows. Furthermore, we are also presenting new trends concerning the intersection of data science, scheduling, and load balance.",reviewType:"peer-reviewed",bibtexUrl:"/chapter/bibtex/72346",risUrl:"/chapter/ris/72346",book:{id:"8754",slug:"scheduling-problems-new-applications-and-trends"},signatures:"Diórgenes Eugênio da Silveira, Eduardo Souza dos Reis, Rodrigo Simon Bavaresco, Marcio Miguel Gomes, Cristiano André da Costa, Jorge Luis Victoria Barbosa, Rodolfo Stoffel Antunes, Alvaro Machado Júnior, Rodrigo Saad and Rodrigo da Rosa Righi",authors:[{id:"18257",title:"Dr.",name:"Cristiano",middleName:null,surname:"Costa",fullName:"Cristiano Costa",slug:"cristiano-costa",email:"cac@unisinos.br",position:null,profilePictureURL:"//cdnintech.com/web/frontend/www/assets/author.svg",institution:null},{id:"69889",title:"Prof.",name:"Rodrigo",middleName:null,surname:"da Rosa Righi",fullName:"Rodrigo da Rosa Righi",slug:"rodrigo-da-rosa-righi",email:"rrrighi@unisinos.br",position:null,profilePictureURL:"https://mts.intechopen.com/storage/users/69889/images/system/69889.jpg",institution:{name:"Universidade do Vale do Rio dos Sinos",institutionURL:null,country:{name:"Brazil"}}},{id:"321638",title:"Dr.",name:"Diorgenes Eugenio",middleName:null,surname:"da Silveira",fullName:"Diorgenes Eugenio da Silveira",slug:"diorgenes-eugenio-da-silveira",email:"DIORGENESES@unisinos.br",position:null,profilePictureURL:"//cdnintech.com/web/frontend/www/assets/author.svg",institution:null},{id:"321639",title:"Dr.",name:"Jorge Luis",middleName:null,surname:"Barbosa",fullName:"Jorge Luis Barbosa",slug:"jorge-luis-barbosa",email:"JBarbosa@unisinos.br",position:null,profilePictureURL:"//cdnintech.com/web/frontend/www/assets/author.svg",institution:null},{id:"321640",title:"Dr.",name:"Rodolfo Stoffel",middleName:null,surname:"Antunes",fullName:"Rodolfo Stoffel Antunes",slug:"rodolfo-stoffel-antunes",email:"RSANTUNES@unisinos.br",position:null,profilePictureURL:"//cdnintech.com/web/frontend/www/assets/author.svg",institution:null},{id:"321641",title:"Dr.",name:"Eduardo",middleName:null,surname:"Souza dos Reis",fullName:"Eduardo Souza dos Reis",slug:"eduardo-souza-dos-reis",email:"ESREIS@unisinos.br",position:null,profilePictureURL:"//cdnintech.com/web/frontend/www/assets/author.svg",institution:null},{id:"321642",title:"M.Sc.",name:"Marcio",middleName:"Miguel",surname:"Gomes",fullName:"Marcio Gomes",slug:"marcio-gomes",email:"MARCIOMG@unisinos.br",position:null,profilePictureURL:"//cdnintech.com/web/frontend/www/assets/author.svg",institution:null},{id:"321643",title:"Dr.",name:"Rodrigo Simon",middleName:null,surname:"Bavaresco",fullName:"Rodrigo Simon Bavaresco",slug:"rodrigo-simon-bavaresco",email:"rsimonb@unisinos.br",position:null,profilePictureURL:"//cdnintech.com/web/frontend/www/assets/author.svg",institution:null},{id:"322136",title:"B.Sc.",name:"Rodrigo",middleName:null,surname:"Saad",fullName:"Rodrigo Saad",slug:"rodrigo-saad",email:"rodrigo.saad@dell.com",position:null,profilePictureURL:"//cdnintech.com/web/frontend/www/assets/author.svg",institution:null},{id:"322137",title:"Dr.",name:"Alvaro",middleName:null,surname:"Machado Júnior",fullName:"Alvaro Machado Júnior",slug:"alvaro-machado-junior",email:"Alvaro.Junior@dell.com",position:null,profilePictureURL:"//cdnintech.com/web/frontend/www/assets/author.svg",institution:null}],sections:[{id:"sec_1",title:"1. Introduction",level:"1"},{id:"sec_2",title:"2. Fundamental concepts",level:"1"},{id:"sec_2_2",title:"2.1 Data science workflow",level:"2"},{id:"sec_3_2",title:"2.2 Solving problems with data science",level:"2"},{id:"sec_3_3",title:"2.2.1 Pattern detection",level:"3"},{id:"sec_4_3",title:"2.2.2 Anomaly detection",level:"3"},{id:"sec_5_3",title:"2.2.3 Cleaning",level:"3"},{id:"sec_6_3",title:"2.2.4 Alignment",level:"3"},{id:"sec_7_3",title:"2.2.5 Classification",level:"3"},{id:"sec_8_3",title:"2.2.6 Regression",level:"3"},{id:"sec_9_3",title:"2.2.7 Knowledge base construction",level:"3"},{id:"sec_10_3",title:"2.2.8 Density estimation",level:"3"},{id:"sec_12_2",title:"2.3 Data science tools",level:"2"},{id:"sec_14",title:"3. Exploring scheduling and load balancing on data science demands",level:"1"},{id:"sec_15",title:"4. Open opportunities and trends",level:"1"},{id:"sec_16",title:"5. Conclusion",level:"1"}],chapterReferences:[{id:"B1",body:'\nAmirian P, van Loggerenberg F, Lang T. Data Science and Analytics. Cham: Springer International Publishing; 2017. pp. 15-37\n'},{id:"B2",body:'\nGibert K, Horsburgh JS, Athanasiadis IN, Holmes G. Environmental data science. Environmental Modelling and Software. 2018;106:4-12. Special Issue on Environmental Data Science. Applications to Air quality and Water cycle\n'},{id:"B3",body:'\nGrus J. Data Science from Scratch: First Principles with Python. 1st ed. O’Reilly Media, Inc.; 2015. Available from: https://www.amazon.com/Data-Science-Scratch-Principles-Python/dp/149190142X\n'},{id:"B4",body:'\nAhmad A, Paul A, Din S, Rathore MM, Choi GS, Jeon G. Multilevel data processing using parallel algorithms for analyzing big data in high-performance computing. International Journal of Parallel Programming. 2018;46(3):508-527\n'},{id:"B5",body:'\nBomatpalli T, Wagh R, Balaji S. High performance computing and big data analytics paradigms and challenges. International Journal of Computer Applications. 2015;116(04):28-33\n'},{id:"B6",body:'\nSingh D, Reddy CK. A survey on platforms for big data analytics. Journal of Big Data. 2014;2(1):8\n'},{id:"B7",body:'\nDorr BJ, Greenberg CS, Fontana P, Przybocki M, Le Bras M, Ploehn C, et al. A new data science research program: evaluation, metrology, standards, and community outreach. International Journal of Data Science and Analytics. 2016;1(3):177-197\n'},{id:"B8",body:'\nChasapis D, Moreto M, Schulz M, Rountree B, Valero M, Casas M. Power efficient job scheduling by predicting the impact of processor manufacturing variability. In: Proceedings of the ACM International Conference on Supercomputing (ICS’19). New York, NY, USA: ACM; 2019. pp. 296-307\n'},{id:"B9",body:'\nLiu L, Tan H, Jiang SH-C, Han Z, Li X-Y, Huang H. Dependent task placement and scheduling with function configuration in edge computing. In: Proceedings of the International Symposium on Quality of Service (IWQoS’19). New York, NY, USA: ACM; 2019. pp. 20:1-20:10\n'},{id:"B10",body:'\nPalyvos-Giannas D, Gulisano V, Papatriantafilou M. Haren: A framework for ad-hoc thread scheduling policies for data streaming applications. In: Proceedings of the 13th ACM International Conference on Distributed and Event-based Systems (DEBS ’19). New York, NY, USA: ACM; 2019. pp. 19-30\n'},{id:"B11",body:'\nFeng Y, Zhu Y. PES: Proactive event scheduling for responsive and energy-efficient mobile web computing. In: Proceedings of the 46th International Symposium on Computer Architecture (ISCA ’19). New York, NY, USA: ACM; 2019. pp. 66-78\n'},{id:"B12",body:'\nTopcuoglu H, Hariri S, Min-You WU. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems. 2002;13(3):260-274\n'},{id:"B13",body:'\nMenon H, Kale L. A distributed dynamic load balancer for iterative applications. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’13). New York, NY, USA: ACM; 2013. pp. 15:1-15:11\n'},{id:"B14",body:'\nSchepis L, Cuomo F, Petroni A, Biagi M, Listanti M, Scarano G. Adaptive data update for cloud-based internet of things applications. In: Proceedings of the ACM MobiHoc Workshop on Pervasive Systems in the IoT Era (PERSIST-IoT’19). New York, NY, USA: ACM; 2019. pp. 13-18\n'},{id:"B15",body:'\nBak S, Menon H, White S, Diener M, Kale L. Integrating openmp into the charm++ programming model. In: Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware (ESPM2’17). New York, NY, USA: ACM; 2017. pp. 4:1-4:7\n'},{id:"B16",body:'\nGandhi R, Liu HH, Hu YC, Lu G, Padhye J, Yuan L, et al. Duet: Cloud scale load balancing with hardware and software. SIGCOMM Computer Communication Review. 2014;44(4):27-38\n'}],footnotes:[{id:"fn1",explanation:"\nhttps://www.tableau.com/\n"},{id:"fn2",explanation:"\nhttps://www.dataiku.com/\n"},{id:"fn3",explanation:"\nhttps://azure.microsoft.com/en-us/services/machine-learning-studio/\n"},{id:"fn4",explanation:"\nhttps://orange.biolab.si/\n"},{id:"fn5",explanation:"\nhttps://www.kaggle.com\n"},{id:"fn6",explanation:"\nhttps://archive.ics.uci.edu/ml/index.php\n"},{id:"fn7",explanation:"\nhttps://www.quandl.com/\n"},{id:"fn8",explanation:"\nhttp://cocodataset.org/home\n"}],contributors:[{corresp:null,contributorFullName:"Diórgenes Eugênio da Silveira",address:null,affiliation:'
Universidade do Vale do Rio dos Sinos, Brazil
'},{corresp:null,contributorFullName:"Eduardo Souza dos Reis",address:null,affiliation:'
Universidade do Vale do Rio dos Sinos, Brazil
'},{corresp:null,contributorFullName:"Rodrigo Simon Bavaresco",address:null,affiliation:'
Universidade do Vale do Rio dos Sinos, Brazil
'},{corresp:null,contributorFullName:"Marcio Miguel Gomes",address:null,affiliation:'
Universidade do Vale do Rio dos Sinos, Brazil
'},{corresp:null,contributorFullName:"Cristiano André da Costa",address:null,affiliation:'
Universidade do Vale do Rio dos Sinos, Brazil
'},{corresp:null,contributorFullName:"Jorge Luis Victoria Barbosa",address:null,affiliation:'
'},{corresp:"yes",contributorFullName:"Rodrigo da Rosa Righi",address:"rrrighi@unisinos.br",affiliation:'
Universidade do Vale do Rio dos Sinos, Brazil
'}],corrections:null},book:{id:"8754",type:"book",title:"Scheduling Problems",subtitle:"New Applications and Trends",fullTitle:"Scheduling Problems - New Applications and Trends",slug:"scheduling-problems-new-applications-and-trends",publishedDate:"July 8th 2020",bookSignature:"Rodrigo da Rosa Righi",coverURL:"https://cdn.intechopen.com/books/images_new/8754.jpg",licenceType:"CC BY 3.0",editedByType:"Edited by",isbn:"978-1-78985-054-3",printIsbn:"978-1-78985-053-6",pdfIsbn:"978-1-83962-169-7",reviewType:"peer-reviewed",numberOfWosCitations:0,isAvailableForWebshopOrdering:!0,editors:[{id:"69889",title:"Prof.",name:"Rodrigo",middleName:null,surname:"da Rosa Righi",slug:"rodrigo-da-rosa-righi",fullName:"Rodrigo da Rosa Righi"}],equalEditorOne:null,equalEditorTwo:null,equalEditorThree:null,coeditorOne:null,coeditorTwo:null,coeditorThree:null,coeditorFour:null,coeditorFive:null,topics:[{id:"596"}],productType:{id:"1",title:"Edited Volume",chapterContentType:"chapter",authoredCaption:"Edited by"},chapters:[{id:"67033",type:"chapter",title:"Global Optimization Using Local Search Approach for Course Scheduling Problem",slug:"global-optimization-using-local-search-approach-for-course-scheduling-problem",totalDownloads:819,totalCrossrefCites:0,signatures:"Ade Jamal",reviewType:"peer-reviewed",authors:[{id:"292309",title:"Dr.",name:"Ade",middleName:null,surname:"Jamal",fullName:"Ade Jamal",slug:"ade-jamal"}]},{id:"67906",type:"chapter",title:"Real-Time Scheduling Method for Middleware of Industrial Automation Devices",slug:"real-time-scheduling-method-for-middleware-of-industrial-automation-devices",totalDownloads:759,totalCrossrefCites:1,signatures:"Hong Seong Park",reviewType:"peer-reviewed",authors:[{id:"110782",title:"Prof.",name:"Hong Seong",middleName:null,surname:"Park",fullName:"Hong Seong Park",slug:"hong-seong-park"}]},{id:"68147",type:"chapter",title:"Intelligent Workload Scheduling in Distributed Computing Environment for Balance between Energy Efficiency and Performance",slug:"intelligent-workload-scheduling-in-distributed-computing-environment-for-balance-between-energy-effi",totalDownloads:727,totalCrossrefCites:0,signatures:"Larysa Globa, Oleksandr Stryzhak, Nataliia Gvozdetska and Volodymyr Prokopets",reviewType:"peer-reviewed",authors:[{id:"105085",title:"Prof.",name:"Larysa",middleName:null,surname:"Globa",fullName:"Larysa Globa",slug:"larysa-globa"},{id:"219896",title:"Prof.",name:"Alexander",middleName:null,surname:"Koval",fullName:"Alexander Koval",slug:"alexander-koval"},{id:"296047",title:"Ms.",name:"Nataliia",middleName:null,surname:"Gvozdetska",fullName:"Nataliia Gvozdetska",slug:"nataliia-gvozdetska"},{id:"296048",title:"Mr.",name:"Volodymyr",middleName:null,surname:"Prokopets",fullName:"Volodymyr Prokopets",slug:"volodymyr-prokopets"}]},{id:"70290",type:"chapter",title:"Approximation for Scheduling on Parallel Machines with Fixed Jobs or Unavailability Periods",slug:"approximation-for-scheduling-on-parallel-machines-with-fixed-jobs-or-unavailability-periods",totalDownloads:594,totalCrossrefCites:1,signatures:"Liliana Grigoriu",reviewType:"peer-reviewed",authors:[{id:"293390",title:"Dr.",name:"Liliana",middleName:null,surname:"Grigoriu",fullName:"Liliana Grigoriu",slug:"liliana-grigoriu"}]},{id:"71826",type:"chapter",title:"An Empirical Survey on Load Balancing: A Nature-Inspired Approach",slug:"an-empirical-survey-on-load-balancing-a-nature-inspired-approach",totalDownloads:606,totalCrossrefCites:0,signatures:"Surya Teja Marella and Thummuru Gunasekhar",reviewType:"peer-reviewed",authors:[{id:"297632",title:"Mr.",name:"Surya Teja",middleName:null,surname:"Marella",fullName:"Surya Teja Marella",slug:"surya-teja-marella"},{id:"298899",title:"Dr.",name:"Thummuru",middleName:null,surname:"Gunasekhar",fullName:"Thummuru Gunasekhar",slug:"thummuru-gunasekhar"}]},{id:"72346",type:"chapter",title:"Looking at Data Science through the Lens of Scheduling and Load Balancing",slug:"looking-at-data-science-through-the-lens-of-scheduling-and-load-balancing",totalDownloads:654,totalCrossrefCites:0,signatures:"Diórgenes Eugênio da Silveira, Eduardo Souza dos Reis, Rodrigo Simon Bavaresco, Marcio Miguel Gomes, Cristiano André da Costa, Jorge Luis Victoria Barbosa, Rodolfo Stoffel Antunes, Alvaro Machado Júnior, Rodrigo Saad and Rodrigo da Rosa Righi",reviewType:"peer-reviewed",authors:[{id:"69889",title:"Prof.",name:"Rodrigo",middleName:null,surname:"da Rosa Righi",fullName:"Rodrigo da Rosa Righi",slug:"rodrigo-da-rosa-righi"},{id:"18257",title:"Dr.",name:"Cristiano",middleName:null,surname:"Costa",fullName:"Cristiano Costa",slug:"cristiano-costa"},{id:"321638",title:"Dr.",name:"Diorgenes Eugenio",middleName:null,surname:"da Silveira",fullName:"Diorgenes Eugenio da Silveira",slug:"diorgenes-eugenio-da-silveira"},{id:"321639",title:"Dr.",name:"Jorge Luis",middleName:null,surname:"Barbosa",fullName:"Jorge Luis Barbosa",slug:"jorge-luis-barbosa"},{id:"321640",title:"Dr.",name:"Rodolfo Stoffel",middleName:null,surname:"Antunes",fullName:"Rodolfo Stoffel Antunes",slug:"rodolfo-stoffel-antunes"},{id:"321641",title:"Dr.",name:"Eduardo",middleName:null,surname:"Souza dos Reis",fullName:"Eduardo Souza dos Reis",slug:"eduardo-souza-dos-reis"},{id:"321642",title:"M.Sc.",name:"Marcio",middleName:"Miguel",surname:"Gomes",fullName:"Marcio Gomes",slug:"marcio-gomes"},{id:"321643",title:"Dr.",name:"Rodrigo Simon",middleName:null,surname:"Bavaresco",fullName:"Rodrigo Simon Bavaresco",slug:"rodrigo-simon-bavaresco"},{id:"322136",title:"B.Sc.",name:"Rodrigo",middleName:null,surname:"Saad",fullName:"Rodrigo Saad",slug:"rodrigo-saad"},{id:"322137",title:"Dr.",name:"Alvaro",middleName:null,surname:"Machado Júnior",fullName:"Alvaro Machado Júnior",slug:"alvaro-machado-junior"}]},{id:"71902",type:"chapter",title:"Types of Task Scheduling Algorithms in Cloud Computing Environment",slug:"types-of-task-scheduling-algorithms-in-cloud-computing-environment",totalDownloads:1330,totalCrossrefCites:5,signatures:"Tahani Aladwani",reviewType:"peer-reviewed",authors:[{id:"291371",title:"Dr.",name:"Tahani",middleName:null,surname:"Aladwani",fullName:"Tahani Aladwani",slug:"tahani-aladwani"}]}]},relatedBooks:[{type:"book",id:"883",title:"Production Scheduling",subtitle:null,isOpenForSubmission:!1,hash:"d790a43d9034878ce03d57e5f4e29293",slug:"production-scheduling",bookSignature:"Rodrigo da Rosa Righi",coverURL:"https://cdn.intechopen.com/books/images_new/883.jpg",editedByType:"Edited by",editors:[{id:"69889",title:"Prof.",name:"Rodrigo",surname:"da Rosa Righi",slug:"rodrigo-da-rosa-righi",fullName:"Rodrigo da Rosa Righi"}],equalEditorOne:null,equalEditorTwo:null,equalEditorThree:null,productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"},chapters:[{id:"25887",title:"Process Rescheduling in High Performance Computing Environments",slug:"process-rescheduling-in-high-performance-computing-environments",signatures:"Rodrigo da Rosa Righi and Lucas Graebin",authors:[{id:"69889",title:"Prof.",name:"Rodrigo",middleName:null,surname:"da Rosa Righi",fullName:"Rodrigo da Rosa Righi",slug:"rodrigo-da-rosa-righi"},{id:"122097",title:"Dr.",name:"Arthur",middleName:null,surname:"Gomez",fullName:"Arthur Gomez",slug:"arthur-gomez"}]},{id:"25888",title:"Online Production Scheduling and Re-Scheduling in Autonomous, Intelligent Distributed Environments",slug:"online-production-scheduling-and-re-scheduling-in-autonomous-intelligent-distributed-environments",signatures:"Edgar Chacón, Juan Cardillo, Rafael Chacón and Germán Darío Zapata",authors:[{id:"11581",title:"Dr.",name:"Edgar",middleName:null,surname:"Chacón",fullName:"Edgar Chacón",slug:"edgar-chacon"},{id:"12410",title:"Prof.",name:"German",middleName:null,surname:"Zapata",fullName:"German Zapata",slug:"german-zapata"},{id:"75346",title:"Dr.",name:"Juan",middleName:null,surname:"Cardillo",fullName:"Juan Cardillo",slug:"juan-cardillo"},{id:"75353",title:"MSc.",name:"Rafael",middleName:null,surname:"Chacon",fullName:"Rafael Chacon",slug:"rafael-chacon"}]},{id:"25889",title:"Minimizing Makespan in Flow Shop Scheduling Using a Network Approach",slug:"minimizing-makespan-in-flow-shop-scheduling-using-a-network-approach",signatures:"Amin Sahraeian",authors:[{id:"118784",title:"MSc.",name:"Amin",middleName:null,surname:"Sahraeian",fullName:"Amin Sahraeian",slug:"amin-sahraeian"}]},{id:"25890",title:"Lot Processing in Hybrid Flow Shop Scheduling Problem",slug:"lot-processing-in-hybrid-flow-shop-scheduling-problem",signatures:"Larysa Burtseva, Rainier Romero, Salvador Ramirez, Victor Yaurima, Félix F. González-Navarro and Pedro Flores Perez",authors:[{id:"11073",title:"Dr",name:"Larysa",middleName:null,surname:"Burtseva",fullName:"Larysa Burtseva",slug:"larysa-burtseva"},{id:"11489",title:"Dr.",name:"Victor",middleName:"Hugo",surname:"Yaurima-Basaldua",fullName:"Victor Yaurima-Basaldua",slug:"victor-yaurima-basaldua"},{id:"40949",title:"MSc.",name:"Rainier",middleName:null,surname:"Romero",fullName:"Rainier Romero",slug:"rainier-romero"},{id:"40950",title:"BSc.",name:"Salvador",middleName:null,surname:"Ramirez",fullName:"Salvador Ramirez",slug:"salvador-ramirez"},{id:"108502",title:"Dr.",name:"Pedro",middleName:null,surname:"Flores",fullName:"Pedro Flores",slug:"pedro-flores"},{id:"115722",title:"Dr.",name:"Félix F.",middleName:null,surname:"González-Navarro",fullName:"Félix F. González-Navarro",slug:"felix-f.-gonzalez-navarro"}]},{id:"25891",title:"Analyzing Different Production Times Applied to the Job Shop Scheduling Problem",slug:"analyzing-different-production-times-applied-to-the-job-shop-scheduling-problem",signatures:"Arthur Tórgo Gómez, Antonio Gabriel Rodrigues and Rodrigo da Rosa Righi",authors:[{id:"69889",title:"Prof.",name:"Rodrigo",middleName:null,surname:"da Rosa Righi",fullName:"Rodrigo da Rosa Righi",slug:"rodrigo-da-rosa-righi"},{id:"122097",title:"Dr.",name:"Arthur",middleName:null,surname:"Gomez",fullName:"Arthur Gomez",slug:"arthur-gomez"},{id:"127211",title:"MSc.",name:"Antonio",middleName:null,surname:"Rodrigues",fullName:"Antonio Rodrigues",slug:"antonio-rodrigues"}]},{id:"25892",title:"Adaptive Production Scheduling and Control in One-Of-A-Kind Production",slug:"adaptive-production-scheduling-and-control-in-one-of-a-kind-production",signatures:"Wei Li and Yiliu Tu",authors:[{id:"67772",title:"Dr.",name:null,middleName:null,surname:"Tu",fullName:"Tu",slug:"tu"},{id:"73805",title:"Dr.",name:"Wei",middleName:null,surname:"Li",fullName:"Wei Li",slug:"wei-li"}]},{id:"25893",title:"Simulation-Based Modular Scheduling System of Semiconductor Manufacturing",slug:"simulation-based-modular-scheduling-system-of-semiconductor-manufacturing",signatures:"Li Li, Qiao Fei Ma Yumin and Ye Kai",authors:[{id:"11038",title:"Dr.",name:"Li",middleName:null,surname:"Li",fullName:"Li Li",slug:"li-li"}]},{id:"25894",title:"Production Scheduling on Practical Problems",slug:"production-scheduling-on-practical-problems",signatures:"Marcius Fabius Henriques de Carvalho and Rosana Beatriz Baptista Haddad",authors:[{id:"66222",title:"Dr.",name:"Rosana",middleName:"Beatriz",surname:"Haddad",fullName:"Rosana Haddad",slug:"rosana-haddad"},{id:"121008",title:"Dr.",name:"Marcius",middleName:null,surname:"Carvalho",fullName:"Marcius Carvalho",slug:"marcius-carvalho"}]},{id:"25895",title:"Achieving Cost Competitiveness with an Agent-Based Integrated Process Planning and Production Scheduling System",slug:"achieving-cost-competitiveness-with-an-agent-based-integrated-process-planning-and-production-schedu",signatures:"Ming Lim and David Zhang",authors:[{id:"64872",title:"Dr.",name:"Ming",middleName:"K",surname:"Lim",fullName:"Ming Lim",slug:"ming-lim"},{id:"114743",title:"Prof.",name:"David",middleName:null,surname:"Zhang",fullName:"David Zhang",slug:"david-zhang"}]},{id:"25896",title:"Using Timed Coloured Petri Nets for Modelling, Simulation and Scheduling of Production Systems",slug:"using-timed-coloured-petri-nets-for-modelling-simulation-and-scheduling-of-production-systems",signatures:"Andrzej Bozek",authors:[{id:"65177",title:"MSc.",name:"Andrzej",middleName:null,surname:"Bozek",fullName:"Andrzej Bozek",slug:"andrzej-bozek"}]}]}],publishedBooks:[{type:"book",id:"3250",title:"Theory and Applications of Monte Carlo Simulations",subtitle:null,isOpenForSubmission:!1,hash:"bed7d51effabfc6156f85819c2b51b2b",slug:"theory-and-applications-of-monte-carlo-simulations",bookSignature:"Victor (Wai Kin) Chan",coverURL:"https://cdn.intechopen.com/books/images_new/3250.jpg",editedByType:"Edited by",editors:[{id:"157011",title:"Prof.",name:"Wai Kin (Victor)",surname:"Chan",slug:"wai-kin-(victor)-chan",fullName:"Wai Kin (Victor) Chan"}],equalEditorOne:null,equalEditorTwo:null,equalEditorThree:null,productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"5095",title:"Montecarlo Simulation of Two Component Aerosol Processes",subtitle:null,isOpenForSubmission:!1,hash:"b3ae34b8425f0bd16af093dc899fa698",slug:"montecarlo-simulation-of-two-component-aerosol-processes",bookSignature:"Jose Ignacio Huertas",coverURL:"https://cdn.intechopen.com/books/images_new/5095.jpg",editedByType:"Authored by",editors:[{id:"37220",title:"Dr.",name:"Jose Ignacio",surname:"Huertas",slug:"jose-ignacio-huertas",fullName:"Jose Ignacio Huertas"}],equalEditorOne:null,equalEditorTwo:null,equalEditorThree:null,productType:{id:"3",chapterContentType:"chapter",authoredCaption:"Authored by"}},{type:"book",id:"8754",title:"Scheduling Problems",subtitle:"New Applications and Trends",isOpenForSubmission:!1,hash:"c700a7e3dbf7482d7d82a1e6639b7f32",slug:"scheduling-problems-new-applications-and-trends",bookSignature:"Rodrigo da Rosa Righi",coverURL:"https://cdn.intechopen.com/books/images_new/8754.jpg",editedByType:"Edited by",editors:[{id:"69889",title:"Prof.",name:"Rodrigo",surname:"da Rosa Righi",slug:"rodrigo-da-rosa-righi",fullName:"Rodrigo da Rosa Righi"}],equalEditorOne:null,equalEditorTwo:null,equalEditorThree:null,productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}},{type:"book",id:"7646",title:"Scientometrics Recent Advances",subtitle:null,isOpenForSubmission:!1,hash:"86bbdd04d7e80be14283d44969d1cc32",slug:"scientometrics-recent-advances",bookSignature:"Suad Kunosic and Enver Zerem",coverURL:"https://cdn.intechopen.com/books/images_new/7646.jpg",editedByType:"Edited by",editors:[{id:"88678",title:"Prof.",name:"Suad",surname:"Kunosic",slug:"suad-kunosic",fullName:"Suad Kunosic"}],equalEditorOne:null,equalEditorTwo:null,equalEditorThree:null,productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}}],publishedBooksByAuthor:[{type:"book",id:"8754",title:"Scheduling Problems",subtitle:"New Applications and Trends",isOpenForSubmission:!1,hash:"c700a7e3dbf7482d7d82a1e6639b7f32",slug:"scheduling-problems-new-applications-and-trends",bookSignature:"Rodrigo da Rosa Righi",coverURL:"https://cdn.intechopen.com/books/images_new/8754.jpg",editedByType:"Edited by",editors:[{id:"69889",title:"Prof.",name:"Rodrigo",surname:"da Rosa Righi",slug:"rodrigo-da-rosa-righi",fullName:"Rodrigo da Rosa Righi"}],equalEditorOne:null,equalEditorTwo:null,equalEditorThree:null,productType:{id:"1",chapterContentType:"chapter",authoredCaption:"Edited by"}}]},onlineFirst:{chapter:{type:"chapter",id:"80019",title:"Vertex Decomposability of Path Complexes and Stanley’s Conjectures",doi:"10.5772/intechopen.101083",slug:"vertex-decomposability-of-path-complexes-and-stanley-s-conjectures",body:'
1. Introduction
Let R=Kx1…xn, where K is a field. Fix an integer n≥t≥2 and let G be a directed graph. A sequence xi1,…,xit of distinct vertices is called a path of length t if there are t−1 distinct directed edges e1,…,et−1 where ej is a directed edge from xij to xij+1. Then, the path ideal of G of length t is the monomial ideal ItG=(xi1…xit:xi1,…,xit is a path of length t in G) in the polynomial ring R=Kx1…xn. The distance dxy of two vertices x and y of a graph G is the length of the shortest path from x to y. The path complex ΔtG is defined by
ΔtG=xi1…xit:xi1…xitisapathoflengthtinG.
Path ideals of graphs were first introduced by Conca and De Negri [1, 2] in the context of monomial ideals of linear type. Recently, the path ideal of cycles has been extensively studied by several mathematicians. In [3], it is shown that I2Cn is sequentially Cohen-Macaulay, if and only if, n=3 or n=5. Generalizing this result, in [4], it is proved that ItCn, (t>2), is sequentially Cohen-Macaulay, if and only if n=t or n=t+1 or n=2t+1. Also, the Betti numbers of the ideal ItCn and ItLn is computed explicitly in [5]. In particular, it has been shown that [6, 7]:
The projective dimension of the path ideal of a graph cycleCnor lineLnis given by,
pdItCn=2p,d≠02p−1,d=0pdItLn=2p−1,d≠t2p,d=tE1
The regularity of the path ideal of a graph cycleCnor lineLnis given by,
regItCn=t−1p+d+1regItLn=pt−1+1,d<tpt−1+t,d=tE2
In [8] it has been shown that ΔtG is a simplicial tree if G is a rooted tree and t≥2. One of interesting problems in combinatorial commutative algebra is the Stanley’s conjectures. The Stanley’s conjectures are studied by many researchers. Let R be a Nn-graded ring and M a Zn-graded R-module. Then, Stanley [9] conjectured that
depthM≤sdepthME3
He also conjectured in [10] that each Cohen-Macaulay simplicial complex is partitionable. Herzog, Soleyman Jahan, and Yassemi in [11, 12, 13, 14] showed that the conjecture about partitionability is a special case of the Stanley’s first conjecture. In this chapter, we first study algebraic properties of ΔtCn. In Section 1, we recall some definitions and results, which will be needed later. In Section 2, for all t>2 we show that the following conditions are equivalent:
ΔtCn is matroid;
ΔtCn is vertex decomposable;
Δt(Cn) is shellable;
ΔtCn is Cohen-Macaulay;
n=t or t+1.
(see Theorem 2.6).
In Section 3, for all t≥2 we show that ΔtG is vertex decomposable if and only if G=Hpnq or G=Hpn. In Section 4, vertex decomposability path complexes of Dynkin graphs are shown. In Section 5 as an application of our results, we show that if n=t or t+1 then ΔtCn is partitionable and Stanley’s conjecture holds for KΔtCn and KΔtG, where G=Hpnq or G=Hpn.
2. Preliminaries
In this section, we recall some definitions and results which will be needed later.
Definition 2.1.A simplicial complexΔ over a set of vertices V=x1…xn, is a collection of subsets of V, with the property that:
xi∈Δ, for all i;
if F∈Δ, then all subsets of F are also in Δ (including the empty set).
An element of Δ is called a face of Δ and complement of a face F is V\\F and it is denoted by Fc. Also, the complement of the simplicial complex Δ=F1…Fr is Δc=F1c…Frc. The dimension of a face F of Δ, dimF, is F−1 where F is the number of elements of F and dim∅=−1. The faces of dimensions 0 and 1 are called vertices and edges, respectively. A non-face of Δ is a subset F of V with F∉Δ. We denote by NΔ, the set of all minimal non-faces of Δ. The maximal faces of Δ under inclusion are called facets of Δ. The dimension of the simplicial complex Δ, dimΔ, is the maximum of dimensions of its facets. If all facets of Δ have the same dimension, then Δ is called pure.
Let FΔ=F1…Fq be the facet set of Δ. It is clear that FΔ determines Δ completely and we write Δ=F1…Fq. A simplicial complex with only one facet is called a simplex. A simplicial complex Γ is called a subcomplex of Δ, if FΓ⊂FΔ.
For v∈V, the subcomplex of Δ obtained by removing all faces F∈Δ with v∈F is denoted by Δ\\v. That is,
Δ\\v=F∈Δ:v∉F.E4
The link of a face F∈Δ, denoted by linkΔF, is a simplicial complex on V with the faces, G∈Δ such that, G∩F=∅ and G∪F∈Δ. The link of a vertex v∈V is simply denoted by linkΔv.
linkΔv=F∈Δ:v∉FF∪v∈Δ.E5
Let Δ be a simplicial complex over n vertices x1…xn. For F⊂x1…xn, we set:
xF=∏xi∈Fxi.E6
We define the facet ideal of Δ, denoted by IΔ, to be the ideal of S generated by xF:F∈FΔ. The non-face ideal or the Stanley-Reisner ideal of Δ, denoted by IΔ, is the ideal of S generated by square-free monomials xF:F∈NΔ. Also, we call KΔ≔S/IΔ the Stanley-Reisner ring of Δ.
Definition 2.2. A simplicial complex Δ on x1…xn is said to be a matroid if, for any two facets F and G of Δ and any xi∈F, there exists a xj∈G such that F\\xi∪xj is a facet of Δ.
Definition 2.3. A simplicial complex Δ is recursively defined to be vertex decomposable, if it is either a simplex, or else has some vertex v so that,
Both Δ\\v and linkΔv are vertex decomposable, and
No face of linkΔv is a facet of Δ\\v.
A vertex v which satisfies in condition (b) is called a shedding vertex.
Definition 2.4. A simplicial complex Δ is shellable, if the facets of Δ can be ordered F1,…,Fs such that, for all 1≤i<j≤s, there exists some v∈Fj\\Fi and some l∈1…j−1 with Fj\\Fl=v.
A simplicial complex Δ is called disconnected, if the vertex set V of Δ is a disjoint union V=V1∪V2 such that no face of Δ has vertices in both V1 and V2. Otherwise, Δ is connected. It is well known that
Definition 2.5. Given a simplicial complex Δ on V, we define Δ∨, the Alexander dual of Δ, by
Δ∨=V\\F:F∉Δ.E7
It is known that for the complex Δ one has IΔ∨=IΔc. Let I≠0 be a homogeneous ideal of S and N be the set of non-negative integers. For every i∈N∪0, one defines:
tiSI=maxj:βi,jSI≠0E8
where βi,jSI is the i,j-th graded Betti number of I as an S-module. The Castelnuovo-Mumford regularity of I is given by
regI=suptiSI−i:i∈Z.E9
We say that the ideal I has a d-linear resolution, if I is generated by homogeneous polynomials of degree d and βi,jSI=0, for all j≠i+d and i≥0. For an ideal that has a d-linear resolution, the Castelnuovo-Mumford regularity would be d. If I is a graded ideal of S, then we write Id for the ideal generated by all homogeneous polynomials of degree d belonging to I.
Definition 2.6. A graded ideal I is componentwise linear if Id has a linear resolution for all d.
Also, we write Id for the ideal generated by the squarefree monomials of degree d belonging to I.
Definition 2.7. A graded S-module M is called sequentially Cohen-Macaulay (over K), if there exists a finite filtration of graded S-modules,
0=M0⊂M1⊂⋯⊂Mr=ME10
such that each Mi/Mi−1 is Cohen-Macaulay, and the Krull dimensions of the quotients are increasing:
dimM1/M0<dimM2/M1<⋯<dimMr/Mr−1.E11
The Alexander dual allows us to make a bridge between (sequentially) Cohen-Macaulay ideals and (componetwise) linear ideals.
Definition 2.8 (Alexander duality). For a square-free monomial ideal I=M1…Mq⊂S=Kx1…xn, the Alexander dual of I, denoted by I∨, is defined to be:
The idealIis componentwise linear ideal if and only ifS/I∨is sequentially Cohen-Macaulay.
The idealIhas aq-linear resolution if and only ifS/I∨is Cohen-Macaulay of dimensionn−q.
Remark 2.1. Two special cases, we will be considering in this paper, are when G is a cycle Cn, or a line graph Ln on vertices x1…xn with edges
ECn=x1x2x2x3…xn−1xnxnx1;ELn=x1x2x2x3…xn−1xn
Remark 2.2. All Cohen-Macaulay simplicial complexes of positive dimension are connected.
3. Vertex decomposability path complexes of cycles
As the main result of this section, it is shown that ΔtCn is matroid, vertex decomposable, shellable, and Cohen-Macaualay if and only if n=t or n=t+1. For the proof, we shall need the following lemmas and propositions.
Lemma 3.1.LetΔtPnbe a simplicial complex on the pathPn=x1…xnand2≤t≤n. ThenΔtPnis vertex decomposable.
Proof. If t=n, then ΔnPn is a simplex which is vertex decomposable. Let 2≤t<n then one has
ΔtPn=x1…xtx2…xt+1…xn−t+1…xn.E13
So ΔtPn\\xn=x1…xtx2…xt+1…xn−t…xn−1. Now, we use induction on the number of vertices of Pn and by induction hypothesis ΔtPn\\xn is vertex decomposable. On the other hand, it is clear that linkΔtPnxn=xn−t+1…xn−1. Thus, linkΔtPnxn is a simplex which is not a facet of ΔtPn\\xn. Therefore, ΔtPn is vertex decomposable.
Lemma 3.2.LetΔ2Cnbe a simplicial complex onx1…xn. ThenΔ2Cnis vertex decomposable.
Proof. Since Δ2Cn=x1x2x2x3…xn−1xnxnx1 then we have
Δ2Cn\\xn=x1x2x2x3…xn−2xn−1.E14
By Lemma 2.1 Δ2Cn\\xn is vertex decomposable. Also, it is trivial that linkΔ2Cnxn=xn−1x1 is vertex decomposable and no face of linkΔ2Cnxn is a facet of Δ2Cn\\xn. Therefore, Δ2Cn is vertex decomposable.□
Lemma 3.3.LetΔtCnbe a simplicial complex onx1…xnand3≤t≤n−2. ThenΔtCnis not Cohen-Macaulay.
Proof. It suffices to show that IΔtCn∨ has not a linear resolution. Since IΔtCn∨=IΔtCnc, then one can easily check that IΔtCn∨=In−tCn. By Theorem 1.1 we have
regIΔtCn∨=n−t−1p+d+1.E15
Since 3≤t≤n−2 then one has regIΔtCn∨≠n−t and by Theorem 2.1 ΔtCn is not Cohen-Macaulay.
Proposition 3.1.LetΔtCnbe a simplicial complex onx1…xnandt≥3. ThenΔtCnis vertex decomposable if and only ifn=tort+1.
Proof. By Lemma 3.3, it suffices to show that if n=t or t+1, then ΔtCn is vertex decomposable. If n=t, then ΔnCn is a simplex which is vertex decomposable.
If t=n−1, then we have
Δn−1Cn=x1…xn−1x2…xnx3…xnx1…xnx1…xn−2.
Now, we use induction on the number of vertices of Cn and show that Δn−1Cn is vertex decomposable. It is clear that Δn−1Cn\\xn=x1…xn−1 is a simplex, which is vertex decomposable.
On the other hand,
linkΔn−1Cnxn=x1…xn−2…xn−1x1…xn−3=Δn−2Cn−1.E16
By induction hypothesis linkΔn−1Cnxn is vertex decomposable. It is easy to see that no face of linkΔn−1Cnxn is a facet of Δn−1Cn\\xn. Therefore, Δn−1Cn is vertex decomposable.
Proposition 3.2.Δ2Cnis a matroid if and only ifn=3or4.
Proof. If n=3 or 4, then it is easy to see that Δ2Cn is a matroid. Now, we prove the converse. It suffices to show that Δ2Cn is not a matroid for all n≥5. We consider two facets x1x2 and xn−1xn. Then, we have x1x2\\x1∪xn−1=x2xn−1 and x1x2\\x1∪xn=x2xn. Since x2xn−1 and x2xn are not the facets of Δ2Cn. So Δ2Cn is not matroid for all n≥5.
For the simplicial complexes, one has the following implication:
Note that these implications are strict, but by the following theorem, for path complexes, the reverse implications are also valid.
Theorem 3.1.Lett≥3. Then the following conditions are equivalent:
ΔtCnis matroid;
ΔtCnis vertex decomposable;
ΔtCnis shellable;
ΔtCnis Cohen-Macaulay;
n=tort+1.
Proof. (i)⇒ (ii), (ii)⇒ (iii) and (iii)⇒ (iv) is well known.
(iv)⇒ (v): Follows from Lemma 3.3 and Proposition 2.1.
(v)⇒ (i): If n=t, then ΔtCn is a simplex which is a matroid.
If n=t+1, then
ΔtCn=x1…xtx2…xt+1x3…xt+1x1…xt+1x1…xt−1.
For any two facets F and G of ΔtCn one has ∣F∩G∣=t−1. We claim that for any two facets F and G of ΔtCn and any xi∈F, there exists a xj∈G such that F\\xi∪xj is a facet of ΔtCn. We have to consider two cases. If xi∈F and xi∉G, then we choose xj∈G such that xj∉F. Thus, F\\xi∪xj=G which is a facet of ΔtCn.
For other case, if xi∈F and xi∈G, then we choose xj∈G such that xj is the same xi. Therefore, F\\xi∪xi=F is a facet of ΔtCn, which completes the proof.
4. Vertex decomposability path complexes of trees
As the main result of this section, for all t≥2, we characterize all such trees whose ΔtG is vertex decomposable. Let Hpnq denote the double starlike tree obtained by attaching p pendant vertices to one pendant vertex of the path Pn and q pendant vertices to the other pendant vertex of Pn. Also, let Hpn be graph obtained by attaching p pendant vertices to one pendant vertex of the path Pn.
Remark 4.1. Let Pn=x1…xn be a path on vertices x1…xn and H2n be a graph obtained by attaching two pendant vertices to pendant vertex xn. Then, ΔtH2n is vertex decomposable for all t≥2.
Proof. By Lemma 3.1 proof is trivial.
Proposition 4.1.LetPn=x1…xnbe a path on verticesx1…xnandHpnbe a graph obtained by attachingppendant vertices to pendant vertexxn. ThenΔtHpnis vertex decomposable for allt≥2.
Proof. We prove the proposition by induction on p the number of pendant vertices to pendant vertex xn of Pn. If p=0 or 1, then Hpn is a path and by Lemma 2.1 ΔtHpn is vertex decomposable. If p=2, then by remark 4.1 ΔtHpn is vertex decomposable. Now, let p>2 and y1…yp be p pendant vertices to pendant vertex xn of Pn, then one has
Hpn\\y1=Hp−1nE17
and
ΔtHpn\\y1=ΔtHp−1n.E18
Therefore by induction hypothesis ΔtHp−1n is vertex decomposable. So ΔtHpn\\y1 is vertex decomposable. If t=3, then we have
linkΔ3Hpny1=xn−1xny2xn…ypxnE19
It is easy to see that linkΔ3Hpny1 is vertex decomposable and y1 is a shedding vertex. If t=2 or t>3, one has
linkΔtHpny1=xn−t+2…xn.E20
thus linkΔtHpny1 is a simplex, which is not a facet of ΔtHpn\\y1, therefore ΔtHpn is vertex decomposable.
Lemma 4.1.Letp=2andq≥2, ThenΔtH2nqis vertex decomposable for all2≤t≤n+2.
Proof. Let H2nq denote the double starlike tree obtained by attaching two pendant vertices y1y2 to pendant vertex x1 of path Pn and y1′…yq′ be pendant vertices to pendant vertex xn of Pn. So by proposition 3.2 ΔtH2nq\\y1 is vertex decomposable. Now, we prove that linkΔtH2nqy1 is vertex decomposable. If t=3, then linkΔ3H2nqy1=x1x2x1y2 which is vertex decomposable. If t=n+2, then
linkΔn+2H2nqy1=x1…xny1′x1…xny2′…x1…xnyq′.
It is easy to see that linkΔn+2H2nqy1 is vertex decomposable. If t=2 or 4≤t≤n+1, then we have linkΔtH2nqy1=x1…xt−1. Thus, linkΔtH2nqy1 is a simplex, which is vertex decomposable. It is clear that y1 is a shedding vertex. □
Proposition 4.2.LetQ1,Q2be two paths of maximum lengthkin treeGandybe a leaf ofGsuch thaty∈Q1∩Q2, ∣Q1∩Q2∣=L. ThenΔtGis not vertex decomposable.
Proof. Suppose Q1=y1y2…yk−Lx1x2…xL−1y and.
Q2=y1′y2′…yk−L′x1x2…xL−1y be two paths of length k in G such that Q1∩Q2=x1x2…xL−1y and degy=1. Since linkΔkGx1…xL−1y is disconnected and pure of positive dimension. By remark 1.11 ΔkG is not Cohen-Macaulay and hence, ΔkG is not vertex decomposable. □
Proposition 4.3.LetGbe a double starlike tree such that is not a path. ThenΔtGis vertex decomposable for all2≤t≤n+2.
Proof. Let G=Hpnq denote the double starlike tree obtained by attaching p pendant vertices to one pendant vertex of the path Pn and q pendant vertices to the other pendant vertex of Pn. We prove the theorem by induction on p the number of pendant vertices to pendant vertex x1 of Pn. If p=0 or p=1, then by proposition 3.2 ΔtG is vertex decomposable. If p=2, then by Lemma 4.3 ΔtG is vertex decomposable. Now, let p>2 and y1…yp be p pendant vertices to pendant vertex x1 of Pn. Since G\\y1 is again double starlike tree on p−1 pendant vertices. Therefore, by induction hypothesis, ΔtG\\y1 is vertex decomposable. So ΔtG\\y1=ΔtG\\y1 is vertex decomposable. Let t=2, then linkΔ2Gy1=x1 is simplex and vertex decomposable. Let t=3, then linkΔ3Gy1=x2x1y2x1…ypx1 is vertex decomposable. Let 3<t≤n+1, then linkΔtGy1=x1x2…xt−1 is simplex and vertex decomposable. Let t=n+2, then linkΔtGy1=〈x1…xny1,x1…xny2,…,{x1,…,xn,yp〉 is a path complex of a starlike tree which is vertex decomposable. It is easy to see that no face of linkΔtGy1 is a facet of ΔtG\\y1. So ΔtG is vertex decomposable.
Now, we are ready that prove the main result of this section.
Theorem 4.1.LetGbe a tree such that is not a path. ThenΔtGis vertex decomposable for allt≥2if and only ifG=HpnqorHpn.
Proof.⇒We prove by contradiction. Suppose G≠Hpnq and G≠Hpn. So there exists two paths of maximum length k in G which contain L common vertices such that one of these vertices is a leaf. Therefore, by proposition 4.2 ΔkG is not vertex decomposable, which is a contradiction. ⇐ By proposition 4.1 and Proposition 4.3, the proof is completed.
5. Vertex decomposability path complexes of dynkin graphs
Dynkin diagrams first appeared [15] in the connection with classification of simple Lie groups. Among Dynkin diagrams a special role is played by the simply laced Dynkin diagrams An,Dn,E6,E7, and E8. Dynkin diagrams are closely related to coxter graphs that appeared in geometry (see [16]).
After that, Dynkin diagrams appeared in many branches of mathematics and beyond, in particular in representation theory. In [17], Gabriel introduced a notion of a quiver (directed graph) and its representations. He proved the famous Gabriel’s theorem on representation of quivers over algebraic closed field. Let Q be a finite quiver and Q¯ the undirected graph obtained from Q by deleting the orientation of all arrows.
Theorem 5.1.(Gabriel theorem). A connected quiverQis of Finite type if and only if the graphQ¯is one of the following simply laced Dynkin diagrams:An,Dn,E6,E7orE8.
Let Ln be a line graph on vertices x1…xn and xjyj be whisker of Ln at xj with 3≤j≤n−1.
We obtain some condition that ΔtLn∪Wxj is vertex decomposable, as an application of our results, vertex decomposability path complexes of Dynkin graphs are shown. Throughout this section, we assume Ln∪Wxj be an undirected graph. By Lemma 3.1, we have the following corollary:
Corollary 5.1.LetAnbe a Dynkin graph and2≤t≤n. ThenΔtAnis vertex decomposable.
Proposition 5.1.LetLnbe a line graph on verticesx1…xnandxn−1yn−1be whisker ofLnatxn−1.
Then, ΔtLn∪Wxn−1 is vertex decomposable for all 2≤t≤n.
Proof. Then by proposition 4.1 proof is trivial.
Corollary 5.2.LetDnbe a Dynkin graph andn≥4. ThenΔtDnis vertex decomposable for all2≤t≤n.
Proof. We know that Dn=Ln∪Wxn−1. So by proposition 4.3 ΔtDn is vertex decomposable. □
Theorem 5.2.LetLnbe a line graph on verticesx1…xnandxjyjbe whisker ofLnatxjwith3≤j≤n−2.
Then, ΔtLn∪Wxj is vertex decomposable if and only if 2≤t≤3 or n≥t>α, where α=mindyjx1dyjxn.
Proof. We first show that ΔtLn∪Wxj is not vertex decomposable for all 4≤t≤α. It is well known that if Δ is a Cohen-Macaulay simplicial complex, then linkΔF is Cohen-Macaulay for each face F of Δ.
Also, we know that all Cohen-Macaulay complexes of positive dimension are connected. It is easy to see that linkΔtLn∪Wxjxjyj is disconnected and pure of positive dimension.
This implies that ΔtLn∪Wxj is not Cohen- Macaulay and hence ΔtLn∪Wxj is not vertex decomposable without loss of generality we can assume that α=dyjx1 if t=2 or n≥t>α, one has:
So ΔtLn∪Wxj\\yj=ΔtLn and by Lemma [4, 6, 7, 9, 10, 18–20] ΔtLn∪Wxj\\yj is vertex decomposable. On the other hand, we have linkΔtLn∪Wxjyj=xjxj+1…xj+t−2 that is a simplex and vertex decomposable.
and Δ3Ln∪Wxj\\yj=Δ3Ln which is vertex decomposable.
It is easy to see that linkΔtLn∪Wxjyj=xj−1xjxjxj+1 is vertex decomposable and yj is a shedding vertex.
Corollary 5.3.LetE6be a Dynkin graph. ThenΔtE6is vertex decomposable if and only if2≤t≤3ort=5.
Proof. Since E6=L5∪Wx3, so by Theorem 5.2 the proof is completed.
Corollary 5.4.LetE7be a Dynkin graph. ThenΔtE7is vertex decomposable if and only if2≤t≤3or5≤t≤6.
Proof. We know that E7=L6∪Wx3 and the proof follow from Theorem 5.2.
Corollary 5.5.LetE8be a Dynkin graph. ThenΔtE8is vertex decomposable if and only if2≤t≤3or5≤t≤7.
6. Stanley decompositions
Let R be any standard graded K-algebra over an infinite field K, i.e, R is a finitely generated graded algebra R=⊕i≥0Ri such that R0=K and R is generated by R1. There are several characterizations of the depth of such an algebra. We use the one that depthR is the maximal length of a regular R-sequence consisting of linear forms. Let xF=⊓i∈Fxi be a squarefree monomial for some F⊆n and Z⊆x1…xn. The K-subspace xFKZ of S=Kx1…xn is the subspace generated by monomials xFu, where u is a monomial in the polynomial ring KZ. It is called a squarefree Stanley space if xi:i∈F⊆Z. The dimension of this Stanley space is ∣Z∣. Let Δ be a simplicial complex on x1…xn. A squarefree Stanley decomposition D of KΔ is a finite direct sum ⊕iuiKZi of squarefree Stanley spaces, which is isomorphic as a Zn-graded K-vector space to KΔ, i.e.
KΔ≅⊕iuiKZiE21
We denote by sdepthD the minimal dimension of a Stanley space in D and we define sdepthKΔ=maxsdepthD, where D is a Stanley decomposition of KΔ. Stanley conjectured in [9] the upper bound for the depth of KΔ as the following:
depthKΔ≤sdepthKΔE22
Also, we recall another conjecture of Stanley. Let Δ be again a simplicial complex on x1…xn with facets G1,…,Gt. The complex Δ is called partitionable if there exists a partition Δ=∪i=1tFiGi where Fi⊆Gi are suitable faces of Δ. Here, the interval FiGi is the set of faces H∈Δ:Fi⊆H⊆Gi. In [10, 18] respectively Stanley conjectured each Cohen-Macaulay simplicial complex is partitionable [16, 17, 18, 19, 20]. This conjecture is a special case of the previous conjecture. Indeed, Herzog, Soleyman Jahan, and Yassemi [14] proved that for Cohen-Macaulay simplicial complex Δ on x1…xn we have that depthKΔ≤sdepthKΔ if and only if Δ is partitionable. Since each vertex decomposable simplicial complex is shellable and each shellable complex is partitionable [4, 6, 7, 19, 20]. Then as a consequence of our results, we obtain:
Corollary 6.1.ifn=tort+1thenΔtCnis partitionable and Stanley’s conjecture holds forKΔtCn.
Corollary 6.2.LetGbe a tree such that is not a path. ifG=HpnqorG=HpnthenΔtGis vertex decomposable for allt≥2and Stanley’s conjecture holds forKΔtG.
\n',keywords:"vertex decomposable, simplicial complex, Matroid, path",chapterPDFUrl:"https://cdn.intechopen.com/pdfs/80019.pdf",chapterXML:"https://mts.intechopen.com/source/xml/80019.xml",downloadPdfUrl:"/chapter/pdf-download/80019",previewPdfUrl:"/chapter/pdf-preview/80019",totalDownloads:43,totalViews:0,totalCrossrefCites:0,dateSubmitted:null,dateReviewed:"October 5th 2021",datePrePublished:"February 1st 2022",datePublished:null,dateFinished:"January 13th 2022",readingETA:"0",abstract:"Monomials are the link between commutative algebra and combinatorics. With a simplicial complex Δ, one can associate two square-free monomial ideals: the Stanley-Reisner ideal IΔ whose generators correspond to the non-face of Δ, or the facet ideal I(Δ) that is a generalization of edge ideals of graphs and whose generators correspond to the facets of Δ. The facet ideal of a simplicial complex was first introduced by Faridi in 2002. Let G be a simple graph. The edge ideal I(G) of a graph G was first considered by R. Villarreal in 1990. He studied algebraic properties of I(G) using a combinatorial language of G. In combinatorial commutative algebra, one can attach a monomial ideal to a combinatorial object. Then, algebraic properties of this ideal are studied using combinatorial properties of combinatorial object. One of interesting problems in combinatorial commutative algebra is the Stanley’s conjectures. The Stanley’s conjectures are studied by many researchers. Let R be a Nn-graded ring and M a Zn-graded R-module. Then, Stanley conjectured that depthM≤sdepthM. He also conjectured that each Cohen-Macaulay simplicial complex is partition-able. In this chapter, we study the relation between vertex decomposability of some simplicial complexes and Stanley’s conjectures.",reviewType:"peer-reviewed",bibtexUrl:"/chapter/bibtex/80019",risUrl:"/chapter/ris/80019",signatures:"Seyed Mohammad Ajdani and Francisco Bulnes",book:{id:"10677",type:"book",title:"Advanced Topics of Topology",subtitle:null,fullTitle:"Advanced Topics of Topology",slug:null,publishedDate:null,bookSignature:"Dr. Francisco Bulnes",coverURL:"https://cdn.intechopen.com/books/images_new/10677.jpg",licenceType:"CC BY 3.0",editedByType:null,isbn:"978-1-80355-094-7",printIsbn:"978-1-80355-093-0",pdfIsbn:"978-1-80355-095-4",isAvailableForWebshopOrdering:!0,editors:[{id:"92918",title:"Dr.",name:"Francisco",middleName:null,surname:"Bulnes",slug:"francisco-bulnes",fullName:"Francisco Bulnes"}],productType:{id:"1",title:"Edited Volume",chapterContentType:"chapter",authoredCaption:"Edited by"}},authors:null,sections:[{id:"sec_1",title:"1. Introduction",level:"1"},{id:"sec_2",title:"2. Preliminaries",level:"1"},{id:"sec_3",title:"3. Vertex decomposability path complexes of cycles",level:"1"},{id:"sec_4",title:"4. Vertex decomposability path complexes of trees",level:"1"},{id:"sec_5",title:"5. Vertex decomposability path co