Multiset-Based Knowledge Representation for the Assessment and Optimization of Large-Scale Sociotechnical Systems

This chapter is dedicated to a new knowledge representation model, providing convergence of classical operations research and modern knowledge engineering. Kernel of the introduced model is the recursively generated multisets, selected according to the predefined restrictions and optimization criteria. Sets of multisets are described by the so-called multiset grammars (MGs), being projection of a conceptual background of well-known string-generating grammars on the multisets universum. Syntax and semantics of MGs and their practice-oriented development — unitary multiset grammars and metagrammars — are considered.


Introduction
Large-scale sociotechnical systems (STS) usually have hierarchical structure, including personnel and various technical devices, which, in turn, consume various material, financial, information resources, as well as energy. As a result, they produce new resources (objects), which are delivered to other similar systems. Main features of such STS are large dimensionality and high volatility of their structures, equipment, consumed/produced objects, and at all, operation logics and dynamics [1][2][3][4][5].
Knowledge and data representation models, used in STS, provide comparatively easy and comfortable management of very large knowledge and data bases with dynamic structures and content [6][7][8][9][10]. These model bases are objects other than matrices, vectors, and graphs, traditionally used in operations research and systems analysis [11][12][13][14], and they are much more convenient for practical problem consideration. But, on the other hand, aforementioned models in general case do not incorporate strict theoretical background and fundamental algorithmics, compared with, for example, mathematical programming, which provides strictly optimal solutions for decision-makers. So, practically all decision-support software in the considered STS is based on various heuristics, which correctness and adequacy are not proved usually in the mathematical sense. As a consequence, quality of the adopted decisions, based on such heuristics, in many cases may be far of optimal.
This chapter is dedicated to a primary survey of the developed knowledge representation model, providing convergence of classical operations research and modern knowledge engineering. This convergence creates new opportunities for complicated problem formalization and solution by integrating best features of mathematical programming (strict optimal solution search in solution space, defined by goal functions and boundary conditions) and constraint programming [15][16][17] (natural and easily updated top-down representation of logic of the decision-making in various situations). Kernel of the considered model is multisets (MS)-relatively long ago known and in the last 20 years intensively applied object of classical mathematics [18][19][20][21][22][23][24][25][26][27][28][29]. This background is generalized to the recursively generated, or, for short, recursive multisets (RMS) by introduction of so-called multiset grammars, or, again for short, multigrammars (MGs), which were described by the author in [30,31]. Last, in turn, are peculiar "projection" of the conceptual basis of classical formal grammars by Chomsky [32,33], operating strings of symbols, to the multisets universum in such a way, that MGs provide generation of one multiset from another and selection (filtration) of those, which satisfy necessary integral conditions: boundary restrictions and/or optimality criteria.
MGs may be considered as prolog-like constraint programming language for solution of problems in operations research and systems analysis areas. Taking into account relative novelty of the multigrammatical approach and absence of any substantial associations with mathematical constructions presented lower, we introduce main content of the chapter by short informal description of the main elements of this approach in Section 2. Basic formal definitions are presented in Section 3. Section 4 is dedicated to multiset grammars, while Section 5-to detailed consideration of the so-called unitary multigrammars (UMGs) and unitary multimetagrammars (UMMGs), which are main tool of the aforementioned problem formalization and solution.

Informal description
Let us consider a company, which consists of director, three departments, and one separate laboratory. This fact may be simply represented as follows: In this notation, a whole structure of the company, detailed to employee positions, may be described in such a way: This set of constructions is of the form: describes following set, created by multiplying and summarizing quantities of identical positions: where n i Á a i means there are n i positions of type a i in this company. Let us join to the company structure knowledge about employees' month salary, represented in the same unified manner: director ! 10000 Á eur, head-department ! 5000 Á eur, head-laboratory ! 3000 Á eur, analyst ! 1500 Á eur, assistant ! 500 Á eur: After applying to the joined set of constructions just the same multiplyingsummarizing procedure, we may obtain resulting set containing the only element {100,000Áeur}, which defines company's total financial resource, necessary for employees' provision a month.
Presented knowledge representation concerns systems analysis, that is, obtaining integral parameters of the system given its structure and local parameters.
Consider more sophisticated task-relating systems design and concerning development of company structure given its integral parameters. Goal is to determine rational quantity of departments and laboratories in the department, as well as quantities of analysts and assistants in one laboratory. Total salary is no more than 120,000 eur, quantity of analysts in one laboratory may be from 1 to 3, while corresponding quantity of assistants may be from 2 to 6. Total quantity of employees must be maximal. There may be three different variants of company structure: (1) three departments and one laboratory; (2) two departments and three laboratories; and (3) four departments. Corresponding set of constructions is as follows: laboratory ! 1 Á head-laboratory, n Á analyst, l Á assistant: Constructions, defining employees' salary, and other aforementioned restrictions are as follows (for definiteness, let us take that quantity of laboratories in one department does not exceed five): head-department ! 1 Á employee, 5000 Á eur, head-laboratory ! 1 Á employee, 3000 Á eur, analyst ! 1 Á employee, 1500 Á eur, assistant ! 1 Á employee, 500 Á eur, eur ≤ 120000, 1 ≤ l ≤ 6: As seen, along with already introduced "detailing" constructions, there are additional constructions, defining sets of values of variables, having places in the first ones, as well as conditions, determining optimization criterion (there may be several such criteria), and bounds of quantities of some objects in the resulting sets. Evidently, due to presence of alternatives in the description of company structure (there are three such alternatives) and variables in some of "detailing" constructions, there may be more than one resulting set like Eq. (4). These sets are of the form x Á employee; y Á eur f g , where x is the quantity of employees, while y-total salary, corresponding to this variant. Conditions (16)- (20) provide selection of those sets, which satisfy them in the described sense. In general, Eqs. (16)-(20) may be interpreted as a query, determining subset of all possible variants, described by Eqs. (6)- (15).
To "mark" "detailing" constructions, used while resulting set creation, one can add to their "bodies" elements like 1 Á variant-i, for example, If so, then resulting sets will be of the form: To implant to these sets values of variables, it is sufficient to represent them in resulting sets in "usual" form j Á v, where v is variable and j is its value, so considered example will lead us to sets like: As seen, shortly introduced by this example knowledge and query representation language, being easy to understand and to use, allows formalization of multicriterial optimization problems, for years associated with mathematical programming. On the other hand, "detailing" constructions have form of productions (rules), far and wide used in knowledge engineering and being common background of prolog-like declarative (nonprocedural) knowledge representation [34][35][36]. As will be shown lower, such constructions may be used not only for structuring, but in many other cases, enabling description of various systems behavior and interaction, as well as their mutual impacts. For such reasons, this informally described technique is taken as a basis for the description of the developed mathematical toolkit considered thoroughly in the following sections.

Basic operations on multisets
Classical set theory is based on the concept of set as unordered assembly of elements, different from one another. Theory of multisets assumes presence of equal ("indistinguishable") elements: v ¼ a 1 , …, a 1 |fflfflfflffl ffl{zfflfflfflffl ffl} Expression (26) is recorded as: where v is called multiset, a 1 , …, a m -objects, n 1 , …, n m -multiplicities of these objects, and n 1 Á a 1 , …, n m Á a m -multiobjects. Following Eq. (27), one may consider v as set of multiobjects; also, from substantial point of view, set a 1 ; …; a m f gand multiset 1 Á a 1 ; …; 1 Á a m f g are equivalent. Empty multiset, as well as empty set, is designated as ∅ f g. Multiplicity of object may be zero, what is equivalent to absence of this object in the multiset: Fact that object a or multiobject n Á a belongs to multiset v ("enters v") is designated by one and the same symbol ∈ : There are five main operations on multisets, used lower: join, intersection, addition, subtraction, and multiplication by constant [26,27].
Consider two multisets: Result of their join (recorded as ∪) is multiset.
where ∪, ∩ andÀ designate operations of set-theoretical join, intersection, and subtraction of two sets correspondingly, while ⋃ designates operation of settheoretical join of sets determined by underwritten conditions.
Result of v, v 0 multisets intersection (recorded as ⋂) is multiset.
Result of v, v 0 multisets addition (recorded as bold þ) is multiset.
Result of v 0 multiset subtraction from v multiset (recorded as bold À) is multiset. v At last, result of v multiset multiplication by integer number n (recorded as v * n) is multiset.
(here integers' usual multiplication is recorded as Â) There are also two basic relations on multisets: inclusion (⊆) and strict inclusion (⊂).
Multiset v is included to multiset v 0 , that is, v⊆v 0 , if and multiset v is strictly included to multiset v 0 , that is, v⊂v 0 , if v⊆v 0 & v6 ¼v 0 : All defined operations are known from widespread sources (e.g., aforementioned [26,27]). At the same time, filtering operations, defined lower, operate sets of multisets (SMS), creating subsets of these sets by selection of multisets, which satisfy some conditions, being operands of these operations.
There are two types of conditions: boundary and optimizing. Boundary condition may be elementary or concatenated (for short, "chain"). Elementary boundary condition (EBC) may have one of the following forms: aρn, where a and a 0 are the objects, n is the integer number, and ρ ∈ , ; ¼; ≤ f g . Chain boundary condition (CBC) is constructed from elementary by writing them sequentially: where e 1 , …, e mþ1 are the objects or nonnegative integers, while ρ 1 , …, ρ m are the symbols of relations ( , , ¼ , ≤ ).
EBC semantics is following. Let V be set of multisets, and v ∈ V. Multiset v satisfies EBC nρa, if n Á a ∈ V, and nρn is true. Similarly, v satisfies EBC aρn, if nρn is also true. At last, v satisfies EBC aρa 0 , if n Á a ∈ v, n 0 Á a 0 ∈ v, and nρn 0 is true. There is one addition to all listed definitions, concerning particular case, when n Á a ∉ v n 0 Á a ∉ v À Á , which is equivalent to n ¼ 0 n 0 ¼ 0 À Á . CBC semantics is defined as follows. CBC (39) is replaced by CBC sequence e 1 ρ 1 e 2 , e 2 ρ 2 e 3 , …, e i ρ i e iþ1 , …, e m ρ m e mþ1 , and v ∈ V is considered satisfying CBC (39), if it satisfies all EBC having place in Eq. (40).
Result of application of boundary condition b to SMS V is recorded as V↓b.
3 Á employee f g , and boundary conditions are 2 ≤ analyst ≤ 4, assistant , employee, 1 ≤ director ≤ assistant ≤ 3, and analyst ¼ assistant , 5: Table 1 contains result of application of listed boundary conditions to V.∎ Optimizing condition has form a = opt, where a is the object, and opt ∈ min; max f g . Semantics of this construction is following. Multiset v ∈ V satisfies condition a ¼ min, if for every v 0 ∈ V, such that v 6 ¼ v 0 , multiplicity n in multiobject n Á a ∈ v is not greater, than multiplicity n 0 in multiobject n 0 Á a ∈ v 0 , that is, n ≤ n 0 . Similarly, v ∈ V satisfies condition a ¼ max, if for every v 0 ∈ V, such that v 6 ¼ v 0 , multiplicity n in multiobject n Á a ∈ v is not less, than multiplicity n 0 in Filter is join of boundary F ≤ and optimizing F opt subfilters: where F ≤ is set of boundary conditions, and F opt is set of optimizing conditions. Result of filtration of set of multisets V by filter F is denoted as V↓F and is defined by expression and c 1 , …, c k are EBC. As seen, set V is filtered by boundary conditions, so there are selected multisets, satisfying all of these conditions, and intermediate result V 0 is then filtered by optimizing conditions, so, that multisets, satisfying all of them, are included to the final result. Table 1.
Results of application of boundary conditions. Filtration is performed as follows: Due to commutativity of set-theoretic join and intersection operations, filtration inside subfilters may be executed in the arbitrary order.

Multiset grammars
As mentioned higher, multiset grammars are tool, providing generation of one multisets from another, or, what is the same, generation sets of multisets.
By analogy with classical grammars, operating strings of symbols [32,33], we shall define multigrammar as a couple.
where v 0 is a multiset called kernel, while R, called scheme, is finite set of the socalled rules, which are used for generation of new multisets from already generated. Rule has the form: where v (left part of the rule) and v 0 (right part of the rule) are multisets, and v 6 ¼ ∅ f g: Semantics of rule is as follows. Let v be multiset; with that we shall speak, that rule (48) is applicable to v, if v ⊆ v, and result of its application is a multiset.
Speaking informally, if v includes v, then the last is replaced by v 0 . Application of rule r ∈ R to multiset v is denoted as v) r v 0 , and any sequence v) r … ) r 0 v 0 is called generation chain.
Set of multisets, defined by MGs S ¼ v 0 ; R h i, is denoted as V S . Iterative representation of MG semantics, that is, SMS V S generation by application of MG S, is the following: where As seen, function (53) implements application of rule v ! v 0 to multiset v as described higher. As a result of i + 1-th step of generation, new SMS is formed by application of all rules r ∈ R to all multisets v ∈ V i ð Þ , and it is joined to SMS V i ð Þ . If multiset v 0 is generated from multiset v by some sequence of such steps, it is denoted as v) * v 0 .
V S is fixed point of the described process, that is, and V S is finite. In the introduced notation, V S includes subset V s ⊆V s of the so-called terminal multisets (TMS) v ∈ V s such that π v; r ð Þ ¼ ∅ f g for all r ∈ R, that is, no one multiset may be generated from terminal multiset. Set V S is called final; final set consists of terminal multisets only.
(for short, identical parts of different generation chains are omitted). So By analogy with classical string-generating grammars, multigrammars may be context-sensitive and context-free (CF). In the last one, left parts of all rules have form 1 Á a f g, while in the first, there are no any limitations on both parts of rules, excluding, that left part must be nonempty multiset.

Unitary multiset grammars and metagrammars
Start point for unitary multigrammars (UMGs), developed on the considered basis, is simplified representation of CF rules: instead of they are written as: Construction (56) is called unitary rule (UR), object a-its head, and unordered sequence (list) n 1 Á a 1 , …, n m Á a m -its body.
Let us consider UMG formal definition and illustrating example. Unitary multigrammar is couple S ¼ a 0 ; R h i, where a is the so-called title object, and R, as in multigrammars, is scheme-set of unitary rules (56).
Iterative representation of UMG semantics, i.e., generation of SMS V S , where S ¼ a 0 ; R h i, is following: where Here, A s is set of the so-called terminal objects, such that a ∈ A s , if and only if R does not include URs, which head is a (i.e., a has place only in the UR bodies). A s is subset of set A s of all objects, having places in scheme R of UMG S. Multiset, generated by UMG S, all objects of which are terminal, is also called terminal multiset (as seen, this notion of TMS does not contradict to the defined higher regarding MGs). In Eq. (61), UR a ! n 1 Á a 1 , ⋯, n m Á a m is written in the angle brackets for unambiguity.
As seen, Eq. (59) defines V S -set of all multisets, generated by UMG S,-while Eq. (60) by condition β v ð Þ ⊆ A s provides selection of V S -set of terminal multisets (STMS)-from V S .
Let us now give strict definition of unitary multimetagrammar notion. UMMG S ¼ a 0 ; R; F h idefines set of terminal multisets V S in such a way: and, at last, if r is a μ 1 Á a 1 , …, μ m Á a m , then r• n 1 ; …; n l h iis unitary rule. where As seen, according to Eqs. (75) and (76), all multiplicities-variables of unitary metarule a μ 1 Á a 1 , …, μ m Á a m are replaced by their corresponding values from the tuple n 1 ; …; n l h i , while all multiplicities-constants (elements of positive integer numbers set N) remain unchanged. Evidently, if all μ 1 , …, μ m are constants, that is, if unitary metarule is UR, it remains unchanged.
Let us note, that multiplicities-variables area of actuality is whole UMMG scheme, that is, if there are n > 1 occurrences of one and the same variable γ in different unitary metarules (and, of course, in one and the same unitary metarule), they all are substituted by one and the same value from the applied sequence n 1 ; …; n l h i . Example 7. Let us consider UMMG S ¼ , company, R, F>, where scheme R contains following three unitary metarules: and filter F includes following conditions, the first being boundary, the secondoptimizing, while the last two-variable declarations: to scheme R, thus creating scheme R 0 , which contains Eq. (78) and all elements of R, and substituting all optimizing conditions of the form γ ¼ opt by γ ¼ opt in filter F, thus converting them to the "canonical" form (65)-remember, γ is object not variable and, more, terminal object, because there is no any UR or UMR with head γ in R. Obtained filter will be denoted as F 0 .
As seen now, UMMG S 0 ¼ a 0 0 ; R 0 ; F 0 generates terminal multisets of the form: n i 1 Á a i 1 ; …; n i k Á a i k ; n 1 Á γ 1 ; …; n l Á γ l È É , where and TMS (79) will be selected to V S 0 , if and only if TMS (80) satisfies all conditions, entering F and concerning terminal objects a i 1 , …, a i k , as well as TMS n 1 Á γ 1 ; …; n l Á γ l f g satisfies all optimizing conditions of the form γ i ¼ opt ∈ F 0 , corresponding γ i ¼ opt ∈ F:.
It is not difficult to define V S by subtracting from all v 0 ∈ V S 0 multisets of the form n 1 Á γ 1 ; …; n l Á γ l f g , but from the practical point of view, it is more useful to consider not V S but V S 0 as a result of application of unitary multimetagrammar S: it is clear that all v 0 ∈ V S 0 contain values n 1 , …, n l of variables γ 1 , …, γ l as terminal objects γ 1 , …, γ l multiplicities, which computation is often main purpose of the mentioned application.
Example 8. As may be seen, problem, described in Section 2, is to obtain m quantity of laboratories, as well as n and l quantities of analysts and assistants, respectively, in one laboratory. Although Eqs. (18)- (20) do not contain optimizing conditions of the form γ ¼ opt, generating TMS like 100 Á employee; 115000 Á eur; 3 Á m; 2 Á n; 5 Á l f g is much more useful than TMS like {100Áemployee, 115,000Áeur} because of Eq. (81) with greater informativity (here, we use m, n, l instead of n, m, l). ∎ So we shall use V S 0 as a result of S ¼ a 0 ; R; F h iunitary multimetagrammar application, even if R does not include variable-containing optimizing conditions.
To finish with syntax and semantics of UMGs/UMMGs, let us note that class of unitary multigrammars is strict subclass of filtering unitary multiset grammars (UMGs ⊂ FUMGs): every UMGs is FUMGs with empty filter. From the other side, FUMGs are strict subclass of unitary multiset metagrammars (UMGs ⊂ FUMGs): every FUMGs is UMMGs without variable multiplicities and corresponding variable declarations inside filter.
UMG/UMMG algorithmics and applications are considered in the separate chapter of this book.