Program Slicing Based on Monadic Semantics

Syntax: p :: = program ide is b b :: = d begin c end d :: = const ide = l.e | var ide : t | d1; d2 c :: = ide := l.e | c1; c2| skip | read ide | write l.e |while l.e do c endwhile | if l.e then c1 else c2 endif


Introduction
A program slice consists of those statements of a program that may directly or indirectly affect the variables computed at a given program point (Weiser, 1984). The program point p and the variable set V, denoted by <p, V>, is called a slicing criterion. Program slicing has applications in software testing and debugging, measurement, re-engineering, program comprehension and so on (Kamkar, 1995;Tip, 1995;Harman, 2001;Binkley, 1996;Gallagher, 1991).
Program slicing algorithms can be roughly classified as static slicing and dynamic slicing methods, according to whether they only use statically available information or compute those statements that influence the value of a variable occurrence for a specific program input. Most of the existing slicing algorithms rely on relation graphs such as system dependence graphs (SDG) or program dependence graphs (PDG). These slicing methods are incremental, sequential, not combinatorial or not parallelizable easily for multi-core systems. However modern programming languages support modularized programming and programs might consist of a set of modules. So the program analysis should reflect this design technology, and their methods (including program slicing) should be flexible, combinable, and parallelizable for improving the efficiency.
As the behavior of a program is determined by the semantics of the language, it is reasonable to expect an approach for program slicing based on formal semantics of a program. On the basis of this view, this paper proposes an approach for program slicing based on modular monadic semantics, called modular monadic slicing. It can compute slices directly on abstract syntax, without explicit construction of intermediate structures such as dependence graphs.
The program slicing methods focused on the semantics of programs can be found in ref. (Hausler, 1989;Ouarbya, 2002;Venkatesh, 1991). These methods are based on the standard denotational semantics of a program language. As mentioned in ref. (Moggi, 1991;Liang & Hudak, 1996;Wansbrough, 1997;Mosses, 1998;Zhang & Xu, 2004), traditional denotational semantics lack modularity and reusability. A better solution was to use monads (Moggi, 1991) to structure denotational semantics, with the help of monad transformers (Moggi, 1991;Wadler, 1992;Espinosa, 1995) which can transform a given monad into a new one with new operations. S. Liang et al. used monads and monad transformers to specify the semantics of programming language; called it modular monadic semantics. In this paper, we will employ it in our program slicing algorithm.
In our previous work (Zhang et al, 2004(Zhang et al, , 2005(Zhang et al, , 2006Zhang & Xu, 2005;Wu et al. 2006), we abstracted the computation of program slicing as a simple slice monad transformer, which took only the label set L as its parameter, without reflecting explicitly the change of the slice table Slices. In ref. (Zhang, 2007), we presented theoretical foundation for our previous work. Based on these theories of the monadic slicing and from the view of the practical implementation, this paper will redesign the static slice monad transformer. The extensibility and reusability of our monadic method will be showed by easily introducing a new program feature (such as pointers) to slicing analysis.
The rest of the paper is organized as follows: In Section 2, we briefly introduce and illustrate the concepts of modular monadic semantics through a simple example language. The computation of program slicing is abstracted as slice monad transformer in Section 3. In Section 4, we discuss and illustrate our static slicing algorithm in detail. In Section 5, we show how our slicing algorithm can be readily adapted to an extension for the example language with pointers. In Section 6 and 7, we address the implementation, the time and space complexity analysis. We conclude this paper with directions for future work in Section 8.
Along the paper, the presentation follows the monadic semantics style. We use Haskell 1 notation with some freedom in the use of mathematical symbols and declarations. For brevity and convenience, we will omit the type constructors in some definitions.

Modular monadic semantics
In this section, we briefly review the theory of monads (Wadler & Thiemann, 2003;Moggi, 1989), monad transformers and modular monadic semantics (Liang, 1998). Readers familiar with these topics may skip the section, except for the last three paragraphs (about the syntax and monadic semantics of an example language).

Monads and monad transformers
Monads, originally coming from philosophy, were discovered in category theory in the 1950s and introduced to the semantics community by Moggi in 1990s (Moggi, 1989. After this work, Wadler popularized Moggi's ideas in the functional programming community (esp. in Haskell) (Wadler, 1995). In the monad-based view of computation, a monad is a way to structure computations in terms of values and sequences of computations using those values (Newbern, 2002). The monad determines how combined computations form a new computation and frees the programmer from having to code the combination manually each time it is required. From this view, a monad can be thought as a strategy for combining computations into more complex computations.
In Haskell, monads are implemented as a type constructor class with two member operations/functions.
Here, return is the Haskell name for the unit and >>= (pronounced "bind") is the extension operation of the monad. The above definition of the monad class means: a parameterized type m (which may think of as a function from types to types) is a monad if it supports the two operations return and >>= with the types given. Using the combinator analogy, a monad m is a combinator that can apply to different values. m a is a combinator applying to a value of type a. The return operation puts a value into a monadic combinator. The >>= operation takes the value from a monadic combinator and passes it to a function to produce a monadic combinator containing a new value, possibly of a different type. The >>= operation is known as "bind" because it binds the value in a monadic combinator to the first argument of an operation.
To be a proper monadic combinators, the return and >>= operations must work together according to some simple laws. Monads laws state in essence that >>= operation (sequential composition) is associative, and return is its unit/identity. Failure to satisfy these laws will result in monads that do not behave properly and may cause subtle problems when using the do-notation 2 .
A monad (call it m) therefore defines a type of computation. The nature of the computation is captured by the choice of the type m. The return operation constructs a trivial computation that just renders its argument as its result. The >>= operation combines two computations together to make more complex computations of that type.
To make the use of monads more convenient, we adopt the following syntactic sugar (which is similar to S. Liang's notation and the do-notation in Haskell as well): In practice, the computations can't be performed in isolation. In this case, we need a monad that combines the features of the two monads into a single computation. It is impossible in general to combine two monads to form a new monad. Moreover, it is inefficient and poor practice to write a new monad instance with the required characteristics each time a new combination is desired. Instead, there is the technique, called monad transformers (Liang, 1998), which can transform a given monad into a new one that has both the new operations and maintains those of the former monad. The concept of monad transformers was rediscovered by D.Espinosa in Moggi's original work. He developed a system, Semantic Lego, which implemented Moggi's original monad constructors to give a modular semantics for languages.
In Haskell, a monad transformer can be defined as any type constructor t such that if m is a monad, so is "t m", by using the two parameter constructor class MonadTrans: The member function lift lifts a monadic computation in the inner monad m into the combined monad "t m". Furthermore, we expect a monad transformer to add features, without changing the nature of an existing computation. This can be obtained by the properties of lift function above (also called monad transformer laws). The monad transformer laws guarantee the basic lifting property that any program, which does not use the added features, should behave in the same way after a monad transformer is applied. Intuitively, these laws say that lifting a null computation brings about a null computation, and that lifting a sequence of computations is equivalent to first lifting them individually, and then combining them in the lifted monad.
For example, Figure 1 gives the environment monad transformer, EnvT, which can be used to add environment reading functionality to other monads. In Figure 1, the functions rdEnv and inEnv, return the current environment and perform a computation in a given environment, respecitively.  (Liang,1998).

Modular monadic semantics
Modular monadic semantics specifies the semantics of a programming language by mapping terms to computations, where the details of the environment, store, etc. are hidden within a monad. This is difference from traditional denotational semantics, which maps a term (an environment or a continuation) to an answer. The modular monadic semantics is composed of two parts: modular semantic building blocks and monad transformers. Semantic building blocks define the monadic semantics of individual source language features. They are independent of each other. building blocks. The crucial property of modular monadic semantics is the division of the monad m into a series of monad transformers, each representing a computation. As mentioned in the previous section, monad transformers provide the power to represent the abstract notion of programming language features, but still allow us to access low-level semantic details. The concept of lifting allows us to consider the interactions between various features. In some sense, monad transformers can be designed once and for all (Liang, 1998), since they are entirely independent of the language being described. From this view, we can draw the computation of program slicing as an entity that is independent of the language being analyzed. This will be discussed in the next section. Before doing it, we illustrate the modular monadic semantic description of a very simple imperative programming language W.
The W language considered in this paper is very similar to the language described in ref. (Slonneger & Kurtz, 1995).The abstract syntax and semantic building blocks of the W language are provided in Figure 2 and 3 respectively. In Figure 3, the identifier Fix denotes a fixpoint operator; xtdEnv and lkpEnv are the updating and lookup operators of environments Env, respectively; updSto and alloc are the updating and allocation functions of stores Loc, respectively; rdEnv and inEnv are the basic operators of the enviornment monad EnvMonad (given in Figure 1); putValue and getValue are the writting and reading functions of I/O actions, respectively. Following Venkatesh's assumption for expressions in ref. (Venkatesh, 1990), we also assume that the labeled expressions have no side-effects. The expressions, whose syntax is left unspecified for the sake of generality, consist of operations over identifiers and are uniquely labeled. The label is for the entire expression. In modular monadic semantics, the monad definition is simply a composition of the corresponding monad transformers, applied to a base monad. In this paper, we use the input/output monad IO as the base monad. We then select some monad transformers, say StateT and EnvT, and apply them to the base monad IO, forming the combined monad ComptM: The environment monad transformer adds an environment to the given monad. The return function ignores the environment, while > >= passes the inherited environment to both subcomputations. Equipped with the monad transformers, the resulting monad ComptM can support all of the semantic building blocks in Figure 3, which gives the formal semantic description we expected.

Static slice monad transformer
As mentioned above, each monad transformer represents a single notion of computation.
Since static program slicing can be viewed as a computation, we can abstract it as a language-independent notion of a computation by using a static slice-monad transformer Domains: r: Env (Environments); loc: Loc (Stores); s: State (States); v: Value (Values) Semantics Functions: SliceT. Its definition is given in Figure 4, where l denotes a set of labels of expressions that were required to compute the current statement; st denotes a slice table Slices whose data structure is defined as follows: In the similar way as ref. (Zhang, 2005(Zhang, , 2007, the following theorems are straightforward. These theorems guarantee the correctness of the definition of the slice monad transformer in

A monadic static slicing algorithm
The static slice for a variable in a program is the collection of all possible computations of values of that variable. In this section, we only consider end slicing for a single variable, i.e. the slicing criterion is <p, v>, where v the variable of interest, and p the end program point. One can easily generalize this to a set of points and a set of variables at each point by taking the union of the individual slices (Binkley, 1996).
The main idea of monadic static slicing algorithms can be briefly stated as follows: for obtaining a static slice, we firstly apply the slice transformer SliceT to semantic building blocks of the program analyzed. It makes the resulting semantic description include program slice semantic feature. According to the semantic description, we then compute static slices of each statement in sequence. Finally we will obtain the static slices of all single variables in the program. In fact, with the process of analyzing a program, the Slices table, which includes the current individual program slices for all variables of this program, is modified steadily according to the monadic slicing algorithm.
Concretely, with respect to the example program W mentioned in Section 2, Figure 5 gives the main part of our monadic static algorithm. Figure 5 gives the rules of when and how to modify the current slice table. It adds the computation of static slicing into program analysis modularly, with the help of the monad transformers SliceT given in Section 3. SliceT can be composed with other transformers such as EnvT and StateT as follows, and apply them to monad IO, forming the underlying monad ComptM: Figure 5  In addition, the updSli operation of the Slices type is applied to record the result of the static slicing in program analysis. In case of the language W, only when describe the semantics of assignment statement and initial assignment statement within a variable declaration, the corresponding operator updSli should be added in as shown in Figure 5.
A static slice includes the statements that possibly affect the variable in the slicing criterion. Therefore, for capturing these possible statements, in Figure 5 we ought to add the operator mrgSli into semantic descriptions of conditional statement and loop statement.
After the last statement of a program is analyzed, we could obtain, from the result Slices table, the static slice of each single variable (say var) of the program, which is the set of labels of all expressions influenced on the var variable: For getting the final result of the static slices, i.e., a syntactically valid subprogram, we -following Venkatesh (Venkatesh, 1990) --define Syn(s, L) for language W in Figure 6, where s is a W-program analyzed. It guides us how to construct a syntactically valid subprogram of s from the set L. It just gives a strategy for returning the final result/subprogram from a given set L, so it could be changed to cater to different people's need. For example, if one does not consider variable declaration as part of slices, then one might change the corresponding term in Figure 6 as follows: "var ide : t " : "skip" The correctness proofs of our monadic static slicing algorithms can refer to their termination theorem and their consistency with PDG-based slicing algorithms, given in ref. (Zhang, 2007). In fact, the term L and For more about the algorithm, we now illustrate to use the rules in Figure 5 to compute the static slice w.r.t. <8, sum> of an example W program in Figure 7. Its each expression is uniquely labeled through the label (marked in source program) of the place where the expression presences. So the fourth expression is "i := 1".
According to the rule/semantics of assignment statements in Figure 5, after the third expression (i.e. "sum := 0") is analyzed, its intermediate set L (whose initial value is ) is changed to L: Where T is the current slice table, including the static slices of the "i" and "sum" variables, written briefly as L(i) and L(sum), respectively. Since this expression is an assignment one, the related data in Slices need to update through updSli, i.

Extending W language with pointers
The modular monadic approach mentioned previously is flexible enough that we can easily introduce a new program feature to analysis. In this section, we will illustrate this power by considering an extension of the language W with pointers. We shall show how to adapt the implementation to this extension, with a small change in our existing monadic slice algorithm.
The introduction of a pointer will lead to aliasing problems (Horwitz, 1989;Hind, 1999) (i.e. multiple variables access the same memory location), so we need pointer analysis to obtain the corresponding data dependency information. In order to represent the unbounded data structures in a finite way for the presence of pointers, we consider that all point in the same procedure applied to form an array of heap space, and will deal with this array as a whole. The extended algorithm combines the point-to analysis by data-flow iteration with forward monad slicing. With the illumination of this idea, the key issue to be addressed is the solution to assignments. The other statements (such as conditional statements, loop statements, etc.) could be resolved by adding point-to computation to the existing slicing methods. Before going on, we introduce a data structure for point-to analysis.
Similar to the Slices datatype in Section 3, we design an abstract datatype PT for point-to analysis: The point-to table, PT, is a table of pairs of a single variable and its associated point-to set (a set of variables). It has five operators getPT, setPT, lkpPT, updPT and mrgPT, which return and set the current table of point-to sets, lookup a point-to set corresponding to a variable in a given table of point-to sets, update some point-to sets corresponding to a list of variables in a given table of point-to sets, and merge two table of point-to sets into one table, respectively.
With the pointers, we sometimes need to update the slices or the point-to sets of some variables at the same time, so we extend the operator xtdSli for Slices datatype, and the operator xtdPT for PT datatype as follows: Now we can study in depth the assignment statements with pointers. For simplicity, we only consider single dereference of a pointer (e.g. *x), since multi-dereference (e.g. **x) can be divided into multiple single dereferences. We decompose an assignment into a left-value expression (such as x or *x), and a right-value expression (also notated as l.e, but may contain *y or &y). So we need to expand the definition of Refs(l.e) in Section 4. The variables appeared in a right-value expression can be divided into three categories: reference variables, dereference variables and address variables. So we have lkpPT(y, getPT), where y is a dereference variable}

Refs(l.e) = {x | x is a reference variable}  {y | y is a dereference variable}
The detail algorithm of Refs(l.e) is shown in Figure 8, where the PtInfo(l.e) function can obtain the point-to information generated by l.e.
The algorithm in Figure 8 addresses the issues of the reference and point-to information of a right-value expression, and hence facilitates the expansion of the existing slicing algorithm in Figure 5. The final expansion for the static slices of a W program with pointers is shown in Figure 9, which is generated by adding the bold and blue terms in Figure 5.
By introducing point-to analysis to our previous monadic slicing, we presented (in Figure 9) an approach of monadic slicing for a program with pointers. This approach obtained the point-to information through the data-flow iteration. Being different from the traditional methods where the point-to information and slicing are analyzed in two different phases, they are computed in the same phase in our method, by combining the forward monad slicing with data-flow iteration. Instead of recording point-to information for every statement, we only need to record the information for current analysis statements. So our method saves space without losing the precision. In addition, our approach also reserves the excellent properties of compositionality and language-flexibility from the original monadic slicing method.

Implementation and complexity analysis
In this section, we will implement the monadic slicing algorithms, and analyze its complexity as well.
Because of the use of monad transformers, modular denotational semantics achieves a high level of modularity and extensibility. Despite this, it is still executable: there is a clear operational interpretation of the semantics (Wadler, 1995). In ref. (Liang, 1995(Liang, , 1998Wadler, 1995), some modular compilers/interpreters using monad transformers were constructed. On the basis of these works, our monadic approach for static program slicing is feasible. Based on Labra's language prototyping system LPS (Labra et al, 2001), we developed a simple monadic slice prototype MSlicer (for more, see ref. (Zhang, 2007) or its website: https://sourceforge.net/projects/ lps). Its implementation language is Haskell, which is a purely functional language with lazy evaluation. The beauty of laziness allows Haskell to deal with infinite data, because Haskell will only load the data as it is needed, and because the garbage collector will throw out the data after it has been processed. Using higher-order functions from the libraries, Haskell modules can be written to concisely describe each language feature (Peterson et al, 1997;Thompson, 1996). Features such as arbitrary precision integer arithmetic, list comprehensions, infinite lists, all come in handy for the effective monadic slicing of a large program.    Figure 10 gives the framework of the monadic slicer MSlicer. Figure 11 gives the static slice results for the example program in Figure 7 from our current monadic slicer. The final results also include the output value, analysis trace, static slice table and CPU time.
In practice, in order to obtain good performance, we choice the Haskell type IntSet instead of the original set type of Labels (i.e. [Int]) in Section 3. The implementation of IntSet is based on big-Endian Patricia trees (Okasaki & Gill, 1998;Morrison, 1968). This data structure performs especially well on binary operations like union and intersection. Many operations have a worst-case complexity of O(min(n,W)), where n is the number of elements; W is the number of bits in an Int (32 or 64). This means that the operation can become linear in the number of elements with a maximum of W. The measures of system size used below are those associated with the data structure of program slice Slices (which is a Hash table).
In a modular compiler/interpreter, our slice monad transformer could be modularly and safely combined into the semantic buildings, so the complexity analysis is restricted to L and Syn(s, L) of a concrete programming language. In the case of our example language W, the intermediate label set Since we finally obtain the static slices of all variables after the last statement is analyzed, the program slice of each variable, on the average, costs O(m  n). In fact, n = O(m 2 ) at worst, for more see the following loop statement shown in Figure 12. So all of its static slices will cost the worst-case time O(m 3 ).
To analyze the space complexity of the algorithms, we pay our attention to the constructions Refs(l.e), Slices, L and L. We need space O(v  v) and O(v  m) to save Refs(l.e) and Slices, respectively. According to the definition of slice monad transformer SliceT in Figure 4, we need more intermediate labels when SliceT is applied to loop statements (e.g. while statements). So it takes the space O(k  m) to save intermediate labels, where k refers to the maximal times of analyzing the loop statements in the program (until the slice stabilizes) . The label set L will cost the space O(m). Therefore, the total space cost is By analyzing the complexity of algorithms in Figure 8 and 9, we find that the cost of point-to analysis is less than the cost of slicing. So our expansion algorithm to pointers has no additional complexity.
We have tested the complexity analysis of our monadic static algorithms by using the program with l while loop statements shown in Figure 12, which is similar to the while program in ref. (Binkley & Gallagher, 1996). From the results given in Figure 13, we can see that n = O(m 2 ) at worst. This shows that the prototype monadic slicer MSlicer without optimization (such as BDD or SEQUITUR (Zhang et al, 2003)

Related work and comparisons
The original program slicing method was expressed as a sequence of data flow analysis problems (Weiser, 1984). An alternative approach was relied on program dependence graphs (PDG) (Ottenstein & Ottenstein, 1984). Most of the existing slicing methods were evolved from the two approaches. A few program slicing methods focused on the semantics of programs.
G.Canfora et al.'s conditioned slicing (Canfora et al, 1998) adds a condition in a slicing criterion. Statements that do not satisfy the condition are deleted from the slice. M.Harman et al.'s amorphous slicing (Harman & Danicic, 1997) allows for any simplifying transformations which preserve this semantic projection. These two methods are not really based on formal semantics of a program. P.A.Hauser et.al 's denotational slicing (Hausler, 1989;Ouarbya et al, 2002) employs the functional semantics of a program language in the denotational (and static) program slicer. G.A.Venkatesh (Venkatesh, 1991) also took account of denotational slicing with formal slicing algorithms including dynamic and static. This approach is indeed based on the standard denotational semantics of a program language. The language Venkatesh considered is a very simply one without pointers. We have extended it in this paper to a more realistic programming language containing pointers, but take an entirely different approach called modular monadic slicing.
Compared with the existing static slicing algorithms, the monadic static-slice algorithm has excellent flexibility, combinability and parallelizability properties, because it has abstracted the computation of static slicing as an independent entity, static slice-monad transformer. Our algorithm has allowed that static slices could be computed directly on abstract syntax, with no needs to explicitly construct intermediate structures such as dependence graphs.
In respect of accuracy, in Section 4 or in ref. (Zhang, 2007) we have stated that the slice results of monadic static slicing algorithm are not less precise than PDG-based ones. This is because According to the complexity analysis (in Sections 6) for monadic slicing algorithms, their time complexity of each variable is averagely O(m 3 ) time. While the intra-procedural slicing algorithms based on dataflow equations can compute a slice in O(v  n  e) time, or averagely in O(n  e) time for each variable, where n is the number of vertices in the control flow graph (CFG) and e the number of edges in CFG (Weiser, 1984). Although the PDGbased algorithms extract slices in linear time (i.e. O(V + E), where V and E are the number of vertices and edges in the slice, respectively) after the PDG has been computed, a PDG can be constructed in O(n  e + n  d) time, where d is the number of definitions in the program (Tip, 1995). Here V, n, e and d are the same complexity level of m, so the whole time of PDGbased algorithms (including the PDG-construct time) is also O(m 3 ) nearly.

Conclusions and future work
In this paper, we have proposed a new approach for program slicing. We have called it modular monadic program slicing as it is based on modular monadic semantics. We have abstracted the computation of program slicing as a language-independence object, slice monad transformer. Therefore, the modular monadic slicing has excellent flexibility and reusability properties comparing with the existing program slicing algorithms. The modular monadic slicing algorithm has allowed that program slices could be computed directly on abstract syntax, with no needs to explicitly construct intermediate structures such as data flow graphs or dependence graphs.
As the behavior of a program is determined by the semantics of the language, it is reasonable to present the modular monadic program slicing. Furthermore, it is feasible, because modular monadic semantics is executable and some modular compilers/interpreters have already been existed.
For our future work, we will analyze slicing for programs with special features such as concurrent, object-oriented, exceptions and side-effects, by combining slice monad transformer with existing ones such as concurrent (Papaspyrou, 2001), object-oriented (Labra, 2002), non-determination, exceptions and side-effects (Moggi, 1991;Wadler, 1992Wadler, , 2003. At the same time, we will improve our prototype of monadic slicers and give more comparisons with other slicing methods in experiments.