A Class of Parametric Tree-Based Clustering Methods A Class of Parametric Tree-Based Clustering Methods

We introduce a class of tree-based clustering methods based on a single parameter W and show how to generate the full collection of cluster sets C(W), without duplication, by varying W according to conditions identified during the algorithm’s execution. The number of clusters within C(W) for a given W is determined automatically, using a graph representation in which cluster elements are represented by nodes and their pairwise con- nections are represented by edges. We identify features of the clusters produced which lead to special procedures to accelerate the computation. Finally, we introduce a related node-based variant of the algorithm based on a parameter Y which can be used to generate clusters with complementary features, and a method that combines both variants based on a parameter Z and a weight that determines the contribution of each variant.


Introduction
Clustering methods have long been a mainstay of statistics and machine learning [1][2][3], and have experienced a surge in importance with the advent of Big Data Analytics [4,5]. A highly successful use of clustering in practical applications has been to seek out particular kinds of clustering methods that are effective in particular settings, based on the finding that different classes of problems respond best to specific classes of clustering methods. This finding motivates the work of this paper, which introduces a new class of tree-based clustering methods with an ability to modify the kinds of clusters produced by changing the value of a particular parameter. Moreover, we show all members of class can be generated without duplication by a process that adaptively determines each new parameter value from the information produced by executing the class member that precedes it.
We are motivated to use a tree-based algorithm due to their applications in genome analysis [6][7][8], image segmentation [9,10], statistics [11] and microaggregation [12]. The most common forms of the tree-based clustering methods in the literature [8,[13][14][15] begin with a minimum spanning tree and then successively delete edges according to various criteria. However, our approach has a greater level of flexibility than these commonly applied methods due to the fact that the clusters produced include those that cannot be obtained by removing edges of a minimum spanning tree.
We introduce special techniques for accelerating the execution of our basic approach by exploiting its underlying properties and then introduce a closely related clustering algorithm that replaces an "edge-based" focus with a complementary "node-based" focus. We unify these two classes of approaches by identifying a third class that marries their complementary features, and which provides additional variation by means of a weight that permits the contribution of these complementary approaches to be varied along a continuum. We conclude by demonstrating how the procedures for accelerating the first method can be expressed in a more general form to accelerate the execution of the combined procedure as well.
The ability to generate a family of clustering methods from each of the three basic clustering designs by varying a single parameter (and the weight employed by the third method) invites empirical research to determine parameter ranges that are effective for specific types of clustering applications, opening the possibility to produce clusters exhibiting features different from those customarily obtained.

Cluster problem formulation
The clustering problem in our treatment is formulated by reference to a graph G = (N, E) where N = {1, …, n} is a set of nodes (cluster elements) and E is a set of edges (pairwise connections between elements) given by E ⊂N × N = {(p,q): p,q∈N}. The notation (p,q) is understood to represent an unordered pair (hence (p,q) = (q,p), and is equivalently represented by the set notation {p,q}). Each edge e = (p,q) ∈ E has an associated cost (or length) denoted by c(e) (= c(p,q)). It is not necessary to assume that G is complete or connected. We also do not require that the costs c(e) be nonnegative.
The goal is to partition N into sets (clusters) N k , k∈ K = {1, …, ko}, where the value ko is automatically determined by the clustering process. We also identify an associated set of edges E k ⊂ {(p,q), p,q∈ N k }, where the subgraph (N k ,E k ) of G constitutes a min cost spanning tree over the nodes of N k . In contrast to those tree-based clustering approaches that begin with a min cost spanning tree over all of G and selectively delete particular edges, our algorithm produces subgraphs (N k ,E k ), k ∈ K, that may not be possible to obtain by deleting edges from such a tree.
The class of clustering methods we describe is based on specifying the value of a parameter W, whose value uniquely determines the outcome of each clustering method within the class. W is expressed as an additive threshold for selecting edges and hence nodes to be added to a current construction (collection of subgraphs), and observe that W can equally be expressed as a multiplicative threshold in the case where the costs are nonnegative and the two approaches are equivalent in this instance.
We start with any selected value W = Wo≥ 0 and after obtaining a collection of clusters C(W) for a given W we systematically modify W so that over successive iterations all possible cluster collections C(W) for W ≥ Wo will be generated without duplication. The complete range of cluster collections results by choosing Wo = 0 (or Wo = 1 in the multiplicative version).

Algorithm to generate the cluster collections C(W)
In overview, we index the edges of E in ascending cost order so that c(e(1)) ≤ c(e(2)) ≤ … ≤ c (e(|E|)), and identify the nodes of edge e(s) by writing e(s) = (p(s), q(s)). We start with each cluster N k consisting of just the node k, that is, each cluster is a degenerate single node tree given by.
The associated set E k of edges in the tree corresponding to N k is empty (E k = ∅). As the algorithm progresses, the composition of the clusters will change and the index set K of clusters will change accordingly.
In addition, we keep a cost value denoted by MinCost(k) for each k ∈ K which identifies the cost of the minimum cost edge e ∈ E k . To begin, since no cluster yet contains an edge, we define MinCost(k) = Large, a large positive number, for all k ∈ K. (We will not have to examine the set E k to identify MinCost(k) = Min(c(e): e ∈ E k ) because the structure of the algorithm will insure that MinCost(k) will equal the cost of the first edge added to E k . In general, while we describe the composition of E k and the manner in which it changes, the organization of the algorithm assures that it is unnecessary to keep track of E k since the sets N k , for k∈ K, will identify the elements in the clusters produced.) We also maintain a list L(i) for each i∈ N that names the cluster that node i belongs to. Hence, initially, L(i) = (i) since i∈ Ni = {i} for all i∈ N. The redundancy provided by this list enables updates to be performed efficiently. Subsequently, L(i) is modified as node i becomes the member of a new cluster N k . As this is done, the list K will come to have "holes" in it, i.e., will not consist of consecutive indexes. (At the end of the algorithm we can rename the clusters indexes, if desired, so that K = {1, 2, …, ko} where ko = |K|.) Finally, during the process of generating the cluster collection C(W) for the current W value, we will identify a value Wnext so that the process may then be repeated for W: = Wnext to generate a new collection of clusters. As previously noted, by starting with W = Wo = 0 (or W = Wo = 1 in the multiplicative version), and then successively identifying Wnext each time a cluster collection C(W) is generated, we can ultimately generate all possible collections C(W), without duplication. The process terminates when W becomes large enough that C(W) consists of a min cost spanning tree over each connected component of G. (A simple condition for identifying this termination point is identified below.) Building on these observations, we now state the full form of our algorithm.

Begin Outer Loop
While W < Large Initialization(B). Let i′ = p(1) and i″ = q(1) and select e(1) (= (i′, i″)), to create the first nondegenerate cluster (containing more than one node and hence more than 0 edges) by identifying k′ = L(i′) and k″ = L(i″) and absorbing N k″ into N k′ to create the cluster N k′ : = N k′ ∪ N k″ = {i′, i″} with edge set E k′ = e(1). Set MinCost(k′) = c(e(1)) and conclude by eliminating the superfluous cluster N k″ (now contained within N k′ ) by setting K: = K \ {k″}. Finally, initialize the edge index s by setting s = 1.

Begin Inner Loop
While s < sLast Set s: = s + 1 and identify edge e(s) = (p(s), q(s)). Let i′ = p(s), i″ = q(s) and let k′ = L(i′) and k″ = L(i″). There are three cases: Case (1): If k′ = k″ (i′ and i″ belong to the same cluster), then continue to the next iteration of the Inner Loop.
Case (2): If c(e(s)) > W + MinCost0, for MinCost0 = Min(MinCost(k′), MinCost(k″)), then edge e(s) is forbidden to be added to join the clusters N k′ and N k″ into a single cluster. In this case, compute Wnext = Min(Wnext, c(e(s)) -MinCost0) and continue to the next iteration of the Inner Loop.
// The node and edge sets for the collection of clusters C(W) for the current W are given.
// by N k and E k for k ∈ K. The node sets can alternatively be recovered by reference to. We employ the customary convention that a loop of the form "While x < Constant" will be bypassed if the beginning value of x does not satisfy "x < Constant" and that the execution of the loop will not be interrupted if x is changed so that x ≥ Constant within the loop (though the execution will then terminate at the loop's conclusion). Hence, for example, in the Inner Loop when s: = s + 1 results in s = sLast, the loop will continue its execution until the current iteration ends.
We now make several observations about the algorithm.
Remark 1: The multiplicative version of the C(W) Algorithm results by modifying Case (2) to replace W + MinCost0 by W•MinCost0 and to replace Wnext = Min(Wnext, c(e(s)) -MinCost0) by Wnext = Min(Wnext, c(e(s))/MinCost0). (Hence, addition is replaced by multiplication and subtraction is replaced by division.) These approaches will generate the same collection of clusters under the assumption that all c(e) > 0 for the following reason: a positive value W′ can always be found for the multiplicative case that will cause Wnext to screen out the same set of elements as any positive value W for the additive case, and vice versa. This relationship can also be extended to cover the situation where all c(e)are nonnegative.
Remark 2: The assignment W = Wnext at the end of the outer loop can be replaced by setting W:= Wnext + Δ for a chosen increment Δ to generate only a subset of the possible C(W) collections. Experimentation with a given class of cluster applications may additionally lead to identifying upper and lower bounds on W (or specific intervals for W) that prove most effective for that class. (When this terminating condition occurs in a connected graph, the method will have generated a min cost spanning tree.) Moreover, if the algorithm is repeated for W = Large, the same outcome will result.
Remark 5: When Wo = 0 (or Wo = 1 for the multiplicative case), each resulting node-disjoint subgraph (N k , E k ) in the collection C(W) consists of a tree in which the cost c(e) for all edges e ∈ E k is the same.
Remark 6: In a complete graph, the algorithm will leave at most one node isolated (with N k = {k} and E k = ∅) at the conclusion of the Inner Loop for any W. In a graph that is not complete or not connected, no node that is not isolated in G will be left isolated in the collection C(W) for W sufficiently large. (To permit additional isolated nodes, a limit c lim may be imposed that prevents C(W) from including any edges e such that c(e) > c lim .) Remark 7: When there are tied (duplicate) cost values c(e), all orderings of e(1) to e(|E|) satisfying c(e(1)) ≤ c(e(2)) ≤ … ≤ c(e(|E|)) will produce the same collection of clusters C(W) in the following sense: For a given value of W, all orderings will produce the same node sets N k defining C(W), and the sum of costs over the edge sets E k will also be the same, though the edges within these sets may differ.

Fundamental relationships for accelerating the algorithm
A number of key relationships hold for the C(W) Algorithm that make it possible to accelerate its execution. We discuss the relationships here in broad outline and then incorporate them in Section 7 within a template for a computer code that applies not only to C(W) but to additional related types of cluster collections C(Y) and C(Z) whose algorithms are described in Sections 5 and 6.

Early termination of the inner loop
The Inner Loop can typically terminate far in advance of satisfying the condition s = sLast for sLast = |E|, hence making it unnecessary to examine all edges of the graph.
First note that the process of examining the edges in ascending cost order implies that once c(e(s)) > W + MinCost(k) for a given k = ko∈ K, then the inequality c(e(v)) > W + MinCost(ko) will also hold for all subsequent edges e(v) for v > s. Hence, by Case (2) of the algorithm, no nodes or edges will be adjoined to the cluster sets N ko and E ko for v > s. In addition, it will be unnecessary to update Wnext by reference to ko in the future.
It may further be observed that the MinCost(k) values are generated in a sequence that makes it possible to readily identify (without sorting) the values k(1), k(2), …, k(m), so that Min Cost(k(1)) ≤ MinCost(k(2)) ≤ … ≤ MinCost(k(m)). It is convenient to define m so that these values refer just to those k ∈ K such that MinCost(k) < Large. (Recall that MinCost(k) = Large implies that N k consists of a single node k, and E k = ∅.) Thus if c(e(s)) > W + MinCost(k(m)), we know that none of the clusters indexed from k(1) to k(m) can take part in the creation of new clusters. Alternatively, if we start by checking whether c(e(s)) > W + MinCost(k(h)) holds for h = 1 and work forward until finding the first index k(h*) for which the inequality does not hold, then on future encounters with Case (2) it is possible to start from k(h*) rather than k(1) to begin checking whether c(e(s)) > W + MinCost(k(h)).
In consideration of these relationships, it should be kept in mind that when two clusters k′ and k″ are joined, then MinCost(k″) will no longer be referenced (since the cluster k″ will no longer exist). To see the consequences of this, suppose that k′ and k″ are interchanged, if necessary, so that MinCost(k′) ≤ MinCost(k″). Then when N k″ is absorbed into N k′ , the following two possibilities arise: i. MinCost(k′) < Large (hence MinCost(k′) identifies the cost of an edge previously added) and MinCost(k′) will be unchanged; ii. MinCost(k′) = Large, and the new MinCost(k′) will be the value c(e(s)) of the edge e(s) currently added.
This implies that in the sequence MinCost(k(1)) ≤ MinCost(k(2)) ≤ … ≤ MinCost(k(m)), the value MinCost(k″) will drop out, and the value MinCost(k′) will either be unchanged and retain its position, or else it will change from a Large value to become the new value MinCost(k(m)) at the end of the ordered list.
However, applying this knowledge to shortcut the checks performed in Case (2) does not make it possible to save appreciable computation, since the amount of effort to perform the checks of Case (2) is not great in any case. Instead, we can make use of the foregoing relationships in a simpler manner without having to keep track of the values k(1), k(2), …, k(m).
To accomplish this, we record the number of elements n k in each node set N k by initializing all n k = 1, and then setting n k′ : = n k′ + n k″ when N k″ is absorbed into N k′ in Case (3). We also record the number of times t(i) each node i is encountered as a node i′ = p(s) or i″ = q(s) by initializing t(i) = 0 for all i, and then setting t(i′): = t(i′) + 1 and t(i″): = t(i″) + 1 when the edge e(s) is examined in the prelude to Case (1)of the algorithm (and also for i′ and i″ in the Initialization). Note that t(i) is bounded by tMax(i) which is the number of nodes adjacent to i in the graph G (where tMax(i) = n -1 if G is complete).
We are interested in determining when t(i) = tMax(i) for an isolated node. We can conveniently identify the condition of being isolated by i = L(i). In conjunction with the preceding records, this makes it possible to keep track of the number nTrack of nodes that cannot take part in any further steps of adding an edge to C(W), and hence permitting the inner loop to terminate when nTrack = n.
Specifically, by initializing nTrack = 0, the first time c(e(s)) > W + MinCost(k)occurs for a given k = k′ or k″ in Case (2), we set nTrack: = nTrack + n k . (To identify this first occurrence, initialize FirstTime(k) = True, and then set FirstTime(k) = False at the point of setting nTrack: = nTrack + nk.) We also set nTrack: = nTrack + 1 whenever t(i) is incremented for i = i′ and i″ in the prelude to Cases (1) to (3) to yield t(i) = tMax(i) under the condition that i = L(i). By checking for nTrack = n at each point where nTrack changes its value, we can then terminate the inner loop when this condition occurs.
Having performed the foregoing operations to terminate early for W = Wo, we may take advantage of another useful relationship to terminate early for all W > Wo. In particular, let sEnd(W) equal the value of s for the final edge e(s) added to C(W) for a given W. Then for values W′ and W″ such that W″ > W′, we are assured that sEnd(W″) ≤ sEnd(W′). Consequently, we can exploit this fact by introducing a variable sEnd which is set to sEnd = s at the conclusion of Case (3), which will cause sEnd to be the index s of the final edge added in constructing the current C(W). Then it is only necessary to set sLast = sEnd after the termination of the Inner Loop, thus overriding the initialization sLast = |E| to permit the next execution of the Inner Loop to terminate earlier. We can also allow the final execution of the Inner Loop to terminate earlier by the fact that the spanning tree generated on this execution will have n − 1 edges, while all other constructions must have fewer than this number of edges.

Advanced starting for successive W values
We can also accelerate the computation of the algorithm by saving information to produce an advanced start on successive executions of the Inner Loop. The underlying relationships are as follows. After completing the Inner Loop for W = Wo (while saving this information), upon assigning W the value Wnext = W(v 2 ), the fact that Wnext < W(v) for v < v 2 implies that the algorithm will perform exactly the same sequence of steps until reaching s = s2(v 2 ), at which point the edge e(s) for s = s2(v 2 ), will be added to the construction (although this edge was not added on the previous execution of the inner loop).
Consequently, all edges e(s3(v)) for s3(v) < s2(v 2 ) will again be added to the current construction, and the values s2(v) for v < v 2 will also be unchanged. Hence, letting v* = Max(v: s3(v) < s2(v 2 )) we can start the current construction by simply adding the edges e(s) for s = s3(1) to s3(v*), followed by adding the edge e(s) for s = s2(v 2 ) (whose index s2(v 2 ) therefore becomes recorded as the new index s3(v* + 1)). Then the customary Inner Loop for W > Wo can be executed starting with s initialized by setting s = s2(v 2 ) instead of s = 1. Subsequent executions of the Inner Loop continue to save the same information, which is used again to create an advanced start in the manner described.
By this means, we avoid examining all edges e(s) for s < s2(v 2 ) that were not added to the previous construction. We also avoid having to re-do the checks to determine that the remaining edges qualify to be added. Together this can amount to a considerable savings in computation.
A possibility arises to save additional computation by using more memory. Each time a new candidate for Wnext is identified, in the process of identifying the indexes s2(1), s2(2), …, s2(v 2 ), we can save a current copy of the arrays N k , E k , MinCost(k) and K used by the algorithm, avoiding the burden of excessive memory by overwriting the previous copy each time a new one is made. Then the latest copy will be available at the point where the edge e(s) for s = s2(v 2 ) is added on the current execution of the loop, making it possible to recover the arrays without having to regenerate them to resume the current loop.
However, it may not be possible to take advantage of a current copy of the saved arrays on every iteration of the Inner Loop (unless previous copies are not overwritten when new ones are made). After re-starting by recovering the arrays for s2(v 2 ), if now a new Wnext value is determined for s > s2(v 2 ) (referring to the v 2 of the previous execution) then we can proceed by again making a copy of the arrays for the next execution of the loop. But if it no new value of Wnext is found for s > s2(v 2 ), then the previous value W(v 2 -1) (for s = s2(v 2 -1) will be the new final Wnext value, and no copies of the arrays remain in memory for this value.
Consequently, in this latter case we resort to the construction that does not rely on the copied arrays, generating the arrays instead in the process of adding edges. Thus, on the next execution of the inner loop we will again have the copies available. Hence in this fashion we will be able to take advantage of the copied arrays at least on every second execution of the loop, if not more frequently.
As previously noted, the foregoing relationships and their implications are embodied in a format suitable for creating a computer code in Section 7, after we first describe two additional algorithmic variants that can be exploited by analogous relationships.

Algorithm C(Y): a node-based algorithmic variant
It is possible to formulate a node-based variant of the C(W) algorithm which follows a closely related format and is supported by a similar rationale.
In the node-based approach, we replace the parameter W by a parameter Y which is linked to costs associated with nodes in N k rather than to costs associated with edges in E k . (More precisely, the costs associated with nodes are also derived from edges-i.e., the edges that meet these nodes-though these edges are different from those referenced in the C(W) algorithm.) Accompanying this parameter change, we replace the value MinCost(k) associated with the sets indexed by k ∈ K with a value MinCostB(i) associated with the nodes i∈ N, and more particularly, we replace MinCost(k′) for k′ = L(i′) by MinCostB(i′), and replace MinCost(k″) for k″ = L(i″) by MinCostB(i″)).
This replacement changes the updating rule when N k″ is absorbed into N k′ in Case (3) The reason for these changes is as follows. In the node-based version, to permit the edge e(s) = (i′, i″) (= (p(s),q(s))) to be added and hence to join the subgraphs (N k′ , E k′ ) and (N k″ , E k″ ), we require that c(e(s)) ≤ Y + MinCostB(i) for both i = i′ and i″. Hence we require c(e(s)) ≤ Y + MinCostB0, for MinCostB0 = Min(MinCostB(i′), MinCostB(i″)). On the other hand, if c(e(s)) > Y + MinCostB0, we are prevented from adding edge e(s), and by the preceding relationships this causes the first part of Case (2) to retain exactly the same form as in the C(W) algorithm.
To update MinCostB(i′) and MinCostB(i″) in Case (3), we must account for the fact that each of these two values is affected only by the cost of the edge e(s), and hence will either retain its present value or become equal to c(e(s)), according to which is smaller. (It may be noted that once node i for i = i′ or i″ has been assigned an edge cost c(e(s)), then MinCostB(i) will not change thereafter, since any edge e(s) that is added later to meet node I will have a cost no less than that of the earlier edge.) Based on these observations, we can state the form of the C(Y) algorithm as follows.

C(Y) Algorithm (Node-Based Version)
Inputs: The graph G(N, E), cost vector c(e), e ∈ E, initial Yo value for Y.

End of C(Y) Algorithm
The Remarks concerning the C(W) algorithm in Section 3 can be applied as well to the C(Y) algorithm.
Now we show how to join the C(W) and C(Y) algorithms.

Algorithm C(Z): a combination of C(W) and C(Y)
Each of the C(W) and C(Y) algorithms has features lacking in the other. However, the C(Y) algorithm has a potential deficiency, which resides in the fact that it is subject to "drift"-a phenomenon where the costs of edges in an edge set E k can grow along a chain, where each new edge added to E k has a higher cost than the previous one. Such an eventuality can arise because the cost of an edge in a chain is limited only by the cost of the previous edge.
It is possible to combat drift and also take advantage of the different features of the C(W) and C(Y) algorithms by joining these algorithms to create an algorithm C(Z) that incorporates the MinCost evaluation criteria of both C(W) and C(Y) simultaneously.
Let α be a nonnegative weight applied to the edge selection criterion of C(W) and let β = 1 -αbe a nonnegative weight applied to the edge selection criterion of C(Y). We construct Algorithm C(Z) so that it will be the same as C(W) if α = 1 and will be the same as C(Y) if α = 0 (β = 1).
For notational convenience, we refer to the value MinCost(k) of the C(W) algorithm as MinCostA(k). Then the MinCost evaluation criterion of C(Z) is given by.
To apply this criterion, we create values MinCostC(i′) and MinCostC(i″) for nodes i′ = p(s) and i″ = q(s), and for k′ = L(i′) and k″ = L(i″), given by Associated with the foregoing values, we define We state the C(Z) algorithm by reference to these definitions.

C(Z) Algorithm.
Inputs: The graph G(N, E), cost vector c(e), e ∈ E, initial Zo value for Z.

Begin Outer Loop
While Z < Large

End of C(Z) Algorithm
The next section shows how to carry out accelerated updates in the context of the preceding algorithm.

Implementing accelerated updates for the C(Z) algorithm
We build on the relationships identified in Section 4 to apply them to the more general C(Z) algorithm. By the implications described earlier, our general updates also apply directly to the C(W) and C(Y) algorithms by respectively replacing Z with W and Y (hence Znext with Wnext and Ynext) and replacing MinCostC(•) by MinCost(•) and MinCostB(•).

Early termination for the C(Z) inner loop
Early termination for the Inner Loop of C(Z) is effected by creating an Initialization(C) immediately following Initialization(B) (hence inheriting the assignments of Initialization(B)) and modifying the Inner Loop as follows. We apply these changes for Z = Zo and thereafter take advantage of setting sLast = sEnd as indicated below, in order to allow for the advanced updating of C(Z) for successive Z values.

Initialization(C) for Z = Zo:
Set n k = 1, k ∈ K \ {k′}, and n k′ = 2 (hence n k = |N k |, k ∈ K); set t(i) = 0 for i∈ N \{i′, i″}and set t(i′) = t(i″) = 1 (hence t(i) identifies the number of edges e(s) = (i,j) for all s currently examined in the Inner Loop (and its initialization). Finally, set nTrack = 0 and set FirstTime(k) = True, k ∈ K (to identify the sets that have not been prevented from having an edge added to them).

Modification of Inner Loop for Z = Zo:
In the prelude to Case (1) Because of the special modifications for Z = Zo that do not apply for Z > Zo, it is convenient to insert the portion of the algorithm for Z = Zo at the very beginning of the C(Z) Algorithm, before the Outer Loop.

Advanced updating for successive Z values
We draw on the relationships of Section 4.2 to create the instructions for updating C(Z) to reduce the amount of computation required on successive iterations of the Inner Loop.
We adopt the following notation of Section 4.2, re-expressed in terms of the C(Z) algorithm: s2 ( Assume the algorithm for Z = Zo has been inserted before the start of the Outer Loop, as indicated in Section 7.1.

Modification After the Inner Loop for all Z:
Following the instructions Z = Znext, and sLast = sEnd:If v 2 = 1, terminate.

Modification Before the Inner Loop for Z > Zo:
Insert a Preliminary Loop before the Inner Loop for Z > Zo as follows: (After Initialization(A) and Initialization(B))

Added Modification to Initialization for Z = Zo:
Copy = False (where Copy indicates whether a copy is made of the arrays indicated in the modification below).

Added Modification of the Inner Loop for all Z:
Each time Case (2) yields a new value of Znext (after checking for both k′ and k″ for Z = Zo) record a copy of ne, the set K and arrays N k ,E k , MinCostA(k), k ∈K and the arrays L(i), 2 As before, we adopt the convention whereby the loop is bypassed if v 3 = 1 but will be executed on the iteration where v: = v + 1 results in v = v 3 .
MinCostB(i), MinCostC(i), i∈ N (writing over any previous copy). Let Copy = True if such a copy is made. (As remarked earlier, it is not necessary to keep track of the E k array.)

Added Modification for the Preliminary Loop
If Copy = False, execute the Preliminary Loop. Otherwise, if Copy = True, instead of executing the Preliminary Loop read the copies of the saved arrays into the active form of these arrays, followed by setting s = s2(v 2 ) and Copy = False. (The Inner Loop immediately follows this modification.) The complete C(Z) Algorithm that incorporates all of these changes is shown in the Appendix for convenience.

Conclusions
The new classes of tree-based clustering algorithms represented by C(W), C(Y) and in the most general case by C(Z), afford the possibility to generate clusters with a range of different characteristics as the parameters W, Y and Z are varied. The fact that the key parameter can be varied adaptively to generate all cluster collections in its class without duplication invites empirical studies to identify parameter ranges that are effective for particular types of clustering applications.
The C(Z) Algorithm can be made still more general by changing the implicit definition of MinCostC(i) for i = i′ and i″ by defining MinCostA(k), for k = k′ and k″, to be any convex combination of the costs of edges in the set E(N k ) = {e = (i,j) ∈ N k }⊂ E (hence (N k ,E(N k )) is the subgraph of G spanned by the nodes of N k ) and defining MinCostB(i) to be any convex combination of the edges of E(N k ) that are adjacent to node i in G. However, this requires updating these MinCost values in a more complex way than in the current form of Case (3).
Future work to exploit the properties of these algorithms can include an investigation of the choice of the parameter α (and hence β = 1 -α) in Algorithm C(Z) to similarly determine ranges that are effective for particular types of clustering applications.
interest for experimental purposes. The case where added memory is used is identified in Case (2) so that this memory need not be used if desired. (In that case, the variable Copy will always remain False, as in Initialization(B).)

C(Z) Algorithm with Accelerated Updates
Inputs: The graph G (N, E), cost vector c(e), e ∈ E, initial Zo value for Z.

Begin Outer Loop for Z > Zo
While Z < Large If (Copy = False) then.
Execute Case (3) For added memory case: Set Copy = True and record a copy of ne, the set K and arrays N k , E k , MinCostA(k), k ∈K and the arrays L(i), MinCostB(i), MinCostC(i), i∈ N (writing over any previous copy).
Continue to the next iteration of the Inner Loop.
Case (1): If k′ = k″ (i′ and i″ belong to the same cluster), then continue to the next iteration of the Inner Loop.

For added memory case:
Set Copy = True and record a copy of ne, the set K and arrays N k , E k , MinCostA(k), k ∈K and the arrays L(i), MinCostB(i), MinCostC(i), i∈ N (writing over any previous copy).