Periodic properties for 2-phenylindole-3-carbaldehyde derivatives.

## Abstract

Algorithms for classification and taxonomy are proposed based on criteria as information entropy and its production. A set of 59 antitubulin agents with trimethoxyphenyl (TMP), indole, and C=O bridge present inhibition of gastric cancer cell line MNK-45. On the basis of structure-activity relation of TMPs, derivatives are designed that are classified using seven structural parameters of different moieties. A lot of categorization methods are founded on the entropy of information. On using processes on collections of reasonable dimension, an extreme amount of outcomes occur, matching information and suffering a combinatorial increase. Notwithstanding, following the equipartition conjecture, an assortment factor appears among dissimilar alternatives resultant from categorization among pecking order rankings. The entropy of information allows classifying the compounds and agrees with principal component analyses. A table of periodic properties TMPs is obtained. Features denote positions R1–4 on the benzo and X–R5/6 on the pyridine ring in indole cycle. Inhibitors in the same group are suggested to present similar properties; those in the same group and period will present maximum resemblance.

### Keywords

- periodic law
- periodic property
- periodic table
- information entropy
- equipartition conjecture
- anticancer activity

## 1. Introduction

Experimentally, antitubulin analogues were synthesized/tested for antitubulin activity, revealing ligand-interaction principles with tubulin/related bioactivity [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]. Molecular modeling studies of antitubulin agents were performed to aid in the design of better antitubulin inhibitors [14, 15, 16]. In computer-aided drug design studies, comparative molecular field analysis (CoMFA) combined with docking calculations was applied to protein-ligand-binding complexes [17, 18, 19, 20, 21]. A class of antitubulin agents, binding at colchicine (COL) site with an indole ring, was developed and underwent examinations for binding, antitubulin polymerization, and/or anticancer effects. The discovered properties are helpful for better-inhibitor design. Half inhibitory concentrations (IC_{50}) were collected for the inhibition of gastric cancer cell MKN-45, for 59 COL-like compounds with indole and trimethoxyphenyl (TMP) rings (Figure 1), which bind at COL site [22]. The IC_{50} were measured for 24 compounds and reviewed for others: 71 compounds were collected. Trial CoMFA calculations for all gave a low leave-one-out determination coefficient *q*^{2}~0.2. Examination of functional groups showed that three ones are much more bulky than the others. Functional groups of eight are much different from others. Compounds were excluded leaving 59 substances in CoMFA calculation. With data, three-dimensional (3D)-quantitative structure-activity relationship (SAR) (QSAR) examination was performed with CoMFA [23], combined with docking calculations for compounds to illustrate correlation of functional group variations with anticancer effect. An approach was employed to examine QSAR for a number of other protein-ligand-binding complexes. Functional-group substitutions locate at sites around indole ring, i.e., R_{1–6} functional-group sites. Comparative QSAR modeling of 2-phenylindole-3-carbaldehyde derivatives was performed as potential antimitotic agents [24]. The KIT kinase mutants showed unique mechanisms of drug resistance to imatinib and sunitinib in gastrointestinal stromal tumor patients [25]. Gene expression profiling of gastric cancer was reported [26]. Natural product COL, obtained from *Colchicum autumnale*, is a bioactive alkaloid used in the treatment of a number of diseases [27]. It received considerable attention in the basic study of neoplasia by its capacity for interrupting *mitosis*, ending the process in metaphase [28]. The COL acts as an inhibitor of the polymerization of tubulin (a protein that contains eight Trp units) [29]. It was used as a probe to understand *microtubule* role in cells because of its big affinity to tubulin, in which structure presents a binding site (*colchicine domain*) [30, 31]. Tubulin is a target for cancer treatment: a number of drugs were developed to target it [32]. Binding with it, ligands interfere with its polymerization dynamics and exhibit an antitumor effect. In addition to developed drugs (*viz*. taxol, vibrestine), which bind with it at taxol/vibrestine-binding sites, COL presents a tubulin binding site and showed anticancer effects although with significant toxicity. Developing COL-like compounds with lesser toxicity represented an effort in finding better ligands to target tubulin at COL-binding site [33, 34]. A simple computerized algorithm useful for establishing a relation between chemical structures [35, 36] was proposed. The preliminary idea results the entropy of information for configuration detection. The entropy of information results was expressed based on a *similarity matrix* among a pair of chemical entities. Because the entropy of information results feebly discerning for categorization reasons, the more influential concepts of *entropy production* and *equipartition conjecture* result were presented in [37]. In previous articles, the classifications by periodic properties of local anesthetics [38, 39, 40], inhibitors of human immunodeficiency virus [41, 42, 43], and anticancer drugs [44, 45] were analyzed. The goal of the current account is expanding the promises of knowledge of the algorithm and, as compounds are unaffectedly explained by a changeable-dimension prearranged model, learning universal methods in the dispensation of prearranged information. Next goal presents a periodic classification of TMPs. A further objective is to perform a validation of the periodic table (PT) with an external property not used in the development of PT.

## 2. Computational method

The key problem in classification studies is to define *similarity indices* when several criteria of comparison are involved. The primary stage in counting resemblance for TMPs records the majority of the significant moieties. The *vector of properties* *i*_{1},*i*_{2},…*ik*,… > should be linked to each TMP *i*, whose parts match with dissimilar characteristic groups in the molecule, in a pecking order consistent the predictable significance of pharmacological potency. Whether moiety *m-th* results more important than portion *k-th* then *m* < *k*. The parts *ik* are values “1” or “0”, consistent if an alike portion of rank *k* is present in TMP *i* contrasted to the recommendation one. The examination comprises two regions of structure variation in TMP molecules: positions R_{1–4} on the benzo and locations X and R_{5/6} on the pyridine ring in the indole cycle. The TMPs are inhibitory to gastric cancer cell line MKN-45. The *structural elements* of a TMP molecule can be *ranked* according to their contribution to MKN-45 inhibition in the order: R_{1} > R_{4} > R_{2} > X > R_{5} > R_{3} > R_{6}. Index *i*_{1} = 1 denotes R_{1} = H (*i*_{1} = 0, otherwise), *i*_{2} = 1 means R_{4} = H, *i*_{3} = 1 signifies R_{2} = H, *i*_{4} = 1 stands for X = N, *i*_{5} = 1 indicates R_{5} = H, *i*_{6} = 1 represents R_{3} = OMe, and *i*_{7} = 1 implies R_{6} = CH_{2}–OH. In TMP 42, R_{1} = R_{4} = R_{2} = R_{5} = H, X = N, R_{3} = OMe and R_{6} = CH_{2}–OH; obviously its associated vector is <1,111,111>. The TMP 42 was selected as *reference* because of its greatest MNK-45 inhibition. Vectors were associated with 59 TMPs with gastric anticancer activities. Vector <1,111,110> is associated with TMP 1 since R_{1} = R_{4} = R_{2} = R_{5} = R_{6} = H, X = N and R_{3} = OMe. Mean by *rij* (0 ≤ *rij* ≤ 1) the similarity index of a pair of TMPs linked to vectors *similarity matrix* **R** = [*rij*]. The similarity index among a pair of TMPs *i*_{1},*i*_{2},…*ik*… > and *j*_{1},*j*_{2},…*jk*… > is described by:

where 0 ≤ *ak* ≤ 1 and *tk* = 1 whether *ik* = *jk* except *tk* = 0 whether *ik* ≠ *jk*. The definition allocates a weight (*ak*)*k* to whichever feature concerned about the explanation of molecule *i* or *j*. The MNK-45 gastric cancer inhibition data reported by Lin *et al*. were used for the present classification study. The *grouping algorithm* applies the *stabilized* similarity matrix obtained *via* the *max-min composition rule o* as described by:

where **R** = [*rij*] and **S** = [*sij*] that result in matrices of the same kind and (**R**o**S**)*ij*, entry (*i*,*j*)*-th* of matrix **R**o**S** [46, 47, 48, 49]. On using composition rule max-min iteratively, so that **R**(*n* + 1) = **R**(*n*) o **R**, an integer *n* results fulfilling: **R**(*n*) = **R**(*n* + 1) = … The resultant matrix **R**(*n*) is named as *stabilized similarity matrix*. The importance of stabilization stretches out in the categorization procedure, and stabilization generates a separation in displaced divisions. From the present on, it results implicitly that the stabilized similarity matrix is applied and named as **R**(*n*) = [*rij*(*n*)]. The *grouping rule* is as follows: *i* and *j* results allocated in the same division whether *rij*(*n*) ≥ *b*. The grouping of *i* (*j* that fulfills the grouping rule *rij*(*n*) ≥ *b*. The matrix of clusters results in

where *s* means whichever indicator of a molecule fitting in class *t* and *information entropy h* measures the surprise that the source emitting the sequences can give [50, 51]. We consider the utilization of a qualitative mark assay to decide the attendance of Fe in a sample of water. With no sample in the past, the analyst has to start with the pair of results supposing: 0 (Fe not present) and 1 (Fe there), which are equiprobable with likelihood 1/2. As up to a pair of elements are there in the sample solution (e.g., Fe, Ni or both), there are four achievable results neither from (0, 0) to the two being there (1, 1) *via* on a par likelihood 1/2^{2}. Which of the four options goes is decided by a pair of assays, each one with a pair of clear conditions. Likewise, with three metals, there are eight options, every one with a likelihood 1/2^{3}: three assays are necessary. The following configuration clearly connects uncertainty to information necessary to solve it. The amount of options results stated to the power of 2. The power to which 2 is lifted to provide the amount of occurrences *N* results in the logarithm to base 2 of that amount. Both information and uncertainty are described in terms of the logarithm to base 2 of the amount of achievable analytical results: log_{2} *N*. The initial uncertainty is defined in terms of the probability of the occurrence of every outcome; e.g., the definition is as follows: *I* = *H* = log_{2} *N* = log_{2} 1/*p* = −log_{2} *p*, where *I* denotes the information held in the reply provided that there were *N* options, *H*, the first uncertainty coming from the necessity of taking into account the *N* options and *p*, the likelihood of each result whether or not all *N* occurrences are evenly probable to occur. The equation can be extended to the case in which the likelihood of each result does not result the same; whether it is identified from historical experiment is proven by some metals that result in more probability than other ones, the expression results are corrected so that the logarithms of the particular likelihood appropriately weighted result in: *H* = −Σ *pi* log_{2} *pi*, where: Σ *pi* = 1. Take into account the first case but at present, historical experiment proved that 90% of the samples had no Fe. The amount of uncertainty results is computed as: *H* = −(0.9 log_{2} 0.9 + 0.1 log_{2} 0.1) = 0.469 bits. For a particular case happening with probability *p*, the amount of astonishment results is proportional to –ln *p*. Extending the outcome to a random variable *X* (that is able to present *N* achievable values *x*_{1}, …, *xN* with probabilities *p*_{1}, …, *pN*), the astonishing mean is obtained when finding out the value of *X* results –Σ *pi* ln *pi*. The entropy of information is linked to similarity matrix **R** results:

Mean is obtained by *Cb*, the collection of divisions and *b*. The entropy of information fulfills the following features. (1) *h*(**R**) = 0 whether *rij* = 0 or *rij* = 1. (2) *h*(**R**) results maximum whether *rij* = 0.5, i.e., as the ambiguity is maximum. (3) *b*, i.e., categorization directs to a deficit of entropy. (4) *b*_{1} < *b*_{2}, i.e., entropy is a monotone function of grouping level *b*. In the categorization procedure, each *hierarchical tree* matches to a reliance of the entropy of information on the classification level, and a plot *h–b* is obtained. The *equipartition conjecture of entropy production* of Tondeur and Kvaalen results is suggested as an assortment principle, between dissimilar alternatives coming from categorization between pecking order rankings. Consistent with the conjecture, for a provided custody, the top arrangement of a dendrogram results in which the production of entropy results is mainly dispersed regularly, i.e., neighboring a type of equipartition. It is gone on at this point similarly *via information entropy* in its place of thermodynamic entropy. Equipartition entails a linear relationship, i.e., a steady production of entropy of information all along the extent of *b*, so that the *equipartition line* results are explained by:

As the categorization results are disconnected, a mean of stating equipartition is a usual staircase function. The most excellent alternative results decided the one minimizing the addition of the square differences:

*Learning procedures* alike the ones met in *stochastic methods* are the results as applied in [52]. Taking into account a provided classification as *good* or perfect from practice or experience, which matches to a *reference* similarity matrix **S** = [*sij*] obtained for equivalent weights *a*_{1} = *a*_{2} = … = *a* and any amount of fabricated features. Then, take into account identical collection of molecules as in the good categorization and the real features. The similarity index *rij* results calculated with Eq. (1) provided matrix **R**. The amount of features for **R** and **S** can vary. The learning process lies in attempting to get categorization outcomes for **R** as near as likely to the *good* categorization. The primary weight *a*_{1} results obtained constant and just the next weights *a*_{2}, *a*_{3},… result exposed to random changes. A novel similarity matrix results *via* Eq. (1) and the novel weights. The distance among the classifications typified by **R** and **S** results is provided by:

The definition was suggested by Kullback to measure the distance between two probability distributions, which is an amount of the distance among matrices **R** and **S** [53]. As for each matrix a matching categorization exists, the pair of categorizations result contrasted by distance, which results a non-negative amount that approximates zero as the similarity among **R** and **S** rises. The outcome of the procedure results a collection of weights permitting proper categorization. The algorithm was utilized in the production of complicated dendrograms *via* the entropy of information [54]. Our program MolClas is an easy, dependable, effective, and quick process for molecular categorization, founded on the conjecture of the equipartition of the production of the entropy of information consistent with Eqs. (1)–(7). It reads the amount of features and molecular indices. It permits the optimization of the coefficients. It not obligatorily reads the initial coefficients and the amount of iteration cycles. The correlation matrix results are computed by the algorithm or read from input. Code MolClas permits the alteration of the correlation matrix from [−1, 1] to [0, 1]. The program computes the similarity matrix of the features in symmetric storage mode, computes categorizations, checks whether categorizations result is dissimilar, computes distances among categorizations, computes the similarity matrices of categorizations, works out the entropy of information of categorizations, optimizes coefficients, carries out single/complete-linkage hierarchical cluster analyses, and charts classification plots. It was written not only to analyze the equipartition conjecture of entropy production but also to explore the world of molecular classification. Code MolClas is different from other program MolClass as referred in the literature [55]. While MolClas classifies molecules based on hierarchical dichotomic (Boolean) descriptors, MolClass discovers SARs from molecular patterns (*fingerprints*) extracted from experimental datasets and needs to interrogate big databases (PubChem, ChEMBL, ChemBank). Code MolClas is available at Internet (

## 3. Calculation results and discussion

Matrix of Pearson correlation coefficients results computed among couples of vector properties <*i*_{1},*i*_{2},*i*_{3},*i*_{4},*i*_{5},*i*_{6},*i*_{7} > for 59 TMPs. Pearson correlations result displayed in the partial correlation diagram, which encloses high (*r* ≥ 0.75), medium (0.50 ≤ *r* < 0.75), low (0.25 ≤ *r* < 0.50), and *zer*o (*r* < 0.25) partial correlations. Couples of inhibitors with superior partial associations present a vector property alike. Notwithstanding, the outcomes have to be gotten with concern since the TMP with steady vector <1,111,111> (Entry 42) presents zero standard deviation, producing maximum partial correlation *r* = 1 with whichever TMP, which results an artifact. After the conjecture of equipartition, the intercorrelations are illustrated in the partial correlation diagram, which contains 1382 high (Figure 2, *red lines*), 109 medium (*orange*), 161 low (*yellow*), and 59 *zero* (*black*) partial correlations. Six out of 58 high partial correlations of Entry 42 were corrected; e.g., its correlations with Entries 3 and 47 are medium, its correlations with Entries 12, 15, and 43 are low, and its correlation with Entry 46 is *zero* partial correlation.

The grouping rule in the case with equal weights *ak* = 0.5 for *b*_{1} = 0.97 allows the classes:

C*–b*_{1} = (1,5–8,10,11,13,16,17,26–28,41,42,44,45,48,58,59),(2,4,9,18,19,49),(3),(12),

(14,20–25,29–33,35,50–55),(15,43),(34,36–40,56,57),(46),(47)

The nine groupings are obtained with associated entropy *h–***R***–b*_{1} = 39.44. The *dendrogram* (binary tree) matching with <*i*_{1},*i*_{2},*i*_{3},*i*_{4},*i*_{5},*i*_{6},*i*_{7} > and C*–b*_{1} is calculated [56, 57, 58]; it provides a binary taxonomy that separates the same nine classes: from top to bottom, the data bifurcate into groupings 3, 4, 8, 9, 1, 2, 5, 6, and 7 with 1, 1, 1, 1, 20, 6, 19, 2, and 8 TMPs, respectively [59]. The TMPs 42, 26, etc. with the greatest inhibitory activity are grouped into the same class. The TMPs in the same grouping appear highly correlated in the partial correlation diagram. At level *b*_{2} with *b*_{2} = 0.86, the set of classes results in:

C*–b*_{2} = (1,4–8,10,11,13,14,16–42,44,45,48–59),(2,9),(3,47),(12,15),(43),(46).

Six classes result and entropy decays to *h–***R***–b*_{2} = 16.18. Dendrogram matching to <*i*_{1},*i*_{2},*i*_{3},*i*_{4},*i*_{5},*i*_{6},*i*_{7} > and C*–b*_{2} divides the same six classes: from top to bottom data bifurcate into classes 5, 6, 1, 2, 3, and 4 with 1, 1, 51, 2, 2, and 2 TMPs, respectively. Again, TMPs with the greatest inhibitory potency belong to the same class. The TMPs in the same class appear highly correlated in the partial correlation diagram and dendrogram. An analysis of set containing 1–59 classes was performed, in agreement with partial correlation diagram and dendrograms. In view of partial correlation diagram and dendrograms, we split data into seven classes: (1,26–28,41,42,45,58,59), (5–8,10,11,13,16,17,44,48), (14,20–25,29–33,35,50–55), (34,36–40,56,57), (2,4,9,18,19,49), (3,47), and (12,15,43,46). Figure 3 displays corresponding tree. Again, TMPs with the greatest activity correspond to the same class.

The illustration of the classification above in a radial tree (Figure 4) shows the same classes, in qualitative agreement with the partial correlation diagram and dendrograms. Once more, TMPs with the greatest potency are included in the same grouping.

Program SplitsTree analyzes cluster analysis (CA) data [60]. Based on *split decomposition*, it takes a *distance matrix* and produces a graph that represents the relations between taxa. For ideal data, graph is a tree, whereas less ideal data cause a tree-like network, which is interpreted as possible evidence for different and conflicting data. As split decomposition does not attempt to force data on to a tree, it gives a good indication of how *tree-*like are given data. Splits graph for 59 TMPs in (Figure 5) shows that most TMP groups collapse: (1,2,4–11,13,16–19,26–28,41,42,44,45,48,49,58,59), (3,47), (12,15,43), (14,20–25,29–33,35,50–55), and (34,36–40,56,57); classes 1, 2, and 5 coincide. No conflicting relation appears between TMPs. Splits graph is in partial agreement with partial correlation diagram, dendrograms, and radial tree.

Usually in quantitative structure-property relationships (QSPRs), the information archive encloses fewer than 100 molecules and thousands of *X-*variables. There are a lot of *X-*variables that nobody is able to find out by *inspection* configurations, tendencies, groupings, etc. in the molecules. *Principal component analysis* (PCA) results a method helpful to *summarize* the knowledge enclosed in the **X**-matrix and place it comprehensible [61, 62, 63, 64, 65, 66]. The PCA acts by decomposing the **X**-matrix as the product of two matrices **P** and **T**. The *loading matrix* (**P**), with knowledge concerning the variables, encloses some vectors [*principal components* (PCs)], in which results are obtained as linear combinations of the first *X-*variables. In the *score matrix* (**T**), with knowledge about the molecules, each molecule result is expressed by projections on to PCs instead of original variables: **X** = **TP’** + **E**. Knowledge not enclosed in the matrices stays as *unexplained* X-*variance* in a *residual matrix* (**E**). Each PC*i* results a novel coordinate stated as a linear combination of the first characteristics x*j*: PC*i* = Σ*jbijxj*. The novel coordinates PC*i* result *scores* or *factors* whereas the coefficients *bij* result the *loadings*. The scores are sorted consistently with the knowledge regarding the entire variability between molecules. The *score-score plots* present the places of the molecules in the novel coordinate scheme, whereas the *loading-loading plots* display the position of the properties that correspond to the molecules in the novel coordinate scheme. The PCs show a pair of features. (1) The PCs result taken out in decreasing sequence of significance: the first PC encloses more knowledge than the second one, the second more than the third one, and so on. (2) Each PC results orthogonal to each other: no correlation exists between information contained in different PCs. A PCA was performed for TMPs. The importance of PCA factors *F*_{1–7} for {*i*_{1},*i*_{2},*i*_{3},*i*_{4},*i*_{5},*i*_{6},*i*_{7}} was calculated. In particular, the use of the first factor *F*_{1} explains 27% of the variability of data (73% error), the combined application of the first two factors *F*_{1/2} accounts for 45% of variance (55% error), the utilization of the first three factors *F*_{1–3} justifies 60% of variability (40% error), etc. Factor loadings of PCA were computed. Profile of PCA *F*_{1}*–F*_{2} for vector property was calculated. For *F*_{1}, variable *i*_{6} shows the maximum weight in the profile; notwithstanding, *F*_{1} is not able to be downgraded to two variables {*i*_{5},*i*_{6}} devoid of a 48% error. For *F*_{2}, variable *i*_{4} presents the maximum weight and *F*_{2} is able to be downgraded to two variables {*i*_{4},*i*_{5}} with a 5% error. For *F*_{3}, variable *i*_{7} assigns the maximum weight and *F*_{3} is able to be downgraded to two variables {*i*_{4},*i*_{7}} with a 3% error. For *F*_{4}, variable *i*_{3} consigns the maximum weight; however, *F*_{4} is not able to be downgraded to two variables {*i*_{2},*i*_{3}} devoid of a 15% error. For *F*_{5}, variable *i*_{1} represents the maximum weight and *F*_{5} is able to be downgraded to two variables {*i*_{1},*i*_{6}} with a 6% error. For *F*_{6}, variable *i*_{2} explains the maximum weight; notwithstanding, *F*_{6} is not able to be downgraded to two variables {*i*_{1},*i*_{2}} devoid of a 25% error. For *F*_{7}, variable *i*_{5} accounts for the maximum weight; nevertheless, *F*_{7} is not able to be downgraded to two variables {*i*_{5},*i*_{6}} devoid of a 36% error. In PCA *F*_{2}*–F*_{1} scores plot (Figure 6), TMPs with the same vector property collapse: (1,26–28,41,45,58,59), (2,9), (4,18,19,49) (5–8,10,11,13,16,17,44,48), (14,20–25,29–33,35,50–55) and (34,36–40,56,57). Seven TMP classes are clearly distinguished: class 1 with 9 compounds (0 < *F*_{1} < *F*_{2}, *right*), class 2 with 11 substances (*F*_{1} < *F*_{2} ≈ 0, *middle*), class 3 with 19 molecules (*F*_{1} > > *F*_{2}, *bottom right*), class 4 with 8 organics (0 < *F*_{1} < < *F*_{2}, *top*), class 5 (6 units, *F*_{1} < *F*_{2} ≈ 0, *middle*), class 6 (2 units, *F*_{1} < < *F*_{2} < 0, *left*) and class 7 (4 units, *F*_{1} < *F*_{2} < 0, *bottom*). The classification is in agreement with partial correlation diagram, dendrograms, radial tree, and splits graph.

From PCA factor loadings of TMPs, *F*_{2}*–F*_{1} loadings plot (Figure 7) depicts the seven properties. In addition, as a complement to the scores plot for the loadings, it is confirmed that TMPs in class 1, located in the right side, present a contribution of R_{3} = OMe situated in the same side. The TMPs in class 3 in the bottom have more pronounced contribution of X = N in the same location. Two classes of properties are clearly distinguished in the loadings plot: class 1 {R_{1},R_{4},R_{2},R_{3}} (*F*_{1} > *F*_{2} > 0, *right*) and class 2 {X,R_{5},R_{6}} (*F*_{1} < *F*_{2}, *left*).

Instead of 59 TMPs in the ℜ^{7} space of seven vector properties, we consider seven properties in the ℜ^{59} space of 59 TMPs. The dendrogram for vector properties separates properties {R_{1},R_{4},R_{2},R_{3}} (class 1) from {X,R_{5},R_{6}} (class 2), in agreement with PCA loadings plot. The splits graph for properties indicates no conflicting relation between vector components, separating properties {R_{1},R_{4},R_{2},R_{3}} (class 1) from {X,R_{5},R_{6}} (class 2), in agreement with PCA loadings plot and dendrogram. A PCA was performed for the vector properties. The use of only the first factor *F*_{1} explains 51% of variance (49% error), the combined application of the first two factors *F*_{1/2} accounts for 71% of variability (29% error), the utilization of the first three factors *F*_{1–3} rationalizes 82% of variance (18% error), etc. In the PCA *F*_{2}*–F*_{1} scores plot, property R_{4} appears superimposed on R_{1}. Two groupings of properties are distinguished: class 1 {R_{1},R_{4},R_{2},R_{3}} (*F*_{1} > *F*_{2}, *right*) and class 2 {X,R_{5},R_{6}} (*F*_{1} < *F*_{2}, *left*), in agreement with PCA loadings plot, dendrogram and splits graph. Format for PT of TMPs (Table 1) indicates that TMPs are categorized first by *i*_{1}, then *i*_{2}, *i*_{3}, *i*_{4}, *i*_{5}, *i*_{6,} and *i*_{7}. Vertical groups result described by {*i*_{1},*i*_{2},*i*_{3},*i*_{4}} and horizontal periods, by {*i*_{5},*i*_{6},*i*_{7}}. Periods of eight elements are considered; e.g., group g0011 denotes <*i*_{1},*i*_{2},*i*_{3},*i*_{4} > = <0011>: <0011100> (R_{1} ≠ H, R_{4} ≠ H, R_{2} = H, X = N, R_{5} = H, R_{3} ≠ OMe, R_{6} ≠ CH_{2}–OH), etc. The TMPs in the same column appear close in partial correlation diagram, dendrograms, radial tree, splits graph, and PCA scores.

g0011 | g0101 | g0111 | g1001 |
---|---|---|---|

–OMe –OMe –H –N –H –H –H | –OMe –H –H –N –H –H –H | –H –OMe –OMe –N –H –H –H | |

–OMe –H –OMe –N –H –OMe –H | –OMe –H –H –N –H –OMe –H | ||

g1011 | g1101 | g1110 | g1111 |
---|---|---|---|

–H –H –H –N –CO–CH=CH_{2} –OMe –H–H –H –H –N –Me –OMe –H –H –H –H –N –Pr –OMe –H –H –H –H –N –Bu –OMe –H –H –H –H –N –CH _{2}–CH_{2}–N(CH_{3})_{2} –OMe –H–H –H –H –N –CH _{2}–CH_{2}–CO–OH –OMe –H–H –H –H –N –CH _{2}–Ph –OMe –H–H –H –H –N –CH _{2}–Pyr –OMe –H–H –H –H –N –CO–Ph –OMe –H –H –H –H –N –CO–2-Furan –OMe –H –H –H –H –N –CO–C(CH _{3})_{3} –OMe –H–H –H –H –N –CO–O–Ph –OMe –H –H –H –H –N –SO _{2}–Ph –OMe –H–H –H –H –N –Et –OMe –H –H –H –H –N – i-Pr –OMe –H–H –H –H –N –CH _{2}–CO–OH –OMe –H–H –H –H –N –CO–2-Thiofuran –OMe –H –H –H –H –N –CO–O–C(CH _{3})_{3} –OMe –H–H –H –H –N –CO–N(CH _{3})_{2} –OMe –H | |||

–H –OMe –H –N –H –H –H | –H –H –OMe –N –H –H –H –H –H –OMe –N –H –H –Me | –H –H –H –N –H –F –H –H –H –H –N –H –OEt –H –H –H –H –N –H –OPr –H –H –H –H –N –H –O -i-Pr –H–H –H –H –N –H –NO _{2} –H–H –H –H –N –H –Br –H –H –H –H –N –H –O–CH _{2}–O– –H–H –H –H –N –H –NHMe –H –H –H –H –N –H –N(Me) _{2} –H–H –H –H –N –H –OH –H –H –H –H –N –H –NH _{2} –H | |

–H –H –OMe –N –H –OMe –H –H –H –NH _{2} –N –H –OMe –H–H –H –OH –N –H –OMe –H –H –H –O–CH _{2}–Ph –N –H –OMe –H | –H –H –H –O –H –OMe –H –H –H –H –O –H –OMe –Me –H –H –H –O –H –OMe –Pr –H –H –H –S –H –OMe –H –H –H –H –S –H –OMe –Me –H –H –H –S –H –OMe –Pr –H –H –H –O –H –OMe –Et –H –H –H –S –H –OMe –Et | –H –H –H –N –H –OMe –H –H –H –H –N –H –OMe –Me –H –H –H –N –H –OMe –Et –H –H –H –N –H –OMe –Pr –H –H –H –N –H –OMe –CO–O–CH _{3}–H –H –H –N –H –OMe –CH _{2}–C≡CH–H –H –H –N –H –OMe –CO–OH –H –H –H –N –H –OMe –CH _{2}–N(CH_{3})_{2} | |

–H –H –H –N –H –OMe –CH_{2}–OH |

The change of property *P* (inhibition of gastric cancer cell MKN-45) of vector <*i*_{1},*i*_{2},*i*_{3},*i*_{4},*i*_{5},*i*_{6},*i*_{7} > is expressed in the decimal system *P* = 10^{6}*i*_{1} + 10^{5}*i*_{2} + 10^{4}*i*_{3} + 10^{3}*i*_{4} + 10^{2}*i*_{5} + 10*i*_{6} + *i*_{7} *vs*. structural parameters {*i*_{1},*i*_{2},*i*_{3},*i*_{4},*i*_{5},*i*_{6},*i*_{7}}, for TMPs. The property was not used in the development of PT and serves to validate it. Most points appear superimposed, and lines *i*_{2/6} on *i*_{1} and *i*_{7} on *i*_{4}. Results show the order of importance of parameters: *i*_{1} > *i*_{2} > *i*_{3} > *i*_{4} > *i*_{5} > *i*_{6} > *i*_{7}, in agreement with PT of properties with vertical groups defined by {*i*_{1},*i*_{2},*i*_{3},*i*_{4}} and horizontal periods by {*i*_{5},*i*_{6},*i*_{7}}. The variation property *P* of vector <*i*_{1},*i*_{2},*i*_{3},*i*_{4},*i*_{5},*i*_{6},*i*_{7} > in base 10 *vs*. the number of group in PT, for TMPs, reveals minima corresponding to compounds with <*i*_{1},*i*_{2},*i*_{3},*i*_{4} > *ca*. <0011> (group g0011) and maxima *ca*. <1111> (group g1111). Periods p010, p100, p110, and p111 represent rows 1–4, respectively. For groups 3 and 6, period p110 is superimposed on p100, and for group 8, all periods coincide. The corresponding function *P*(*i*_{1},*i*_{2},*i*_{3},*i*_{4},*i*_{5},*i*_{6},*i*_{7}) indicates a series of cyclic *waves* obviously controlled by minima or maxima, which propose a periodic performance that evokes the shape of a trigonometric function. For <*i*_{1},*i*_{2},*i*_{3},*i*_{4},*i*_{5},*i*_{6},*i*_{7}>, maximum results are obviously presented. The space in <*i*_{1},*i*_{2},*i*_{3},*i*_{4},*i*_{5},*i*_{6},*i*_{7} > elements among every couple of successive maxima is eight, which agrees with TMP collections in consecutive periods. The maxima are in similar locations in the curve and are in phase. The typical points in phase have to match with the components in similar group in PT. For maxima <*i*_{1},*i*_{2},*i*_{3},*i*_{4},*i*_{5},*i*_{6},*i*_{7}>, there is consistency among the two descriptions; notwithstanding, the constancy is not universal. The assessment of the waves presents a pair of dissimilarities: (1) periods are incomplete and (2) periods 2 and 3 are somewhat staircase like. The most characteristic points of the plot are maxima that lie about group g1111. The values of <*i*_{1},*i*_{2},*i*_{3},*i*_{4},*i*_{5},*i*_{6},*i*_{7} > are repeated as the periodic law (PL) states. An empirical function *P*(*p*) reproduces different <*i*_{1},*i*_{2},*i*_{3},*i*_{4},*i*_{5},*i*_{6},*i*_{7} > values; a minimum of *P*(*p*) presents significance just if it is contrasted with the previous *P*(*p*–1) and afterward *P*(*p* + 1) points, necessitating to satisfy:

Sequenced relationship (8) has to be done again at determined gaps peer to the dimension of the period and is equal to:

Because relationship (9) is just suitable for minima, additional universal others are wanted for all positions *p*; *D*(*p*) = *P*(*p* + 1) – *P*(*p*) differences are computed by allocating each value to TMP *p*:

In the place of *D*(*p*), the values of *R*(*p*) = *P*(*p* + 1)/*P*(*p*) are obtained by assigning *R*(*p*) to TMP *p*; whether PL is universal, components in similar group in equivalent locations in dissimilar periodic waves assure:

either

either

Notwithstanding, the outcomes demonstrate that this is not the case, so PL is not universal but with anomalies. The change of *D*(*p*) *vs*. group number shows that for group 6, periods p100 and p110 collapse. It introduces lack of consistency among <*i*_{1},*i*_{2},*i*_{3},*i*_{4},*i*_{5},*i*_{6},*i*_{7} > Cartesian and PT charts. Whether constancy were exact, every position in each period present similar sign: in general, a tendency exists in the positions to provide *D*(*p*) > 0 for the lower groups but not for group 8; however, the latter results should be taken with care because *D*(*p*) are calculated using data from the next period. In detail, irregularities exist in which TMPs for successive periods are not always in phase. The change of *R*(*p*) *vs*. group number shows that for groups 3 and 6, periods p100 and p110 collapse, and, for group 8, all periods coincide, confirming the lack of steadiness among Cartesian and PT representations. Whether control were precise or not, every position in every period presents *R*(*p*) either smaller or larger than one. A tendency exists in the positions to provide *R*(*p*) > 1 for the lower groups but not for group 8; however, the latter should be taken with care because *R*(*p*) are calculated from the next period. Confirmed incongruities exist in which TMPs for successive periods are not always in phase.

## 4. Conclusion

Several criteria were selected to reduce analysis to manage quantity of trimethoxyphenyl, indole, carbonyl bridge antitubulins referred to structural parameters related to positions R

_{1–4}on benzo, R_{5/6}on pyridine, and heteroatom X in indole. Molecular*structural elements*were*ranked*according to inhibitory activity: R_{1}> R_{4}> R_{2}> X > R_{5}> R_{3}> R_{6}. In compound 42, R_{1}= R_{4}= R_{2}= R_{5}= H, X = N, R_{3}= OMe and R_{6}= CH_{3}–OH <1,111,111>, which was selected as*reference*. Many classification algorithms are based on*information entropy*. For moderate-sized sets, an excessive number of results appear compatible with data and suffer a combinatorial explosion; however, after the*equipartition conjecture,*one has a selection criterion, according to which the best configuration is that in which entropy production is most uniformly distributed. Method avoids the problem of continuum variables because for compound with constant <1,111,111> vector, null standard deviation causes Pearson correlation coefficient of one. Classification is in agreement with the analyses by principal components.Code MolClas is an easy, dependable, effective, and quick process for the classification of molecules founded on the conjecture of the equipartition of the production of the entropy of information. The code was developed not just to examine the conjecture of equipartition but, in addition, to discover the world of the classification of molecules.

The periodic law does not convince the category of the laws of physics: (1) antitubulin inhibitory powers do not result done again; maybe their chemical nature; (2) sequence relations are done again with exemptions. The examination compels the declaration: relationships that whichever molecule

*p*presents with its neighbor*p*+ 1 are more or less done again for each period. Periodicity result is not universal; notwithstanding, if a usual order of molecules are agreed, the rule should be phenomenological. The antiproliferative potency did not generate the table of periodic classification and serves to confirm it. The examination of other antitubulin features would give an insight into the achievable generalization of the periodic table.

## Acknowledgments

The authors thank support from Generalitat Valenciana (Project No. PROMETEO/2016/094) and Valencia Catholic University *Saint Vincent Martyr* (Project No. PRUCV/2015/617).