Leaching Mechanisms of Trace Elements from Coal and Host Rock Using Method of Data Mining

Coal and host rock, including the gangue dump, are important sources of toxic elements, which have high-contaminating potential to surface and groundwater. Surface water in the coal mine area and groundwater in the active or abandoned coal mines have been observed to be polluted by trace elements, such as arsenic, mercury, lead, selenium, cadmium. It is helpful to control pollution caused by the trace elements by understanding the leaching behavior and mechanism. The leaching and migration of the trace elements are controlled mainly by two factors, trace elements’ occurrence and the surrounding environment. The traditional method to investigate elements’ occurrence and leaching mechanism is based on the geochemical method. In this research, the data mining method was applied to find the relationship and patterns, which is concealed in the data matrix. From the geochemical point of view, the patterns mean the occurrence and leaching mechanism of trace elements from coal and host rock. An unsupervised machine learning method, principal component analysis was applied to reduce dimensions of data matrix of solid and liquid samples, and then, the re-calculated data were clustered to find its co-existing pattern using the method of Gaussian mixture model.


Introduction
Coal is a complex system, which contains most elements in the periodic table. The origin of the coal was organic matter containing virtually every element in the periodic table, mainly carbon, but also trace elements. The elements with relative higher content in the coal and host rock, such as iron (Fe) and aluminum (Al), which usually take 1-20% of the rock, respectively, and sodium (Na), potassium (K), calcium (Ca), magnesium (Mg), which are usually in the range of 0.01-10% of the rock, respectively. The trace elements refer to the elements at the 10-10,000 ppm levels in coal, rocks, and soil, etc. A variety of chemicals are associated with coal that is either found in the coal or in the rock layers that lie above and beneath the seams of coal [1]. Some of the trace elements are of great health concern. For example, lead (Pb) accounts for most of the cases of pediatric heavy metal poisoning and makes it difficult for children to learn, pay attention, and succeed in school. Mercury (Hg) exposure puts newborns at risk of neurological deficits and increased cardiovascular risk in adults. Arsenic (As) could cause heavy metal poisoning in adults and does not leave the body once it enters. technological development of artificial intelligence, and the technique of machine learning, the multivariate parameter problem could be solved or mined to discover knowledge or criteria. In the field of geochemistry, the problems are feasible to be solved by using the multivariate analysis method. The multivariate analysis method can be classified to be supervised, unsupervised, and semi-supervised, depending on whether the target parameters are labeled. The unsupervised algorithms refer to principal component analysis (PCA), factor analysis (FA), clustering analysis (CA), positive matrix fractionation (PMF), etc., while the supervised algorithms refer to linear regression, logistic regression, support vector machine (SVM), decision tree (DT), random forest (RF), artificial neural network (ANN), and discriminant analysis (DA).
While the target parameter can be labeled, a supervised machine learning algorithm should be used in priority as accurate and stable models are expected. In the USA, the research tried to identify the source of salt ions (Mg, CL, and Na). As the samples were collected from known sites or environments, including (oceans, atmospheric deposition, weathering of common rocks, minerals and soils, and salt deposits and brines landfills, wastewater and water treatment, agriculture), the samples can be labeled. Therefore, discriminant analysis and clustering analysis were applied [35]. In Belgium, a Bayesian isotope mixing model was used to estimate proportional contributions of multiple nitrate sources in surface water [36]. In a coal mine, water inrush constantly threatens the production and human health and causes financial losses. The source apportionment technology is used in coal mines to determine the source of water inrush [37]. The water inrushes could be categized into four sources: quaternary sand-gravel pore aquifer, Dyas sandstone aquifer, limestone aquifer from Ordovician and Carboniferous, and abandoned coal mine districts, respectively. Different sources show various features and need suitable treating strategies. To set up the discriminant model, geochemical and data mining analytical protocol should be established. As the samples were collected from identified aquifers, a supervised machine learning method could be used. Huang et al. [37] proposed a technology system, the Piper-PCA-Bayes-LOOCV discrimination model to determine water inrush types in coal mines. The piper diagram is a geochemical technique to show the water characteristics, and abnormal samples/points were screened in this research. PCA was used to lower the dimension of the sample matrix, to make less variates standing for all the original variates. Then, the supervised ML model, Bayes DA, is used to train and implement a model for water source discriminant. LOOCV means leave-one-out cross-validation, to validate and improve the quality of the model. Wang et al. used discriminant analysis to determine water bursting sources in coal mines [38].
Comparing the supervised ML method, the unsupervised ML algorithms are used more frequently, for the samples are not always labeled. Pumure et al. [39] investigated the occurrence of selenium and arsenic in coal by the method of two-step PCA, founding that ultrasound leachable selenium concentrations were associated with 14 Å d-spacing phyllosilicate clays (chlorite, montmorillonite, and vermiculite all 2:1 layered clays), while ultrasound leachable arsenic concentrations were closely related to the concentration of illite, another 2:1 phyllosilicate clay. The PCA and PMF methods are often used to identify the source of trace elements. For example, lake sediment was analyzed [40] in southwest China using the PCA method, and it is shown that Cd/Hg/Pb/Zn and As were mainly from nonpoint anthropogenic sources, especially with the atmospheric emission from nonferrous metal smelting and coal consumption [41]. In Costa Rica, by using the method of PMF, eight important sources of PM 2.5 and PM 10 were identified. Vehicle exhaust, residual oil combustion, and fresh sea salt were the first three sources. Crustal, or dust aerosols originated, organic carbon and sulfate, secondary sulfate, secondary Data Mining nitrate, and heavy fuels are the other potential sources [42]. In Pakistan, factor analysis was used to identify sources of surface soil contamination. It was found that Ni, Cr, Zn, and Cu were originated from industrial activates, and vehicular emission, and anthropogenic activities such as automobiles brought Pb, Cd, and Co; some other important contaminants, including Fe and Mn, were natural source origin [43]. In Turkey, the PCA was used to find latent factors that influence the water quality, mineral pollution, nutrient pollution, and organic pollution were identified to be the major factors.

Site description and sampling
This study was carried out at the Xuzhou-Datun coal mine district, located at the northwest of Jiangsu province, eastern China (Figure 1). The area of Xuzhou city is in the plain of Huanghuai, South part of northern China. Sediment stratum covering the Archean system are Simian, Cambrian, middle-lower Ordovician, middleupper Carboniferous, Permian, Jurassic, Cretaceous, Tertiary, and Quaternary system, from bottom to top. The hydrogeology cell selected for this study is isolated by a series of faults. This includes Sanhejian, Yaoqiao, and Longdong coal mines shown in Figure 1. In this area, groundwater flows from northeast to southwest.
The coal seams that are being mined are located in the Carboniferous and Permian systems, the former include Benxi and Taiyuan formations, and the latter include Shanxi and Lower-Shihezi formations, listed from the bottom to top in both systems. In Permian strata, there are mostly low sulfur content Gas coal and fat coal. The lower formation in Carboniferous has a higher content of sulfur than the upper layers. Mass percentage of sulfur in Permian Shanxi formation coal seams is around 0.83% in coal seam No.7 and 1.09% in coal seam No.9. In coal seam No. 17 and No.19 in the Taiyuan formation, the average sulfur content was tested to be 1.87 and 3.49%, respectively. The two mining coal seams (No.2 and No.7) in the Permian system were included in this study; these are located in the middle Lower-Shihezi formations (No.2) and Shanxi formations (No.7). The two formations give thickness of 187-302.95 m and 81.67-136.13, respectively. White feldspar, quartz granule-sandstone, and silicon-mudstone cementation are the main minerals in the lower Shanxi formation. In addition, siltstone, siderite, carbon-mudstone, and plant-fossil clast can also be found. Gray mudstone, sand-mudstone, and sandstone  are the major rocks in the middle Shanxi formation with some silicon-mudstone and siderite also present.
There are six aquifers in the sediment stratum of the hydrogeology cell. A grit aquifer in the Quaternary, a conglomerate rock aquifer in the Jurassic, two sandstone aquifers-one in the lower-Shihezi formation, and one above the coal seam in the Shanxi formation; and two limestone aquifers-one is located in the Carboniferous Taiyuan formation (thickness of 180-200 m) and the other in the Ordovician (thickness of 600 m). These last two aquifers are the main water sources of the coal seam.

Leaching experiments and sample test
A total of 16 water samples and 28 rock/coal samples were collected from the study area. Water samples were collected in 1000 mL Nalgene bottles previously acid-cleaned and rinsed twice using the water to be collected. pe and pH of water samples were taken in the field by using a JENCO 6010 pH/ORP meter. Coal and rock samples were collected from the working area at the mine and put into plastic bags that were immediately sealed.
Major ions and physical parameters of water samples were determined according to Chinese standard protocols in Jiangsu Provincial Coal Geology Research Institute. Solid samples were acid digested to determine the concentration of trace elements. The concentration of trace elements in water/coal/rock samples was determined by ICP-MS and the ICP-AES. The ICP-MS analysis was carried out in the China University of Mining and Technology using the X-Series ICP-MS-Thermo Electron Co. An internal standard of Rh was used to determine the limit of detection (0.5 pg/ mL) and analytical deviation (less than 2%). The ICP-AES analysis was carried out in the Nanjing University using a JY38S ICP-AES model. The limit of detection and deviation for the analysis carried out by such equipment are 0.01 μg/mL and less than 2%, respectively.
Leaching experiments were conducted using the batch mode to simulate conditions in a coal seam where water movement is slow and dissolution reactions tend to achieve equilibrium, with regard to the previous studies [44,45]. To simulate a "closed environment" (with low pO2; see Stumm and Morgan [46] for details), bottles were closed with a rubber stopper; samples were taken out using syringes. The pe of the solution during experiments was determined by a JENCO 6010 pH/ ORP meter.
Three subsamples were used for each sample: one per 1000 mL aliquot of deionized water at the following pHs: 2, 5.6, 7, and 12. Flasks were sealed and shaken every 2 h for up to 10 days. The temperature was controlled using a water bath at about 40°C. Leachate solutions were collected using syringes at 2, 6, 24, and 48 h. A total of 0.5 mol/L HNO 3 was added into all the samples. Leachate aliquots were titrated with HCl or NaOH, depending on the pH conditions, to compare the behavior of leaching elements in acid, neutral, and alkali environments. In addition to leaching experiments, water samples including those collected from the Zhaoyang Lake and Yunlong lakes, shown in Figure 1, were shaken every 2 h for up to 10 days at a constant temperature of 40°C.

Multivariate analysis
While univariate statistical analysis of a large scale of data could be cumbersome and cause misunderstanding and error in the interpretation, multivariate statistical techniques are more robust. Therefore, it becomes a more useful tool for environmental data treatment and identification of anomalous patterns. During the immigration process of the trace elements from coal bedding seam to groundwater and surface water, in the complex matrix system, solid and liquid bodies are involved. In each system, the elements show different or similar coexisting patterns, and immigration behavior, including dissolution, transport, adsorption. Therefore, the multivariate analysis can be used to find out different and similar components, which suggest similar and dissimilar occurrences in solids, and immigration mechanisms during the process of water-rock interaction.
In the area of hydrochemical studies, the PCA method has been widely used to reduce dimensions and analyze the relations among the variates and samples [32][33][34][47][48][49][50][51]. The PCA is a typical nonsupervised analytical method. To calculate the PCA result, data are first standardized by mean centering each column within the original data matrix and then dividing each of the values within each column by the column standard deviation. With PCA, the large data matrix is reduced to smaller ones that consist of PC loadings and scores. PC loadings are the eigenvectors of the correlation matrix depending on PC scores. Therefore, it contains information on all of the variables combined into a single number, with the loadings indicating the relative contribution that each variable makes to that score. PCs are calculated so that they take into account the correlations present in the original data but are uncorrelated with others. Typically, the data can be reduced to two or three dimensions representing the majority of the variance within the original data. Sometimes, more dimensions may have to be included to present more variance of the original data [33]. Based on the PCA analytical result, the loadings and scores of the data frame were then clustered in the dimensions that PCA has reduced. As the axis of coordinates was rotated to achieve maximum loadings of elements, the rotated axis of coordinates was marked as RCs.
The bi-plot of the PCA result is usually drawn to show patterns of parameters and samples. However, the loading and score of the PCA axis show different aspects of the result. In our study, the loadings of every drawn show coexisting pattern of elements, and scores of every drawn show the coexisting pattern of samples. What we focus on is the coexisting pattern of elements to disclose their migration mechanism. The clustering result of loadings shows similar and different patterns among elements and parameters. Therefore, the coexisting behavior of elements and parameters can be summarized. The clustering result of scores shows similar and different patterns among samples. Therefore, the coexisting behavior of samples, which means types of solid and liquid samples, can be summarized. The clustering method was based on the Gaussian mixture model. The GM model can cluster target reasonably. Comparing with K-means algorithm, the GM model does not divide the different group by stiff border but allows some mixture of different groups. So, the classifying probability for each group can be calculated.
We have applied software R as a tool, the packages psych and mclust were used to calculate PCA and GM model clustering results.

Geochemical analysis
A total of 16 water samples were collected from the study site, including 12 coal mine waters, two surface waters, and two carbonate waters, respectively. Concentrations of major ions are drawn in a piper plot (Figure 2). Figure 2 suggests that the carbonate water and coal mine water belong to medium-mineralized water, and surface water belongs to low-mineralized water, respectively. The surface water is Na-Mg-Ca-Cl − -SO 4 ], [Cl − ], TDS, and hardness were also higher than the Chinese-regulated limit. The combination of higher levels of Ca 2+ , Mg 2+ , HCO 3 − , and SO 4 2− concentrations in the groundwater suggests that the coupled reactions involving sulfide oxidation and carbonate dissolution largely control the solute acquisition processes in the study area [52].
The PCA analysis is used to reduce the dimensions of the water matrix. In this study case, dimension means water parameters. Water samples are represented by 10s of conventional inorganic and organic parameters, some of which are an indicator of the environment and reaction pathways, and some others a redundant or collinear. The PCA method could solve problems of not only parameter redundant and collinear, but also shows principal components in the data matrix, and relationships between parameters and among the parameters and samples could also be shown by using the parameters' loading and samples' score, respectively.
In this study, the traditional method of PCA calculation was applied, and principal components and variance that the PC explained were calculated. In the original table, 16 parameters were tested, and the PCA calculation used 16 new components to represent the original parameters, which explain the variance of samples, in descending order. The head six components explained 29, 21, 17, 10, 9, and 5% of the variance, respectively. Considering the balance of more variance explained and less components, we chose two principal components to stand for the sample data. The GM method was used to group the ions and trace elements in the water sample, which is shown in Figure 3. The parameters were clustered into four , and pH; group 4 includes As, Hg, Se, Cd, Pb, respectively. The samples were collected in or around the coal mine district, so the clustering result is representative, and the groups were separated from others distinctly. From the clustering result, it is suggested that group 2 stands for the dissolution of carbonate, and group 4 stands for the trace element. The trace element contaminant could be identified from this result.

Leaching mechanism of trace elements from the coal host rock
To investigate the leaching mechanism of trace elements from the coal host rock, both the rock sample and water sample were tested. The rock samples were those collected from coal roof, which then was processed in a standard treatment to decide its content. The milled rock samples were mixed with deionized water in the batch experiments to observe and evaluate the leaching behavior and mechanism of the trace elements from rock to water. The major and trace element concentrations in host rock and leachate are listed in the Table 1 in Shan et al. [53]. A hypothesis was that the occurrence and leaching mechanism of the trace elements in the solid samples were related to their concentrations in the water samples. Therefore, the PCA was applied to reduce dimensions of the rock and water samples, and then, the analytical results of solid and liquid samples are discussed parallelly.
For the rock samples, 18 elements were tested, and then, the PCA method was applied. The first two components explained 91% of all variance; therefore, the two PCs were used to stand for information of the data. For the water samples, 16 ions and trace elements were tested. The same analytical process was applied. The first two PCs explained 87% of all variance, which were used to stand for information in the water samples. By using the new PCs, parameters were assigned loadings on every new component. Then, the parameters of rock and water samples can be drawn in a two-dimensional (2D) scatter diagram. Figure 4 shows the elements of rock samples, and Figure 5 shows the ions and elements of water samples in a 2D scatter diagram, respectively.  The PCA-treated data were clustered using the expectation maximization (EM) algorithm. The EM algorithm could make several clustering results. By considering the BIC score and conciseness of every clustering model, the parameters in the rock samples were clustered into three groups. The first group includes Mo, Pb, Cr, V, Ti, and Al, which are marked in solid circles; the second group includes Zn, Ba, Mn, Fe, Mg, As, Hg, Se, and Cd, which are shown in hollow squares; the third group includes Cu, Sr, and Ca, which are shown in solid triangles. As mentioned before, the clustering could help to analyze the elements' occurrence in solid samples. Cr has a high affinity of clay and ash yield in gangue [3]. Zhou et al. [2] reported a high relationship of Pb and Se and with Fe in gangue, so high-sulfide mineral affinity was observed. Zn and Cd were found to have a high association with pyrite and sphalerite. Xiong et al. [26] found that Cd is mainly in sulfide form in the coal host rock. As and Mo are mainly carbonate-and silicate-related form. Finkelman et al. [3] found that Mo, Pb, Cr, Ti, and Al are mainly in clay minerals, As, Hg, Cd, and Zn mainly occur in sulfide form, and Ca and Sr are mainly carbonate-related. The PCA analysis corroborates the previous studies. As the figure 5 shows, the first group stands for clay affinity elements, the second group stands for elements with sulfur-mineral affinity, and the third group stands for the carbonate-related elements.
The ions and trace elements in the rock leachate could be clustered into three groups, the first group includes Al, Si, Cr, Mn, Fe, Cd, and Pb; the second group includes Ti, V, As, Se, Mo, and Hg; and the third group includes Zn, Sr, and Ba, respectively. The coexisting pattern of ions and elements in the water are controlled not only the occurrence in rock, but also the water-rock interaction, and adsorption behavior. Therefore, the clustering result of solid and liquid results was not exactly the same. However, two results are comparable to find out certain or probable reaction mechanisms in the water-rock interaction pathway. The three groups clustered for the water samples can be compared with those of the solid samples. Therefore, a primary deduction could be made. The first group of elements in the water samples suggests the reaction pathway of clay reaction with water. When the clay mineral reacts with water, the transformation of illite to kaolinite could happen, and some minerals, such as Cr, could be released. Cd was clustered to the second group in the rock analysis but was clustered to group 1 in water analysis. The result could be explained by two reasons: first, Pb and Cd embedded in both sulfur minerals and clay minerals, and second, Pb and Cd were controlled not only by dissolution, but also by adsorption. When the water has a low pH value, metal elements tend to release, while they could be adsorbed in a higher pH environment. According to our observation, the concentration of Pb and Cd in the surface water in the coal mine district was evidently higher than that in the non-coal mine district. As, Hg, and Se have a similar pattern in the solid and liquid samples. It is apparent that they were controlled by the dissolution of sulfur minerals. The content of the sulfur mineral in the rock was not high in our samples. However, the oxidation and dissolution processes were distinct, leading to the release of toxic trace elements.

Leaching mechanism of trace elements from coal
The major and trace element concentrations in coal and leachate are listed in the Table 1 in Shan et al. [53]. The same analytical method with rock was applied to the coal and coal leaching analysis. And the PCA and clustering analytical results of coal and coal leaching water are shown in Figures 6 and 7. Two principal components could explain 96 and 91% variance for the coal and leachate, respectively. As Figure 6 shows that elements are clustered into four groups, the group 1 includes Mo, Pb, Cr, V, Cu, Ti, Al, Hg, and Se; group 2 includes Zn and Cd; group 3 includes Ba, Mn, Sr, Mg, and Ca; group 4 includes Fe and As, respectively. The ions and trace elements in coal leachate, as shown in Figure 7, were grouped into three groups. Group 1 includes Al, Se, and Pb; group 2 includes Si, As, Sr, Mo, and Hg; group 3 includes Ti, Cr, Mn, Fe, Zn, Cd, and Ba, respectively. Finkelman et al. [3] investigated the occurrence of most of the trace elements, it is found that 65% of Ti, 90% of Al, and 75% of Cr 25% and 30% of Cu and Mo are in clay minerals, little Pb and Se are in clay form, 75 and 65% of Zn and Cd formed in mono-sulfide form, and 70 and 90% of As and Hg are sulfide form. Pumure et al. [39] argued that As and Se usually occur in clay minerals. Pb was found to be sulfide form as pyrite and galena [54] and organic form [55]. Combining the literature review and PCA-clustering analysis, group 1 for the coal samples stands for clay affinity, groups 2 and 4 are sulfur-mineral elements, and group 3 is related to carbonate minerals. Group 2 has two elements, Zn and Cd. This result is consistent with some previous studies [2,56]. It is concluded the main occurrence of trace elements: As, Hg, Cd occurred in sulfide minerals, and Pb, Cr, and Se occurred in clay minerals, respectively. Zn and Cd are the primary elements in sphalerite. Compared with the host rock, the sphalerite is more probably to form an independent mineral in coal.
The coal leachate clustering results were relatively different with that of the analytical results of coal. Compared to the rock samples, coal is a more complex matrix and consists of organic and mineral matter, the latter including crystalline minerals, non-crystalline mineraloids, and elements with non-mineral associations [55]. However, some patterns could be concluded. Group 1 includes Al, Se, and Pb, which is similar to group 1 in the coal analysis. Therefore, group 1 stands for the elements that originated from clay minerals. Group 2 stands for the elements related to sulfur-bearing minerals. As and Hg had similar behavior patterns in solid and liquid matrices. So the leaching product in water was mainly from the dissolution of its bearing mineral, the sulfide mineral. Similar to the host rock analysis, low content of sulfur-mineral may lead to trace element concentration. The trace elements Se, Cr, and Pb have similar behavior patterns in solid and liquid matrices, suggesting a dissolution progress of its bearing minerals. According to the literature research and coexisting analysis, these elements usually occur in continental facies minerals, such as clay minerals.

Conclusion
A data mining workflow, composed of principal component analysis and the Gaussian mixture model, was applied to find the trace elements' occurrence and leaching mechanism from coal and rock to surface and groundwater bodies. It is found that Se, Cd, Hg, and As were associated with sulfide minerals; Be and V occurred in carbonate minerals; Cr and Pb occurred mainly in clay minerals in the rock samples. While As and Hg were mainly occurred in sulfide minerals, Se, Cr, and Pb were embedded in clay minerals.
When the host rock is leaching with water, As, Hg, and Se were originated from oxidation and dissolution of sulfur-mineral; especially for pyrite, Cr was mainly controlled by the transformation of clay minerals. When the coal is leaching with water, As and Hg showed high affinity of sulfur-minerals, and Se and Cr seemed to be controlled by the water-rock interaction of clay minerals. It suggested that Se exist in sulfide mineral, clay minerals, and also organic matters. Therefore, the leaching mechanism of Se is not unique, and multiple mechanisms may control or influence the leaching behaviors. Cd and Pb showed apparent differences between the solid samples and liquid samples. The mechanism leading to this result was probably explained not only the releasing process, but also the adsorption process. These elements are typical metal elements. They can be easily adsorbed in the alkaline and neutral environment. Therefore, the released metal elements were adsorbed by clay minerals and organic matters. The immigration mechanism and long-term environmental impact need further studies.