Statistical evaluation of the geochemical data for prospecting polymetallic mineralization in the Suoi Thau – Sang Than region, Northeast Vietnam

In Northeast Vietnam, Suoi Thau-Sang Than is considered as a high potential area of polymetallic deposits. 1,720 geochemical samples were used to investigate polymetallic mineralization; thereby polymetallic ore occurrences in this study region were discovered and the statistical and multivariate analysis helps to define geochemical anomalies in some northeastern regions, namely Suoi Thau, Sang Than, and Ban Kep. The statistical method and cluster analysis of geochemical data indicate that the Cu, Pb, and Zn elements are good indicators, and most of them comply with the lognormal or gamma distribution. Based on the third-order threshold, the geochemical anomalies of the content of the Cu, Pb, and Zn elements reflect the concentration of copper forming ore bodies in the mineralized zone, and clearly show the concentration in three distinct zones. The trend surface analysis which was employed to determine spatial variations and relationships among these good indicator elements and anomalous areas revealed relative changes in the content of the indicator elements, and they can be considered as regular. Moreover, the goodness of fit obtained trend functions of Pb and Zn, and Cu elements is a third-degree trend surface model. These results indicate that the models can be useful in studying geochemical anomalies and analyzing the tendency of the concentration of indicator elements in the Suoi Thau-Sang Than region. Additionally, it is suggested that the statistical analysis shows a remarkable potential to use the bottom river sediments in the region to investigate polymetallic mineralization. Moreover, geochemical data can help to evaluate geochemical anomalies of the pathfinder elements and potential mineral mapping of the Suoi Thau-Sang Than region in Northeast Vietnam.


INTRODUCTION
Geochemical anomalies which frequently appear in many mineral deposits show different values of the normal background. In addition, the geochemical anomalies allow to identify a mineral deposit. During geochemical data processing, samples can contain much higher or lower background values that are considered as anomalies (Rei mann 2000, Chen J.R. et al. 2015, Chen D. et al. 2019.
To show representative sample sets and avoid incorrect recognition of the studied object, many methods have been carried out, of which the statistical and multivariate analysis is regarded as the most popular ones (Hawkes & Webb 1962, Williams 1967, Beus & Grigoryan 1975, Reimann & Filzmoser 2000, Chen J.R. et al. 2015, Chen D. et al. 2019. Based on the threshold values (mean ±3 SDEV (standard deviation)) for each type of mineral deposit, the threshold values of samples can be employed to determine a new mineral deposit (Aitchison 1986, Rose et al. 1991, Jolliffe 2002, Ghadimi et al. 2016, Chen D. et al. 2019.
In Northeast Vietnam, the Suoi Thau-Sang Than region is seen as an important location that has attracted considerable attention from geologists prospecting for potential polymetallic ore (i.e., Cu, Pb, Zn, and Au) (Tri & Khuc 2011). Furthermore, it plays a critical role in providing valuable metals for industry as zinc, lead, copper, and gold can be found together in polymetallic mineralization such as Fe and Cu Suoi Thau deposits (Rankin 2011, USGS 2014, Graedel et al. 2015. This area has been surveyed in geological mapping and mineral prospecting at 1:500,000-1:50,000 scale from 1965 until now (Dovjikov 1965, Bat 1989, Minh 1992, Son 2003. However, geological sample collection and geochemical data processing are still not sufficiently represented and satisfied to identify potential areas of polymetallic deposits. Consequently, it is of importance to have a further investigation of the Suoi Thau-Sang Than region in Northeast Vietnam to define new polymetallic ore occurrences.
In this study, statistic and multivariate analysis are used to investigate 1,720 geochemical samples that enable us to assess prospective polymetallic mineralization from Suoi Thau, Sang Than, and Ban Kep zones in Northeast Vietnam in terms of their potential for new polymetallic ore occurrences.

GEOLOGICAL SETTINGS
The Suoi Thau-Sang Than region belongs to the Tung Ba block in Northeast Vietnam (Fig. 1A, B). The lithology of this studied area consists of Devonian sedimentary rocks mainly (i.e., clay shale, carbonate rocks, and marly sandstone), Triassic gabbros, and Paleozoic granitoid rocks (Minh 1992, Son 2003, Hung 2010. Quaternary sediments (i.e., conglomerate, sandstone, and gravestones) have been distributed majorly along rivers in the northeast and southwest parts of the studied area. The Suoi Thau-Sang Than region is located in the northwest part of a synclinorium complex extending from the northwest to the southeast that contains overlapping secondary synclinorium complexes (Fig. 1B, C). The Duong Thuong-Du Gia overthrust/reverse fault in the northern part and the Ban Coc-Minh Ngoc reverse dip-slip fault in the southern part play an important role in controlling the Tung Ba structural block (Dovjikov 1965). In this area, the magmatic intrusive rocks were significantly forced by these faults and other smaller fault systems that have contributed to make the structure of the area more complicated (Son 2003).
There are three main polymetallic mineralization zones in the studied area, namely Suoi Thau, Sang Than, and Ban Kep that run from the northwest to southeast with a length of 380-3,800 m and are mainly encompassed by Devonian sedimentary rocks (Sinh 1985, Bat 1989, Minh 1992 (Fig. 1C). According to Son (2003) and Sang (2011), uneven concentrations of Cu, Pb, and Zn were found in these mineralized zones that are illustrated by Hung et al. (2020).

Bottom sediments and sample collection
Bottom sediment samples are frequently employed in geochemical exploration methods when prospecting for mineral deposits. In this study, 1,720 geochemical samples of recent bottom sediments were collected at 25-50 m intervals along the river and streamlines. To extract the fine and recent sediment, the surface sediment (0-3 cm depth) is obtained with a flat hand shovel as sub-samples from all points (approximately 50-100 m on both river banks) with low current velocities. Based on the grain size of the sediment sample, approximately 25-130 g of the recent bottom sediment was used for each sample.  Leloup et al. 1995Leloup et al. , 2001, DNCV -The Con Voi mountain range (A). Tectonic sketch map of Northeast Vietnam modified from Dovjikov (1965) With reliance on the different characterization of the bottom sediments in the zone, they were divided into three zones, including Suoi Thau (518 samples), Sang Than (699 samples), and Ban Kep (503 samples), and their sample sets were processed respectively. Besides, the Inductively Coupled Plasma Mass Spectrometry (ICP-MS) was used to measure the concentration of 16 chemical elements (i.e., Cu, Pb, Zn, Fe, Mn, Ti, Co, Ni, Cr, Mg, Ca, Ba, V, Na, Y, and Zr).

Data transformation
The total of variable elements (i.e., Cu, Pb, Zn, Fe, Mn, Ti, Co, Ni, Cr, Mg, Ca, Ba, V, Na, Y, and Zr) in the bottom sediment samples were processed in this study. If these variables are not symmetric distribution, then they are tested in the normal distribution of each variable based on skewness (statistical distribution test) and transformed variables (Reimann & Filzmoser 2000). Furthermore, lognormal and gramma transformations were also carried out to achieve normality and transition for the skewed variables (Aitchison 1986, Egozcue et al. 2003, Carranza 2011).

Multivariate analysis
In the assessment and collection of statistical data, multivariate analysis methods are used to clarify and explain relationships between various variables associated with this data.
Correlation coefficients and cluster analyses help to evaluate relationships between the elements and the element groups and their results are tested by using Geostatistic 9.0.
Cluster analysis aims at minimizing meaningful subgroups of individuals or items to a large data set. Based on the similarities of the objects over a range of defined features, the split is achieved. Ward (1963) refers to Ward's method of mathematics as a criterion applied in the study of hierarchical clusters. The method of general agglomerative hierarchical clustering was proposed by Ward (1963) where the criteria for selecting the pair of clusters to combine at each stage is based on the optimum value of an objective function.
On the whole, covariance and correlation coefficient matrixes are represented by eigenvalues and eigenvectors. In the meantime, varimax rotation was done to improve the factor loads. Pearson's correlation coefficient cluster analysis (or hierarchical cluster analysis) was performed with the use of Ward's method and the results are given in a dendrogram.

Trend surface analysis
Trend analysis is described by Davis (2002) as a statistical method for segregating map data into two elements, one of which is "signal" of a geographic nature and the other is "noise" of a local form. In geochemistry, "trend surface" is used as a record or accustomed to communicating geochemical parameters with the idea of "regional trend" and "local anomalies". The trend surface is defined as a function of the geographical position of the observation site (control point). Trend surfaces fit geochemical data can be represented by the model equation: where Z k is the variable at a point k; j k = j(x k , y k ) represents the trend and L k is the residual at point k. j k is a specific value of the variable j(x k , y k ) with: j(x k , y k ) = a 00 + a 10 x + a 01 y + a 20 x 2 + + a 11 xy + ... + a pq x p y q The coefficients a 00 , a 10 , a 01 , a 20 , a 11 ,..., a pq are called determinants in the trend model.
The trend surface analysis is mainly based on L k component (trend residual map or trend deviation) to detect trend structural anomalies to demarcate and identify geochemical, geophysical anomalies, and ore metallic nodes, or geological structures such as high-order folds, small faults, and so on.
Generally, x and y form a rectangular set of coordinates; however, latitudes and longitudes are also used (Vistelius & Hurst 1964). For particular applications, p + q ≤ r, where r denotes the degree of the trend surface. Depending on the working requirements as well as the characteristics and properties of the study object, the first degree (linear, r = 1), second degree (quadratic, r = 2), and third-degree (cubic, r = 3), and fourth-degree (quartic, r = 4) trend models were conducted.
In this research, the second-degree trend model allows studying the characterization of the content of the indicator elements for prospecting polymetallic minerals in the Suoi Thau-Sang Than region.
Trend surface and trend residuals maps represent an intuitive concept in morphology, spatial variation law, and complexity of the study object, as well as aiding the detection and delineation of the local geometry of an object. Geologically, these local geometries can be anomalous contents of the elements in the region.

The characteristics of statistical distribution elements
The rules of the statistical distribution of polymetallic ores and related elements can be recognized through the determination of their statistical distribution models. Thanks to statistical analysis, geochemical data of the whole region and each zone are shown separately. As a result, the element concentrations are Co > Zn > Pb > Fe > Mn > Cu > Y > Ti > Cr > Mn > Zr > Ca > Ba > Ni > V > Na for whole region (Tab. 1). In particular, Co, Zn, Pb, Fe, Mn, and Cu elements account for more than 90%, representing a clear association for polymetallic ore. Therefore, these elements can be selected as the pathfinder ones when prospecting for polymetallic mineralization.
Both the primary and selected geochemical samples, following the three-sigma limit method, are statistically performed in this study (Tab. 2). Basic statistical parameters include mean value, variance, and coefficient of variation. The distribution models of the elements were also tested by skewness and kurtosis methods and most of the element concentrations complied with the lognormal standard or gamma distribution rules (Tab. 2). The test of the distribution models and the statistical evaluation was carried out by utilizing the Geostatistic 9.0 software (Robertson 2008).
The characterizations of the statistical distribution of Cu, Pb, and Zn elements in the secondary geochemical field reveal that the distribution rules of the indicator elements do not conform with the normal standard distribution and were transformed to gamma or the three-parameter lognormal distribution (Tab. 2). As a whole, the content of Cu, Pb, and Zn elements is generally higher than Clark's value in the crust (Cu* = 68 ppm, Pb* = 13 ppm, Zn* = 76 ppm (Fortescue 1992), in which the Cu content varies from uneven to very uneven. It is possible to create different local geochemical anomalies; however, the content of Pb and Zn elements is distributed quite uniformly to unevenly while their variation is smaller than the Cu variation. Hence, the possibility of the Pb and Zn ones forming the local anomalies in the primary geochemical field is not as clear as the Cu element. Nevertheless, these data can be used to detect geochemical dispersion haloes which serve to delineate prospective areas for prospecting polymetallic ores in the Suoi Thau-Sang Than region.

Cluster analysis
The results of correlation analysis can be used to form the pair correlation matrix of the best indicator elements in the geochemical field of the whole region and each specific area. The elements of the pair correlation matrix between the good indicator elements are presented in Tables 3-6. The numbers upper diagonal are the correlation coefficients and the numbers below the diagonal are the student test results. Among the indicator element associations, the Cu, Pb, Zn, and Co elements display a close association, especially the relationship between Cu and Co, Pb and Zn, forming the element collaboration together as an indicator for prospecting polymetallic ores. The calculated results of each region are also similar to those in the whole area but its relation levels are varied, particularly in the Sang Than zone between Pb and Zn elements which display a lesser close relationship.
Based on the results of pair-correlation analysis among these good indicator elements in the secondary geochemical field, a dendrogram was conducted to determine the relationship between the studied objects. The similarities between such indicator elements were assessed by Pearson's correlation coefficients. The results allow dendrograms to be established for the general sample set and each zone in the Suoi Thau-Sang Than region (Fig. 2). Note: In which, t 0.05, 350 = 1.96 is the significance of the cross-correlation coefficient value, 0.05 is the level of significance, 350 is the degree of freedom, |r m | = 0.107 is the cross-correlation for match position m. The t α,n−2 and r m parameters are presented by the formula:    The correlation coefficients show a closer relationship amongst Cu, Pb, Zn, and Co elements in comparison with Fe-Mn ones (Tabs. 3-6), indicating that the Fe-Mn ones are usually removed that leads to the wider Fe-Mn dispersion haloes because these elements are dominated in both rock-forming and ore-forming ones. Calculation results show a tight relationship between Cu, Pb, and Zn elements which can be considered that they have paragenesis of elements.
The paragenetic relationship among element groups is represented in the dendrogram. The elements are divided into two groups, including polymetallic ore (i.e., Cu, Pb, Zn, and Co) and rock-forming (i.e., V, Ni, Cr, Mg, and Ti). Besides, the close relationship of Cu, Pb, Zn, and Co elements, a relatively continuous level of elements, local branching of V-Ni, Cr-Mg elements, and Cr-Ti-V elements can also be observed that shows the V, Ni, Cr, Mg, and Ti elements are not syngenetic elements of the polymetallic ores in the region.
The combination of multivariate correlation and dendrogram analysis enables to estimate the significance of the syngenetic element association for prospecting polymetallic ores in the studied area. As consequences, the Cu, Pb, Zn, and Co elements are considered as syngenetic ones. Despite the other elements being high values, they are not indicator ones for the prospecting polymetallic ores, or mirror the appearance of another type of mineralization in this region.

Geochemical anomalous modeling
In the studied area, both primary and secondary geochemical fields coexist together. The foundation of the primary geochemical field is simultaneously addressed with the ore-forming process in the mineralized zones, around the ore bodies, and ore zones. The contents of major ore-forming and associated elements are higher than those in the surrounding rocks. This region is often much larger than the ore bodies and ore zones and distributed around the ore bodies. With a reliance on morphology, the size of the primary geochemical field can allow to speculate distribution, depth, morphology, strike, dip formats of the orebody, and level of denudation. The original ore bodies have been exposed or hidden ore bodies.
As a matter of fact, the ore bodies, mineralization zones, and the primary geochemical field can be destroyed and transformed in exogenous conditions. Some elements and minerals are dissolved, washed, and drifted away while others are accumulated and enriched. There is a redistribution of the material constituents of the secondary geochemical field in the weathered environment. The distribution location of this field can appear on the surface of the terrain and cover the original ore bodies, or the distribution lies towards the lower terrain or valley, and they are often much .

Fig. 2. Dendrogram diagram of the good indicator elements in the Sang Than (A), Ban Kep (B), Suoi Thau (C), and entire area (D), respectively. The numbers indicate the linkage distance cluster analysis by using Ward's agglomerative clustering algorithm
A C B D larger than the ore bodies. The secondary localization field is of great importance in detecting the location of hidden ore bodies in the prospecting area. In order to recognize polymetallic ore in the studied region, anomalous geochemical diagrams and trend analysis methods were used to model the extent of spatial variation of the elements on the site's localization documents.

Mapping geochemical anomalous related to indicator elements
To model the spatial field of good indicator elements as an indicator for prospecting and discovering new polymetallic ores in the Suoi Thau-Sang Than region, secondary geochemical anomalies of Cu, Pb, and Zn elements were developed. The establishment of the anomalous geochemical diagrams of these elements aims at elucidating the distribution, concentration, and accumulation of geochemical anomalies concerning specific ore bodies in the chosen area (Fig. 3). On such a basis, it allows explaining and selection the anomalies associated with the mineralization, eliminating the anomalies which do not relate to one ore. Furthermore, trend surface and residual trend maps of the indicator elements in the weathered crust and the bedrock were established to support information for mineralization.  unrelated to the mineralization represent often localized secondary accumulations, concentrated on low terrain slopes, and are highly dependent on the morphology of the current terrain.

Trend surface analysis
According to the results taken from the metal sample collection for the entire region (352 samples) and the geochemical samples of each specific area (1,720 samples), this study has developed the first degree, second-degree trend surface model, and higher for the Cu, Pb, and Zn elements. These models are considered as direct or indirect indicators for prospecting for polymetallic mineralization in the Suoi Thau-Sang Than region. The results of establishing trend surface models and its models were in line with the indicator elements (Tab. 8). Geochemical anomalies, trend diagrams, and trend deviations of the elements are plotted owing to Surfer 13.0 software (Figs. 3, 4). The testing results taken from the goodnessof-fit of the trend surface models towards the indicator elements with reliance on the multiple correlation coefficient are presented in Tables 8 and 9. The multiple correlated coefficients of the trend surface models of the Cu, Pb, and Zn elements were generally calculated above 0.4, especially up to 0.7 for those of the Zn element. These results indicate that such models are well-reflected the goodness-of-fit of the trend model to the indicator elements. It is shown from the results of the trend surface analysis that the changes amongst the first, second, third, and fourth-degree trend surface models have no great differences (Tab. 8). Moreover, it is also revealed that linear changes in trend surfaces and trend equations are well-illustrated in the spatial variation of the elements in the selected area.
The geochemical anomaly and anomalous fields of the indicator elements are determined based on mapping the isoelectric contour lines with different content degrees in accordance with the geochemical background and local anomalous values. The results of statistical processing are employed to determine the geochemical background value according to the local average value, and to select the anomaly thresholds of the first-order (mean ±1 SDEV), second-order (mean ±2 SDEV), and third-order (mean ±3 SDEV) values (Tab. 7, Fig. 3), in which, the mean and SDEV values are calculated following the lognormal and gramma transformation. Based on the establishment of the anomalous geochemical diagrams of the indicator elements and combined with the documents of the prospecting works for checking, the geochemical anomalies can be used to select geochemical anomalies associated with the metallization and reject those not related to the polymetallic mineralization.
In general, the geochemical anomalies of Cu, Pb, and Zn elements reflect the concentration of copper forming ore bodies in the mineralized zone. The distribution area of the geochemical anomalies of the indicator elements obviously shows the concentration in three distinct areas (Fig. 4). The geochemical anomalies generally have an isotropic or elliptical form that extends along the northwest-southeast direction, in consistence with the developed direction of the mineralization zone. Most of the geochemical anomalies coincide with the distribution area of the Ban Dom Formation. The size of the geochemical anomalies is quite large with complex morphology, especially the copper geochemical anomalies that are close to the mineralized zone, reflecting the presence of ore bodies. The geochemical anomalies  In order to choose the best degree trend surface, this study considers the a 00, and a pq coefficients of the equation (2), and select which one of the a 00 is the greatest, and which one of the a pq is the smallest. It is found out that there is a strong variation of the trend surface when x and y values change. Therefore, the study can use the third-degree trend surface model with respect to the Cu, Pb, and Zn elements in the Suoi Thau-Sang Than region.
The trend deviation or residual trend model of each indicator element is also constructed, showing the changing degree of the indicator elements at each control point versus their trend surface models. The trend deviation maps help to detect the relative concentration of ore-bearing areas. The positions of known ore occurrences are quite suitable for locations with large deviations (Fig. 4). Some high concentrations of elements exist but there are no mineral expressions around them; hence, it is likely for this location to be focused on the investigated process in the next steps.

Statistical significance of the trend
It is customary to test its statistical significance after determining the trend-surface equation.
Regarding the sum of squares, the calculation of variance and the variance analysis (ANOVA) taken by the F-test enable the statistical significance of the trend surface. Table 10 presents the analysis of the variance (ANOVA) in this case.
The variance analysis technique can be useful in selecting the degree of the trend surface. Table 10 shows the F-test results in steps, ranging from degree 1 to 2, degree 2 to 3, and degree 3 to 4, respectively. The F-test calculated for the improvement of fit obtained when proceeding from degree r to degree r + 1 is a value close to 1.0 if the residuals for a trend surface are rarely uncorrelated and normally distributed. For the Pb element, F 1 = 125.87 which is much greater than 1.0 and F 1(0.05) = 3.02. The first value (F 1 = 125.87) is significant; however, the second one (F 1(0.05) = 3.02) is not, indicating that the best choice for the Pb element is probably the cubic trend surface. Similarly, the degree of their trend surfaces is based on the multiple correlation coefficients (R) in the cases of the Zn and Cu elements due to most of the F-test often being lower than F (0.05) (Tab. 10).
As a whole, trend surface models describe the relative changes in the content of the indicator elements and can be considered as regular (multiple correlation coefficients R > 0.3) (Tab. 9). Nonetheless, the third-degree trend surface models of the Cu, Pb, and Zn elements can be regarded as the best fit for the Cu, Pb, and Zn content data (R > 0.6). These models can be used to study geochemical anomalies and to analyze the tendency of concentration of the Cu, Pb, and Zn indicator elements. Therefore, the third-degree trend surface model helps to investigate areas with trend structure anomalies, ore locations, and nodes with high concentrations, making the exploration more effective.

CONCLUSIONS
In conclusion, based on 1,720 geochemical samples, several methods of statistical, multivariate, and trend analysis were used to study polymetallic mineralizations from the Suoi Thau, Sang Than, and Ban Kep zones in Northeast Vietnam. As a result, the following conclusions can be drawn. It is initially shown from the results of frequency analysis that the Co, Zn, Pb, Fe, Mn, and Cu elements reveal a tight association with polymetallic ore, suggesting that these elements can be selected as the indicator ones for prospecting polymetallic mineralization. Furthermore, Pb, Zn, and Cu elements appear wide anomalies in the Suoi Thau, Ban Kep, and Sang Than zones to provide useful information for prospecting polymetallic in such region. Additionally, the correlation matrix and dendrogram analyses can allow the division of elements into polymetallic ore-forming (i.e., Cu, Pb, Zn, and Co) and rock-forming ones (i.e., V, Ni, Cr, Mg, and Ti) in the studied region.
Next, the threshold value (mean ±3 SDEV) is used to identify anomalous regions and background values of the indicator elements for determining polymetallic mineralization in the region. Generally, the prospective areas are located quite appropriately for known polymetallic mineral sites. Additionally, there is an anomalous region in the northeastern part of the Suoi Thau-Sang Than region without identifying mineralization areas that imply for further investigation.
It is revealed from the trend analysis and the distribution of localized anomalous areas of the indicator elements that the local factors of Cu, Pb, and Zn elements are similar. This indicates a close relationship between polymetallic mineralization with the northwest-southeast fault system and the gabbro blocks within this region.
Finally, the statistical analysis of the bottom sediments shows a remarkable degree of polymetallic mineralization for prospecting in the region due to the association with the localization anomalies that were established during the statistical process. The research results also highlight the possibility of obtaining a positive spatial correlation with the presence of polymetallic mineralization. This is consistent with the results of statistical analysis, as well as the delineation of the prospective areas based on the threshold value of the geochemical data sets and the results of the trend analysis of the indicator elements.