Document Type : Original Research Paper

Authors

1 Department of Mining Engineering, Faculty of Engineering, National University of Trujillo, Trujillo, Peru

2 Faculty of Chemical Engineering, National University of the Altiplano of Puno, Puno, Peru

3 Department of Industrial Engineering, Faculty of Engineering, National University of Trujillo, Trujillo, Peru

10.22044/jme.2025.16568.3239

Abstract

The geochemical and spatial characterization of legacy mine tailings is essential for identifying reprocessing opportunities and informing environmental management. However, the high compositional complexity of polymetallic tailings requires robust multivariate approaches. This study evaluates and compares the performance of four unsupervised clustering algorithms Euclidean K-Means, Riemannian K-Means, Gaussian Mixture Model (GMM), and Agglomerative Clustering applied to 927 samples from the Quiulacocha tailings deposit in Peru, using six major elements (Zn, Pb, Cu, Fe, Ag, Au) and spatial coordinates. All methods consistently identified three main geochemical domains. Cluster 1 was enriched in Cu and Au, Cluster 2 in Pb and Fe, and Cluster 3 in Zn, Ag, and Fe. Covariance-based methods (Riemannian K-Means and Agglomerative Clustering) outperformed others in internal validation (Silhouette scores up to 0.58) and consistency (Adjusted Rand Index = 1.00), offering more interpretable and geologically coherent partitions. CLR transformation reduced clustering performance, highlighting the importance of preserving raw geochemical variance for spatial segmentation. These findings demonstrate the effectiveness of multivariate clustering for unraveling compositional heterogeneity in tailings and delineating domains of potential economic value. The approach provides a quantitative framework for supporting reprocessing decisions, reducing risk, and guiding future research on mine waste valorization.

Keywords

Main Subjects

[1]. Falagán C, Grail BM, Johnson DB (2017) New approaches for extracting and recovering metals from mine tailings. Miner Eng. https://doi.org/10.1016/j.mineng.2016.10.008
[2]. Kermani M, Hassani FP, Aflaki E, Benzaazoua M, Nokken M (2015) Evaluation of the effect of sodium silicate addition to mine backfill, Gelfill - Part 1. Journal of Rock Mechanics and Geotechnical Engineering. https://doi.org/10.1016/j.jrmge.2015.03.006
[3]. Nascimento SC, Cooke DR, Cracknell MJ, Miller CB, Parbhakar-Fox A (2025) Mineralogical and geochemical characterization of mine tailings in the King river delta, Western Tasmania: Implications for long-term stability of trace elements. Applied Geochemistry 184:106366
[4]. Elghali A, Benzaazoua M, Bussière B, Kennedy C, Parwani R, Graham S (2019) The role of hardpan formation on the reactivity of sulfidic mine tailings: A case study at Joutel mine (Québec). Science of the Total Environment. https://doi.org/10.1016/j.scitotenv.2018.11.066
[5]. Gäbler HE (1997) Mobility of heavy metals as a function of pH of samples from an overbank sediment profile contaminated by mining activities. J Geochem Explor.
[6]. Anju M, Banerjee DK (2010) Comparison of two sequential extraction procedures for heavy metal partitioning in mine tailings. Chemosphere.
[7]. Cook NJ, Ciobanu CL, Pring A, Skinner W, Shimizu M, Danyushevsky L, Saini-Eidukat B, Melcher F (2009) Trace and minor elements in sphalerite: A LA-ICPMS study. Geochim Cosmochim Acta.
[8]. Yin Z, Sun W, Hu Y, Zhang C, Guan Q, Wu K (2018) Evaluation of the possibility of copper recovery from tailings by flotation through bench-scale, commissioning, and industrial tests. J Clean Prod.
[9]. Antonijević MM, Dimitrijević MD, Stevanović ZO, Serbula SM, Bogdanovic GD (2008) Investigation of the possibility of copper recovery from the flotation tailings by acid leaching. J Hazard Mater. https://doi.org/10.1016/j.jhazmat.2008.01.063
[10]. Wanhainen C, Palsson BI, Martinsson O, Lahaye Y (2017) Rare earth mineralogy in tailings from Kiirunavaara iron ore, northern Sweden: Implications for mineral processing. Minerals and Metallurgical Processing. https://doi.org/10.19150/mmp.7859
[11]. Schuenemeyer JH, Drew LJ (2010) Statistics for Earth and Environmental Scientists. Statistics for Earth and Environmental Scientists. https://doi.org/10.1002/9780470650707
[12]. Aitchison J (1982) The Statistical Analysis of Compositional Data. J R Stat Soc Series B Stat Methodol. https://doi.org/10.1111/j.2517-6161.1982.tb01195.x
[13]. Filzmoser P, Hron K, Templ M (2012) Discriminant analysis for compositional data and robust parameter estimation. Comput Stat. https://doi.org/10.1007/s00180-011-0279-8
[14]. Shanmugam R (2019) Applied compositional data analysis: with worked examples in R. J Stat Comput Simul. https://doi.org/10.1080/00949655.2019.1628880
[15]. Pacifico LR, Guarino A, Iannone A, Albanese S (2025) Accounting for the Compositional Nature of Geochemical Data to Improve the Interpretation of Their Univariate and Multivariate Spatial Patterns: A Case Study from the Campania Region (Italy). Geosciences (Basel) 15:20
[16]. Somma R, Ebrahimi P, Troise C, De Natale G, Guarino A, Cicchella D, Albanese S (2021) The first application of compositional data analysis (CoDA) in a multivariate perspective for detection of pollution source in sea sediments: The Pozzuoli Bay (Italy) case study. Chemosphere. https://doi.org/10.1016/j.chemosphere.2021.129955
[17]. Peel MC, Finlayson BL, McMahon TA (2007) Updated world map of the Köppen-Geiger climate classification. Hydrol Earth Syst Sci. https://doi.org/10.5194/hess-11-1633-2007
[18]. Foley JA, DeFries R, Asner GP, et al (2005) Global consequences of land use. Science (1979). https://doi.org/10.1126/science.1111772
[19]. Verhoeven G (2011) Taking computer vision aloft - archaeological three-dimensional reconstructions from aerial photographs with photoscan. Archaeol Prospect. https://doi.org/10.1002/arp.399
[20]. Carranza EJM (2009) Geochemical Anomaly and Mineral Prospectivity Mapping in GIS. Handbook of Exploration and Environmental Geochemistry. https://doi.org/10.1016/S1874-2734(09)70004-X
[21]. Koutsaki E, Vardakis G, Papadakis N (2023) Spatiotemporal Data Mining Problems and Methods. Analytics. https://doi.org/10.3390/analytics2020027
[22]. HAWKES HE, WEBB JS (1963) Geochemistry in Mineral Exploration. Soil Sci. https://doi.org/10.1097/00010694-196304000-00016
[23]. Ilgen E, Levsen K, Angerer J, Schneider P, Heinrich J, Wichmann HE (2001) Aromatic hydrocarbons in the atmospheric environment - Part II: Univariate and multivariate analysis and case studies of indoor concentrations. Atmos Environ. https://doi.org/10.1016/S1352-2310(00)00490-8
[24]. Shahrestani S, Cohen DR, Mokhtari AR (2024) A comparison of PCA and ICA in geochemical pattern recognition of soil data: The case of Cyprus. J Geochem Explor 264:107539
[25]. Dominech S, Yang S, Aruta A, Gramazio A, Albanese S (2022) Multivariate analysis of dilution-corrected residuals to improve the interpretation of geochemical anomalies and determine their potential sources: The Mingardo River case study (Southern Italy). J Geochem Explor. https://doi.org/10.1016/j.gexplo.2021.106890
[26]. Goovaerts P (1997) Geostatistics for Natural Resources Evaluation (Applied Geostatistics). Oxford University Press, New York
[27]. Aghahadi MH, Jozanikohan G, Asghari O, Talesh Hosseini S, Emery X, Rezaei M (2024) Geochemical anomaly separation based on geology, geostatistics, compositional data and local singularity analyses: A case study from the kuh panj copper deposit, Iran. Applied Geochemistry 173:106135
[28]. Cheng Q (2007) Mapping singularities with stream sediment geochemical data for prediction of undiscovered mineral deposits in Gejiu, Yunnan Province, China. Ore Geol Rev. https://doi.org/10.1016/j.oregeorev.2006.10.002
[29]. Cheng Q, Agterberg FP, Ballantyne SB (1994) The separation of geochemical anomalies from background by fractal methods. J Geochem Explor. https://doi.org/10.1016/0375-6742(94)90013-2
[30]. Li C, Ma. T, Shi J (2003) Application of a fractal method relating concentrations and distances for separation of geochemical anomalies from background. J Geochem Explor. https://doi.org/10.1016/S0375-6742(02)00276-5.
[31]. Pourgholam MM, Adib A, Afzal P, Rahbar K, Gholinejad M (2025) Deep learning and fractal-wavelet techniques for magnetite-apatite exploration in Tarom Iran. Scientific reports 15(1), 31907. https://doi.org/10.1038/s41598-025-16040-2.
[32]. Farhadi S, Tatullo S, Boveiri M, Afzal P (2024) Evaluating StackingC and ensemble models for enhanced lithological classification in geological mapping. Journal of Geochemical Exploration 260, 107441. https://doi.org/10.1016/j.gexplo.2024.107441.
[33]. Saadati H, Afzal P, Torshizian H, Solgi A (2025) Application of Stepwise Fractal Modeling for Interpretation of Remote Sensing data, NE Iran. Iranian Journal of Earth Sciences 17 (3). https://doi.org/10.57647/j.ijes.2025.16799.
[34]. Samadi S, Afzal P, Arian M, Solgi A, Maleki Z, Seraj M (2025) Detection of effective porosity zones utilizing fractal modeling in an oilfield reservoir, NW Iran. Geopersia 15 (1), 85-95. https://doi.org/10.22059/geope.2024.380851.648770.
[35]. Cotrina-Teatino M, Marquina-Araujo J, Mamani-Quispe J, Chira-Fernandez J, Castillo-Chung A, Arango-Retamozo S, González-Vasquez J, Ortiz-Quintanilla S (2025) Geochemical and mineralogical characterization of critical elements in gold tailings from the La Cienega, Peru, and assessment of their reuse potential. Journal of Environmental Chemical Engineering 13 (5), 118497. https://doi.org/10.1016/j.jece.2025.118497.
[36]. Cotrina-Teatino M, Marquina-Araujo J, Mamani-Quispe J, Guartán J, Castillo-Chung A, Arango-Retamozo S, González-Vasquez J, Ortiz-Quintanilla S (2025) Strategic potential assessment of lanthanum and scandium through geochemical-lithological analysis with unsupervised machine learning in southern Ecuador. Resource Policy 109, 105731. https://doi.org/10.1016/j.resourpol.2025.105731.
[37]. Ahmed AD, Hood SB, Cooke DR, Belousov I (2020) Unsupervised clustering of LA-ICP-MS raster map data for geological interpretation: A case study using epidote from the Yerington district, Nevada. Applied Computing and Geosciences. https://doi.org/10.1016/j.acags.2020.100036
[38]. Wang X, Chen Y (2025) Unsupervised detection of multivariate geochemical anomalies using a high-performance deep autoencoder Gaussian mixture model. J Geochem Explor 271:107671
[39]. Zhou W, Maerz NH (2002) Implementation of multivariate clustering methods for characterizing discontinuities data from scanlines and oriented boreholes. Comput Geosci. https://doi.org/10.1016/S0098-3004(01)00111-X
[40]. Stumpe B, Marschner B (2024) Rehabilitated Tailing Piles in the Metropolitan Ruhr Area (Germany) Identified as Green Cooling Islands and Explained by K-Mean Cluster and Random Forest Regression Analyses. Remote Sens (Basel) 16:4348
[41]. Santos NL, Gomes M da CR, Dos Anjos JÂSA, Cunha FG (2020) Multivariate statistical analysis applied to assess the dispersion of contaminants in a mining tailings basin in the semiarid region of bahia – brazil. Revista Ambiente e Agua. https://doi.org/10.4136/AMBI-AGUA.2572
[42]. Jin Y, Wakayama T, Jiang R, Sugasawa S (2025) Clustered factor analysis for multivariate spatial data. Spat Stat 66:100889
[43]. Baragilly MH, Gabr H, Willis BH (2023) Clustering Analysis of Multivariate Data: A Weighted Spatial Ranks-Based Approach. J Probab Stat 2023:1–15
[44]. Xiao W, Zhou Z, Ren B, Deng X (2025) Integrating spatial clustering and multi-source geospatial data for comprehensive geological hazard modeling in Hunan Province. Sci Rep 15:1982
[45]. Fouedjio F (2016) A hierarchical clustering method for multivariate geostatistical data. Spat Stat. https://doi.org/10.1016/j.spasta.2016.07.003
[46]. Riquelme ÁI, Ortiz JM (2024) A Riemannian Tool for Clustering of Geo-Spatial Multivariate Data. Math Geosci. https://doi.org/10.1007/s11004-023-10085-7
[47]. Cotrina-Teatino M, Riquelme Á, Marquina J, Mamani-Quispe J, Arango-Retamozo S, Ccatamayo-Barrios J, Donaires-Flores T, Calla-Huayapa M, González-Vásquez J (2025) KMeans-Riemannian model for classification mineral resources in a copper deposit in Peru. International Journal of Mining, Reclamation and Environment. https://doi.org/10.1080/17480930.2025.2518987.
[48]. Martin R, Boisvert J (2018) Towards justifying unsupervised stationary decisions for geostatistical modeling: Ensemble spatial and multivariate clustering with geomodeling specific clustering metrics. Comput Geosci. https://doi.org/10.1016/j.cageo.2018.08.005
[49]. Templ M, Filzmoser P, Reimann C (2008) Cluster analysis applied to regional geochemical data: Problems and possibilities. Applied Geochemistry. https://doi.org/10.1016/j.apgeochem.2008.03.004
[50]. Hajihosseinlou M, Maghsoudi A, Ghezelbash R (2024) A comprehensive evaluation of OPTICS, GMM and K-means clustering methodologies for geochemical anomaly detection connected with sample catchment basins. Geochemistry. https://doi.org/10.1016/j.chemer.2024.126094
[51]. Sadeghi M, Casey P, Carranza EJM, Lynch EP (2024) Principal components analysis and K-means clustering of till geochemical data: Mapping and targeting of prospective areas for lithium exploration in Västernorrland Region, Sweden. Ore Geol Rev 167:106002
[52]. Jansson NF, Allen RL, Skogsmo G, Tavakoli S (2022) Principal component analysis and K-means clustering as tools during exploration for Zn skarn deposits and industrial carbonates, Sala area, Sweden. J Geochem Explor. https://doi.org/10.1016/j.gexplo.2021.106909
[53]. Morales González-Moro Á, D’Auria L, Pérez Rodríguez NM (2025) Genetic K-Means Clustering of Soil Gas Anomalies for High-Enthalpy Geothermal Prospecting: A Multivariate Approach from Southern Tenerife, Canary Islands. Geosciences (Basel) 15:204
[54]. Ellefsen KJ, Smith DB (2016) Manual hierarchical clustering of regional geochemical data using a Bayesian finite mixture model. Applied Geochemistry. https://doi.org/10.1016/j.apgeochem.2016.05.016
[55]. Ghanbari Y, Hezarkhani A, Ataei M, Pazand K (2010) Regional geochemical pattern recognition with multivariate correspondence cluster analysis in the Ravar area, Iran. Transactions of the Institutions of Mining and Metallurgy, Section B: Applied Earth Science. https://doi.org/10.1179/1743275811Y.0000000014
[56]. Moreira G de C, Coimbra Leite Costa JF, Marques DM (2020) Defining geologic domains using cluster analysis and indicator correlograms: a phosphate-titanium case study. Applied Earth Science: Transactions of the Institute of Mining and Metallurgy. https://doi.org/10.1080/25726838.2020.1814483
[57]. Erikstad L, Bakkestuen V, Dahl R, Arntsen ML, Margreth A, Angvik TL, Wickström L (2022) Multivariate Analysis of Geological Data for Regional Studies of Geodiversity. Resources. https://doi.org/10.3390/resources11060051
[58]. Hoseinzade Z, Bazoobandi MH (2024) Deep embedded clustering: Delineating multivariate geochemical anomalies in the Feizabad region. Geochemistry 84:126208
[59]. Tokuda EK, Comin CH, Costa L da F (2022) Revisiting agglomerative clustering. Physica A: Statistical Mechanics and its Applications. https://doi.org/10.1016/j.physa.2021.126433
[60]. Li T, Rezaeipanah A, Tag El Din ESM (2022) An ensemble agglomerative hierarchical clustering algorithm based on clusters clustering technique and the novel similarity measurement. Journal of King Saud University - Computer and Information Sciences. https://doi.org/10.1016/j.jksuci.2022.04.010
[61]. Marquina-Araujo JJ, Cotrina-Teatino MA, Cruz-Galvez JA, Noriega-Vidal EM, Vega-Gonzalez JA (2024) Application of Autoencoders Neural Network and K-Means Clustering for the Definition of Geostatistical Estimation Domains. Mathematical Modelling of Engineering Problems 11:1207–1218
[62]. Martin R, Boisvert J (2020) Performance of clustering for the decision of stationarity; A case study with a nickel laterite deposit. Comput Geosci. https://doi.org/10.1016/j.cageo.2020.104565
[63]. Moreira G de C, Modena RCC, Costa JFCL, Marques DM (2021) A workflow for defining geological domains using machine learning and geostatistics. Tecnol Metal Mater Min. https://doi.org/10.4322/2176-1523.20212472
[64]. Carlotto V, Quispe J, Acosta H, Rodríguez R, Romero D, Cerpa L, Mamani M, Díaz-Martínez E, Navarro P, Jaimes F, Velarde T, Lu S, Cueva E (2009) Geotectonic domain as tool for metallogenetic mapping in Peru. Sociedad Geológica del Perú.
[65]. Barachant A, Bonnet S, Congedo M, Jutten C (2013) Classification of covariance matrices using a Riemannian-based kernel for BCI applications. Neurocomputing. https://doi.org/10.1016/j.neucom.2012.12.039
[66]. Barachant A, Bonnet S, Congedo M, Jutten C (2010) Riemannian geometry applied to BCI classification. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://doi.org/10.1007/978-3-642-15995-4_78
[67]. Märzinger T, Kotík J, Pfeifer C (2021) Application of hierarchical agglomerative clustering (Hac) for systemic classification of pop-up housing (puh) environments. Applied Sciences (Switzerland). https://doi.org/10.3390/app112311122
[68]. Raju VNG, Lakshmi KP, Jain VM, Kalidindi A, Padma V (2020) Study the Influence of Normalization/Transformation process on the Accuracy of Supervised Classification. Proceedings of the 3rd International Conference on Smart Systems and Inventive Technology, ICSSIT 2020. https://doi.org/10.1109/ICSSIT48917.2020.9214160
[69]. Corcoran L, Simonetti A, Spano TL, Lewis SR, Dorais C, Simonetti S, Burns PC (2019) Multivariate analysis based on geochemical, isotopic, and mineralogical compositions of uranium-rich samples. Minerals. https://doi.org/10.3390/min9090537
[70]. Li P, Wang Q, Zeng H, Zhang L (2017) Local Log-Euclidean Multivariate Gaussian Descriptor and Its Application to Image Classification. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2016.2560816
[71]. Congedo M, Barachant A, Bhatia R (2017) Riemannian geometry for EEG-based brain-computer interfaces; a primer and a review. Brain-Computer Interfaces. https://doi.org/10.1080/2326263X.2017.1297192
[72]. Cotrina-Teatino MA, Marquina-Araujo JJ, Riquelme ÁI (2025) Comparison of Machine Learning Techniques for Mineral Resource Categorization in a Copper Deposit in Peru. Natural Resources Research. https://doi.org/10.1007/s11053-025-10505-x
[73]. Cotrina M, Marquina J, Mamani J (2025) Application of artificial neural networks for the categorization of mineral resources in a copper deposit in Peru. World Journal of Engineering. https://doi.org/https://doi.org/10.1108/WJE-01-2025-0004
[74]. Cotrina M, Marquina J, Mamani J, Arango S, Ccatamayo J, Gonzalez J, Donaires T, Calla M (2025) Categorization of Mineral Resources using Random Forest Model in a Copper Deposit in Peru. Journal of Mining and Environmental 16:947–962
[75]. Shi N, Liu X, Guan Y (2010) Research on k-means clustering algorithm: An improved k-means clustering algorithm. 3rd International Symposium on Intelligent Information Technology and Security Informatics, IITSI 2010. https://doi.org/10.1109/IITSI.2010.74
[76]. Tarigan DA (2023) Optimization of the K-Means Clustering Algorithm Using Davies Bouldin Index in Iris Data Classification. Media Online) 4:
[77]. Shang M, Li H, Ahmad A, Ahmad W, Ostrowski KA, Aslam F, Joyklad P, Majka TM (2022) Predicting the Mechanical Properties of RCA-Based Concrete Using Supervised Machine Learning Algorithms. Materials. https://doi.org/10.3390/ma15020647
[78]. Waller LA (2012) Detection of Clustering in Spatial Data. The SAGE Handbook of Spatial Analysis. https://doi.org/10.4135/9780857020130.n16
[79]. Batool F, Hennig C (2021) Clustering with the Average Silhouette Width. Comput Stat Data Anal. https://doi.org/10.1016/j.csda.2021.107190
[80]. Massing T (2021) Clustering Using Student t Mixture Copulas. SN Comput Sci. https://doi.org/10.1007/s42979-021-00503-0
[81]. Vattani A (2011) k-means Requires Exponentially Many Iterations Even in the Plane. Discrete Comput Geom. https://doi.org/10.1007/s00454-011-9340-1
[82]. Supajaidee N, Chutsagulprom N, Moonchai S (2024) An Adaptive Moving Window Kriging Based on K-Means Clustering for Spatial Interpolation. Algorithms. https://doi.org/10.3390/a17020057
[83]. Goh A, Vidal R (2008) Clustering and dimensionality reduction on Riemannian manifolds. 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR. https://doi.org/10.1109/CVPR.2008.4587422
[84]. Holopainen I, Rickman S (1993) Classification of Riemannian manifolds in nonlinear potential theory. Potential Analysis. https://doi.org/10.1007/BF01047672
[85]. Viroli C, McLachlan GJ (2019) Deep Gaussian mixture models. Stat Comput. https://doi.org/10.1007/s11222-017-9793-z
[86]. Chassagnol B, Bichat A, Boudjeniba C, Wuillemin PH, Guedj M, Gohel D, Nuel G, Becht E (2023) Gaussian Mixture Models in R. R Journal.
[87]. Rousseeuw PJ (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math.
[88]. Shutaywi M, Kachouie NN (2021) Silhouette analysis for performance evaluation in machine learning with applications to clustering. Entropy.
[89]. Lima SP, Cruz MD (2020) A genetic algorithm using Calinski-Harabasz index for automatic clustering problem. Revista Brasileira de Computação Aplicada 12:97–106