Simulated annealing in feature selection approach for modeling aboveground carbon stock at the transition between Brazilian Savanna and Atlantic Forest biomes


  • Laís Almeida Araújo Federal University of Lavras
  • Isáira Leite e Lopes Federal University of Lavras
  • Rafael Menali Oliveira Federal University of Lavras
  • Sérgio Henrique Godinho Silva Federal University of Lavras
  • Carolina Souza Jarochinski e Silva Federal University of Lavras
  • Lucas Rezende Gomide Federal University of Lavras



Data mining, Biomass, Simulated Annealing, Random Forest.


Forest ecosystems are important in the carbon storage process. Thus, the objective was to investigate the effectiveness of the Simulated Annealing meta-heuristic analysis for selecting variables to maximize the accuracy of the aboveground carbon prediction at the tree level. We used data from uneven-aged forests located in the Rio Grande Basin - Minas Gerais, Brazil, where 227 trees had their carbon stock measured. The classic Spurr linear model, stepwise linear regression and pan-tropical coverage, Random Forest (RF), and the hybrid SARF method (Simulated Annealing and Random Forest) were used to estimate the carbon stock from the selection of variables for the different compartments of the tree (total, stem, branch, and leaf). The SARF consisted of the metaheuristic to select the variables to be used in the RF. These methods were evaluated by the root mean square error (RMSE), coefficient of determination (R²), and residual graph. As a result, the pan-tropical equation demonstrated superior performance than the Spurr model due to its greater homogeneity of residues. The stepwise technique reduced the number of variables and the error of the estimates, mainly for the validation set. SARF showed better adjustments than RF, as it reduced in on average 99.2% of the number of variables and 9% of the error of estimates considering all compartments. In general, variables such as volume, basic wood density, canopy projection area, diameter at 0%, diameter at breast height, height, and latitude contributed strongly to the carbon independent of the tree compartment. Among the methods, SARF is an alternative to the traditional method, as it can extract accurate information from a large data set

Author Biographies

Laís Almeida Araújo, Federal University of Lavras

Department of forest Sciences

Isáira Leite e Lopes, Federal University of Lavras

Department of forest Sciences

Rafael Menali Oliveira, Federal University of Lavras

Department of forest Sciences

Sérgio Henrique Godinho Silva, Federal University of Lavras

Department of Soil Science

Carolina Souza Jarochinski e Silva, Federal University of Lavras

Department of forest Sciences

Lucas Rezende Gomide, Federal University of Lavras

Department of forest Sciences


Abbasi B., Niaki S.T.A., Khalife M.A., & Faize Y., 2011. A hybrid variable neighborhood search and simulated annealing algorithm to estimate the three parameters of the Weibull distribution. Expert Systems with Applications, 38(1):700-708. eswa.2010.07.022Anifowose F.A., Labadin J., Abdulraheem A., 2014. Non- linear feature selection-based hybrid computational intelligence models for improved natural gas reservoir characterization. J Nat Gas Sci Eng 21:397–410. https:// F., Ramezankhani A., Azizi F., Hadaegh F., Steyerberg E.W., Khalili D., 2016. A tutorial on variable selection for clinical prediction models: feature selection methods in data mining could improve the results. J Clin Epidemiol 71:76–85. https:// D.J., Richards G., Grainger A., Gonzalez P., Brown S., DeFries R., Held A., Kellndorfer J., Ndunda P., Ojima D. et al., 2010. Achieving forest carbon information with higher certainty: A five-part plan. Environ Sci Policy 13:249–260. envsci.2010.03.004Breiman L., 2001. Random forests. Machine Learning 45, 5–32. org/10.1023/A:1010933404324Bui D.T., Bui Q.T., Nguyen Q.P., Pradhan B., Nampak H., Trinh P.T., 2017. Ahybrid artificial intelligence approach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest fire susceptibility modeling at a tropical area. Agric For Meteorol 233:32– 44. W.H., Hoffmann M.B., Compton J.F., Back P.V., Tait L.J., 2000. Allometric relationships and community biomass estimates for some dominant eucalypts in Central Queensland woodlands. Aust J Bot 48:707–714. J.M.B., Vasconcelos M.J., Lucas R.M., 2012. Understanding the relationship between aboveground biomass and ALOS PALSAR data in the forests of Guinea-Bissau (West Africa). Remote Sens Environ 121:426–442. L.G. de, Oliveira M.S. de, Alves M. de C., Vianello R.L., Sediyama G.C., Neto P.C., Dantas A.A.A., 2008. Clima. In: Scolforo J.R.S., Carvalho L.M.T., Oliveira A.D. (eds.) Zoneamento ecológico - econômico do estado de Minas Gerais: comoponentes geofísico e biótico. pp 89–101.Chave J., Andalo C., Brown S., Cairns M.A., Chambers J.Q., Eamus D., Fölster H., Fromard F., Higuchi N., Kira T. et al., 2005. Tree allometry and improved estimation of carbon stocks and balance in tropical forests. Oecologia 145:87–99. 005-0100-xChave J., Réjou-Méchain M., Búrquez A., Chidumayo E., Colgan M.S., Delitti W.B.C., Duque A., Eid T., Fearnside P.M., Goodman R.C. et al., 2014. Improved allometric models to estimate the aboveground biomass of tropical trees. Glob Chang Biol 20:3177–3190. T.S., Bronsveld M.C., Rossiter D.G., Dube T., 2013. The precision of C stock estimation in the Ludhikola watershed using model-based and design- based approaches. Nat Resour Res 22:297–309. https:// V., Menguzzato G., Pellicone G., Veltri A., Marziliano P.A., 2016. Effect of thinning on above- ground biomass accumulation in a Douglas-fir plantation in southern Italy. J For Res 27:1313–1320.úñez R.O., Mendoza-Ponce A., López- Martínez R., 2017. Model selection changes the spatial heterogeneity and total potential carbon in a tropical dry forest. For Ecol Manage 405:69–80. https://doi. org/10.1016/j.foreco.2017.09.018Dantas D., Terra M. de C.N.S., Schorr L.P.B., Calegario N., 2021. Machine learning for carbon stock prediction in a tropical forest in southeastern brazil. Bosque 42:131–140. P., Iannucci S., Banicescu I., 2020. Autonomic feature selection using computational intelligence. Futur Gener Comput Syst 111:68–81. future.2020.04.015Drake J.M., Randin C., Guisan A., 2006. Modelling ecological niches with support vector machines. J Appl Ecol 43:424–432. 2664.2006.01141.xEnquist B.J., Niklas K.J., 2001. Correction: Corrigendum: Invariant scaling relations across tree-dominated communities. Nature 425:741. nature02023Feldpausch T.R., Lloyd J., Lewis S.L., Brienen R.J.W., Gloor M., Monteagudo Mendoza A., Lopez-Gonzalez G., Banin L., Abu Salim K., Affum-Baffoe K. et al., 2012. Tree height integrated into pantropical forest biomass estimates. Biogeosciences 9:3381–3403. J., Weisberg S., Price B., et al., 2019. car: Companion to applied regression. R package version 3(3).Gauchi J.-P., Chagnon P., 2001. Comparison of selection methods of explanatory variables in PLS regression with application to manufacturing process data. Chemom Intell Lab Syst 58:171–193. S0169-7439(01)00158-7Genuer R., Poggi J.M., Tuleau-Malot C., 2010. Variable selection using random forests. Pattern Recognit Lett 31:2225–2236. patrec.2010.03.014Gleason C.J., Im J., 2012. Forest biomass estimation from airborne LiDAR data using machine learning approaches. Remote Sens Environ 125:80–91. https:// R.C., Phillips O.L., Baker T.R., 2014. The importance of crown dimensions to improve tropical tree biomass estimates . Ecol Appl 24(4):680-698. M., Yujun S., Saeed S., 2017. Models for predicting the biomass of Cunninghamia lanceolata trees and stands in Southeastern China. PLoS One 12:1–14. P.-T., Li M.-F., Luo W., Tang Q.-F., Liu Z.-W., Lin Z.-M., 2015. Digital mapping of soil organic matter for rubber plantation at regional scale: An application of random forest plus residuals kriging approach. Geoderma 237–238:49–59. geoderma.2014.08.009Guyon I., Elisseeff A., 2003. An introduction to variable and feature selection. J of Machine Learn Res 3:1157– 1182.Han J., Kamber M., Pei J., 2011. Data mining: concepts and techniques. Elsevier.Hapfelmeier A., Ulm K., 2014. Variable selection by Random Forests using data with missing values. Comput Stat Data Anal 80:129–139. csda.2014.06.017Heinrich V.H.A., Dalagnol R., Cassol H.L.G., et al., 2021. Large carbon sink potential of secondary forests in the Brazilian Amazon to mitigate climate change. Nat Commun 12:1785. M., Picard N., Trotta C., Manlay R.J., Valentini R., Bernoux M., Saint-André L., 2011. Estimating tree biomass of sub-Saharan African forests: A review of available allometric equations. Silva Fenn 45:477–569. H., Tsangaratos P., Ilia I., Liu J., Zhu A.-X., Xu C., 2018. Applying genetic algorithms to set the optimal combination of forest fire related variables and model forest fire susceptibility based on data mining models. The case of Dayu County, China. Sci Total Environ 630:1044–1056. scitotenv.2018.02.278Kavzoglu T, Mather PM, 2002. The role of feature selection in artificial neural network applications. Int J Remote Sens 23:2919–2937. https://doi. org/10.1080/01431160110107743Kazempour Larsary M, Pourbabaei H, Sanaei A, et al., 2021. Tree-size dimension inequality shapes aboveground carbon stock across temperate forest strata along environmental gradients. For Ecol Manage 496:1–10. Kuyah S, Muthuri C, Jamnadass R, Mwangi P, Neufeldt H, Dietz J, 2012. Crown area allometries for estimation of aboveground tree biomass in agricultural landscapes of western Kenya. Agrofor Syst 86:267–277. https://doi. org/10.1007/s10457-012-9529-1Labani M, Moradi P, Ahmadizar F, Jalili M, 2018. A novel multivariate filter method for feature selection in text classification problems. Eng Appl Artif Intell 70:25–37. H., Yu L., 2005. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17:491–502. TKDE.2005.66Mafarja M.M., Mirjalili S., 2017. Hybrid Whale Optimization Algorithm with simulated annealing for feature selection. Neurocomputing 260:302–312.ães T.M., Seifert T., 2015. Estimation of tree biomass, carbon stocks, and error propagation in Mecrusse Woodlands. Open J For 05:471–488. https:// K., Selvi K., Ahila R., 2011. Hybrid soft computing techniques for feature selection and parameter optimization in power quality data mining. Appl Soft Comput J 11:5485–5497. https://doi. org/10.1016/j.asoc.2011.05.010Marziliano P.A., Lafortezza R., Colangelo G., Davies C., Sanesi G., 2013. Structural diversity and height growth models in urban forest plantations: A case-study in northern Italy. Urban For Urban Green 12:246–254. P.A., Menguzzato G., Scuderi A., Scuderi A., Scalise C., Coletta V., 2017. Biomass conversion and expansion factors in Douglas-fir stands of different planting density: Variation according to individual growth and prediction equations. For Syst 26(1):e003. C.R., Viola M.R., Beskow S., Norton L.D., 2013. Multivariate models for annual rainfall erosivity in Brazil. Geoderma 202–203:88–102. https://doi. org/10.1016/j.geoderma.2013.03.009Mendoza-Ponce A., Corona-Núñez R., Kraxner F., Leduc S., Patrizio P., 2018. Identifying effects of land use cover changes and climate change on terrestrial ecosystems and carbon stocks in Mexico. Glob Environ Chang 53:12–23. gloenvcha.2018.08.004Mouazen A.M., Kuang B., De Baerdemaeker J., Ramon H., 2010. Comparison among principal component, partial least squares and back propagation neural network analyses for accuracy of measurement of selected soil properties with visible and near infrared spectroscopy. Geoderma 158:23–31. geoderma.2010.03.001Návar J., 2009. Allometric equations for tree species and carbon stocks for forests of northwestern Mexico. For Ecol Manage 257:427–434. foreco.2008.09.028Nunes M.H., Görgens E.B., 2016. Artificial intelligence procedures for tree taper estimation within a complex vegetation mosaic in Brazil. PLoS One 11(5):e0154738. Q., Lei X., Shen C., 2019. Individual tree diameter growth models of larch–spruce–fir mixed forests based on machine learning algorithms. Forests 10:187. https:// V., Purkyt J., Benc A., Nwaogu C., Štěrbová L., Cudlín P., 2018. Modelling of the carbon sequestration and its prediction under climate change. Ecol Inform 47:50–54. MH, Do TH, Pham VM, Bui QT, 2020. Mangrove forest classification and aboveground biomass estimation using an atom search algorithm and adaptive neuro-fuzzy inference system. PLoS One 15(5):e0233110. pone.0233110Picard N., Saint-André L., Henry M., 2012. Manual for building tree volume and biomass allometric equations: From field measurement to prediction. Food and Agricultural Organization of the United Nations, Rome, and Centre de Coopération Internationale en Recherche Agronomique pour le Développement.Pilli R., Anfodillo T., Carrer M., 2006. Towards a functional and simplified allometry for estimating forest biomass. For Ecol Manage 237:583–593. https://doi. org/10.1016/j.foreco.2006.10.004Reddy N., Gebreslasie M., Ismail R., 2017. A hybrid partial least squares and random forest approach to modelling forest structural attributes using multispectral remote sensing data. South African J Geomatics 6(3):377. S.C., Fehrmann L., Soares C.P.B., Jacovine L.A.G., Kleinn C., Gaspar R.O., 2011. Above- and belowground biomass in a Brazilian Cerrado. For Ecol Manage 262(3):491–499. foreco.2011.04.017Sanquetta C.R., Corte A.P.D., Da Silva F., 2011. Biomass expansion factor and root-to-shoot ratio for Pinus in Brazil. Carbon Balance Manag 6(6). https://doi. org/10.1186/1750-0680-6-6Sanquetta C.R., Dalla Corte A.P., Behling A., Oliveira Piva L.R., Péllico Netto S., Rodrigues A.L., Sanquetta M.N.I., 2018. Selection criteria for linear regression models to estimate individual tree biomasses in the Atlantic Rain Forest, Brazil. Carbon Balance Manag 13:25. H.F., Scolforo J.R.S., Mello C.R., Mello J.M., Ferraz Filho A.C., 2015. Spatial distribution of aboveground carbon stock of the arboreal vegetation in Brazilian Biomes of Savanna, Atlantic Forest and Semi- arid woodland. PLoS One 10(6):e0128781. https://doi. org/10.1371/journal.pone.0128781Segura M.A., Acuña L.M., Andrade H.J., 2018. Allometric models to estimate aboveground biomass of small trees in wet tropical forests of Colombian Pacific Area. Rev Árvore 42. 90882018000200009Siddiq Z., Hayyat M.U., Khan A.U., et al., 2021. Models to estimate the above and below ground carbon stocks from a subtropical scrub forest of Pakistan. Glob Ecol Conserv 27:e01539. gecco.2021.e01539Silveira E.M. de O., Silva S.H.G., Acerbi-Junior F.W., Carvalho M.C., Carvalho L.M.T., Scolforo J.R.S., Wulder M.A., 2019. Object-based random forest modelling of aboveground forest biomass outperforms a pixel-based approach in a heterogeneous and mountain tropical environment. Int J Appl Earth Obs Geoinf 78:175–188. S., Peng Q., Shakoor A., 2014. A kernel-based multivariate feature selection method for microarray data classification. PLoS One 9(7):e102541. https://doi. org/10.1371/journal.pone.0102541Tetemke B.A., Birhane E., Rannestad M.M., Eid T., 2021. Species diversity and stand structural diversity of woody plants predominantly determine aboveground carbon stock of a dry Afromontane forest in Northern Ethiopia. For Ecol Manage 500(15):119634. https://doi. org/10.1016/j.foreco.2021.119634Vafaei S., Soosani J., Adeli K., Fadaei H., Naghavi H., Pham T.D., Bui D.T., 2018. Improving accuracy estimation of Forest Aboveground Biomass based on incorporation of ALOS-2 PALSAR-2 and Sentinel- 2A imagery and machine learning: A case study of the Hyrcanian forest area (Iran). Remote Sens 10(2):172. A.A., 2016. Artificial neural network application in comparison with modeling allometric equations for predicting above-ground biomass in the Hyrcanian mixed-beech forests of Iran. Biomass and Bioenergy 88:66–76. biombioe.2016.03.020Vargas-Larreta B., López-Sánchez C.A., Corral-Rivas J.J., López-Martínez J.O., Aguirre-Calderón C.G., Álvarez-González J.G., 2017. Allometric equations for estimating biomass and carbon stocks in the temperate forests of North-Western Mexico. Forests 8(8):269. P., Shrestha R.P., Nagai M., Salam A.P., Kiratiprayoon S., 2014. Carbon stock assessment using remote sensing and forest inventory data in Savannakhet, Lao PDR. Remote Sens 6:5452–5479. G., Vaudry R., Andriamanohisoa S.F.D., Rakotonarivo O.S., Randrianasolo H.Z., Razafindrabe H.N., Rakotoarivony C.B., Ebeling J., Rasamoelina M., 2012. A universal approach to estimate biomass and carbon stock in tropical forests using generic allometric models. Ecol Appl 22:572–583. https://doi. org/10.1890/11-0039.1Vieira G.C., de Mendonça A.R., da Silva G.F., Zanetti S.S., da Silva M.M., dos Santos A.R., 2018. Prognoses of diameter and height of trees of eucalyptus using artificial intelligence. Sci Total Environ 619–620:1473– 1481. B., Waters C., Orgill S., Cowie A., Clark A., Li Liu D., Simpson M., McGowen I., Sides T., 2018. Estimating soil organic carbon stocks using different modelling techniques in the semi-arid rangelands of eastern Australia. Ecol Indic 88:425–438. https://doi. org/10.1016/j.ecolind.2018.01.049Wang M., Wan Y., Ye Z., Lai X., 2017. Remote sensing image classification based on the optimal support vector machine and modified binary coded ant colony optimization algorithm. Inf Sci (Ny) 402:50–68. https:// K., Bui D.T., Dick Ø.B., Singh B.R., 2015. A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape. Ecol Indic 52:394– 403. C., Shen H., Shen A., Deng J., Gan M., Zhu J., Xu H., Wang K., 2016. Comparison of machine-learning methods for above-ground biomass estimation based on Landsat imagery. J Appl Remote Sens 10:035010. C., Tao H., Zhai M., Lin Y., Wang K., Deng J., Shen A., Gan M., Li J., Yang H., 2018. Using nonparametric modeling approaches and remote sensing imagery to estimate ecological welfare forest biomass. J For Res 29:151–161. D., Mencuccini M., 2004. On simplifying allometric analyses of forest biomass. For Ecol Manage 187:311– 332.






Research article