Schmidt, James (2025): Using Random Forest Machine Learning to Identify Homes at High Risk from Wildfires in California Counties.
Preview |
PDF
MPRA_paper_126685.pdf Download (8MB) | Preview |
Abstract
Wildfires driven by extreme winds, such as the Camp Fire in 2018 and the Eaton and Palisades fires in 2025, account for a large share of structure losses due to wildfires in California. Because these types of events are relatively rare, their risks are difficult to estimate using conventional simulation techniques. This study explores the use of the Random Forest machine learning algorithm as an alternative method for estimating wildfire risk to structures. Environmental variables are estimated for 57,000 structures destroyed in wildfires in California and for 6.2 million unburned structures with the potential for wildfire exposure. A Random Forest model, trained on both the burned and unburned structures, identifies which variables are most effective in distinguishing between the two and which unburned structures belong in the High-Risk category.
The six environmental variables found to be the most important in identifying High-Risk structures are: · the annual Red Flag Warning hours (RFW) · the average Energy Release Component (ERC) · the Wildland Urban Interface Zone (WUI) · the Normalized Difference Vegetation Index (NDVI) · the annual number of downslope wind events (DW) · the proportion of sustained winds of 20 mph or greater on high fire danger days (SW20)
By adjusting the maximum tree-depth parameter, the Random Forest model is calibrated to produce a state-wide percentage of High-Risk structures of 12% in order to match estimates by the California Department of Insurance (CDI). The CDI estimates are based on a weighted average of insurance industry risk models. Although the Random Forest model matches the CDI estimates for the percentage of High-Risk structures at the state level, the percentage by county differs significantly from the CDI numbers. The largest reductions in the percentage of High-Risk structures occur in the Central Sierra counties of Tuolumne and Mariposa ( -48% and -34% respectively). The largest increases occur in Mono County in the Eastern Sierras (+53%) and Ventura County in Southern California (+42%).
Wind characteristics appear to be the primary reason for the differences in county risk ratings. Counties with fewer Red Flag Warning hours, fewer downslope wind days, and a smaller proportion of winds above 20 mph tend to have a smaller percentage of High-Risk structures than estimated by the CDI.
| Item Type: | MPRA Paper |
|---|---|
| Original Title: | Using Random Forest Machine Learning to Identify Homes at High Risk from Wildfires in California Counties |
| English Title: | Using Random Forest Machine Learning to Identify Homes at High Risk from Wildfires in California Counties |
| Language: | English |
| Keywords: | wildfire Random Forest California structures risk simulation wind WUI NDVI ERC |
| Subjects: | D - Microeconomics > D8 - Information, Knowledge, and Uncertainty > D81 - Criteria for Decision-Making under Risk and Uncertainty R - Urban, Rural, Regional, Real Estate, and Transportation Economics > R2 - Household Analysis > R23 - Regional Migration ; Regional Labor Markets ; Population ; Neighborhood Characteristics Y - Miscellaneous Categories > Y1 - Data: Tables and Charts |
| Item ID: | 126685 |
| Depositing User: | James Schmidt |
| Date Deposited: | 07 Nov 2025 02:45 |
| Last Modified: | 07 Nov 2025 02:45 |
| References: | Abatzoglou JT. (2013). Development of Gridded Surface Meteorological Data for Ecological Applications and Modelling. International Journal of Climatology, 33, 121-131. https://doi.org/10.1002/joc.3413 Abatzoglou JT, Hatchett BJ, Fox-Hughes P, Gershunov A, Nauslar NJ. (2021). Global climatology of synoptically forced downslope winds. Int J Climatology. 41: 31–50. https://doi.org/10.1002/joc.6607. Data downloaded from: https://climate.northwestknowledge.net/ACSL/DOWNSLOPEWINDS/ Abatzoglou JT, Kolden CA, Williams AP, Sadegh M, Balch JK, & Hall A. (2023). Downslope wind-driven fires in the western United States. Earth's Future, 11, e2022EF003471. https://doi.org/10.1029/2022EF003471 Ager AA, Day MA, Alcasena FJ, Evers CR, Short KC, & Grenfell I. (2021). Predicting Paradise: Modeling future wildfire disasters in the western US. Science of the Total Environment, 784, 147057. https://doi.org/10.1016/j.scitotenv.2021.147057 Balch JK, Iglesias V, Mahood AL, Cook MC, Amaral C, DeCastro A, & Kolden CA. (2024). The fastest-growing and most destructive fires in the US (2001 to 2020). Science, 386(6720), 425-431. CAL FIRE Damage Inspection Database (DINS) (Jan., 2025). https://data.ca.gov/dataset/cal-fire-damage-inspectiondins-data CAL FIRE Wind Data (2024). Produced for CAL FIRE by the Desert Research Institute. Included as part of the Fire Hazard Severity Zone Data Package downloadable from: https://osfmfhsz.blob.core.windows.net/public/index.html GRIDMET: A dataset created by John Abatzoglou (University of California at Merced) containing daily high-spatial resolution (~4-km, 1/24th degree) surface meteorological data covering the contiguous US from 1979 to present. https://www.climatologylab.org/gridmet.html. Kramer HA, Mockrin MH, Alexandre PM, Stewart SI, Radeloff VC. 2023. Building loss and rebuilding within wildfire perimeters of the conterminous United States (2000-2013). Fort Collins, CO: Forest Service Research Data Archive. https://doi.org/10.2737/RDS-2023-0040; https://www.fs.usda.gov/rds/archive/catalog/RDS-2023-0040 Microsoft Building Footprint Database, 2022 (https://github.com/microsoft/USBuildingFootprints?tab=readme-ovfile) Mesonet, Iowa State University (Red Flag Warning Data) (https://mesonet.agron.iastate.edu/request/gis/watchwarn.phtml ) NOAA NDVI data: (https://www.ncei.noaa.gov/products/climate-data-records/normalized-difference-vegetationindex) Schmidt J. (2023). Defensible Space, Housing Density, and Diablo-North Wind Events: Impacts on Loss Rates for Homes in Northern California Wildfires. Munich Personal RePEc Archive, https://mpra.ub.unimuenchen. de/116166/ Scott JH, Gilbertson-Day JW, Moran C, Dillon GK, Short KC, Vogler KC. (2020). Wildfire Risk to Communities: Spatial datasets of landscape-wide wildfire risk components for the United States. Fort Collins, CO: Forest Service Research Data Archive. Updated 25 November 2020. (https://www.fs.usda.gov/rds/archive/Catalog/RDS-2020-0060) Silvis Lab, University of Wisconsin-Madison: (https://silvis.forest.wisc.edu/data/wui-change/) St. Denis LA, Short KC, McConnell K, Cook MC, Mietkiewicz NP, Buckland M, Balch JK. (2023). All-hazards dataset mined from the US National Incident Management System 1999-2020. Scientific Data. 10:112 Dataset: https://research.fs.usda.gov/firelab/products/dataandtools/ics-209-plus Syphard AD, Rustigian-Romsos H, Keeley JE. (2021). Multiple-Scale Relationships between Vegetation, the Wildland–Urban Interface, and Structure Loss to Wildfire in California. Fire, 4, 12. https://doi.org/10.3390/fire4010012 |
| URI: | https://mpra.ub.uni-muenchen.de/id/eprint/126685 |

