Logo
Munich Personal RePEc Archive

Using Random Forest Machine Learning to Identify Homes at High Risk from Wildfires in California Counties

Schmidt, James (2025): Using Random Forest Machine Learning to Identify Homes at High Risk from Wildfires in California Counties.

[thumbnail of MPRA_paper_126685.pdf]
Preview
PDF
MPRA_paper_126685.pdf

Download (8MB) | Preview

Abstract

Wildfires driven by extreme winds, such as the Camp Fire in 2018 and the Eaton and Palisades fires in 2025, account for a large share of structure losses due to wildfires in California. Because these types of events are relatively rare, their risks are difficult to estimate using conventional simulation techniques. This study explores the use of the Random Forest machine learning algorithm as an alternative method for estimating wildfire risk to structures. Environmental variables are estimated for 57,000 structures destroyed in wildfires in California and for 6.2 million unburned structures with the potential for wildfire exposure. A Random Forest model, trained on both the burned and unburned structures, identifies which variables are most effective in distinguishing between the two and which unburned structures belong in the High-Risk category.

The six environmental variables found to be the most important in identifying High-Risk structures are: · the annual Red Flag Warning hours (RFW) · the average Energy Release Component (ERC) · the Wildland Urban Interface Zone (WUI) · the Normalized Difference Vegetation Index (NDVI) · the annual number of downslope wind events (DW) · the proportion of sustained winds of 20 mph or greater on high fire danger days (SW20)

By adjusting the maximum tree-depth parameter, the Random Forest model is calibrated to produce a state-wide percentage of High-Risk structures of 12% in order to match estimates by the California Department of Insurance (CDI). The CDI estimates are based on a weighted average of insurance industry risk models. Although the Random Forest model matches the CDI estimates for the percentage of High-Risk structures at the state level, the percentage by county differs significantly from the CDI numbers. The largest reductions in the percentage of High-Risk structures occur in the Central Sierra counties of Tuolumne and Mariposa ( -48% and -34% respectively). The largest increases occur in Mono County in the Eastern Sierras (+53%) and Ventura County in Southern California (+42%).

Wind characteristics appear to be the primary reason for the differences in county risk ratings. Counties with fewer Red Flag Warning hours, fewer downslope wind days, and a smaller proportion of winds above 20 mph tend to have a smaller percentage of High-Risk structures than estimated by the CDI.

Atom RSS 1.0 RSS 2.0

Contact us: mpra@ub.uni-muenchen.de

This repository has been built using EPrints software.

MPRA is a RePEc service hosted by Logo of the University Library LMU Munich.