NSSL/SPC Spring Forecasting Experiment Blog: Metrics: Pick your number

Thursday, June 07, 2012

Metrics: Pick your number

How exactly do you choose your favorite model?

Because that is the model you fall back to when uncertainty is large. The model you use when a big event is forecast. The model you are most familiar with. The model you use to re-calibrate yourself when a big event is forecast.

Have you chosen wisely? What standards have you used to evaluate your "favorite" model? What metrics have you used to evaluate your model? How long was the data set that you used to perform this verification? Or did you validate your model based on a few cases using a few parameters?

Most managers, when choosing which models are superior, ask for a number. A number they can use to justify an upgrade or justify removal of a modeling system. They ask what metrics are most relevant and then ask for that number and the metrics/numbers for the competing modeling systems.

Yet all metrics have both strengths and weaknesses, can be applied to some variables, over certain time intervals. Sometimes they are informative, sometimes not. We have been exploring some more popular metrics like Fractions Skill Scores, Gilbert Skill Scores, CSI, etc. We have been doing this by pairing these scores with subjective measures of skill.

But exactly how do you quantify the skill of mode or evolution of convection?

Can you compare two different models and rank them when one purposefully is different in appearance than the other? And when these (lets call them) obvious biases are present, what metric can account for these? How can you be objective in that setting?

There are many (forecasters, researchers, modelers) suffering through these practical considerations because there are only conditional metrics available with which to evaluate a model. Most people prefer to have "eyes on the output" (i.e. subjective impressions) to identify the strengths and weaknesses of the system so it can be developed further so they can be improved.

As it stands we settle for models with well known biases. But in this climate of change we really need to develop a framework for evaluation that at its core is able to distinguish skill and reliability across models that can capture the subjective impressions of the people who use them.

No comments:

Post a Comment

About the NSSL Spring Experiment

The NOAA HWT Spring Forecasting Experiment is a yearly experiment that investigates the use of convection-allowing model forecasts as guidance for the prediction of severe convective weather. A variety of model output is examined and evaluated daily during the experiment and experimental severe weather forecasts are created and verified. The variety of model output allows us to explore different types of guidance, including products derived from both ensembles and deterministic forecasts.

The 2018 Spring Forecasting Experiment will be held from April 30th through June 1st in the HWT facility at the National Weather Center in Norman. The Experiment is scheduled to run Monday through Friday from 8am to 4pm. The Experiment will continue the focus on probabilistic forecast generation over shorter time periods than current Storm Prediction Center operational products. A major effort has been made to coordinate convection-allowing ensemble configurations between contributing agencies, resulting in a Community Leveraged Unified Ensemble (CLUE) containing 81 members with 3 km grid spacing. This ensemble will be used heavily in the forecast process and be used in verification exercises to compare different ensemble design strategies. More information about the unique CLUE members can be found in the 2017 Operations Plan.