NSSL/SPC Spring Forecasting Experiment Blog: 06/05/2011

Wednesday, June 08, 2011

Certainty, doubt, and verification

Today's forecast on CI focused on the area from northeast KS southwest along a front down towards the TX-OK panhandles. It was straightforward enough. How far southwest will the cap break? Will there be enough moisture in the warm sector near the frontal convergence? Will the dryline serve as a focus for CI, given the development of a dry slot present just ahead of the dryline along the southern extent of the front and a transition zone (reduced moisture zone)?

So we went to work mining the members of the ensemble, scrutinizing the deterministic models for surface moisture evolution, examining the convergence fields, and looking at ensemble soundings. The conclusion from the morning was two moderate risk areas: one in northeast KS and another covering the triple point, dryline, and cold front. The afternoon forecast backed off the dryline-triple point given the observed dry slot and the dry sounding from LMN at 1800 UTC.

The other issue was that the dryline area was so dry and the PBL so deep that convective temperature would be reached but with minimal CAPE (10-50 J kg-1). The dry LMN sounding was assumed to be representative of the larger mesoscale environment. This was wrong, as the 00 UTC sounding at LMN indicated an increase in moisture by 6 g/kg aloft and 3 at the surface.

Another aspect to this case was our scrutiny of the boundary layer and the presence of open-cell convection and horizontal convective rolls. We discussed, again, that at 4km grid spacing we are close to resolving these types of features. We are close because of the scale of the rolls (in order to resolve them they need to be larger than 7times the grid spacing) which scales with the boundary layer depth. So a day like today where the PBL is deep, the rolls should be close to resolvable. On the other hand, there is a need for additional diffusion in light wind conditions and when this does not happen, the scale of the rolls collapses to the scale of the grid. In order to believe the model we must take these considerations into account. In order to discount the model, we are unsure what to look for besides indications of "noise" (e.g. features barely resolved on the grid, scales of the rolls being close to 5 times the grid spacing).

The HCRs were present today as per this image from Wichita:

However, just because HCRs were present does not mean I can prove they were instrumental in CI. So when we saw the forecast today for HCRs along the front, and storms developed subsequently, we had some potential evidence. Given the distance from the radar, it may be difficult if not impossible, to prove that HCRs intersected the front, and contributed to CI.

This brings up another major point: In order to really know what happened today we need a lot of observational data. Major field project data. Not just surface data, but soundings, profilers, and low level radar data. On the scale of The Thunderstorm Project, only for numerical weather prediction. How else can we say with any certainty that the features we were using to make our forecast were present and contributing to CI? This is the scope of data collection we would require for months in order to get a sufficient amount of cases to verify the models (state variables and processes such as HCRs). Truly an expensive undertaking, yet one where a number of people could benefit from one data set and the field of NWP could improve tremendously. And lets not forget about forecasters who could benefit from having better models, better understanding, and better tools to help them.

I will update the blog after we verify this case tomorrow morning.

Monday, June 06, 2011

Wrong but verifiable

The fine resolution guidance we are analyzing can get the forecast wrong yet probabilistically verify. It may seem strange but the models do not have to be perfect, they just have to be smooth enough (tuned, bias corrected) to be reliable. The smoothing is done on purpose to account for the fact that the discretized equations can not resolve more than 5-7 times the grid spacing. It is also done because the models have little skill below 10-14 times the grid spacing. As has been explained to me, this is approximately the scale at which the forecasts become statistically reliable. An example forecast of a 10 percent probability, in the reliable sense, will verify 10 percent of the time.

This makes competing with the model tough unless we have skill at deriving not only similar probabilities, but placing those probabilities in close proximity in space-time relative to observations. Re-wording this statement: Draw the radar at forecast hour X probabilistically. If you draw those probabilities to cover a large area you wont necessarily verify. But if you know the number of storms, their intensity, their longevity, and place them close to what was observed you can verify as well as the models. Which means, humans can be just as wrong but still verify their forecast well.

Let us think through drawing the radar. This is exactly what we are trying to do, in a limited sense, in the HWT for the Convection Initiation and Severe Storms Desks over 3 hour periods. The trick is the 3 hour period over which the models and forecasters can effectively smooth their forecasts. We isolate the areas of interest, and try to use the best forecast guidance to come up with a mental model of what is possible and probable. We try to add detail to that area by increasing the probabilities in some areas and removing some for other areas. But we still feel we are ignoring certain details. In CI, we feel like we should be trying to capture episodes. An episode is where CI occurs in close proximity to other CI in a certain time frame presumable because of a similar physical mechanism.

By doing this we are essentially trying to provide context and perspective but also a sense of understanding and anticipation. By knowing the mechanism we hope to either look for that mechanism or symptoms of that mechanism in observations in the hopes of anticipating CI. We also hope to be able to identify failure modes.

In speaking with forecasters for the last few weeks, there is a general feeling that it is very difficult to both accept and reject the model guidance. The models don't have to be perfect in individual fields (correct values or low RMS error) but rather just need to be relatively correct (errors can cancel). How can we realistically predict model success or model failure? Can we predict when forecasters will get this assessment incorrect?

Adding Value

Today the CI (convective initiation) forecast team opted to forecast for northern Nebraska, much of South Dakota, southern and eastern North Dakota, and far west-central Minnesota for the 3-hr window of 21-00 UTC. The general setup was one with an anomalously deep trough ejecting northeast over the intermountain West. Low-level moisture was not all that particularly deep as a strong, blocking ridge had persisted over the southern and eastern United States for much of the past week. With that said, the strength of the ascent associated with the ejecting trough, the presence of a deepening surface low, and a strengthening surface front was such that most numerical models insisted that precipitation would break out across the CI forecast domain. The $64,000 question was, "Where?".

One model in particular, the High-Resolution Rapid Refresh (HRRR) model insisted that robust storm development would occur across central and north-eastern South Dakota during the late afternoon hours. It just so happened that this CI episode fell in outside the CI team's forecast of a "Moderate Risk" of convective initiation. As the CI forecast team poured over more forecast information than any single individual could possibly retain, we could not make sense of the how or why the HRRR model was producing precipitation where it was. The environment would (should?) be characterized by decreasing low-level convergence as the low-level wind fields responded to the strengthening surface low to the west. Furthermore, the surface front (and other boundaries) were well removed from the area. Still, several runs of the HRRR insisted storms would develop there.

It's situations like this where humans can still improve upon storm-scale numerical models. By monitoring observations, and using the most powerful computers in existence (our brains), humans can add value to numerical forecasts. Knowing when to go against a model, or knowing when it is important to worry about the nitty-gritty details of a forecast, are important traits that good forecasters have to have. Numerical forecasts are rapidly approaching the point where on a day-to-day basis, humans are hard pressed to beat them. And, in my opinion, forecasters should not be spending much time trying to determine if the models are wrong by 1 degree Fahrenheit for afternoon high temperatures in the middle of summer in Oklahoma. Even if the human is correct and improves the forecast, was there much value added? Contrast this with a forecaster improving the forecast by 1F when dealing with temperatures around 31-33F and precipitation forecast. In this case the human can add a lot of value to the forecast. Great forecasters know when to to accept numerical guidance, and when there is an opportunity to improve upon it (and then actually improve it). Today, that's just what the CI forecast team did. The HRRR was wrong in it's depiction of thunderstorms developing in northeast South Dakota by 00 UTC (7 PM CDT), and the humans were right...

...and as I write this post at 9:30 PM CDT, a lone supercell moves slowly eastward across northeastern South Dakota. Maybe the HRRR wasn't as wrong as I thought...

About the NSSL Spring Experiment

The NOAA HWT Spring Forecasting Experiment is a yearly experiment that investigates the use of convection-allowing model forecasts as guidance for the prediction of severe convective weather. A variety of model output is examined and evaluated daily during the experiment and experimental severe weather forecasts are created and verified. The variety of model output allows us to explore different types of guidance, including products derived from both ensembles and deterministic forecasts.

The 2018 Spring Forecasting Experiment will be held from April 30th through June 1st in the HWT facility at the National Weather Center in Norman. The Experiment is scheduled to run Monday through Friday from 8am to 4pm. The Experiment will continue the focus on probabilistic forecast generation over shorter time periods than current Storm Prediction Center operational products. A major effort has been made to coordinate convection-allowing ensemble configurations between contributing agencies, resulting in a Community Leveraged Unified Ensemble (CLUE) containing 81 members with 3 km grid spacing. This ensemble will be used heavily in the forecast process and be used in verification exercises to compare different ensemble design strategies. More information about the unique CLUE members can be found in the 2017 Operations Plan.