NSSL/SPC Spring Forecasting Experiment Blog: 05/17/2009

Friday, May 22, 2009

My impressions from the week May 11-15th

Finally got a chance to type some words (I hope not to many) after getting back to the UK.

First I would like to say thank you for a very enjoyable week in which I feel that I've come away learning a great deal about the particular difficulties of predicting severe convection in the United States and gained more insight into the challenges that new storm-permitting models are bringing. I was left with huge admiration for the skill of the SPC forecasters; particularly synthesising such a large volume of diverse information quickly (observational and model) to produce impressive forecasts, and to see how an understanding of the important (and sometimes subtle) atmosphereic processes and conceptual models are used in the decision-making process.
I was also struck by the wealth of storm-permitting numerical models that were available and how remarkably good some of the model forecasts were, and can appreciate the amount of effort that is required to get so many systems and products up, running and visible.

One interesting discussion we touched on briefly was how probabilistic forecasts should be interpreted and verified. The issue raised was whether it is sensible to verify a single probabilistic forecast.
So if the probability of a severe event is say 5% or 15% does that mean the forecast is poor if there were no events recorded inside those contours? It could be argued that if the probabilities are as low as that, then it is not a poor forecast if events are missed, because in a reliability sense we would expect to miss more than we get. But the probabilities given were meant to represent the chance of an event occurring within 25 miles of a location, so if the area is much larger than that it implies much larger probabilities of something occurring somewhere within that area. So it may be justifiable to assess the forecast of whether something occurred within the warning area as if it is almost deterministic, which may be why it seemed intuitively correct to do it like that in the forecast assessments. The problem then is that the verification approach does not really assess what the forecast was trying to convey.
An alternative is to predict the probability of something happening within the area enclosed by a contour (rather than within a radius around points inside the contour), which would then be less ambiguous for the forecast assessment. The problem then is that the probabilities will vary with the size of the area as well as the perceived risk (larger area = larger probability), which means that for contoured probabilities, any inner contours that are supposed to represent a higher risk (15% contour inside 5% contour), can't really represent a higher risk at all if they cover a smaller area (which they invariably will!). So at the end of all this I'm still left pondering.

The practically perfect probability forecasts of updraft helicity were impressive for the forecast periods we looked at. Even the forecasts that weren't so good seemed to appear better when the information was presented in that way and they seemed to enclose the main areas of risk very well.

The character of the 4km and 1km forecasts seemed to be similar to what we have seen in the UK (although we don't get the same severity of storms of course). The 1km could produce a somewhat unrealistic speckling of showers before organising on to larger scales. Some of the speckling was precipitation produced from very shallow convection in the boundary layer below the lid and was spurious. (We've also seen in 1km forecasts what appear to be boundary-layer roles producing rainfall when they shouldn't, but the cloud bands do verify against satellite imagery even though the rainfall in wrong).
The 4km models appeared to have a delay in initiation in weakly forced situations (e.g. 14th May) which wasn't apparent in the strongly forced cases (e.g. 13th May). It appeared to me that the 1km forecasts were more likely to generate bow-echoes that propagate ahead (compared to 4km) and on balance this seemed overdone. There was also an occasion from the previous Friday when the 1km forecast generated a MCV correctly when the other models couldn't, so perhaps it indicates that more organised behaviour is more likely at 1km than 4km - and sometimes this is beneficial and sometimes it is not. It implies that there may be general microphysics or turbulence issues across models that need to be addressed.

It was noticeable that the high-resolution models were not being interpreted literally by the participants - in the sense that it was understood that a particular storm would not occur exactly as predicted and it was the characteristics of the storms (linear or supercell etc) and the area of activity that was deemed most relevant. Having an ensemble helped to emphasise that any single realisation would not be exactly correct. This is reassuring as a danger might be that kilometre-scale models are taken too much a face value because the rainfall looks so realistic (i.e. just like radar).

It seemed to me that the spread of the CAPS 4km ensemble wasn't particularly large for the few cases we looked at - the differences were mostly local variability, probably because there wasn't much variation in the larger-scale forcing. The differences between different models seemed greater on the whole. The members that stood out as being most unlike the rest were the ones that developed a faster-propagating bow-echo. This was also a characteristic of the 1km model and was maybe a benefit of having the 1km model as it did give a different perspective, or added to the confusion, however you look at it! One of the main things that came up was the shear volume of information that was available and the difficulty of mentally assimilating all that information in a short space of time. The ensemble products were found to be useful I thought - particularly for guidance in where to draw the probability lines. However, it was thought too time-consuming to be able investigate the dynamical reasons why some members were doing something and another members something else (although forecaster intuition did go a long way) . Getting the balance between a relevant overview and sufficient detail is tricky I guess and won't be solved overnight. Perhaps 20 members were too many.

The MODE verification system gave intuitively sensible results and definitely works better than traditional point-based measures. It would be very interesting to see how MODE statistics compared with the human assessment of some of the models over the whole experiment.

One of the things I came away with was that the larger-scale (mesoscale rather than local or storm scale) dynamical features appear to have played a dominant role whatever other processes may also be at work. The envelope of activity is mostly down to the location of the fronts and upper-level troughs. If they are wrong then the forecast will be poor whatever the resolution. An ensemble should capture that larger-scale uncertainty.

Thanks once again. Hope the rest of the experiment goes well.
Nigel Roberts

Wednesday, May 20, 2009

Wednesday

Wednesday morning we talked about severe reports. Is "practically perfect" the way to go? How do we deal with people-sparse regions? Can or should we add an uncertainty to the location and time and veracity of each report?

With our current capability, we can't reliably forecast whether a storm will be a wind or hail producer. I think that is what John Hart suggested.

Yesterday, the 0-4Z forecast ensemble was too eager to produce high updraft helicity severe weather for the first 20-0Z forecast period, but the 0-4 Z period was forecast almost perfectly. Ryan said UH is usually better than surface wind and hail (graupel).

The MODE area ratio is not as useful as area "bias". Ratio is small-over-big and doesn't tell you if the forecast is biased high or low.

I summarized bias, GSS, and MODE results for yesterday. For CSI, Radar assimilation jumped out to an early lead, but joined the control run at near-zero after 3 hours. MODE had some spotty matches, but no clear winner.

Dave Ahijevych

Tuesday

We talked about Monday evening weather in Montana and how the forecasts went.

NMM had a high false alarm rate but was the only model to correctly predict the severe storm in ID. ARW missed the storm by 200km to the southeast. Another comparison we always make here at the HWT is the 0Z vs the 12Z runs of thhe NMM. For this day, I think the 12Z NMM was much better than the 0Z. Steve thought it was "somewhat" better. The 12Z run had less false alarms in central MT and captured the ID storm area better.

The probability matched mean is an interesting way of summarizing the ensemble of forecasts. It has the spread of the ensemble, but the sharpness of an individual run. I'm not sure but I was told to check out Ebert/McBride for a reference on this.

The Monday forecast was pretty insignificant, so we also talked about the high-end derecho on Friday May 8. There were some differences between the 1 and 4-km CAPS solutions of this event. (I forget what they were).

David Ahijevych

monday

Here's my summary of yesterday's (Monday's) activity at the HWT.

This is a very quiet week for mid-May.

The DTC has been well-received. I presented Jamie's verification .ppt and gave it away to several interested people. The need for objective verification is great. There so many models and little time to analyze everything after the fact. Mike Coniglio led two discussions of Friday's MODE verification output. The CAPS model without radar data assimilation lagged behind the CAPS model with radar data assimilation. MMI was similar for the two models, but the MODE centroid distance was a distinguishing factor. The model that lagged behind had greater centroid distance. This wouldn't have been possible to quantify with conventional verification metrics.

We also subjectively evalutated the Friday storms over the centeral U.S. The 0Z NMM had a false alarm storm in the morning that disrupted the afternoon forecast. The simulated squall line was much weaker than the observation. This was not as much of a problem with the NSSL model. The 12Z NMM was not a whole lot better with convective mode and individual storm evolution, but its 0-2 h and 6-12 h forecasts had better storm placement than the older 0Z NMM.

As an aside, ARW runs with Thompson microphysics have less intense simulated radar reflectivity than observed.

For Monday afternoon and evening's severe weather forecast, we chose Billings, MT as the center point. It was the only place to have the possibility of severe weather. We broke up into 2 teams and came up with a less than 5% chance. Two actual reports were northwest of our predicted zone in northern Idaho. Radar indicated some small storms in our predicted zone.

Dave Ahijevych

Model Evaluation Tools

I would like to thank all of the HWT personnel for a fun and interesting week - May 10-15. The experience was well worth it. How quickly I (being in the research community) have lost touch with the daily challenges that an operational forecaster faces. It was good to get back to those roots with a little hand analysis of maps!

I would like to thank you for engaging with the DTC and helping us to evaluate MET/MODE during the Spring Experiment. It is great to have eyes looking at this on a daily basis to give us some good feedback on how the tools are performing. It seemed that while I was there the participants were encouraged by the performance of MODE and its ability to capture objectively what forecasters felt subjectively. This is a great first step towards more meaningful forecast evaluations which we hope, ultimately, feedback to improve overall forecasts by removing systematic biases.

Please feel free to visit the DTC's HWT page at: http://www.dtcenter.org/plots/hwt/

You were all great hosts. Thanks again!

Posted by Jamie W.

Monday, May 18, 2009

Recap of Week 2 from a Forecaster's Perspective

After spending a week at the HWT, I must say I'm encouraged to see how far the NWP world has come in recent years. For instance, in an effort to keep my mind occupied on my flight to Norman last week, I thought it would be neat to read a paper produced by the SPC on the Super Tornado Outbreak of 1974. If I remember correctly, the old LFM model had a model grid spacing of 190.5 km! After reading this and then coming to the HWT as seeing model output on a scale as low as 1 km is absolutely amazing in my opinion. This is a testament to all the model developers out there who work diligently on a daily basis to produce better models for forecasters in the field. If nothing less, the HWT opportunity made me realize and appreciate the efforts of the model developers more so than I had ever done previously.

Although these models can provide increased guidance for basic severe wx guidance, such as convective mode and intensity, the models only show output (simulated refl, updraft helicity, etc.) on a very small scale. If taken at face value, critical forecasting decisions can be made without having an adequate handle on the overall synoptic and mesoscale pattern. Thus, even with all the high resolution model output, one must still interrogate the atmosphere utilizing a forecast funnel methodology in an effort to develop a convective mode framework to work from. Sadly, if high resolution model output is taken at face value without any 'behind the scenes" work beforehand, I can see many blown/missed forecasts as forecasters would be forecasting "blind." Many factors must be taken into account when developing a convective forecast and unfortunately just looking at the new high res model output will likely lead to more questions than answers. In order to answer these questions, a detailed analysis done prior can allow one to see why a particular model may be producing one thing as opposed to the other. Looking back at some of the old severe wx forecasting handbooks, one thing remains clear, much can be gained on the developing synoptic/mesoscale patterns through pattern recognition. Some of the old bow echo/derecho papers (Johns and Hirt, 1987) and a whole list of others have reiterated the fact that much can be gained by recognizing the overall synoptic pattern. How many times last week were the models producing a bow type signature during the overnight hours? Situations like these commonly need deep vertical shear and unfortunately not much shear was available for organized cold pools when the H50 flow was only 5-10 knots. This is just one instance where having a good conceptual model in the back of your mind can assist in the forecasting process.

As for the models, more often than not, I was pleased by the 4-km AFWA runs. For the activity that developed on the Tue (05/12), the 00/12 UTC AFWA runs had better handle the low-level moisture intrusion up the Palo Duro Canyon just SE of AMA. A supercell resulted which led to several wind/hail reports. A look back at the Practically Perfect Forecast based on updraft helicity the following day had a bullseye centered over the area based on the AFWA output. This is more than likely a testament to different initial conditions as the AFWA utilizes the NASA LIS data. This can pay huge dividends for offices along the TX Caprock where these low-level moisture intrusions have been documented to assist in tornadogenesis across the canyon locations along with a backed wind profile (meso-low formation).

Posted by Chris G.

Sunday, May 17, 2009

Spring Experiment Week 3 Participants

The Spring Experiment organizers would like to welcome the following participants to Week 3 of the 2009 NSSL/SPC Spring Experiment:

Dave Ahijevich (NCAR/DTC, Boulder, CO)
Lance Bosart and Tom Galarneau (University at Albany-SUNY)
Geoff Manikin (NOAA/NWS/NCEP EMC, Camp Springs, MD)
Morris Weisman (NCAR, Boulder CO)
Jon Zeitler (NOAA/NWS San Antonio/Austin, TX)
Jack Hales (NOAA/NWS/NCEP SPC)
John Hart (NOAA/NWS/NCEP SPC)
Jon Racy (NOAA/NWS/NCEP SPC)

About the NSSL Spring Experiment

The NOAA HWT Spring Forecasting Experiment is a yearly experiment that investigates the use of convection-allowing model forecasts as guidance for the prediction of severe convective weather. A variety of model output is examined and evaluated daily during the experiment and experimental severe weather forecasts are created and verified. The variety of model output allows us to explore different types of guidance, including products derived from both ensembles and deterministic forecasts.

The 2018 Spring Forecasting Experiment will be held from April 30th through June 1st in the HWT facility at the National Weather Center in Norman. The Experiment is scheduled to run Monday through Friday from 8am to 4pm. The Experiment will continue the focus on probabilistic forecast generation over shorter time periods than current Storm Prediction Center operational products. A major effort has been made to coordinate convection-allowing ensemble configurations between contributing agencies, resulting in a Community Leveraged Unified Ensemble (CLUE) containing 81 members with 3 km grid spacing. This ensemble will be used heavily in the forecast process and be used in verification exercises to compare different ensemble design strategies. More information about the unique CLUE members can be found in the 2017 Operations Plan.