NSSL/SPC Spring Forecasting Experiment Blog

Wednesday, May 16, 2012

End of week 1

I am really behind on the blog posts. Last week had some challenges especially for severe storms down in south Texas. We have had a few days where the cutoff lows have been approaching south Texas providing sufficient vertical shear and ample moisture and instability. The setup was favorable but our non-met limitation was the border with Mexico. We don't have severe storm reports in Mexico nor do we have radar coverage. Forecasting near a border like this also imposes a spatial specificity problem. In most cases there is room for error, room for uncertainty especially with making longer (16 hr) spatial forecasts of severe weather. On one particular day the ensemble probabilities were split: 1 ensemble in the US with extension into Mexico, 1 ensemble in Mexico with extension into the US, and another further northwest split across the two unevenly into the US.

So the challenge quickly becomes all about specificity ... where do you put the highest probabilities and where are the uncertainties large (i.e. which side of the border). The evolution of convection quickly comes into question also since as you anticipate where the most reports might be (where will storms be when they are in the most favorable environment), you have to also account for if/when storms will grow upscale, how fast that system will move and if it will also be favorable to generate severe weather.

We have discussed this in previous experiments as such: "Can we reliably and accurately draw what the radar will look like in 1, 3, 6, 12, etc hours?". This aspect in particular is what makes high-resolution guidance valuable. It is precisely a tool that offers what the radar will look like. Furthermore, an ensemble of such guidance offers a whole set of "what if" scenarios. The idea is to cover the phase space so that the ensemble has a higher chance of depicting observations. This is why taking the ensemble mean tends to be better (for some variables) than any individual member of an ensemble.

Utilizing all these members of an ensemble can become overwhelming. In order to cope with this onslaught (especially when you 3 ensembles with 27 total members), we create so-called spaghetti diagrams. These typically involve proxy variables for severe storms. By proxy variables I mean model output that can be correlated with the severe phenomenon we are forecasting for. This year we have been looking at simulated Reflectivity, Hourly maximum (HM) storm updrafts, HM updraft helicity, and HM wind speed. Further, given the number of ensembles, we have so-called "buffet diagrams" where each ensemble is color coded but now depicts each and every member. We have also focused heavily on probabilities for each of the periods we have been forecasting for.

In this case all the probabilities are somewhat uncalibrated. Put another way the exact value of the probabilities do not corresponding directly to what we are forecasting for nor have we incorporated how to map them from model world to the real world. In one instance we do have calibrated ensemble guidance but not for the 2 other ensembles. It turns out you need a long data set to perform calibration for rare event forecasting like severe weather.

Lets come back to the forecasts. Given that each ensemble had a different solution it was time to examine if we could discount any of them given what we thought was a likely scenario. We decided to remove one of the ensembles from consideration. The factors that led to this decision were a somewhat displaced area of convection that did not match observations prior to forecast time, and a similar enough evolution of convection. It was decided to put some probabilities in the big bend area of Texas to account for early and ongoing convection. This was a relatively decent decision as it turned out.

This process took about 2 hours and we didn't really dig into the details of any individual model with complete understanding. Such are the operational time constraints. there was much discussion on this day about evolution. Part of the evolution was the upscale growth (which occurred) but also whether that convection produced any severe reports. Since the MCS that formed was almost entirely in Mexico, we won't know whether severe weather was produced. Just another day in the HWT.

Monday, May 07, 2012

Getting started

Today was an interesting day as we had a joint decision to pick our domain of where we would collectively issue our forecasts. It was decided the clean slate CI and severe domain would be in south Texas. According to the models this area would potentially result in multiple types of potential severe weather (outside the stronger flow pulse severe was possible and further north in the frontal zone area, possible behind, where the flow and shear were stronger, more organized threat) as well as multiple triggers for CI (along the cold front moving south, the higher terrain in Mexico into NM, as well as potential along the sea breeze near Houston).

It was increasingly clear that adding value, by moving from course to high temporal resolution, is difficult because of how accurate we are requiring the models to be. The model capability may be good by simulating the correct convective mode and evolution, but getting that result at the right time and in the right place, will still determine the goodness of the forecast. So no matter the kind of spatial or temporal smoothing we apply to derive probabilities we are still at the mercy of the processes in the model that can be early or late and thus displaced, or displaced and increasingly incorrect in timing. This is not new mind you, but it underscores the difference between capability and skill.

In the forecast setting, with operational timeliness requirements, there is little room for only capability. This is not to say that such models don't get used, it just means that they have little utility. The operational forecasters are skilled with available guidance so you can't just put models with unknown skill in their laps and expect it to have immediate high impact (value). The strengths and weaknesses need to be evaluated. We do this in the experiment by relating the subjective impressions of participants to objective skill score measures.

And we do critically evaluate them. But let me be clear. Probabilities never tell the whole story. The model processes can be as important to generating forecaster confidence in model solutions. This is because the details can be used as evidence to support or refute processes that can be observed. Finding clues for CI is rather difficult because the boundary layer is the least well observed. We have surface observations which can be a proxy for boundary layer processes, but not everything that happens in the boundary layer happens at the surface.

A similar situation happens for the severe weather component. We can see storms by interrogating model reflectivity but large reflectivity values are not highly correlated with severe weather. We don't necessarily even know if the rotating storms in the model are surface based which would yield a higher threat than say elevated strong storms. Efforts to use additional fields as conditional proxies with the severe variables are underway. These take time to evaluate and refine before we can incorporate them into probability fields. Again these methods can be used to derive evidence that a particular region is favored or not for severe weather.

Coming back to our forecast for today there was evidence for both elevated storms and surface based organized storms, and evidence to suggest that the cold front may not be the initiator of storms even though it was in close proximity. We will verify our forecasts in the morning and see if we can make some sense out of all the data, in the hopes of finding some semblance of signal that stands out above the noise.

2012 HWT-EFP

Today is the first official day of the Hazardous Weather Testbed Experimental Forecast Programs' Spring Experiment. We will have two official desks this year: Severe and Convection Initiation. Both desks will be exploring the use of high-resolution convection-permitting models in making forecasts which include on the severe side, the total severe storms probabilities of the Day 1 1630 convective outlook and then 3 forecast periods similar to the enhanced thunder (20-00, 00-04, and 04-12 UTC), while on the CI side they will make forecasts of CI and convection coverage for 3 four periods (16-20,20-00, 00-04 UTC).

We have 3 ensembles that will be used heavily: the so-called Storm Scale ensemble of opportunity (SSEO; 7 member including the NSSL-WRF, NMM-B Nest, and the hi-res window runs including 2 time lagged members), AFWA (Air Force 10 member), and SSEF (CAPS 12 member).

We will be updating throughout the week as events unfold (not necessarily in real time) and will try to put together a week in review. Let the forecasting begin.

Thursday, May 03, 2012

Data Assimilation Stats

I am debugging and modifying code to complement some of our data assimilation (DA) evaluations. In recent years efforts have been made to provide higher temporal resolution of composite reflectivity. I wanted to take a more statistical visualization approach to these evaluations. One way to do that is to data mine using object based methods, in this case storm objects. I developed an algorithm to identify storms using composite reflectivity using a double area double threshold method using the typical spread-growth approach. The higher temporal resolution of 15 minutes is good enough to identify what is going on in the beginning of the simulations when one model has DA and the other does not; everything else is held constant.

Among the variables extracted are maximum composite reflectivity, maximum 1km reflectivity, and pixel count for each object at every 15 minute output time. In order to make the upcoming plot more readable I have taken the natural log of the pixel count (so a value around 4 equates to 54 pixels, roughly speaking). The plot is a conditional box plot of ln(pixel count) by model time step with 0 being 0000 UTC and 24 being 0600 UTC. I have used a technique called linked highlighting to show the model run using data assimilation in an overlay (top). Note that the model without DA does not initiate storms until 45 minutes into the simulation (bottom). The take away point here being the scale at which storms are assimilated for this one case (over much of the model domain) at the start time is a median of 4.2 (or roughly > 54 pixels) while when the run without DA initiate storms they are on the low end with a median of 2.6 (13 pixels).

This is one aspect we will be able to explore next week. Once things are working well, we can analyze the skill scores from this object based approach.

Saturday, April 14, 2012

Verification: Updated storm reports

Here are the NSSL(top), SSEO_0000 (middle), and SSEO_1200 (bottom) plots showing the model reports overlaid with the observed reports so far. The SSEO lacks hail and wind. Red dots indicate current observed storm reports. Black contours can be compared directly to he shaded model fields. The ensemble plots have 7 and 4 members, respectively. All go through 1200 UTC tomorrow morning.

UPDATE 1: I have rerun the graphics and they are displayed below.
NSSL-WRF (only looking at the top panel of "test") compares favorably to the observations, at least in this smoothed representation. It does appear to be shifted too far east and south (the slight offset in the outer contours relative to the shading). But it did not capture the concentrated area of tornadoes in central KS. Despite "looking good" I think the skill scores would be somewhat low. I will try to run the numbers this week for all the models displayed so that each individual model can be compared and we can see which one, if any, stood out from the pack.

The SSEO ensembles are below:

High Risk: Uncertainty

Well, the atmosphere is showing her cards slowly but surely. The big questions this morning were initiation and coverage in OK especially along the dryline. Convection allowing model guidance was flip flopping in every way possible. Lets remind ourselves that this is normal. The models only marginally resolve storms let alone the initiation process.

Given the recent outbreaks, convection allowing models have a hard time predicting supercells, especially when they remain discrete supercells (in the observations). Models have all kinds of filters to get rid of computational noise and it is likely partially this noise that contributes to initiation of small storms. This is speculation but it is a good first guess. The evidence comes from monitoring every time step of the model and seeing how long storms last and the one thing that stands out is that small storms happen in the model, remain small, and are thus short-lived. To be maintained, I argue that they must grow to a scale large enough for the model to fully resolve them.

Back to the uncertainty today. Many 0000 UTC hi-res models were not that robust with the dry line storms. And even at 1200 UTC, not that robust except for 1 or 2. Even the SREF that I saw yesterday via Patrick Marsh's weather discussion was a potpourri of model solutions dependent on dynamic core.

So now that the dryline appears to be initiating storms the question is how many. Well given the current observations your guess is as good as mine. A slug of moisture (mid to upper 60's) is sitting in western OK in and around where yesterdays supercells dumped upwards of 2" of rain, while temps warm into the 80's. That is going to mean low LCL heights throughout the state. The dryline itself is just east of Liberal, KS and west of Gage, OK. Good clearing now occurring in western OK though there is touch of cirrus still drifting through. Much of the low cloud has cleared and a cumulus field stretches along the dryline down into Lubbock. Clearly the dryline is capable of initiating storms and the abundant moisture /favorable environment/ is not going to be at issue today.

SSEO

Here is another 24 hour graphic from the SSEO. This shows the probabilities of rotating storms including a spaghetti overlay of all 7 members UH tracks. This will get your attention. Courtesy of Greg Carbin, WCM SPC.

The NSSL-WRF is a part of this ensemble. The general idea I am extracting from this graphic is that there will potentially be multiple supercell corridors (and possibly tornado corridors). The ensemble suggests every major city in the Plains is under threat; Talk about potential hyperbole!

I know of one web page that has some information regarding these members if you want to see more detail from each member:
http://www.erh.noaa.gov/rah/hiresnwp/hi.res.nwp.compare.ncep.nam.nest.18.utc.php

UPDATE 1:
Timing for SSEO UH (remember 7 members):

The ramp up from the hires guidance starts at 2200 UTC indicated by the tallest histogram bar in the time plot. The largest UH values occur in the darker blue to violet shades between 0000-0600 UTC. The threat ramps back up after that too.

UPDATE 2: 4 more members became available from 1200 UTC. The ramp up starts at 19 UTC now. But the dryline remained dry in these runs. That does NOT jive with current observational trends.

About the NSSL Spring Experiment

The NOAA HWT Spring Forecasting Experiment is a yearly experiment that investigates the use of convection-allowing model forecasts as guidance for the prediction of severe convective weather. A variety of model output is examined and evaluated daily during the experiment and experimental severe weather forecasts are created and verified. The variety of model output allows us to explore different types of guidance, including products derived from both ensembles and deterministic forecasts.

The 2018 Spring Forecasting Experiment will be held from April 30th through June 1st in the HWT facility at the National Weather Center in Norman. The Experiment is scheduled to run Monday through Friday from 8am to 4pm. The Experiment will continue the focus on probabilistic forecast generation over shorter time periods than current Storm Prediction Center operational products. A major effort has been made to coordinate convection-allowing ensemble configurations between contributing agencies, resulting in a Community Leveraged Unified Ensemble (CLUE) containing 81 members with 3 km grid spacing. This ensemble will be used heavily in the forecast process and be used in verification exercises to compare different ensemble design strategies. More information about the unique CLUE members can be found in the 2017 Operations Plan.