It's been over a month since I completed most of the following analysis, which I've been planning to write up more concretely. This post is intended as a summary of the basic questions I've been trying to address, although there are some mathematical issues I've worked out that I won't cover here. First, a few basic facts and some terminology regarding Earth's atmosphere. If you're already familiar with lapse rate discussions you can skip the next few paragraphs.
Almost everywhere, as you go up from the surface, the temperature of the air decreases. Rare occasions when this is not true constitute temperature inversions. However, this familiar situation is actually an artifact of the way things work here on Earth's surface - there is nothing intrinsic in atmospheres that forces temperature to increase or decrease with altitude. Temperature starts to increase again once you go above around 10 miles altitude. The lower portion of the atmosphere, where the temperature declines with altitude, is called the troposphere. The altitude where temperature decline switches to increase roughly defines the tropopause, and the upper region where temperature increases again is the stratosphere.
Another demonstration that there's nothing intrinsically fundamental about temperatures declining with altitude in an atmosphere-like fluid is the behavior of Earth's oceans. Thanks to the thermohaline circulation, the deep ocean is much colder than the surface almost everywhere (effectively the deep oceans are most closely coupled to the polar surface waters, rather than to the equatorial ones), so for the oceans unlike the atmosphere, temperature rises as you go up. On the other hand, under solid ground temperature goes up as you go deeper thanks to Earth's radioactively heated interior, i.e. temperature declines with height. So we have a quite a mix of temperature gradients with altitude in our near vicinity - under ground (negative), under water (positive), in the troposphere (negative), and the stratosphere (positive). The explanation for each particular value depends on the details of heat flow in each case.
The decline in temperature with altitude in the troposphere is known as the "lapse rate". The heat flows responsible for Earth's tropospheric lapse rate come from the heating of Earth's surface by the sun (to which the atmosphere is largely transparent), the flow of this heat into the atmosphere, and then the subsequent radiation from the atmosphere into deep space. The stratosphere is above most of these heat flows, and is heated more directly by what it absorbs from the sun, allowing it to be warmer.
The lapse rate in the atmosphere is limited by stability considerations, due to the relationship between density, pressure, and temperature in a gas, and the decline in pressure with height caused by gravity. If the temperature drops too quickly with altitude you effectively have dense gas sitting on top of lighter gas, which quickly leads to convective turnover putting the colder, denser air on the bottom, and making the average temperature gradient more positive than you started with. Any lapse rate larger than a certain limit is unstable in this fashion. This limit is called the "adiabatic lapse rate", the temperature decrease you would get just by lowering the pressure on a parcel of air (with no heat flow in our out) in the same way pressure naturally decreases with altitude.
For air containing water vapor (or any other gas that might condense out and release latent heat) the lapse rate limit is lower once you reach an altitude where the temperature is cold enough for some of that vapor to condense. The reason is that when you lower the pressure for (and cool) a given parcel of "moist" air, some of the water vapor turns back to liquid water and releases its latent heat content, and that heat means the final temperature of the air parcel is warmer, and the temperature higher, than under "dry" conditions. This "moist adiabatic lapse rate" is itself temperature-dependent since at warmer temperatures more water vapor condenses out for any given temperature drop, giving more warming. So in particular in Earth's tropics, where moist air abounds, this lower "moist adiabatic" limit largely prevails.
For Earth's atmosphere the "dry" adiabatic limit is 9.8 K/km, meaning that when there's no water vapor condensing out, temperatures can decrease by up to 9.8 degrees with each kilometer increase in altitude. The "moist" adiabatic limit depends on the specific humidity - this graph shows some of the details. For low specific humidity (over -20 C ice, say) it can come close to the dry 9.8 K/km. For saturated air at 35 degrees C, the moist adiabatic limit is about three times lower.
This temperature dependence of the moist adiabatic limit has a simple implication: any increase in surface temperature for "moist" regions of the planet will result in an even greater increase in temperature at any altitude that is currently close to the moist adiabatic curve, because that moist curve gets steeper as the surface temperature rises. Correspondingly, any decrease in surface temperature will result in a greater decrease in the temperature a few km up, because the moist curve flattens as surface temperature decreases. That is, any change (up or down) of surface temperature in Earth's tropical regions should be amplified - multiplied by a factor greater than one - in the middle of the troposphere.
There have been several papers looking into this question of tropical amplification, comparing observations with the expectation of amplified warming with altitude. RealClimate covered the issue in some detail back in 2007, and looked at it again last fall describing their more formal response to a paper by Douglass that claimed the amplification was absent in observations. Of course, since it's a natural feature of the atmosphere's handling of latent heat flow, models consistently show this amplification (whatever the cause of warming or cooling), and the latest IPCC report also compared the expected amplification with observations in section 126.96.36.199 (of IPCC AR4 WG1).
That same report also included several figures of temperature changes associated with model simulations, including Figure 9.1. This figure plotted modeled temperature changes for the 20th century with latitude and altitude relative to different realistic "forcings". The pattern of tropical tropospheric amplification (for all causes of warming and cooling) was evident in each case, though clearest for the forcing with the largest effects, from greenhouse gases. The associated text discussed the patterns (as well as the time-dependence of changes) as characteristic of the different forcings, allowing for attribution of temperature changes based on observations. In particular, the text pointed out the cooling of the stratosphere, a key part of the greenhouse-gas pattern, matches observations and is an important piece of evidence implicating greenhouse gases in 20th century warming.
It wasn't too long before this section on characteristic patterns or fingerprints and figure 9.1 was severely misinterpreted by none other than Viscount Monckton. In what appears to be the first use of the term in the fall of 2007, Monckton claimed that the models predicted a "hot spot" that was a unique fingerprint of greenhouse-gas warming, and not present in observations. Monckton's misunderstanding somehow resonated with a few of the usual bloggers, and the claim was echoed back and forth in a way that made mutual understanding very difficult to reach. To be clear, the term "hot spot" is never used by the IPCC, and I don't believe was used by any of the scientists commenting on the issue prior to Monckton's misinterpretation. It's a misnomer in the first place, since the region claimed to be "hot", is in the coldest region of the atmosphere - the issue is the rate of change of temperature in response to forcings, not absolute value. More importantly, if Monckton's "hot spot" is interpreted as the same as the tropical tropospheric amplification issue discussed by Douglass and Santer and the IPCC in 188.8.131.52, it has absolutely nothing to do with greenhouse gas forcing, and is the same lapse rate change issue I've already described.
In any case, the big debate at Lucia's led to an interesting comment from Willis Eschenbach - search for comment #7672 on that page. Willis claimed he had done the following analysis:
Amplification at timescales from 1 to 340 months. Amplification is calculated as the tropospheric trend over the time period divided by the surface trend over the same period. Trends calculated by linear regression, 95% CI = 1.96 * std. error of trend. Data from UAH MSU and Had
with the following result:
and then, adding RSS as well as UAH he found the following very similar picture:
Shortly after this, Willis posted a more formal version of his results in this thread at ClimateAudit, with the rather more stable-looking comparison of many different satellite and surface observations here:
This time-dependent look at the amplification question seemed a rather interesting approach, but it wasn't clear to me exactly what Willis Eschenbach was showing in these graphs, what real error bars might be (correlations inevitably distort the standard deviation numbers) and whether this time-dependent metric really provided a better estimate in some way of amplification than the more traditional estimates based on the full time-series looking at trend or standard-deviation ratios.
Comparing satellite measurements with surface measurements makes some sense. The various satellite temperature time-series (UAH T2LT, UAH T2, RSS TLT and RSS TMT) consist of observations of emitted radiation from the atmosphere, with layers of the atmosphere identified through pressure-broadening of emission lines. Atmospheric radiative properties can then be used to turn the observed radiation into temperature estimates for broadly-defined layers of the atmosphere. In the case of the above-listed datasets, the layers correspond to the lower (T2LT/TLT) and mid (T2/TMT) troposphere, just the atmospheric regions (in the tropics, at least) where we expect to see this amplification.
Eschenbach used the Hadley center's surface temperature data series for comparison, but they are all in reasonable agreement so it shouldn't make much difference. I downloaded the Hadley tropical and global average temperature series, to run the comparisons. For tropical tropospheric temperatures I used the "Trpcs" series from the UAH dataset, and the "-20.0/20.0" series from RSS; for global temperatures I just used their respective global average temperatures. Eschenbach's tropical data was obtained by a stricter cut via the Climate Explorer website, as he explained.
Eschenbach's metric then consists of looking at sequences of months of corresponding data in the satellite and surface data. For any given number of months he takes all possible sequences of that length in the datasets. For each such sequence he finds the ordinary least-squares slope of the graph of Tsatellite vs Tsurface. In other words, time-period doesn't enter into Eschenbach's metric after a set of temperature-pairs has been selected. Then, across all sequences of the same length, Eschenbach averages that slope (and computes the standard deviation), which is what is plotted above.
To show I've captured Eschenbach's metric and that the above graphs are relatively stable under minor changes in datasets such as just described, the following is my plot of the UAH and RSS vs Hadley comparison in a replication of his method:
One more common metric for amplification has been between the actual time-dependent trend in one temperature series and that in the other, i.e. finding the ratio between dT_satellite/dt and dT_surface/dt where those slopes are independently determined for each time series. The same short-sequence approach Eschenbach uses can be applied in this case - here by calculating the trend with time for the surface and satellite data separately, taking the ratio, and then averaging over all sequences of the same number of months.
Comparison of this "raw" trend ratio with Eschenbach's metric for these same datasets shows significant differences, for example for UAH T2LT (lower troposphere):
for UAH T2 (mid-troposphere):
while for RSS TLT (lower troposphere) the two seem a little closer:
The wild variations for short series are largely because the surface temperature series often include ones where the trend is very close to zero, thus leading to spuriously large (positive or negative) ratios which distort the averages. However the short-term patterns generally are showing large positive amplifications more or less in line with the Eschenbach numbers. The real discrepancy is for long time series where things smooth out; unfortunately the data sets also become less independent (obviously there is only one long time series covering the roughly 30 years of satellite records) and it's not clear how much simple measurement errors might be influencing the different results.
To get a better feel for the random measurement error issue I created a collection of artificial "surface" and "troposphere" datasets by running Tamino's ARMA process to generate an underlying "real" temperature series, multiplying that "real" trend by an amplification factor to get a "real troposphere" series, and then adding random measurement noise to both series. The results for several simulated series of this sort are in this plot:
There are several interesting things evident here. First, the "raw trend ratio" metric for the simulated data comes very close to getting the right amplification factor once the short-time-series oscillations have stabilized. Second, Willis Eschenbach's metric systematically underestimates the amplification factor at short time-scales by an additive factor of -0.3 to -0.5 (for the various parameters I used), and gradually trends toward the right factor as you get to longer and longer time-series.
But third, the long-time-scale behavior of this simulated dataset is very different from the long-time-scale picture for the observational series - in particular, the "raw trend ratio" metric is higher than the Eschenbach metric in the simulated case, but considerably lower in the observational case (and dramatically so for the UAH data). Why?
To help better understand some of these differences and questions I decided to investigate a third potential metric, with aspects resembling both of the other two, but with I believe somewhat simpler analytical properties. This metric is obtained by taking all possible pairs of months in the data record, recording the surface temperature difference between those two points, and taking the ratio of the troposphere temperature difference to this surface temperature difference, plotted as a function of that surface temperature difference.
That is, what we're looking at in what follows is, for any pair of dates (1) and (2), the ratio:
(T_satellite(2) - T_satellite(1))/(T_surface(2) - T_surface(1))
plotted as a function of T_surface(2) - T_surface(1) (where (1) and (2) are switched if necessary to make this difference positive). Technically, the data are "binned" into increments of the surface temperature change, Delta T_surf, so that what is plotted is the average of the above ratio for Delta T_surf between two limits (say 0.40 to 0.41). In addition since we're looking at relatively independent pairs of points, the standard deviation among the binned quantities should be relatively robust.
Looking at this first with simulated data, which had a maximum surface temperature delta of a bit over 1 degree C, this new metric, like Eschenbach's, quickly settles to a value about 0.3 below the actual amplification ratio:
but the actual value remains close to or within one standard deviation in the ratio (the plotted error bars are for one s.d.). The "sigma" value of 0.1 in this plot indicates the white noise measurement error added to the simulated surface and satellite temperature series had 0.1 degrees C standard deviation.
Adjusting the amplification and "sigma" ratio and running several examples gives a more complete picture of how this third metric behaves:
For a smaller measurement error (sigma = 0.05) it is evident this metric again starts out about 0.2 C low, but is gradually trending toward the correct amplification factor of 2.0 (top curve). The next two curves (sigma = 0.1) are two realizations which start out together about 0.4 below the correct amplification, and diverge from one another for the largest surface temperature differences, thanks to lower statistics for large differences. The fourth curve (amplification 1.5, sigma 0.1) starts out about 0.3 low and doesn't noticeably converge. The last (amplification 1.0, sigma 0.1) starts out about 0.2 low and again doesn't noticeably converge.
Very roughly, then, the degree to which this metric is on the low side in the mid-range is approximately given by twice the product of the amplification and the measurement error (sigma). There's a bunch more analysis you can do to back this up, but it gives you a rough idea of what the issues are.
The measured standard deviations were not shown on the immediately preceding curve, but behave quite nicely when inverted and plotted against the change in surface temperature:
i.e. the standard deviation is very close to linearly proportional to the inverse of the measured surface temperature difference, and also proportional to that same product of amplification by sigma.
So do we see this sort of behavior in the observed data? Let's take a look first at the RSS lower troposphere data:
One concern here was whether taking arbitrary pairs of months would result in some satellite/surface discrepancies because each month's "anomaly" in these data series is calculated relative to a baseline specific for that month; differences in inter-month baselining between the satellite and surface temperature series might cause spurious temperature differences unrelated to the amplification we're trying to get at. To that end, the black curve in the above figure looked only at pairs of months that compared the same month in a year (i.e. dates an integer number of years apart), while the red curve did the same analysis looking at all possible pairs of points. Since there's considerably less data when restricting to same-month pairs, the average deviated up and down quite a bit, but clearly kept to almost exactly the same pattern as the all-pairs curve, so this concern doesn't seem to be a real issue.
The measured standard deviations in these ratios might also be affected by sampling process - to check for this the green curve here used 4-times lower resolution in "binning" the data to get averages and standard deviations, so each analysis point had 4 times as much data to look at, on average. Nevertheless, the green, red, and black curves and their standard deviations essentially fall on top of each other up to the point (delta T_surf around 0.85) where the number of samples started to become too limited to calculate standard deviations at all. The inverse standard deviation for these three analysis approaches is here:
Rather than continuing to decrease with the inverse of the surface temperature series, in all three analyses the standard deviation remains larger than expected from the simulated data from the point where the surface temperature difference is 0.5 C on up. In any case, the consistency between these curves appears to show robust properties of the surface and satellite time series.
Applying the same pair ratio metric analysis to the other observational datasets looked at above, we get the following:
Now this is rather striking. In contrast to the Eschenbach and "raw trend ratio" metrics looked at above, which for longer time series appeared to see a drop in the amplification ratio, here the amplification ratio consistently rises for all the tropical data series, and in fact they become closer and closer to one another for the largest values of surface temperature delta. Over long enough time periods under conditions of a generally rising surface temperature trend, all of these metrics should agree with one another under the long-time/large-temperature change conditions.
The largest difference between pairs of monthly anomalies for the Hadley tropical surface data occurs between the peak of 0.798 in February 1998 and the low of -0.172 in January 1989. This 0.97 degree difference occurred over less than 10 years, and so issues of satellite calibration or drift in long-term measurements are perhaps less of a concern than for the full 30-year record. And remarkably, all 4 tropical satellite datasets (lower- and mid-troposphere for UAH and RSS) show almost identical differences corresponding to this same pair of dates:
January 1989 to February 1998:
so dividing by the 0.97 surface temperature difference gives an amplification of about 1.85.
Note also that there is essentially no amplification of the global temperatures in this graph - the lower troposphere ratios according to the pair ratio metric are close to 1.0, and the mid-troposphere numbers show a ratio less than 1. So the amplification (up to a factor of close to 2 for the largest temperature changes) is strictly a tropical effect.
We can look at the inverse standard deviation chart for all these curves to see how it matches the way the simulated data behaved:
The global curves have the lowest standard deviations (and so highest inverse standard deviations) and appear to follow the linear pattern found in the simulated data. The tropical curves, however, all show roughly the same flattening of standard deviation at a surface-temperature delta of about 0.5 C we saw for RSS TLT. Note that RSS TLT has the best (lowest standard deviation, highest curve in this inverse picture) behavior and thus from the simulated data analysis is most likely to have it's metric value closest to the true amplification ratio. This strongly suggests the true amplification ratio for the lower troposphere represented by TLT/T2LT should be at least 1.5, and possibly more.
A more rigorous analysis of the best figure for amplification from this data is underway; there is still a need to explain the flattening in the standard deviation values and the differences in behavior of the Eschenbach and raw temperature ratio curves between the observations and the simulated data. So, a bit more to do on all this...
Anyway, anyone claiming that the satellite data shows "no tropical troposphere amplification" has just not looked at the data closely enough - it seems very clearly to be there, according to the amplification metric presented here.