Monthly zonal grids | Time series | Diagnostics 1 | Diagnostics 2 | Validation

**Monthly mean**

This plot shows weighted averages of the RO data within 5-degree latitude bins at a 200-meter height grid. The
averaging includes all data from the selected RO mission that passed the QC screening.

**Monthly standard deviation**

This plot shows the standard deviations of the RO data within 5-degree latitude bins at a 200-meter
height grid. The standard deviation includes all data from the selected RO mission that passed the QC screening.

The standard deviation is dominated by the variability of the atmosphere, but also contains a small component
due to the *observational errors* of the underlying profile data. We can estimate the uncertainty of the
monthly mean by the *standard error*:
$$\sigma_m \approx \sigma_m^{se} = \frac{s}{\sqrt{N}}$$
where $\sigma_m^{se}$ is the standard error of the mean, $s$ is the observed standard deviation, and $N$ is the
monthly data number.

**Observational error**

This plot shows the observational errors of the monthly means. They are computed from the assumed
observational errors of the individual profiles, and scales inversely with the square root of the monthly data
number.

The total error of the mean (i.e the difference between the observed mean and the true mean) is assumed to
be caused by two effects. First, each measurement has a random *observational error*, or *measurement
error*, associated with it. This error can only be described in terms of a statistical uncertainty. Secondly, the
finite number of measurements is not able to fully account for all variability within the monthly bin, resulting in a
*sampling error*. Unlike the observational errors, the actually realized sampling errors can be estimated,
and eventually subtracted.

**Sampling error**

This plot shows an estimate of the error due to under-sampling of the atmospheric variability. We sample an
ECMWF operational analysis field (at a 2.5x2.5 degree horizontal resolution -- roughly corresponding to the
resolution of the RO measurements) at the same time and locations as the actual observations, and compute
the error made by averaging the sampled values rather than the whole field. This error gives us an
estimate of the sampling error.

The total error of the mean (i.e the difference between the observed mean and the true mean) is assumed to
be caused by two effects. First, each measurement has a random *observational error*, or *measurement
error*, associated with it. This error can only be described in terms of a statistical uncertainty. Secondly, the
finite number of measurements is not able to fully account for all variability within the monthly bin, resulting in a
*sampling error*. Unlike the observational errors, the actually realized sampling errors can be estimated,
and eventually subtracted.

**Data number**

The *data number* plots show the number of data available for processing before the QC screening.
There is a variation with latitude due to the satellite orbits and the particular scan mode of the RO technique.
The number of data also decreases at low altitudes as a consequence of the gradual loss
of signal in the moist troposphere.

**Rel. data number**

The *relative data number* plots show the number of data available for processing before the QC screening,
relative to the number of data at high altitudes. The plots demonstrate the gradual loss of
signal as the radio waves prober deeper into the atmosphere.

**A priori fraction**

The *a priori fraction* plots show estimates of the fraction of a priori information that enter the
monthly mean climate data. This a priori information is taken from a background model, e.g. ECMWF, or from
some sort of climatology, e.g. MSIS-90.

**Mean ecmwf@obs**

This plot shows weighted averages within 5-degree latitude bins at a 200-meter height grid generated from ECMWF
data co-located (in latitude, longitude, and time) with the observed data.

**Stdev ecmwf@obs**

This plot shows the standard deviation of the data within 5-degree latitude bins at a
200-meter height grid generated from ECMWF data co-located (in latitude, longitude, and time) with the observed
data. Like the observed standard deviations, this plot demonstrates the degree of variability of the atmosphere.

**(O-B)**

This plot shows the differences between zonal monthly means computed from RO data and
from ECMWF short-term forecast data. The ECMWF monthly means were computed from model data co-located (in latitude,
longitude, and time) with the observed data.

**(O-B)/B**

This plot shows the relative differences between zonal monthly means computed from RO data and
from ECMWF short-term forecast data. The ECMWF monthly means were computed from model data co-located (in latitude,
longitude, and time) with the observed data.

The climate data sets are fundamentally 3-dimensional: time series of zonal monthly means on a 2D
latitude-height grid
$$f_{ijm} = f({\phi}_i,h_j,m)$$
where $f$ is a climate variable (refractivity, temperature, etc.), indices $i$ and $j$ denote the latitude and height bins,
and $m$ denotes the time (number of months).

We define the * long-term climate mean* as the mean over the time dimension for the
full length of the time series
$$f_{ij}^{\rm C} = f^C({\phi}_i,h_j) = \frac{1}{M}\sum\limits_{m=1}^{M} f_{ijm}$$
where $M$ is the number of months in the climate data record. The long-term climate mean
is used as a reference for constructing

We define a

Based on these quantities, we define the

Similarly, we define the

The anomaly fields are still 3D as they depend on latitude, altitude, and time. For plotting, the number of dimensions needs to be reduced. This is done by averaging over a set of

**Lat mean**

These * 2D time-height plots* show monthly anomalies at a 200-meter height grid as a
function of time. The anomalies are based on long-term means computed from the full length
of the time series (see the details below). The latitudinal averaging is done using a simple cosine
weighting of the fundamental 5-degree latitude bins.
The plots are dominated by the seasonal cycle. In the tropics, were the seasonal cycle is weaker, influence
from the

The climate data sets are fundamentally 3-dimensional: time series of zonal monthly means on a 2D latitude-height grid $$f_{ijm} = f({\phi}_i,h_j,m)$$ where $f$ is a climate variable (refractivity, temperature, etc.), indices $i$ and $j$ denote the latitude and height bins, and $m$ denotes the time (number of months).

We define the

We define a

Based on these quantities, we define the

Similarly, we define the

The anomaly fields are still 3D as they depend on latitude, altitude, and time. For plotting, the number of dimensions needs to be reduced. This is done by averaging over a set of

**Lat mean (de-season)**

These * 2D time-height plots* show monthly de-seasonalized anomalies in various latitude
zones at a 200-meter height grid as a function of time. The anomalies are based on a mean annual cycle
computed from the full length of the time series (see the details below). The latitudinal averaging is done
using a simple cosine weighting of the fundamental 5-degree latitude bins. Since the dominating seasonal
cycle has been removed, we can see other structures in the climate data more clearly: trends and periodic
phenomena, like the

The climate data sets are fundamentally 3-dimensional: time series of zonal monthly means on a 2D latitude-height grid $$f_{ijm} = f({\phi}_i,h_j,m)$$ where $f$ is a climate variable (refractivity, temperature, etc.), indices $i$ and $j$ denote the latitude and height bins, and $m$ denotes the time (number of months).

We define the

We define a

Based on these quantities, we define the

Similarly, we define the

The anomaly fields are still 3D as they depend on latitude, altitude, and time. For plotting, the number of dimensions needs to be reduced. This is done by averaging over a set of

**Lat/layer mean**

These * time series plots* show monthly anomalies in various
latitude zones and height layers as a function of time. The latitudinal averaging is done using a simple cosine
weighting of the fundamental 5-degree latitude bins. The plots are dominated by the seasonal cycle. In the tropics,
were the seasonal cycle is weaker, influence from the

The climate data sets are fundamentally 3-dimensional: time series of zonal monthly means on a 2D latitude-height grid $$f_{ijm} = f({\phi}_i,h_j,m)$$ where $f$ is a climate variable (refractivity, temperature, etc.), indices $i$ and $j$ denote the latitude and height bins, and $m$ denotes the time (number of months).

We define the

We also define a

Based on these quantities, we define the

Similarly, we define the

**Lat/layer mean (de-season)**

These * time series plots* show monthly de-seasonalized anomalies in various latitude
zones and height layers as a function of time. The latitudinal averaging is done using a simple cosine weighting
of the fundamental 5-degree latitude bins. The dominating seasonal cycle has been removed, which allow us to
see other structures in the climate data: trends and periodic phenomena, like the

The climate data sets are fundamentally 3-dimensional: time series of zonal monthly means on a 2D latitude-height grid $$f_{ijm} = f({\phi}_i,h_j,m)$$ where $f$ is a climate variable (refractivity, temperature, etc.), indices $i$ and $j$ denote the latitude and height bins, and $m$ denotes the time (number of months).

We define the

We define a

Based on these quantities, we define the

Similarly, we define the

This page presents a range of diagnostic information related to sampling distributions, quality of the data, quality control (QC), and data numbers. The information is presented on a monthly basis, and some of the plots also show data broken down to daily numbers.

**File numbers**

The * file numbers* plots show the daily number of files available at different steps in the ROM SAF

processing:

•

•

•

The

files used as input to the processing.

The

The

**Data numbers**

The * data numbers* plot shows the daily number of input files to the ROM SAF processing chain.

**Data numbers & sampling**

The * distrib* plots show the distribution of occultations over latitude, longitude,
and local time. In all four plots, the data have been divided into 36 bins. There are two latitude
plots - one for equal-angle bins and one for equal-area bins. All plots are based on data numbers
before the QC screening, and before the gradual loss of occultations in the troposphere.

The

**QC screening**

The * number of occs passing QC* shows how the data numbers are reduced by the
quality screening procedure:

•

•

•

•

•

•

The

The

**QC screening**

The plot * QC impact on O-B stat* shows the impacts of the QC screening procedure on
the refractivity O-B statistics.

**Diagnostics, bending angle**

The * 60-80 km noise floor* is the smallest standard deviation of the raw LC bending
angle (with respect to a fitted background bending angle)
over a 7.5 km height layer (roughly one scale height) in the 60-80 km interval. This quantity can
be regarded as an estimate of the underlying instrumental noise level for an occultation. A
variation of the noise floor distribution with time may indicate time varying instrumental effects.

The

**Diagnostics, BA optimization**

The * 60-80 km bias* is the mean difference between the raw LC bending angle and
a fitted background bending angle in the 60-80 km interval.

The

This page presents a range of diagnostic information related to quality of the data, quality control (QC), and data numbers. The information is mainly presented as time series of monthly data for the full length of the data record.

**File numbers**

The * file numbers* plots show the mean daily file numbers available at different steps in the

ROM SAF processing.

•

•

•

The

of

The

a certain starting date.

**Data numbers**

The * data numbers* plot shows the (monthly mean) daily number of excess-phase data files.

The

**QC screening**

The * number of occs passing QC* shows how the data numbers are reduced by the
quality screening procedure:

•

•

•

•

•

•

The

The

**Diagnostics, bending angle**

The * 60-80 km noise floor* is the smallest standard deviation of the raw LC bending
angle (with respect to a fitted MSIS bending angle)
over a 7.5 km height layer (roughly one scale height) in the 60-80 km interval. This quantity can
be regarded as an estimate of the underlying instrumental noise level for an occultation. A
variation of the noise floor distribution with time may indicate time varying instrumental effects.

The

**Diagnostics, BA optimization**

The * 60-80 km stdev* is the standard deviation of the neutral bending angle (with
respect to a fitted MSIS bending angle) in the 60-80 km interval.

The

This page is mainly intended for internal data quality monitoring relative to the requirements defined by project documents. It may be of a broader interest but requires an understanding of the quantities being monitored, and the requirements used.

**Compliance with PRD**

For each climate variable and month, and within specified latitudinal zones and height
layers, we can plot the distribution of absolute deviations from ERA-Interim.
The median and the 60%, 70%, and 80% percentiles of this distribution are compared
with the three levels of PRD accuracy requirements. Using color codes we show, for each percentile,
whether *threshold*, *target*, or *optimal* accuracies were reached.

**PRD requirements**

The *Product Requirements Document*
(PRD) states the required accuracies for the data
products, and suggests how to demonstrate compliance to these accuracies. For each
climate variable, three accuracy levels are specified (*threshold*, *target*, and *optimal*),
and the PRD states that the actual data should be evaluated against these accuracies by comparison
with reanalysis data.

The zonal fields are divided into three latitudinal zones: tropics (30°S-30°N), mid-latitudes
(30°S-60°S and 30°N-60°N), and polar regions (60°S-90°S and 60°N-90°N).
Each zone is further separated into three height layers (0-8 km, 8-15 km,
and 15-40 km). For each of these 9 latitude-height regions, the median and three percentiles (60%, 70%, and 80%)
of the absolute deviation from ERA-Interim are computed. The 67% percentile corresponds approximately to one standard
deviation for a normal distribution. *Formally, we validate the Level 3 data by the compliance
of the 60% percentile of the absolute deviations from ERA-Interim, with the PRD requirements*.

**Compliance with PRD**

For each climate variable and month, and within specified latitudinal zones and height
layers, we can plot the distribution of estimated errors of the monthly mean.
The median and the 60%, 70%, and 80% percentiles of this distribution are compared
with the three levels of PRD accuracy requirements. Using color codes we show, for each percentile,
whether *threshold*, *target*, or *optimal* accuracies were reached.

**PRD requirements**

The *Product Requirements Document*
(PRD) states the required accuracies for the data
products, and suggests how to demonstrate compliance to these accuracies. For each
climate variable, three accuracy levels are specified (*threshold*, *target*, and *optimal*),
and the PRD states that the actual data should be evaluated against these accuracies by comparison
with reanalysis data.

The zonal fields are divided into three latitudinal zones: tropics (30°S-30°N), mid-latitudes
(30°S-60°S and 30°N-60°N), and polar regions (60°S-90°S and 60°N-90°N).
Each zone is further separated into three height layers (0-8 km, 8-15 km,
and 15-40 km). For each of these 9 latitude-height regions, the median and three percentiles (60%, 70%, and 80%)
of the absolute deviation from ERA-Interim are computed. The 67% percentile corresponds approximately to one standard
deviation for a normal distribution. *Formally, we validate the Level 3 data by the compliance
of the 60% percentile of the absolute deviations from ERA-Interim, with the PRD requirements*.

**Compliance with Service Specifications**

For each climate variable and month, and within specified latitudinal zones and height
layers, we can plot the distribution of estimated errors of the monthly mean.
The 60% percentiles of the distributions are compared
with the three levels of PRD accuracy requirements. Using color codes we show
whether *threshold*, *target*, or *optimal* accuracies were reached.
To comply with the
*Service Specifications*, the 60% percentile must at least
reach the *target* accuracy.

**PRD requirements**

The *Product Requirements Document*
(PRD) states the required accuracies for the data
products, and suggests how to demonstrate compliance to these accuracies. For each
climate variable, three accuracy levels are specified (*threshold*, *target*, and *optimal*),
and the PRD states that the actual data should be evaluated against these accuracies by comparison
with reanalysis data.

The zonal fields are divided into three latitudinal zones: tropics (30°S-30°N), mid-latitudes
(30°S-60°S and 30°N-60°N), and polar regions (60°S-90°S and 60°N-90°N).
Each zone is further separated into three height layers (0-8 km, 8-15 km,
and 15-40 km). For each of these 9 latitude-height regions, the median and three percentiles (60%, 70%, and 80%)
of the absolute deviation from ERA-Interim are computed. The 67% percentile corresponds approximately to one standard
deviation for a normal distribution. *Formally, we validate the Level 3 data by the compliance
of the 60% percentile of the absolute deviations from ERA-Interim, with the PRD requirements*.