PartMC  2.6.1
Data Types | Functions/Subroutines
pmc_stats Module Reference

The stats_t type and associated subroutines. More...

Data Types

type  stats_1d_t
 Structure for online computation of 1D arrays of mean and variance. More...
 
type  stats_2d_t
 Structure for online computation of 2D arrays of mean and variance. More...
 
type  stats_t
 Structure for online computation of mean and variance. More...
 

Functions/Subroutines

subroutine stats_clear (stats)
 Clear data statistics collected so far. More...
 
subroutine stats_1d_clear (stats)
 Clear data statistics collected so far. More...
 
subroutine stats_2d_clear (stats)
 Clear data statistics collected so far. More...
 
subroutine stats_add (stats, data)
 Add a new data value to a stats_t structure. More...
 
subroutine stats_1d_add (stats, data)
 Add all new data values to a stats_1d_t structure. More...
 
subroutine stats_1d_add_entry (stats, data, i)
 Add a new single data value to a stats_1d_t structure. More...
 
subroutine stats_2d_add (stats, data)
 Add all new data values to a stats_2d_t structure. More...
 
subroutine stats_2d_add_row (stats, data, i)
 Add a row of new data values to a stats_2d_t structure. More...
 
subroutine stats_2d_add_col (stats, data, j)
 Add a column of new data values to a stats_2d_t structure. More...
 
subroutine stats_2d_add_entry (stats, data, i, j)
 Add a single new data value to a stats_2d_t structure. More...
 
real(kind=dp) function stats_conf_95_offset (stats)
 Compute the 95% confidence interval offset from the mean. More...
 
real(kind=dp) function, dimension(size(stats%n)) stats_1d_conf_95_offset (stats)
 Compute the 95% confidence interval offset from the mean. More...
 
real(kind=dp) function, dimension(size(stats%n, 1), size(stats%n, 2)) stats_2d_conf_95_offset (stats)
 Compute the 95% confidence interval offset from the mean. More...
 
subroutine update_mean_var (mean, var, data, n)
 Compute a running average and variance. More...
 
real(kind=dp) function student_t_95_coeff (n_sample)
 Return a fairly tight upper-bound on the Student's t coefficient for the 95% confidence interval. More...
 
real(kind=dp) function conf_95_offset (var, n_sample)
 95% confidence interval offset from mean. More...
 
subroutine stats_output_netcdf (stats, ncid, name, unit)
 Write statistics (mean and 95% conf. int.) to a NetCDF file. More...
 
subroutine stats_1d_output_netcdf (stats, ncid, name, dim_name, unit)
 Write statistics (mean and 95% conf. int.) to a NetCDF file. More...
 
subroutine stats_2d_output_netcdf (stats, ncid, name, dim_name_1, dim_name_2, unit)
 Write statistics (mean and 95% conf. int.) to a NetCDF file. More...
 
subroutine stats_1d_output_text (stats, filename, dim)
 Write statistics (mean and 95% conf. int.) to a text file. More...
 

Detailed Description

The stats_t type and associated subroutines.

Function/Subroutine Documentation

◆ conf_95_offset()

real(kind=dp) function pmc_stats::conf_95_offset ( real(kind=dp), intent(in)  var,
integer, intent(in)  n_sample 
)

95% confidence interval offset from mean.

If mean and var are the sample mean and sample variance of n data values, then

 offset = conf_95_offset(var, n)
 

means that the 95% confidence interval for the mean is [mean - offset, mean + offset].

If n_sample is one or less then zero is returned.

Parameters
[in]varSample variance of data.
[in]n_sampleNumber of samples.
Returns
Return offset from mean for the 95% confidence interval.

Definition at line 551 of file stats.F90.

◆ stats_1d_add()

subroutine pmc_stats::stats_1d_add ( type(stats_1d_t), intent(inout)  stats,
real(kind=dp), dimension(:), intent(in)  data 
)

Add all new data values to a stats_1d_t structure.

Parameters
[in,out]statsStatistics structure to add to.
[in]dataData values to add.

Definition at line 108 of file stats.F90.

◆ stats_1d_add_entry()

subroutine pmc_stats::stats_1d_add_entry ( type(stats_1d_t), intent(inout)  stats,
real(kind=dp), intent(in)  data,
integer, intent(in)  i 
)

Add a new single data value to a stats_1d_t structure.

Parameters
[in,out]statsStatistics structure to add to.
[in]dataData value to add.
[in]iIndex of data value to add.

Definition at line 140 of file stats.F90.

◆ stats_1d_clear()

subroutine pmc_stats::stats_1d_clear ( type(stats_1d_t), intent(inout)  stats)

Clear data statistics collected so far.

Parameters
[in,out]statsStatistics structure to clear.

Definition at line 61 of file stats.F90.

◆ stats_1d_conf_95_offset()

real(kind=dp) function, dimension(size(stats%n)) pmc_stats::stats_1d_conf_95_offset ( type(stats_1d_t), intent(in)  stats)

Compute the 95% confidence interval offset from the mean.

Parameters
[in]statsStatistics structure to find confidence interval for.
Returns
Return offset so that [mean - offset, mean + offset] is the 95% confidence interval.

Definition at line 315 of file stats.F90.

◆ stats_1d_output_netcdf()

subroutine pmc_stats::stats_1d_output_netcdf ( type(stats_1d_t), intent(in)  stats,
integer, intent(in)  ncid,
character(len=*), intent(in)  name,
character(len=*), intent(in), optional  dim_name,
character(len=*), intent(in), optional  unit 
)

Write statistics (mean and 95% conf. int.) to a NetCDF file.

Parameters
[in]statsStatistics structure to write.
[in]ncidNetCDF file ID, in data mode.
[in]nameVariable name in NetCDF file.
[in]dim_nameNetCDF dimension name for the variable.
[in]unitUnit of variable.

Definition at line 594 of file stats.F90.

◆ stats_1d_output_text()

subroutine pmc_stats::stats_1d_output_text ( type(stats_1d_t), intent(in)  stats,
character(len=*), intent(in)  filename,
real(kind=dp), dimension(:), intent(in)  dim 
)

Write statistics (mean and 95% conf. int.) to a text file.

The format has three columns: dim mean ci_offset where dim is the optional dimension argument, mean is the mean value and ci_offset is the 95% confidence interval offset, so the 95% CI is mean - ci_offset, mean + ci_offset.

Parameters
[in]statsStatistics structure to write.
[in]filenameFilename to write to.
[in]dimDimension array (independent variable).

Definition at line 650 of file stats.F90.

◆ stats_2d_add()

subroutine pmc_stats::stats_2d_add ( type(stats_2d_t), intent(inout)  stats,
real(kind=dp), dimension(:, :), intent(in)  data 
)

Add all new data values to a stats_2d_t structure.

Parameters
[in,out]statsStatistics structure to add to.
[in]dataData values to add.

Definition at line 162 of file stats.F90.

◆ stats_2d_add_col()

subroutine pmc_stats::stats_2d_add_col ( type(stats_2d_t), intent(inout)  stats,
real(kind=dp), dimension(:), intent(in)  data,
integer, intent(in)  j 
)

Add a column of new data values to a stats_2d_t structure.

Parameters
[in,out]statsStatistics structure to add to.
[in]dataData values to add.
[in]jColumn of data value to add.

Definition at line 235 of file stats.F90.

◆ stats_2d_add_entry()

subroutine pmc_stats::stats_2d_add_entry ( type(stats_2d_t), intent(inout)  stats,
real(kind=dp), intent(in)  data,
integer, intent(in)  i,
integer, intent(in)  j 
)

Add a single new data value to a stats_2d_t structure.

Parameters
[in,out]statsStatistics structure to add to.
[in]dataData values to add.
[in]iFirst index of data value to add.
[in]jSecond index of data value to add.

Definition at line 273 of file stats.F90.

◆ stats_2d_add_row()

subroutine pmc_stats::stats_2d_add_row ( type(stats_2d_t), intent(inout)  stats,
real(kind=dp), dimension(:), intent(in)  data,
integer, intent(in)  i 
)

Add a row of new data values to a stats_2d_t structure.

Parameters
[in,out]statsStatistics structure to add to.
[in]dataData values to add.
[in]iRow of data value to add.

Definition at line 197 of file stats.F90.

◆ stats_2d_clear()

subroutine pmc_stats::stats_2d_clear ( type(stats_2d_t), intent(inout)  stats)

Clear data statistics collected so far.

Parameters
[in,out]statsStatistics structure to clear.

Definition at line 77 of file stats.F90.

◆ stats_2d_conf_95_offset()

real(kind=dp) function, dimension(size(stats%n, 1), size(stats%n, 2)) pmc_stats::stats_2d_conf_95_offset ( type(stats_2d_t), intent(in)  stats)

Compute the 95% confidence interval offset from the mean.

Parameters
[in]statsStatistics structure to find confidence interval for.
Returns
Return offset so that [mean - offset, mean + offset] is the 95% confidence interval.

Definition at line 335 of file stats.F90.

◆ stats_2d_output_netcdf()

subroutine pmc_stats::stats_2d_output_netcdf ( type(stats_2d_t), intent(in)  stats,
integer, intent(in)  ncid,
character(len=*), intent(in)  name,
character(len=*), intent(in), optional  dim_name_1,
character(len=*), intent(in), optional  dim_name_2,
character(len=*), intent(in), optional  unit 
)

Write statistics (mean and 95% conf. int.) to a NetCDF file.

Parameters
[in]statsStatistics structure to write.
[in]ncidNetCDF file ID, in data mode.
[in]nameVariable name in NetCDF file.
[in]dim_name_1First NetCDF dimension name for the variable.
[in]dim_name_2Second NetCDF dimension name for the variable.
[in]unitUnit of variable.

Definition at line 617 of file stats.F90.

◆ stats_add()

subroutine pmc_stats::stats_add ( type(stats_t), intent(inout)  stats,
real(kind=dp), intent(in)  data 
)

Add a new data value to a stats_t structure.

Parameters
[in,out]statsStatistics structure to add to.
[in]dataData value to add.

Definition at line 93 of file stats.F90.

◆ stats_clear()

subroutine pmc_stats::stats_clear ( type(stats_t), intent(inout)  stats)

Clear data statistics collected so far.

Parameters
[in,out]statsStatistics structure to clear.

Definition at line 49 of file stats.F90.

◆ stats_conf_95_offset()

real(kind=dp) function pmc_stats::stats_conf_95_offset ( type(stats_t), intent(in)  stats)

Compute the 95% confidence interval offset from the mean.

Parameters
[in]statsStatistics structure to find confidence interval for.
Returns
Return offset so that [mean - offset, mean + offset] is the 95% confidence interval.

Definition at line 299 of file stats.F90.

◆ stats_output_netcdf()

subroutine pmc_stats::stats_output_netcdf ( type(stats_t), intent(in)  stats,
integer, intent(in)  ncid,
character(len=*), intent(in)  name,
character(len=*), intent(in), optional  unit 
)

Write statistics (mean and 95% conf. int.) to a NetCDF file.

Parameters
[in]statsStatistics structure to write.
[in]ncidNetCDF file ID, in data mode.
[in]nameVariable name in NetCDF file.
[in]unitUnit of variable.

Definition at line 574 of file stats.F90.

◆ student_t_95_coeff()

real(kind=dp) function pmc_stats::student_t_95_coeff ( integer, intent(in)  n_sample)

Return a fairly tight upper-bound on the Student's t coefficient for the 95% confidence interval.

The number of degrees of freedom is one less than n_sample. If a set of $n$ numbers has sample mean $\mu$ and sample standard deviation $\sigma$, then the 95% confidence interval for the mean is $[\mu - r\sigma/\sqrt{n}, \mu + r\sigma/\sqrt{n}]$, where r = student_t_95_coeff(n_sample).

The method used here was written by MW on 2011-05-01, based on the following empirical observation. If $f(\nu) = t_{0.95,\nu}$ is the function we want, where $\nu = n - 1$ is the number of degrees-of-freedom, then set $g(\nu) = f(\nu) - L$, where $L = \Phi^{-1}(0.975)$ is the limiting value given by the Gaussian CDF $\Phi$. We observe numerically that $g'(\nu) < -1$ and $g'(\nu) \to -1$ as $\nu \to \infty$. Thus $g(\nu)$ is well-approximated by $A/\nu$ for some $A$. Furthermore, if $g(\nu^*) = A^*/\nu$, then $g(\nu) < A^*/\nu$ for $\nu > \nu^*$. We thus have $f(\nu) \le (f(\nu^*) - L) (\nu^* / \nu) + L$ for $\nu \ge \nu^*$. By using a sequence of known $(\nu^*, f(\nu^*))$ pairs we can thus construct a fairly tight upper bound.

This implementation has an error of below 0.1% for all values of n_sample.

Parameters
[in]n_sampleNumber of samples.

Definition at line 510 of file stats.F90.

◆ update_mean_var()

subroutine pmc_stats::update_mean_var ( real(kind=dp), intent(inout)  mean,
real(kind=dp), intent(inout)  var,
real(kind=dp), intent(in)  data,
integer, intent(in)  n 
)

Compute a running average and variance.

Given a sequence of data x(i) for i = 1,...,n, this should be called like

 do i = 1,n
   call update_mean_var(mean, var, x(i), i)
 end do
 

After each call the variables mean and var will be the sample mean and sample variance of the sequence elements up to i.

This computes the sample mean and sample variance using a recurrence. The initial sample mean is $m_1 = x_1$ and the initial sample variance is $v_1 = 0$ for $n = 1$, and then for $n \ge 2$ we use the mean update

\[ m_n = m_{n-1} + \frac{x_n - m_{n-1}}{n} \]

and the variance update

\[ v_n = \frac{n - 2}{n - 1} v_{n-1} + \frac{(x_n - m_{n-1})^2}{n}. \]

Then $m_n$ and $v_n$ are the sample mean and sample variance for $\{x_1,\ldots,x_n\}$ for each $n$.

The derivation of these formulas begins with the definitions for the running total

\[ t_n = \sum_{i=1}^n x_i \]

and running sum of square differences

\[ s_n = \sum_{i=1}^n (x_i - m_n)^2. \]

Then the running mean is $m_n = \frac{t_n}{n}$, the running population variance is $p_n = \frac{1}{n} \left( s_n - n m_n^2 \right)$, and the running sample variance $v_n = \frac{1}{n - 1} \left( s_n - n m_n^2 \right)$.

We can then compute the mean update above, and observing that

\[ s_n = \sum_{i=1}^n x_i^2 - n m_n^2 \]

we can compute the sum-of-square-dfferences update identity

\[ s_n = s_{n-1} + \frac{n - 1}{n} (x_n - m_{n-1})^2. \]

The algorithm then follows immediately. The population variance update is given by

\[ p_n = \frac{n-1}{n} p_{n-1} + \frac{(x_n - m_n)(x_n - m_{n-1})}{n} = \frac{n-1}{n} p_{n-1} + \frac{n-1}{n^2} (x_n - m_{n-1})^2. \]

This algorithm (in a form where $m_n$ and $s_n$ are tracked) originally appeared in:

B. P. Welford [1962] "Note on a Method for Calculating Corrected Sums of Squares and Products", Technometrics 4(3), 419-420.

Numerical tests performed by M. West on 2012-04-12 seem to indicate that there is no substantial difference between tracking $s_n$ versus $v_n$.

The same method (tracking $m_n$ and $s_n$) is presented on page 232 in Section 4.2.2 of Knuth:

D. E. Knuth [1988] "The Art of Computer Programming, Volume 2: Seminumerical Algorithms", third edition, Addison Wesley Longman, ISBN 0-201-89684-2.

An analysis of the error introduced by different variance computation methods is given in:

T. F. Chan, G. H. Golub, and R. J. LeVeque [1983] "Algorithms for Computing the Sample Variance: Analysis and Recommendations", The American Statistician 37(3), 242-247.

The relative error in $s_n$ of Welford's method (tracking $m_n$ and $s_n$) is of order $n \kappa \epsilon$, where $\epsilon$ is the machine precision and $\kappa$ is the condition number for the problem, which is given by

\[ \kappa = \frac{\|x\|_2}{\sqrt{s_n}} = \sqrt{1 + \frac{m_n^2}{p_n}} \approx \frac{m_n}{p_n}. \]

This analysis was apparently first given in:

T. F. C. Chan and J. G. Lewis [1978] "Rounding error analysis of algorithms for computing means and standard deviations", Technical Report No. 284, The Johns Hopkins University, Department of Mathematical Sciences.

Parameters
[in,out]meanMean value to update (on entry $m_{n-1}$, on exit $m_n$).
[in,out]varVariance value to update (on entry $v_{n-1}$, on exit $v_n$).
[in]dataData value $x_n$.
[in]nNumber $n$ of this data value.

Definition at line 455 of file stats.F90.