PartMC
2.6.1
|
The stats_t
type and associated subroutines.
More...
Data Types | |
type | stats_1d_t |
Structure for online computation of 1D arrays of mean and variance. More... | |
type | stats_2d_t |
Structure for online computation of 2D arrays of mean and variance. More... | |
type | stats_t |
Structure for online computation of mean and variance. More... | |
Functions/Subroutines | |
subroutine | stats_clear (stats) |
Clear data statistics collected so far. More... | |
subroutine | stats_1d_clear (stats) |
Clear data statistics collected so far. More... | |
subroutine | stats_2d_clear (stats) |
Clear data statistics collected so far. More... | |
subroutine | stats_add (stats, data) |
Add a new data value to a stats_t structure. More... | |
subroutine | stats_1d_add (stats, data) |
Add all new data values to a stats_1d_t structure. More... | |
subroutine | stats_1d_add_entry (stats, data, i) |
Add a new single data value to a stats_1d_t structure. More... | |
subroutine | stats_2d_add (stats, data) |
Add all new data values to a stats_2d_t structure. More... | |
subroutine | stats_2d_add_row (stats, data, i) |
Add a row of new data values to a stats_2d_t structure. More... | |
subroutine | stats_2d_add_col (stats, data, j) |
Add a column of new data values to a stats_2d_t structure. More... | |
subroutine | stats_2d_add_entry (stats, data, i, j) |
Add a single new data value to a stats_2d_t structure. More... | |
real(kind=dp) function | stats_conf_95_offset (stats) |
Compute the 95% confidence interval offset from the mean. More... | |
real(kind=dp) function, dimension(size(stats%n)) | stats_1d_conf_95_offset (stats) |
Compute the 95% confidence interval offset from the mean. More... | |
real(kind=dp) function, dimension(size(stats%n, 1), size(stats%n, 2)) | stats_2d_conf_95_offset (stats) |
Compute the 95% confidence interval offset from the mean. More... | |
subroutine | update_mean_var (mean, var, data, n) |
Compute a running average and variance. More... | |
real(kind=dp) function | student_t_95_coeff (n_sample) |
Return a fairly tight upper-bound on the Student's t coefficient for the 95% confidence interval. More... | |
real(kind=dp) function | conf_95_offset (var, n_sample) |
95% confidence interval offset from mean. More... | |
subroutine | stats_output_netcdf (stats, ncid, name, unit) |
Write statistics (mean and 95% conf. int.) to a NetCDF file. More... | |
subroutine | stats_1d_output_netcdf (stats, ncid, name, dim_name, unit) |
Write statistics (mean and 95% conf. int.) to a NetCDF file. More... | |
subroutine | stats_2d_output_netcdf (stats, ncid, name, dim_name_1, dim_name_2, unit) |
Write statistics (mean and 95% conf. int.) to a NetCDF file. More... | |
subroutine | stats_1d_output_text (stats, filename, dim) |
Write statistics (mean and 95% conf. int.) to a text file. More... | |
The stats_t
type and associated subroutines.
real(kind=dp) function pmc_stats::conf_95_offset | ( | real(kind=dp), intent(in) | var, |
integer, intent(in) | n_sample | ||
) |
95% confidence interval offset from mean.
If mean
and var
are the sample mean and sample variance of n
data values, then
offset = conf_95_offset(var, n)
means that the 95% confidence interval for the mean is [mean - offset, mean + offset]
.
If n_sample
is one or less then zero is returned.
[in] | var | Sample variance of data. |
[in] | n_sample | Number of samples. |
subroutine pmc_stats::stats_1d_add | ( | type(stats_1d_t), intent(inout) | stats, |
real(kind=dp), dimension(:), intent(in) | data | ||
) |
Add all new data values to a stats_1d_t
structure.
[in,out] | stats | Statistics structure to add to. |
[in] | data | Data values to add. |
subroutine pmc_stats::stats_1d_add_entry | ( | type(stats_1d_t), intent(inout) | stats, |
real(kind=dp), intent(in) | data, | ||
integer, intent(in) | i | ||
) |
Add a new single data value to a stats_1d_t
structure.
[in,out] | stats | Statistics structure to add to. |
[in] | data | Data value to add. |
[in] | i | Index of data value to add. |
subroutine pmc_stats::stats_1d_clear | ( | type(stats_1d_t), intent(inout) | stats | ) |
real(kind=dp) function, dimension(size(stats%n)) pmc_stats::stats_1d_conf_95_offset | ( | type(stats_1d_t), intent(in) | stats | ) |
subroutine pmc_stats::stats_1d_output_netcdf | ( | type(stats_1d_t), intent(in) | stats, |
integer, intent(in) | ncid, | ||
character(len=*), intent(in) | name, | ||
character(len=*), intent(in), optional | dim_name, | ||
character(len=*), intent(in), optional | unit | ||
) |
Write statistics (mean and 95% conf. int.) to a NetCDF file.
[in] | stats | Statistics structure to write. |
[in] | ncid | NetCDF file ID, in data mode. |
[in] | name | Variable name in NetCDF file. |
[in] | dim_name | NetCDF dimension name for the variable. |
[in] | unit | Unit of variable. |
subroutine pmc_stats::stats_1d_output_text | ( | type(stats_1d_t), intent(in) | stats, |
character(len=*), intent(in) | filename, | ||
real(kind=dp), dimension(:), intent(in) | dim | ||
) |
Write statistics (mean and 95% conf. int.) to a text file.
The format has three columns: dim mean ci_offset
where dim
is the optional dimension argument, mean
is the mean value and ci_offset
is the 95% confidence interval offset, so the 95% CI is mean - ci_offset, mean + ci_offset
.
[in] | stats | Statistics structure to write. |
[in] | filename | Filename to write to. |
[in] | dim | Dimension array (independent variable). |
subroutine pmc_stats::stats_2d_add | ( | type(stats_2d_t), intent(inout) | stats, |
real(kind=dp), dimension(:, :), intent(in) | data | ||
) |
Add all new data values to a stats_2d_t
structure.
[in,out] | stats | Statistics structure to add to. |
[in] | data | Data values to add. |
subroutine pmc_stats::stats_2d_add_col | ( | type(stats_2d_t), intent(inout) | stats, |
real(kind=dp), dimension(:), intent(in) | data, | ||
integer, intent(in) | j | ||
) |
Add a column of new data values to a stats_2d_t
structure.
[in,out] | stats | Statistics structure to add to. |
[in] | data | Data values to add. |
[in] | j | Column of data value to add. |
subroutine pmc_stats::stats_2d_add_entry | ( | type(stats_2d_t), intent(inout) | stats, |
real(kind=dp), intent(in) | data, | ||
integer, intent(in) | i, | ||
integer, intent(in) | j | ||
) |
Add a single new data value to a stats_2d_t
structure.
[in,out] | stats | Statistics structure to add to. |
[in] | data | Data values to add. |
[in] | i | First index of data value to add. |
[in] | j | Second index of data value to add. |
subroutine pmc_stats::stats_2d_add_row | ( | type(stats_2d_t), intent(inout) | stats, |
real(kind=dp), dimension(:), intent(in) | data, | ||
integer, intent(in) | i | ||
) |
Add a row of new data values to a stats_2d_t
structure.
[in,out] | stats | Statistics structure to add to. |
[in] | data | Data values to add. |
[in] | i | Row of data value to add. |
subroutine pmc_stats::stats_2d_clear | ( | type(stats_2d_t), intent(inout) | stats | ) |
real(kind=dp) function, dimension(size(stats%n, 1), size(stats%n, 2)) pmc_stats::stats_2d_conf_95_offset | ( | type(stats_2d_t), intent(in) | stats | ) |
subroutine pmc_stats::stats_2d_output_netcdf | ( | type(stats_2d_t), intent(in) | stats, |
integer, intent(in) | ncid, | ||
character(len=*), intent(in) | name, | ||
character(len=*), intent(in), optional | dim_name_1, | ||
character(len=*), intent(in), optional | dim_name_2, | ||
character(len=*), intent(in), optional | unit | ||
) |
Write statistics (mean and 95% conf. int.) to a NetCDF file.
[in] | stats | Statistics structure to write. |
[in] | ncid | NetCDF file ID, in data mode. |
[in] | name | Variable name in NetCDF file. |
[in] | dim_name_1 | First NetCDF dimension name for the variable. |
[in] | dim_name_2 | Second NetCDF dimension name for the variable. |
[in] | unit | Unit of variable. |
subroutine pmc_stats::stats_add | ( | type(stats_t), intent(inout) | stats, |
real(kind=dp), intent(in) | data | ||
) |
subroutine pmc_stats::stats_clear | ( | type(stats_t), intent(inout) | stats | ) |
real(kind=dp) function pmc_stats::stats_conf_95_offset | ( | type(stats_t), intent(in) | stats | ) |
subroutine pmc_stats::stats_output_netcdf | ( | type(stats_t), intent(in) | stats, |
integer, intent(in) | ncid, | ||
character(len=*), intent(in) | name, | ||
character(len=*), intent(in), optional | unit | ||
) |
real(kind=dp) function pmc_stats::student_t_95_coeff | ( | integer, intent(in) | n_sample | ) |
Return a fairly tight upper-bound on the Student's t coefficient for the 95% confidence interval.
The number of degrees of freedom is one less than n_sample
. If a set of numbers has sample mean and sample standard deviation , then the 95% confidence interval for the mean is , where r = student_t_95_coeff(n_sample)
.
The method used here was written by MW on 2011-05-01, based on the following empirical observation. If is the function we want, where is the number of degrees-of-freedom, then set , where is the limiting value given by the Gaussian CDF . We observe numerically that and as . Thus is well-approximated by for some . Furthermore, if , then for . We thus have for . By using a sequence of known pairs we can thus construct a fairly tight upper bound.
This implementation has an error of below 0.1% for all values of n_sample
.
[in] | n_sample | Number of samples. |
subroutine pmc_stats::update_mean_var | ( | real(kind=dp), intent(inout) | mean, |
real(kind=dp), intent(inout) | var, | ||
real(kind=dp), intent(in) | data, | ||
integer, intent(in) | n | ||
) |
Compute a running average and variance.
Given a sequence of data x(i)
for i = 1,...,n
, this should be called like
do i = 1,n call update_mean_var(mean, var, x(i), i) end do
After each call the variables mean
and var
will be the sample mean and sample variance of the sequence elements up to i
.
This computes the sample mean and sample variance using a recurrence. The initial sample mean is and the initial sample variance is for , and then for we use the mean update
and the variance update
Then and are the sample mean and sample variance for for each .
The derivation of these formulas begins with the definitions for the running total
and running sum of square differences
Then the running mean is , the running population variance is , and the running sample variance .
We can then compute the mean update above, and observing that
we can compute the sum-of-square-dfferences update identity
The algorithm then follows immediately. The population variance update is given by
This algorithm (in a form where and are tracked) originally appeared in:
B. P. Welford [1962] "Note on a Method for Calculating Corrected Sums of Squares and Products", Technometrics 4(3), 419-420.
Numerical tests performed by M. West on 2012-04-12 seem to indicate that there is no substantial difference between tracking versus .
The same method (tracking and ) is presented on page 232 in Section 4.2.2 of Knuth:
D. E. Knuth [1988] "The Art of Computer Programming, Volume 2: Seminumerical Algorithms", third edition, Addison Wesley Longman, ISBN 0-201-89684-2.
An analysis of the error introduced by different variance computation methods is given in:
T. F. Chan, G. H. Golub, and R. J. LeVeque [1983] "Algorithms for Computing the Sample Variance: Analysis and Recommendations", The American Statistician 37(3), 242-247.
The relative error in of Welford's method (tracking and ) is of order , where is the machine precision and is the condition number for the problem, which is given by
This analysis was apparently first given in:
T. F. C. Chan and J. G. Lewis [1978] "Rounding error analysis of algorithms for computing means and standard deviations", Technical Report No. 284, The Johns Hopkins University, Department of Mathematical Sciences.
[in,out] | mean | Mean value to update (on entry , on exit ). |
[in,out] | var | Variance value to update (on entry , on exit ). |
[in] | data | Data value . |
[in] | n | Number of this data value. |