Type: | Package |
Title: | Data Research, Access, Governance Network : Statistical Disclosure Control |
Version: | 0.1.0 |
Author: | Ben Derrick |
Maintainer: | Ben Derrick <ben.derrick@uwe.ac.uk> |
Description: | A tool for checking how much information is disclosed when reporting summary statistics. |
License: | GPL-3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.1.2.9000 |
Imports: | gtools |
NeedsCompilation: | no |
Packaged: | 2022-03-01 15:58:40 UTC; bf2-derrick |
Repository: | CRAN |
Date/Publication: | 2022-03-02 19:40:01 UTC |
Statistical Data Control. Data Research, Access, Governance Network.
Description
A tool for checking how much information is disclosed when reporting summary statistics
Disguise the sample mean and sample deviation
Description
Disguises the sample mean and standard deviation via a choice of methods.
Usage
disguise(usersample, method = 2)
Arguments
usersample |
A vector of all individual sample values. |
method |
Approach for disguising mean and standard deviation. (default = 1) |
Details
*Method 1*
Randomly split the sample into two (approx. equal size) samples A, and B. For sample A calculate and report mean. For sample B calculate and standard deviation.
*Method 2* (default)
Take a sample of size N with replacement; calculate and report mean. Repeat to calculate and report standard deviation.
*Method 3*
Generate a random number (RN1) between N/2 and N. Sample with replacement a sample size of RN1; calculate and report mean. Generate a random number (RN2) between N/2 and N. Sample with replacement a sample size of RN2; calculate and report standard deviation.
*Method 4*
As Method 3, but sampling without replacement.
Value
Outputs disguised mean and disguised standard deviation.
References
Derrick, B., Green, L., Kember, K., Ritchie, F. & White P, 2022, Safety in numbers: Minimum thresholding, Maximum bounds, and Little White Lies. Scottish Economic Society Annual Conference, University of Glasgow, 25th-27th April 2022
Examples
usersample<-c(1,1,2,3,4,4,5)
disguise(usersample,method=1)
disguise(usersample,method=2)
disguise(usersample,method=3)
disguise(usersample,method=4)
Find individual sample values from the sample mean and standard deviation
Description
For integer based scales, finds possible solutions for each value within a sample. This is revealed upon providing sample size, minimum possible value, maximum possible value, mean, standard deviation (and optionally median).
Usage
solutions(
n,
min_poss,
max_poss,
usermean,
usersd,
meandp = NULL,
sddp = NULL,
usermed = NULL
)
Arguments
n |
Sample size. |
min_poss |
Minimum possible value. If sample minimum is disclosed, this can be inserted here, otherwise use the theoretical minimum. If there is no theoretical maximum 'Inf' can be inserted. |
max_poss |
Maximum possible value. If sample maximum is disclosed, this can be inserted here, otherwise use the theoretical maximum. If there is no theoretical minimum '-Inf' can be inserted. |
usermean |
Sample mean. |
usersd |
Sample standard deviation, i.e. n-1 denominator. |
meandp |
(optional, default=NULL) Number of decimal places mean is reported to, only required if including trailing zeroes. |
sddp |
(optional, default=NULL) Number of decimal places standard deviation is reported to, only required if including trailing zeroes. |
usermed |
(optional, default=NULL) Sample median. |
Details
For use with data measured on a scale with 1 unit increments. Samuelson's inequality [1] used to further restrict the minimum and maximum. All possible combinations within this inequality are calculated [2] for factorial(n+k-1)/(factorial(k)*factorial(n-1))<65,000,000.
No restriction on number of decimal places input. Reporting less than two decimal places will reduce the chances of unique solution to all sample values being uncovered [3]
Additional options to specify number of digits following the decimal place that are reported, required for trailing zeroes.
Value
Outputs possible combinations of original integer sample values.
References
[1] Samuelson, P.A, 1968, How deviant can you be? Journal of the American Statistical Association, Vol 63, 1522-1525.
[2] Allenby, R.B. and Slomson, A., 2010. How to count: An introduction to combinatorics. Chapman and Hall/CRC.
[3] Derrick, B., Green, L., Kember, K., Ritchie, F. & White P, 2022, Safety in numbers: Minimum thresholding, Maximum bounds, and Little White Lies. Scottish Economic Society Annual Conference, University of Glasgow, 25th-27th April 2022
Examples
# EXAMPLE 1
# Seven observations are taken from a five-point Likert scale (coded 1 to 5).
# The reported mean is 2.857 and the reported standard deviation is 1.574.
solutions(7,1,5,2.857,1.574)
# For this mean and standard deviation there are two possible distributions:
# 1 1 2 3 4 4 5
# 1 2 2 2 3 5 5
# Optionally adding median value of 3.
solutions(7,1,5,2.857,1.574, usermed=3)
# uniquely reveals the raw sample values:
# 1 1 2 3 4 4 5
# EXAMPLE 2
# The mean is '4.00'.
# The standard deviation is '2.00'.
# Narrower set of solutions found specifying 2dp including trailing zeroes.
solutions(3,-Inf,Inf,4.00,2.00,2,2)
# uniquely reveals the raw sample values:
# 2 4 6