Title:  Bayesian Methods to Estimate the Proportion of Liars in Coin Flip Experiments 

Description:  Implements Bayesian methods, described in HughJones (2019) <doi:10.1007/s4088101900069x>, for estimating the proportion of liars in coin flipstyle experiments, where subjects report a random outcome and are paid for reporting a "good" outcome. 
Authors:  David HughJones <[email protected]> 
Maintainer:  David HughJones <[email protected]> 
License:  MIT + file LICENSE 
Version:  0.2.0.9000 
Built:  20241101 03:54:10 UTC 
Source:  https://github.com/hughjonesd/truelies 
Given two distributions with density functions $\phi_1, \phi_2$
,
this calculates:
$\int_0^1 \int_0^{l_1}\phi_1(l_1) \phi_2(l_2) d l_2 d l_1,$
the probability that the value of the first distribution is greater.
compare_dists(dist1, dist2)
compare_dists(dist1, dist2)
dist1 
Density of distribution 1, as a oneargument function. 
dist2 
Density of distribution 2. 
A probability scalar.
d1 < update_prior(30, 50, P = 0.5, prior = stats::dunif) d2 < update_prior(25, 40, P = 0.5, prior = stats::dunif) compare_dists(d1, d2)
d1 < update_prior(30, 50, P = 0.5, prior = stats::dunif) d2 < update_prior(25, 40, P = 0.5, prior = stats::dunif) compare_dists(d1, d2)
Given two probability density functions dist1
and dist2
, difference_dist
returns the density of “dist1  dist2'.
difference_dist(dist1, dist2)
difference_dist(dist1, dist2)
dist1 , dist2

Probability density functions 
At the moment this only works when dist1 and dist2 are defined on [0, 1]
.
A probability density function defined on [1, 1]
.
d1 < update_prior(30, 50, P = 0.5, prior = stats::dunif) d2 < update_prior(32, 40, P = 0.5, prior = stats::dunif) dd < difference_dist(d1, d2) dist_hdr(dd, 0.95)
d1 < update_prior(30, 50, P = 0.5, prior = stats::dunif) d2 < update_prior(32, 40, P = 0.5, prior = stats::dunif) dd < difference_dist(d1, d2) dist_hdr(dd, 0.95)
This is a wrapper for hdrcde::hdr
. The highest density region is the
interval that covers conf_level
of the data and has the highest
average density. See:
dist_hdr(dist, conf_level, bounds = attr(dist, "limits"))
dist_hdr(dist, conf_level, bounds = attr(dist, "limits"))
dist 
A oneargument function 
conf_level 
A scalar between 0 and 1 
bounds 
A length 2 vector of the bounds of the distribution's support 
Rob J Hyndman (1996) “Computing and graphing highest density regions”. American Statistician, 50, 120126.
A length 2 vector of region endpoints
d1 < update_prior(33, 50, P = 0.5, prior = stats::dunif) dist_hdr(d1, 0.95)
d1 < update_prior(33, 50, P = 0.5, prior = stats::dunif) dist_hdr(d1, 0.95)
Find mean of a probability density function
dist_mean(dist, l = attr(dist, "limits")[1], r = attr(dist, "limits")[2])
dist_mean(dist, l = attr(dist, "limits")[1], r = attr(dist, "limits")[2])
dist 
A oneargument function returned from 
l 
Lower bound of the density's support 
r 
Upper bound of the density's support 
A scalar
d1 < update_prior(10, 40, P = 5/6, prior = stats::dunif) dist_mean(d1)
d1 < update_prior(10, 40, P = 5/6, prior = stats::dunif) dist_mean(d1)
Find quantiles given a probability density function
dist_quantile(dist, probs, bounds = attr(dist, "limits"))
dist_quantile(dist, probs, bounds = attr(dist, "limits"))
dist 
A one argument function 
probs 
A vector of probabilities 
bounds 
A length 2 vector of the bounds of the distribution's support 
A vector of quantiles
d1 < update_prior(33, 50, P = 0.5, prior = stats::dunif) dist_quantile(d1, c(0.025, 0.975))
d1 < update_prior(33, 50, P = 0.5, prior = stats::dunif) dist_quantile(d1, c(0.025, 0.975))
This function creates a prior by fitting a Beta distribution to the heads/N
vector,
using MASS::fitdistr()
. The prior is then updated using data from each
individual sample to give the posterior distributions.
empirical_bayes(heads, ...) ## Default S3 method: empirical_bayes(heads, N, P, ...) ## S3 method for class 'formula' empirical_bayes(formula, data, P, subset, ...)
empirical_bayes(heads, ...) ## Default S3 method: empirical_bayes(heads, N, P, ...) ## S3 method for class 'formula' empirical_bayes(formula, data, P, subset, ...)
heads 
A vector of numbers of the good outcome reported 
... 
Ignored 
N 
A vector of sample sizes 
P 
Probability of bad outcome 
formula 
A twosided formula of the form 
data 
A data frame or matrix. Each row represents one individual. 
subset 
A logical or numeric vector specifying the subset of data to use 
The formula interface allows calling the function directly on experimental data.
A list with two components:
prior
, the calculated empirical prior (of class densityFunction
).
posterior
, a list of posterior distributions (objects of class densityFunction
).
If heads
was named, the list will have the same names.
heads < c(Baseline = 30, Treatment1 = 38, Treatment2 = 45) N < c(50, 52, 57) res < empirical_bayes(heads, N, P = 0.5) compare_dists(res$posteriors$Baseline, res$posteriors$Treatment1) plot(res$prior, ylim = c(0, 4), col = "grey", lty = 2) plot(res$posteriors$Baseline, add = TRUE, col = "blue") plot(res$posteriors$Treatment1, add = TRUE, col = "orange") plot(res$posteriors$Treatment2, add = TRUE, col = "red") # starting from raw data: raw_data < data.frame( report = sample(c("heads", "tails"), size = 300, replace = TRUE, prob = c(.8, .2) ), group = rep(LETTERS[1:10], each = 30) ) empirical_bayes(I(report == "heads") ~ group, data = raw_data, P = 0.5)
heads < c(Baseline = 30, Treatment1 = 38, Treatment2 = 45) N < c(50, 52, 57) res < empirical_bayes(heads, N, P = 0.5) compare_dists(res$posteriors$Baseline, res$posteriors$Treatment1) plot(res$prior, ylim = c(0, 4), col = "grey", lty = 2) plot(res$posteriors$Baseline, add = TRUE, col = "blue") plot(res$posteriors$Treatment1, add = TRUE, col = "orange") plot(res$posteriors$Treatment2, add = TRUE, col = "red") # starting from raw data: raw_data < data.frame( report = sample(c("heads", "tails"), size = 300, replace = TRUE, prob = c(.8, .2) ), group = rep(LETTERS[1:10], each = 30) ) empirical_bayes(I(report == "heads") ~ group, data = raw_data, P = 0.5)
This uses simulations to estimate the power to detect a given level of lying in a
sample of size N
by this package's methods.
power_calc(N, P, lambda, alpha = 0.05, prior = stats::dunif, nsims = 200)
power_calc(N, P, lambda, alpha = 0.05, prior = stats::dunif, nsims = 200)
N 
Total number in sample 
P 
Probability of bad outcome 
lambda 
Probability of a subject lying 
alpha 
Significance level to use for the null hypothesis 
prior 
Prior over lambda. A function which takes a vector of values between 0 and 1, and returns the probability density. The default is the uniform distribution. 
nsims 
Number of simulations to run 
Estimated power, a scalar between 0 and 1.
power_calc(N = 50, P = 0.5, lambda = 0.2)
power_calc(N = 50, P = 0.5, lambda = 0.2)
Using simulations, estimate power to detect differences in lying
using compare_dists()
, given values for $\lambda$
, the
probability of lying, in each sample.
power_calc_difference(N1, N2 = N1, P, lambda1, lambda2, alpha = 0.05, alternative = c("two.sided", "greater", "less"), prior = stats::dunif, nsims = 200)
power_calc_difference(N1, N2 = N1, P, lambda1, lambda2, alpha = 0.05, alternative = c("two.sided", "greater", "less"), prior = stats::dunif, nsims = 200)
N1 
N of sample 1 
N2 
N of sample 2 
P 
Probability of bad outcome 
lambda1 
Probability of lying in sample 1 
lambda2 
Probability of lying in sample 2 
alpha 
Significance level 
alternative 
"two.sided", "greater" (sample 1 is greater), or "less". Can be abbreviated 
prior 
Prior over lambda. A function which takes a vector of values between 0 and 1, and returns the probability density. The default is the uniform distribution. 
nsims 
Number of simulations to run 
Estimated power, a scalar between 0 and 1.
power_calc_difference(N1 = 100, P = 0.5, lambda = 0, lambda2 = 0.25)
power_calc_difference(N1 = 100, P = 0.5, lambda = 0, lambda2 = 0.25)
densityFunction
.Print/plot an object of class densityFunction
.
## S3 method for class 'densityFunction' print(x, ...) ## S3 method for class 'densityFunction' plot(x, ...)
## S3 method for class 'densityFunction' print(x, ...) ## S3 method for class 'densityFunction' plot(x, ...)
x 
The object 
... 
Unused 
d1 < update_prior(33, 50, P = 0.5, prior = stats::dunif) d1 plot(d1) # show the actual R code (techies only) unclass(d1)
d1 < update_prior(33, 50, P = 0.5, prior = stats::dunif) d1 plot(d1) # show the actual R code (techies only) unclass(d1)
update_prior
uses the equation for the posterior:
$\phi(\lambda  R; N,P) = Pr(R\lambda; N,P) \phi(\lambda) /
\int Pr(R  \lambda'; N,P) \phi(\lambda') d \lambda'$
where $\phi$
is the prior and $Pr(R  \lambda; N, P)$
is the
probability of R reports of heads given that people lie with probability
$\lambda$
:
$Pr(R  \lambda; N, P) = binom(N, (1P) + \lambda P)$
update_prior(heads, N, P, prior = stats::dunif, npoints = 1000)
update_prior(heads, N, P, prior = stats::dunif, npoints = 1000)
heads 
Number of good outcomes reported 
N 
Total number in sample 
P 
Probability of bad outcome 
prior 
Prior over lambda. A function which takes a vector of values between 0 and 1, and returns the probability density. The default is the uniform distribution. 
npoints 
How many points to integrate on? 
The probability density of the posterior distribution, as a oneargument function.
posterior < update_prior(heads = 30, N = 50, P = 0.5, prior = stats::dunif) plot(posterior)
posterior < update_prior(heads = 30, N = 50, P = 0.5, prior = stats::dunif) plot(posterior)