---
title: "Confidence Interval Estimation with Hand-Crafted CDFs"
author: "Peter E. Freeman"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Confidence Interval Estimation with Hand-Crafted CDFs}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

## Introduction

Recall that to determine confidence interval bounds, we solve the
equation
\begin{align*}
F_Y(y_{\rm obs} \vert \theta) - q = 0
\end{align*}
for $\theta$. Here, $Y$ is our chosen statistic (e.g., $\bar{X}$),
$F_Y(\cdot)$ is the cumulative distribution function (or cdf) for
$Y$'s sampling distribution, $y_{\rm obs}$ is the observed statistic value,
and $q$ is a quantile that we determine given the confidence coefficient 
$1-\alpha$ and the type of interval we are constructing (one-sided lower
bound, one-sided upper bound, or two-sided).

In this vignette, we examine the situation in which the sampling distribution
cdf *is* analytically derivable, but (a) is not (easily) invertible by hand, 
and (b) is (to our knowledge!) not coded in an existing `R` package. In this
situation, we would develop our own cdf-calculating function, and then pass
it to `cdfinv()`.

## Example

Let's suppose we sample $n = 16$ data from a $\text{Uniform}(0,\theta)$
distribution and we wish to compute a 95% upper bound for $\theta$. Further,
let's suppose that the $10^{\rm th}$ datum is $x_{(10),\rm obs} = 0.45$, and
that we will use this datum to determine the confidence interval. (This is
obviously an academic exercise: we would normally utilize $x_{(16),\rm obs}$
and solve for the upper bound by hand.)

## Deriving the CDF

For the $\text{Uniform}(0,\theta)$ distribution, the cdf within the domain is
\begin{align*}
F_X(x) = \int_0^x \frac{1}{\theta - 0} dy = \frac{x}{\theta} \,.
\end{align*}
We utilize a result found in Casella & Berger (2002; page 229): the
cdf for the $j^{\rm th}$ order statistic is
\begin{align*}
F_{(j)}(x) = \sum_{k=j}^n \binom{n}{k} [F_X(x)]^k [1-F_X(x)]^{n-k} \,.
\end{align*}
(Our notation slightly differs from that of Casella & Berger.) Here, that
cdf would be
\begin{align*}
F_{(j)}(x) = \sum_{k=j}^n \binom{n}{k} \left(\frac{x}{\theta}\right)^k \left[1-\left(\frac{x}{\theta}\right)\right]^{n-k} \,.
\end{align*}
This is obviously *not* a cdf that we can easily invert by hand when $j < n$.

## Coding the CDF

Below we show how one might create one's own code that returns the cdf value
as a function of a coordinate $x$ and a parameter $\theta$. Note that the first
argument must be `q`...the observed statistic value passed to `cdfinv()` will be
matched to this argument name.
```{r}
# give the function a name that will not conflict with others in the namespace
punif.ord <- function(q,theta,j,n)
{
  sum(choose(n,j:n)*(q/theta)^(j:n)*(1-q/theta)^(n-j:n))
}
```

## Applying the CDF

Below we show how we can pass the function, now defined in `R`'s global environment,
to `cdfinv()`. The first three arguments are standard: the distribution "name" is
`unif.ord` (`cdfinv()` will tack a `p` on the front), the parameter of interest is
`theta`, and the observed statistic value is 0.45. The next argument specifies that
we wish to compute a (95% by default) upper bound on $\theta$, instead of the default
two-sided interval. As we know that $\theta \geq x_{(10),\rm obs}$, we specify a
lower parameter bound (`lpb`) of 0.45. The last two arguments are extra ones specifically
required for `punif.ord()`.

```{r}
require(cdfinv)
cdfinv("unif.ord","theta",0.45,bound="upper",lpb=0.45,j=10,n=16)
```

The 95% upper bound on $\theta$ determined using the $10^{\rm th}$ ordered datum is
$\hat{\theta}_U = 1.151$.