Skip to content

4 Simulation methods

4.1 Reliability: System lifetime

A system consists of three components A, B and C serially connected, such that A is positioned before B, which is again positioned before C. The system will be functioning only so long as A, B and C are all functioning. The lifetime in months of the three components are assumed to follow exponential distributions with means: 2 months, 3 months and 5 months, respectively (hence there are three random variables, X_{A}, X_B and X_C with exponential distributions with \lambda_{A} = ½, \lambda_{B}= ⅓ and \lambda_{C}= ⅕ resp.)

a)

Generate, by simulation, a large number (at least 1000 – go for 10000 or 100000 if your computer is up for it) of system lifetimes (hint: consider how the random variable Y = System lifetime is a function of the three X-variables: is it the sum, the mean, the median, the minimum, the maximum, the range or something even different?).

  • The lifetime can be seen as the minimal value of the three random component lifetimes
\text{"Lifetime"} = min(X_A,X_B,X_C)
k <- 10000
xA <- rexp(k, 1/2)
xB <- rexp(k, 1/3)
xC <- rexp(k, 1/5)
x <- cbind(xA, xB, xC)
lifetimes <- apply(x,1, min)  # The second argument in apply() function, 1, means that there should be used rows (NOT columns) for finding the minimal value

b)

Estimate the mean system lifetime.

\bar{x} = 0.9770395
> mean(lifetimes)
[1] 0.9770395

c)

Estimate the standard deviation of system lifetimes.

\bar{s} = 0.9791683
> sd(lifetimes)
[1] 0.9791683

d)

Estimate the probability that the system fails within 1 month.

P(X<1)= 0.6395
> mean(lifetimes<1)
[1] 0.6395

e)

Estimate the median system lifetime

> median(lifetimes)
[1] 0.6714813

f)

Estimate the 10th percentile of system lifetimes

> quantile(lifetimes, 0.1)
       10% 
0.09988259

g)

What seems to be the distribution of system lifetimes? (histogram etc)

  • It seems to be the exponential distribution

hist(lifetimes, col = "blue", nclass = 30)

4.2 Basic bootstrap CI

The following measurements were given for the cylindrical compressive strength (in MPa) for 11 prestressed concrete beams:

38.43, 38.43, 38.39, 38.83, 38.45, 38.35, 38.43, 38.31, 38.32, 38.48, 38.50

1000 bootstrap samples (each sample hence consisting of 11 measurements) were generated from these data, and the 1000 bootstrap means were arranged on order. Refer to the smallest as \bar{x}^*_{(1)} , the second smallest as \bar{x}^*_{(2)} and so on, with the largest being \bar{x}^*_{(1000)}. Assume that

\bar{x}^*_{(25)} = 38.3818\\ \bar{x}^*_{(26)} = 38.3818\\ \bar{x}^*_{(50)} = 38.3909\\ \bar{x}^*_{(51)} = 38.3918\\ \bar{x}^*_{(950)} = 38.5218\\ \bar{x}^*_{(951)} = 38.5236\\ \bar{x}^*_{(975)} = 38.5382\\ \bar{x}^*_{(976)} = 38.5391\\

a)

Compute a 95% bootstrap confidence interval for the mean compressive strength.

  • We need to find:
[q_{0.025}, q_{0.975}]
  • By definition 1.7
q_p = (x_{(np)}+x_{(np+1}))/2
  • So
q_{0.025} = (x_{1000*0.025}+x_{1000*0.025+1})/2 = \\ (x_{25}+ x_{26})/2 = (38.3818 + 38.3818)/2 = 38.3818
  • And
q_{0.975}= (x_{1000*0.975}+x_{1000*0.975+1})/2= \\ (x_{975}+x_{976})/2= (38.5382 + 38.5391)/2 = 38.53865
  • The answer is that we accept that
\bar{x} \in [38.3818, 38.53865]

b)

Compute a 90% bootstrap confidence interval for the mean compressive strength.

  • We know
\alpha = 1 - \frac{90\%}{100\%} = 0.1 \\
  • We need to find
[q_{\alpha/2}, q_{1-\alpha/2}] = [q_{0.05}, q_{0.95}]
  • So
q_{0.05} = (x_{50}+x_{51})/2 = (38.3909+38.3918)/2 = 38.39135 \\ q_{0.95} = (x_{950}+x_{951})/2 = (38.5218+38.5236)/2 = 38.5227
  • The answer is that we accept that
\bar{x} \in [38.39135, 38.5227]

4.3 Various bootstrap CIs

Consider the data from the exercise above. These data are entered into R as:

x <- c(38.43, 38.43, 38.39, 38.83, 38.45, 38.35,
38.43, 38.31, 38.32, 38.48, 38.50)

Now generate k = 1000 bootstrap samples and compute the 1000 means (go higher if your computer is fine with it)

a)

What are the 2.5%, and 97.5% quantiles (so what is the 95% confidence interval for \mu without assuming any distribution)?

> k <- 10000
> x <- c(38.43, 38.43, 38.39, 38.83, 38.45, 38.35,
+        38.43, 38.31, 38.32, 38.48, 38.50)
> simSamples <- replicate(k, sample(x, replace=TRUE))
> simMeans <-   apply(simSamples,2, mean)
> hist(simMeans)
> quantile(simMeans, c(0.025, 0.975))
    2.5%    97.5% 
38.38091 38.53727

b)

Find the 95% confidence interval for \mu by the parametric bootstrap assuming the normal distribution for the observations. Compare with the classical analytic approach based on the t-distribution from Chapter 2.

  • Parametric distribution:
> simSamples <- replicate(k,rnorm(length(x),xBar, sBar))
> simMean <- apply(simSamples, 2, mean)
> quantile(simMean, c(0.025,0.975))
    2.5%    97.5% 
38.36430 38.53135 
  • Based on the t-distribution:
> c(xBar - qt(0.975, df=length(x)-1)*sBar/sqrt(length(x)),
+   xBar + qt(0.975, df=length(x)-1)*sBar/sqrt(length(x)))
[1] 38.35250 38.54205
> t.test(x)

    One Sample t-test

data:  x
t = 903.89, df = 10, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 38.35250 38.54205
sample estimates:
mean of x 
 38.44727 

c)

Find the 95% confidence interval for \mu by the parametric bootstrap assuming the log-normal distribution for the observations.

> logx <-log(x)
> logxMean <- mean(logx) 
> logxSd <- sd(logx)
> sim_logx <- replicate(k, rlnorm(length(logx), logxMean, logxSd))
> sim_mean <- apply(sim_logx, 2, mean)
> quantile(sim_mean, c(0.025, 0.975))
    2.5%    97.5% 
38.36428 38.53088 

d)

Find the 95% confidence interval for the lower quartile Q_1 by the parametric bootstrap assuming the normal distribution for the observations.

> Q1 <- function(x) {quantile(x, 0.25)}
> xBar <- mean(x)
> sBar <- sd(x)
> simSamples <- replicate(k,rnorm(length(x),xBar, sBar))
> simQ1 <- apply(simSamples, 2, Q1)
> quantile(simQ1, c(0.025,0.975))
    2.5%    97.5% 
38.25936 38.46556

e)

Find the 95% confidence interval for the lower quartile Q_1 by the non-parametric bootstrap (so without any distributional assumptions)

> simSamples <- replicate(k, sample(x, replace=TRUE))
> simQ1_nonparametric <-apply(simSamples,2, Q1)
> quantile(simQ1_nonparametric, c(0.025, 0.975))
  2.5%  97.5% 
38.315 38.430 

4.4 Two-sample TV data

A TV producer had 20 consumers evaluate the quality of two different TV flat screens - 10 consumers for each screen. A scale from 1 (worst) up to 5 (best) were used and the following results were obtained:

TV screen 1 TV screen 2
1 3
2 4
1 2
3 4
2 2
1 3
2 2
3 4
1 3
1 2

a)

Compare the two means without assuming any distribution for the two samples (non-parametric bootstrap confidence interval and relevant hypothesis test interpretation).

tv1 <- c(1,2,1,3,2,1,2,3,1,1)
tv2 <- c(3,4,2,4,2,3,2,4,3,2)

k <- 100000
sim_tv1 <- replicate(k, sample(tv1, replace=TRUE))
sim_tv2 <- replicate(k, sample(tv2, replace=TRUE))
means_sim_tv1 <- apply(sim_tv1, 2, mean)
means_sim_tv2 <- apply(sim_tv2, 2, mean) 
sim_dif <- means_sim_tv1 - means_sim_tv2
quantile(sim_dif, c(0.025,0.975))
     2.5% 97.5% 
     -1.9  -0.5
  • If:
H_0 :\ \mu_1= \mu_2
  • so we reject the null hypothesis, since 0 is not included in the CI of the differences.

b)

Compare the two means assuming normal distributions for the two samples without using simulations (or rather: assuming/hoping that the sample sizes are large enough to make the results approximately valid).

 t.test(tv1, tv2)

    Welch Two Sample t-test

data:  tv1 and tv2
t = -3.1574, df = 17.932, p-value = 0.005468
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.99869 -0.40131
sample estimates:
mean of x mean of y 
      1.7       2.9 

c)

Compare the two means assuming normal distributions for the two samples simulation based (parametric bootstrap confidence interval and relevant hypothesis test interpretation – in spite of the obviously wrong assumption).

tv1 <- c(1,2,1,3,2,1,2,3,1,1)
tv2 <- c(3,4,2,4,2,3,2,4,3,2)

k <- 100000
n <- length(tv1)
sim_norm_tv1 <- replicate(k, rnorm(n, mean=mean(tv1),sd=sd(tv1) ))
sim_norm_tv2 <- replicate(k, rnorm(n, mean=mean(tv2),sd=sd(tv2) ))
means_sim_tv1 <- apply(sim_norm_tv1, 2, mean)
means_sim_tv2 <- apply(sim_norm_tv2, 2, mean) 
sim_norm_dif <- means_sim_tv1 - means_sim_tv2
quantile(sim_norm_dif, c(0.025,0.975))

      2.5%      97.5% 
-1.9371878 -0.4560157 
  • We reject the null hypothesis of \mu_1=\mu_2

4.5 Non-linear error propagation

The pressure P, and the volume V of one mole of an ideal gas are related by the equation PV = 8.31T, when P is measured in kilopascals, T is measured in kelvins, and V is measured in liters.

a)

Assume that P is measured to be 240.48 kPa and V to be 9.987 L with known measurement errors (given as standard deviations): 0.03 kPa and$ 0.002 L$. Estimate T and find the uncertainty in the estimate.

We know: $$ PV = 8.31T\ T = \frac{PV}{8.31} $$ Estimate T: $$ \hat{T} = 240.48*9.987 / 8.31 = 289.0101 $$

Differentiate the function with respect to both p and v:

f(p,v) = \frac{pv}{8.31} \\ \frac{\partial f}{\partial p} = \frac{1*p^0v}{8.31}=\frac{v}{8.31} \\ \frac{\partial f}{\partial v} = \frac{p*1*v^0}{8.31} = \frac{p}{8.31} \\

Calculate the uncertainty:

\sigma^2_{\hat{T}} = \left(\frac{\partial f}{\partial p}\right)^2*\sigma^2_p+\left(\frac{\partial f}{\partial v}\right)^2*\sigma^2_v = \\ \frac{v^2}{8.31^2} * \sigma^2_p + \frac{p^2}{8.31^2}*\sigma^2_v =\\ 9.987^2/8.31^2*0.03^2+240.48^2/8.31^2*0.002^2 = 0.3210887 \\

So the uncertainty is:

\sigma_{\hat{T}} = \sqrt{0.3210887} = 0.06818855

b)

Assume that P is measured to be 240.48kPa and T to be 289.12K with known measurement errors (given as standard deviations): 0.03kPa and 0.02K. Estimate V and find the uncertainty in the estimate.

We know: $$ V = \frac{8.31T}{P}\ $$ Estimate $$ \hat{V} = 8.31*289.12/240.48= 9.990798 $$

Differentiate the function with respect to both p and v: $$ f(p,v) = \frac{8.31t}{p} \ \frac{\partial f}{\partial t} = \frac{1*t^0*8.31}{p}=\frac{8.31}{p} \ \frac{\partial f}{\partial p} = 8.31t*-1*p^{-1-1} =- \frac{8.31t}{p^2} \ $$

Calculate the uncertainty: $$ \sigma^2_{\hat{V}} = \left(\frac{\partial f}{\partial t}\right)2*\sigma2_t+\left(\frac{\partial f}{\partial p}\right)2*\sigma2_p = \ \frac{8.312}{p2} * \sigma^2_t + \left( -\frac{8.31*t}{p^2}\sigma_p \right)^2 =\ 8.312/240.482*0.02^2 + (- 8.31*289.12/240.482*0.03)2= 0.001425149 ^2\ $$

So the uncertainty is: $$ \sigma_{\hat{T}} = \sqrt{0.001425149 ^2} = 0.001425149 $$

c)

Assume that V is measured to be 9.987 L and T to be 289.12 K with known measurement errors (given as standard deviations): 0.002 L and 0.02 K. Estimate P and find the uncertainty in the estimate.

We know:

P = \frac{8.31T}{V}\\

Estimate P:

\hat{P} = 8.31*289.12/9.987 = 240.5715

Differentiate the function with respect to both t and v:

f(t,v) = \frac{8.31t}{v} \\ \frac{\partial f}{\partial t} = \frac{1*t^0*8.31}{v}=\frac{8.31}{v} \\ \frac{\partial f}{\partial v} = 8.31t*-1*v^{-1-1} =- \frac{8.31t}{v^2} \\

Calculate the uncertainty:

\sigma^2_{\hat{P}} = \left(\frac{\partial f}{\partial t}\right)^2*\sigma^2_t+\left(\frac{\partial f}{\partial v}\right)^2*\sigma^2_v = \\ \frac{8.31^2}{v^2} * \sigma^2_t + \left( -\frac{8.31*t}{v^2}\sigma_v \right)^2 =\\ 8.31^2/9.987^2*0.02^2 + (- 8.31*289.12/9.987^2*0.002)^2= 0.00259796\\

So the uncertainty is:

\sigma_{\hat{T}} = \sqrt{0.00259796} = 0.05097019

d)

Try to answer one or more of these questions by simulation (assume that the errors are normally distributed).

Let's do simulations in order to estimate P and find the the uncertainty in the estimate, where: $$ P = \frac{8.31T}{V}\ $$

> n <- 10000
> Ts <- rnorm(n, mean= 289.12, sd = 0.02)
> Vs <- rnorm(n, mean= 9.987, sd = 0.002)
> p <- 8.31*Ts/Vs
> mean(p)
[1] 240.5706
> sd(p)
[1] 0.05104125