3 Analysis of Two Samples
3.6 Cholesterol
In a clinical trial of a cholesterol-lowering agent, 15 patients’ cholesterol (in mmol/L) has been measured before treatment and 3 weeks after starting treatment. Data are listed in the following table:
Patient | Before | After |
---|---|---|
1 | 9.1 | 8.2 |
2 | 8.0 | 6.4 |
3 | 7.7 | 6.6 |
4 | 10.0 | 8.5 |
5 | 9.6 | 8.0 |
6 | 7.9 | 5.8 |
7 | 9.0 | 7.8 |
8 | 7.1 | 7.2 |
9 | 8.3 | 6.7 |
10 | 9.6 | 9.8 |
11 | 8.2 | 7.1 |
12 | 9.2 | 7.7 |
13 | 7.3 | 6.0 |
14 | 8.5 | 6.6 |
15 | 9.5 | 8.4 |
The following is run in R:
x1 <- c(9.1, 8.0, 7.7, 10.0, 9.6, 7.9, 9.0, 7.1, 8.3,
9.6, 8.2, 9.2, 7.3, 8.5, 9.5)
x2 <- c(8.2, 6.4, 6.6, 8.5, 8.0, 5.8, 7.8, 7.2, 6.7,
9.8, 7.1, 7.7, 6.0, 6.6, 8.4)
t.test(x1, x2)
Welch Two Sample t-test
data: x1 and x2
t = 3.3, df = 27, p-value = 0.003
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.4637 1.9630
sample estimates:
mean of x mean of y
8.600 7.387
t.test(x1, x2, pair=TRUE)
Paired t-test
data: x1 and x2
t = 7.3, df = 14, p-value = 0.000004
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.8588 1.5678
sample estimates:
mean of the differences
1.213
a)
Can there, based on these data be demonstrated a significant decrease in cholesterol levels with \alpha = 0.001?
This is clearly a paired setting, because the same persons were measured before treatment and 3 weeks after starting treatment. So only the results from the last of the R-calls are relevant, where we can read off the results:
The (non-directional) \text{p-value}= 0.00000367, so there is very strong evidence against the null hypothesis, and we can beyond any reasonable doubts conclude that the mean cholesterol level has decreased after the 3 weeks.
3.7 Pulse
Runner | Pulse end | Pulse 1min |
---|---|---|
1 | 173 | 120 |
2 | 175 | 115 |
3 | 174 | 122 |
4 | 183 | 123 |
5 | 181 | 125 |
6 | 180 | 140 |
7 | 170 | 108 |
8 | 182 | 133 |
9 | 188 | 134 |
10 | 178 | 121 |
11 | 181 | 130 |
12 | 183 | 126 |
13 | 185 | 128 |
The following was run in R:
Pulse_end <- c(173,175,174,183,181,180,170,182,188,178,181,183,185)
Pulse_1min <- c(120,115,122,123,125,140,108,133,134,121,130,126,128)
mean(Pulse_end)
[1] 179.5
mean(Pulse_1min)
[1] 125
sd(Pulse_end)
[1] 5.19
sd(Pulse_1min)
[1] 8.406
sd(Pulse_end-Pulse_1min)
[1] 5.768
a)
What is the 99% confidence interval for the mean pulse drop (meaning the drop during 1 minute from end of workout)?
- We know that these samples are dependent(paired situation), so so we apply the one-sample theory (Method 3.9) on the differences in order to find CI
mean_diff <- mean(Pulse_end) - mean(Pulse_1min)
c(mean_diff - qt(1-0.01/2, df=12)*sd_diff/sqrt(13),
mean_diff + qt(1-0.01/2, df=12)*sd_diff/sqrt(13))
49.57507 59.34801
b)
Consider now the 13 pulse end measurements (first row in the table). What is the 95% confidence interval for the standard deviation of these?
By method 3.19
c(sqrt(12*var(Pulse_end)/qchisq(1-0.05/2, df=12)), sqrt(12*var(Pulse_end)/qchisq(0.05/2, df=12)))
[1] 3.721662 8.567283
- So the answer is that we accept that \sigma\in[3.721662, 8.567283]
3.8 Foil production
In the production of a certain foil (film), the foil is controlled by measuring the thickness of the foil in a number of points distributed over the width of the foil. The production is considered stable if the mean of the difference between the maximum and minimum measurements does not exceed 0.35 mm. At a given day, the following random samples are observed for 10 foils:
Foil | Max. in mm y_{max} | Min. in mm y_{min} | Max-Min(D) |
---|---|---|---|
1 | 2.62 | 2.14 | 0.48 |
2 | 2.71 | 2.39 | 0.32 |
3 | 2.18 | 1.86 | 0.32 |
4 | 2.25 | 1.92 | 0.33 |
5 | 2.72 | 2.33 | 0.39 |
6 | 2.34 | 2.00 | 0.34 |
7 | 2.63 | 2.25 | 0.38 |
8 | 1.86 | 1.50 | 0.36 |
9 | 2.84 | 2.27 | 0.57 |
10 | 2.93 | 2.37 | 0.56 |
The following statistics may potentially be used $$ \bar{y}{max} = 2.508, \bar{y}= 2.103, s_{y_{max}}=0.3373,\ s_{y_{min}}=0.2834, s_D=0.09664 $$
a)
What is a 95% confidence interval for the mean difference?
Answer: $$ [0.3358679, 0.4741321] $$
> c(2.508-2.103 - qt(1-0.05/2, df=9)*0.09664/sqrt(10),
2.508-2.103 + qt(1-0.05/2, df=9)*0.09664/sqrt(10))
[1] 0.3358679 0.4741321
Note: The confidence interval contains those values of the mean difference that we believe in based on the data.
b)
How much evidence is there that the mean difference is different from 0.35? State the null hypothesis, t-statistic and p-value for this question.
By using Method 3.36
1.
$$
t_{obs}= \frac{\bar{x}-\mu_{0}}{s/\sqrt{n}}=1.799723
$$
> ((2.508-2.103)-0.35)/(0.09664/sqrt(10))
[1] 1.799723
2. $$ \text{p-value}= 2*P(T>|t_{obs}|) = 0.1054369 $$
> 2*(1-pt(1.799723, df=9))
[1] 0.1054369
3. Conclusion:
\text{p-value} is 0.1054369, so there is little or evidence against H_{0}, which is \mu=0.35
3.9 Course project
At a specific education it was decided to introduce a project, running through the course period, as a part of the grade point evaluation. In order to assess whether it has changed the percentage of students passing the course, the following data was collected:
Before introduction of project | After introduction of project | |
---|---|---|
Number of students evaluated | 50 | 24 |
Number of students failed | 13 | 3 |
Average grade point \bar{x} | 6.420 | 7.375 |
Sample standard deviation s | 2.205 | 1.813 |
a)
As it is assumed that the grades are approximately normally distributed in each group, the following hypothesis is tested:
The test statistic, the p-value and the conclusion for this test become?
By method 3.51(Welch):
> tobs <- (6.420-7.375)/(sqrt((2.205^2/50)+(1.813^2/24)))
[1] -1.973387
> v <- ((s1^2/n1)+(s2^2/n2))^2/
(((s1^2/n1)^2/(n1-1))+((s2^2/n2)^2/(n2-1)))
[1] 54.38591
2*(1-pt(1.973387, df=54.38591))
- Conclusion
On a 5% level we cannot conclude a significant difference in the grade point means before and after. This means that we cannot reject H_0
b)
A 99% confidence interval for the mean grade point difference is?
By method 3.47:
c(6.420-7.375-qt(1-0.01/2, df=v)*sqrt(2.205^2/50+1.813^2/24),
6.420-7.375+qt(1-0.01/2, df=v)*sqrt(2.205^2/50+1.813^2/24))
[1] -2.2467772 0.3367772
- The answer: we accept that the mean difference in the interval [-2.247; 0.337]
c)
A 95% confidence interval for the grade point standard deviation after the introduction of the project becomes?
By method 3.19
n <- 24
sd <- 1.813
alpha <- 0.05
c(sqrt((n-1)*sd^2/qchisq(1-alpha/2, df=(n-1))),
sqrt((n-1)*sd^2/qchisq(alpha/2, df=(n-1))))
[1] 1.409088 2.543205
- So the answer is that we accept that \sigma\in[1.409088, 2.543205]