practical data analysis – examples 43
scales =list (log =10 , tck =0 .5 ),
auto.key=list(columns=2), aspect=1)
Fitting a regression line
The plots suggest that a line is a good fit. Note however that the
data span a huge range of distances. The ratio of longest to shortest
distance is almost 3000:1. Departures from the line are of the order
of 15% at the upper end of the range, but are so small relative to this
huge range that they are not obvious.
The following uses the function lm() to fit a straight line fit to The name lm is a mnemonic for
linear model.
the logged data, then extracting the regression coefficients:
The equation gives predicted
times:
[
Time = e
0.7316
⇥ Distance
1.1248
= 2.08 ⇥ Distance
1.1248
This implies, as would be ex-
pected, that kilometers per minute
increase with increasing distance.
Fitting a line to points that are on a
log scale thus allows an immediate
interpretation.
worldrec.lm <- lm (log (Time) ~ log (Distance),
data =worldRecords)
coef (worldrec.lm)
(Intercept) log(Distance)
0.7316 1.1248
There is no di↵erence that can be detected visually between the
track races and the road races. Careful analysis will in fact find no
di↵erence.
5.2.1 Summary information from model objects
In order to avoid recalculation of the model information each time The name worldrec.lm is used
to indicate that this is an lm object,
with data from worldRecords.
Use any name that seems helpful!
that some di↵erent information is required, we store the result from
the lm() calculation in the model object worldrec.lm.
Plot points; add line:
plot (log(Time) ~ log (Distance),
data =worldRecords)
abline (worldrec.lm)
Note that the function abline() can be used with the model
object as argument to add a line to the plot of log(Time) against
log(Distance).
Diagnostic plots
Insight into the adequacy of the line can be obtained by examining
the “diagnostic” plots, obtained by “plotting” the model object.
Figure 5.5 following shows the first and last of the default plots:
## Code
plot (worldrec.lm, which =c(1 ,5) ,
sub.caption=rep ("" ,2))
By default, there are four “diagnostic” plots. Panel A is designed
to give an indication whether the relationship really is linear, or
whether there is some further systematic component that should
perhaps be modeled. It does show systematic di↵erences from a
line. The largest of these is more than a 15% di↵erence.
1
There are
1
A di↵erence of 0.05 on a scale of
log
e
translates to a di↵erence of
just over 5%. A di↵erence of 0.15
translates to a di↵erence of just
over 16%, i.e., slightly more than
15%.
mechanisms for using a smooth curve to account for the di↵erences
from a line, if these are thought important enough to model.