Title: | Functions for University of Auckland Course STATS 201/208 Data Analysis |
---|---|
Description: | A set of functions used in teaching STATS 201/208 Data Analysis at the University of Auckland. The functions are designed to make parts of R more accessible to a large undergraduate population who are mostly not statistics majors. |
Authors: | Brant Deppa [aut] (Wrote the original R scripts this package is derived from), James Curran [aut, cre] (Wrote the original R package. Current maintainer.), Rachel Fewster [ctb], Russell Millar [ctb], Ben Stevenson [ctb], Andrew Balemi [ctb], Chris Wild [ctb], Sophie Jones [ctb], Dineika Chandra [ctr], Brendan McArdle [ctr] |
Maintainer: | James Curran <[email protected]> |
License: | GPL-2 | file LICENSE |
Version: | 3.1-40 |
Built: | 2024-10-13 05:03:39 UTC |
Source: | https://github.com/stats-uoa/s20x |
Number of international airline passengers (in thousands) recorded monthly from January 1949 to December 1960.
A time series with 144 observations.
These data come from a classic long-term experiment conducted at the East Malling Research Station, Kent, which is the centre four research into apple growing in the U.K. Commercial apple trees consist of two parts grafted together. The lowest part, the rootstock, largely determines the size of the tree, while the upper part (the scion) determines the fruit characteristics. Rootstocks propagated by cuttings (i.e. asexually produced) were once thought to result in smaller trees than those propagated from seeds (i.e. sexually produced). This hypothesis was re-examined in an experiment begun in 1918. Several trees of each type of 16 types of rootstock were planted, all trees having the same scion. Rootstocks I-IX were asexually produced, while X-XVI were sexually produced. In the winter of 1933-4 a number of trees were removed to make room for more, and the data presented here consists of the above ground weights of 104 trees felled in this period. No trees of types VIII, XI or XIV were felled. The description is from Lee (Lee, A.J. Data analysis. An introduction based on R. University of Auckland 1994). The data are from Andrews and Herzberg (1985).
The data consist of a data frame with 104 observations on 3 variables.
[,1] | Rootstock | factor | levels (I, II, III, IV, IX, V, VI, VII, X, XII, XIII, XV, XVI) |
[,2] | Weight | integer | . |
[,3] | Propagated | factor | levels (cutting, seed) |
Data from an experiment to measure the effect of different images on emotional arousal, by measuring changes in pupil diameter. The experiment used 20 males and 20 females. Images included a nude man, nude woman, infant, and a landscape.
A data frame with 160 observations on 3 variables.
[,1] | arousal | numeric | Change in the subject's pupil size |
[,2] | gender | factor | Subject's gender (female, male) |
[,3] | picture | factor | Picture shown to subject (infant, landscape, nude female, nude male) |
Plots current vs lagged residuals along with quadrants dividing these residuals about the value zero.
autocorPlot(fit, main = "Current vs Lagged residuals", ...)
autocorPlot(fit, main = "Current vs Lagged residuals", ...)
fit |
output from the function 'lm()'. |
main |
the plot title. |
... |
extra parameters to be passed to the |
Plots current vs lagged residuals along with quadrants dividing these residuals about the value zero.
autocor.plot
is deprecated inline with our new policy of removing periods from function names.
data(airpass.df) time = 1:144 airpass.fit = lm(passengers ~ time, data = airpass.df) autocorPlot(airpass.fit)
data(airpass.df) time = 1:144 airpass.fit = lm(passengers ~ time, data = airpass.df) autocorPlot(airpass.fit)
Monthly United States beer production figures (in millions of 31-gallon barrels) for the period July 1970 to June 1978.
A time series with 96 observations.
Data collected to examine how women from various ethnic groups rate their body image. All subjects were slightly underweight for their body size.
A data frame with 246 observations on 8 variables.
[,1] | ethnicity | factor | Subject's ethnicity (Asian, Europn, Maori, Pacific) |
[,2] | married | . | . |
[,3] | bodyim | factor | Subject's rating of themself (slight.uw, right, slight.ow, mod.ow, very.ow) |
[,4] | sm.ever | . | . |
[,5] | weight | . | . |
[,6] | height | . | . |
[,7] | age | . | . |
[,8] | stressgp | . | . |
This data consists of 50 sentence lengths from each of 8 books. The books “Disclosure” and “Rising Sun” were written by Michael Crichton, whilst the others “Four Past Midnight”, “The Dark Half”, “ Eye of the Dragon”, “The Shining”, “The Stand” and “The Tommy-Knockers” where written by Stephen King. The pages and sentences where chosen using a multistage design where the pages where selected at random, and then sentences within each page were selected at random. These data were collected by James Curran.
The data frame consists of 400 observations on 2 variables.
[,1] | length | integer | |
[,2] | book | factor | levels (4.Past.Mid, Dark.Half, Disclosure, Eye.Drag, |
Rising.Sun, Shining, Stand, T.Knock) |
Draws boxplots and normal quantile-quantile plots of x for each value of the grouping variable g
boxqq(formula, ...) ## S3 method for class 'formula' boxqq(formula, data = NULL, ...)
boxqq(formula, ...) ## S3 method for class 'formula' boxqq(formula, data = NULL, ...)
formula |
A symbolic specification of the form |
... |
Arguments to be passed to methods, such as graphical parameters
(see |
data |
An optional data frame in which to evaluate the formula. |
Returns the plot.
boxqq(formula)
: Box plots and normal quantile-quantile plots
This function is deprecated and will be removed in later versions of the package.
## Zoo data data(zoo.df) boxqq(attendance ~ day.type, data = zoo.df)
## Zoo data data(zoo.df) boxqq(attendance ~ day.type, data = zoo.df)
Data for the 2001 Bursary results for 75 secondary schools in the Auckland area. For each school the decile rating of the school is recorded along with the percentage of eligible students who gain a B Bursary or better.
A data frame with 75 observations on 2 variables.
[,1] | decile | numeric | Decile rating of the school |
[,2] | pass.rate | numeric | Percentage of eligible students who gained a 'B' Busary or better |
This data gives the mean percentage of butterfat produced by different Canadian pure-bred diary cattle. There are five different breeds and two age groups, two years old and greater than five years old. For each combination of breed and age, there are measurements for 10 cows.
A data frame with 100 observations on 3 variables.
[,1] | Butterfat | numeric | Mean percentage of butterfat per cow |
[,2] | Breed | factor | Breed (ayrshire, canadian, guernesy, holst.fres, jersey) |
[,3] | Age | factor | Age group (2yo, mature) |
A Handbook of Small Data Sets
Hand, D.J., Daly, F., Lunn, A.D., McConway, K.J. and Ostrowski, E. (1994). A Handbook of Small Data Sets. Boca Raton, Florida: Chapman and Hall/CRC.
Sokal, R.R. and Rohlf, F.J. (1981). Biometry, 2nd edition. San Francisco: W.H. Freeman, 368.
66 bluegills were captured from Camp Lake, Minnesota. For each bluegill we have the length of the fish, its age in years and its age in scale radius.
A data frame with 66 observations on 3 variables.
[,1] | Age | numeric | Age of fish (years) |
[,2] | Scale.Radius | numeric | Age of fish (radius of the key scale (mm/100) ) |
[,3] | Length | numeric | Length at capture (mm) |
These data involve 11 laboratories and 2 brands of chalk. The laboratories tested the density of the chalk. The main interest was whether the different laboratories yielded the same density for the two different types of chalk.
A data frame with 66 observations on 3 variables.
[,1] | Density | numeric | Density of the chalk |
[,2] | Lab | integer | Laboratory where testing done |
[,3] | Chalk | factor | Chalk tested (A, B) |
Calculates and prints the confidence intervals for the fitted model.
ciReg(fit, conf.level = 0.95, print.out = TRUE)
ciReg(fit, conf.level = 0.95, print.out = TRUE)
fit |
an object of classlm, i.e. the output from |
conf.level |
confidence level of the intervals. |
print.out |
if |
The function returns a two-column matrix containing the upper and lower endpoints of the intervals.
##Peruvian Indians data data(peru.df) fit=lm(BP ~ age + years + weight + height, data = peru.df) ciReg(fit)
##Peruvian Indians data data(peru.df) fit=lm(BP ~ age + years + weight + height, data = peru.df) ciReg(fit)
Data from a test to see if a questionnaire was properly designed. The questionnaire measures managers' technical knowledge of computers. The test has 19 managers complete the questionnaire as well as rate their own technical expertise.
A data frame with 19 observations on 2 variables.
[,1] | score | numeric | Questionnaire score |
[,2] | selfassess | ordered factor | Self-assessed level of expertise (1 = low, 2 = medium, 3 = high) |
Draws a Cook's distance plot.
cooks20x( x, main = "Cook's Distance plot", xlab = "observation number", ylab = "Cook's distance", line = c(0.5, 1.2, 2), cex.labels = 1, axisOpts = list(xAxis = TRUE, yAxisTight = FALSE), ... )
cooks20x( x, main = "Cook's Distance plot", xlab = "observation number", ylab = "Cook's distance", line = c(0.5, 1.2, 2), cex.labels = 1, axisOpts = list(xAxis = TRUE, yAxisTight = FALSE), ... )
x |
an object of class |
main |
the plot title |
xlab |
the x-axis title. |
ylab |
the y-axis title. |
line |
a vector of length 3 controlling the distances of the plot title, the x-axis title and the y-axis title from the axis in line units. |
cex.labels |
a factor controlling the font size of the labels on suspected high influence points. |
axisOpts |
a list of additional arguments that can be used to control the axes. At this point
this list only contains one element |
... |
additional arguments are passed to |
Returns the plot and identifies the three highest Cook's values
# Peruvian Indians data data(peru.df) peru.fit = lm(BP ~ age + years + I(years^2) + weight + height, data = peru.df) cooks20x(peru.fit)
# Peruvian Indians data data(peru.df) peru.fit = lm(BP ~ age + years + I(years^2) + weight + height, data = peru.df) cooks20x(peru.fit)
Data from a summer school Stats 20x course. Each observation represents a single student.
A data frame with 146 observations on 15 variables.
[,1] | Grade | factor | Final grade for the course (A, B, C, D) |
[,2] | Pass | factor | Passed the course (No, Yes) |
[,3] | Exam | numeric | Mark in the final exam |
[,4] | Degree | factor | Degree enrolled in (BA, BCom, BSc, Other) |
[,5] | Gender | factor | Gender (Female, Male) |
[,6] | Attend | factor | Regularly attended class (No, Yes) |
[,7] | Assign | numeric | Assignment mark |
[,8] | Test | numeric | Test mark |
[,9] | B | numeric | Mark for the short answer section of the exam |
[,10] | C | numeric | Mark for the long answer section of the exam |
[,11] | MC | numeric | Mark for the multiple choice section of the exam |
[,12] | Colour | factor | Colour of the exam booklet (Blue, Green, Pink, Yellow) |
[,13] | Stage1 | factor | Stage one grade (A, B, C) |
[,14] | Years.Since | numeric | Number of years since doing Stage 1 |
[,15] | Repeat | factor | Repeating the paper (No, Yes) |
Data from a summer school Stats 20x course. Each observation represents a single student. It is of interest to see if there is a relationship between a student's final examination mark and both their gender and whether they regularly attend lectures.
A data frame with 40 observations on 3 variables.
[,1] | Exam | numeric | Final exam mark (out of 100) |
[,2] | Gender | factor | Gender (Female, Male) |
[,2] | Attend | factor | Regularly attended or not (No, Yes) |
Computes a factor that has a level for each combination of the factors 'fac1' and 'fac2'.
crossFactors(x, fac2 = NULL, ...) ## Default S3 method: crossFactors(x, fac2 = NULL, ...) ## S3 method for class 'formula' crossFactors(formula, fac2 = NULL, data = NULL, ...)
crossFactors(x, fac2 = NULL, ...) ## Default S3 method: crossFactors(x, fac2 = NULL, ...) ## S3 method for class 'formula' crossFactors(formula, fac2 = NULL, data = NULL, ...)
x |
the name of the first factor or a formula in the form |
fac2 |
the name of the second factor - ignored if |
... |
Optional arguments |
formula |
a formula in the form |
data |
an optional data frame in which to evaluate the formula |
Returns a vector containing the factor which represents the interaction of the given factors.
crossFactors(default)
: Crossed Factors
crossFactors(formula)
: Crossed Factors
This function actually returns a factor
now instead of a character string, so coercion into a factor
is no longer necessary.
## arousal data: data(arousal.df) gender.picture = crossFactors(arousal.df$gender, arousal.df$picture) gender.picture ## arousal data: data(arousal.df) gender.picture = crossFactors(~ gender * picture, data = arousal.df) gender.picture
## arousal data: data(arousal.df) gender.picture = crossFactors(arousal.df$gender, arousal.df$picture) gender.picture ## arousal data: data(arousal.df) gender.picture = crossFactors(~ gender * picture, data = arousal.df) gender.picture
Produces a 2-way table of counts and the corresponding chi-square test of independence or homogeneity.
crosstabs(formula, data)
crosstabs(formula, data)
formula |
a symbolic description of the model to be fit: ~ fac1 + fac2; where fac1 and fac2 are vectors to be crosstabulated and treated internally as factors. |
data |
an optional data frame containing the variables in the model. |
An invisible list containing the following components:
row.props |
a matrix of row proportions, i.e. cell counts divided by row marginals. |
col.props |
a matrix of column proportions, i.e. cell counts divided by column marginals. |
Totals |
a matrix containing the cell counts and the marginal totals. |
This function is deprecated and will be removed in future versions of the package.
##body image data: data(body.df) crosstabs(~ ethnicity + married, body.df)
##body image data: data(body.df) crosstabs(~ ethnicity + married, body.df)
Prices of ladies' diamond rings from a Singaporean retailer and the weight of their diamond stones.
A data frame with 48 observations on 2 variables.
[,1] | price | numeric | Price of ring (Singapore dollars) |
[,2] | weight | numeric | Weight of Diamond (carats) |
Displays within-level pairwise comparisons from a two-way ANOVA with
interactions. Note that this is just a display function: it ignores any
cross-level pairs included in allpairs
, even though these will have
contributed to the computations for the Tukey adjustments. The purpose is
just to organise the output from emmeans
into a more convenient
format.
displayPairs(allpairs, levels1, levels2, brief = TRUE, asDF = FALSE)
displayPairs(allpairs, levels1, levels2, brief = TRUE, asDF = FALSE)
allpairs |
pairwise output from a command like |
levels1 |
a character string specifying which within-level comparisons
from |
levels2 |
a character string specifying which within-level comparisons
from |
brief |
either |
asDF |
either |
allpairs
is a pairwise output from a command like
pairs(emmeans(fit, ~factor1 * factor2))
. If allpairs
is not
already a data.frame
it will be converted to a data.frame
within this function. It must contain a column called contrast
with
text descriptions like 'lev1 lev2 - lev3 lev4'
etc. levels1
and levels2
are character strings specifying which within-level
comparisons are wanted, and in which order. They must match the order
specified in emmeans
, so if using emmeans(fit, ~factor1 *
factor2)
then levels1
must belong to factor1
and
levels2
must belong to factor2
. All this function does is to
pick out the rows of allpairs
with the requested contrasts, so if
there are no contrasts of the requested format (e.g. because levels1
and levels2
have been switched) it will output a blank list. If
brief = TRUE
, columns labelled df
, SE
, and
t.ratio
or z.ratio
will be removed for a more succinct
display. If asDF = TRUE
, the output is returned as a data-frame
suitable for further manipulation, whereas if asDF = FALSE
it is
returned as a list for display only.
Rachel Fewster
## Fit a two-way ANOVA to the arousal data in arousal.df. ## The factors are gender (female, male) and picture shown to ## subject (infant, landscape, nude.f, nude.m): data(arousal.df) arousal.fit = lm(arousal ~ gender * picture, data = arousal.df) ## Create a data-frame with all pairwise comparisons using \code{emmeans}: require(emmeans) arousal.allpairs = pairs(emmeans(arousal.fit, ~gender * picture), infer = TRUE) ## Display only the within-level comparisons: displayPairs(arousal.allpairs, levels1 = c('female', 'male'), levels2 = c('infant', 'landscape', 'nude.f', 'nude.m'))
## Fit a two-way ANOVA to the arousal data in arousal.df. ## The factors are gender (female, male) and picture shown to ## subject (infant, landscape, nude.f, nude.m): data(arousal.df) arousal.fit = lm(arousal ~ gender * picture, data = arousal.df) ## Create a data-frame with all pairwise comparisons using \code{emmeans}: require(emmeans) arousal.allpairs = pairs(emmeans(arousal.fit, ~gender * picture), infer = TRUE) ## Display only the within-level comparisons: displayPairs(arousal.allpairs, levels1 = c('female', 'male'), levels2 = c('infant', 'landscape', 'nude.f', 'nude.m'))
Plots the residuals versus the fitted (or predicted) values from a linear
model. A horizontal line is drawn at y = 0, reflecting the fact that we
expect the residuals to have a mean of zero. An optional lowess line is
drawn if smoother is set to TRUE. This can be useful in determining whether
a trend still exists in the residuals. An optional pair of lines is drawn at
+/- 2 times the standard deviation of the residuals - which is estimated
from the Residual Mean Sqare (Within group mean square = WGMS). This can be
useful in highlighting potential outliers. If the model has one or two
factors and no continous variables, i.e. if it is a oneway or twoway ANOVA
model, and levene = TRUE
then the P-value from Levene's test for
equality variance is displayed in the top left hand corner,as long as the
number of observations per group exceeds two.
eovcheck(x, ...) ## S3 method for class 'formula' eovcheck( x, data = NULL, xlab = "Fitted values", ylab = "Residuals", col = NULL, smoother = FALSE, twosd = FALSE, levene = FALSE, ... ) ## S3 method for class 'lm' eovcheck(x, smoother = FALSE, twosd = FALSE, levene = FALSE, ...)
eovcheck(x, ...) ## S3 method for class 'formula' eovcheck( x, data = NULL, xlab = "Fitted values", ylab = "Residuals", col = NULL, smoother = FALSE, twosd = FALSE, levene = FALSE, ... ) ## S3 method for class 'lm' eovcheck(x, smoother = FALSE, twosd = FALSE, levene = FALSE, ...)
x |
A linear model formula. Alternatively, a fitted lm object from a linear model. |
... |
Optional arguments |
data |
A data frame in which to evaluate the formula. |
xlab |
a title for the x axis: see |
ylab |
a title for the y axis: see |
col |
a color for the lowess smoother line. |
smoother |
if TRUE then a smoothed lowess line will be added to the plot |
twosd |
if |
levene |
if |
eovcheck(formula)
: Testing for equality of variance plot
eovcheck(lm)
: Testing for equality of variance plot
# one way ANOVA - oysters data(oysters.df) oyster.fit = lm(Oysters ~ Site, data = oysters.df) eovcheck(oyster.fit) # Same model as the previous example, but using eovcheck.formula data(oysters.df) eovcheck(Oysters ~ Site, data = oysters.df) # A two-way model without interaction data(soyabean.df) soya.fit=lm(yield ~ planttime + cultivar, data = soyabean.df) eovcheck(soya.fit) # A two-way model with interaction data(arousal.df) arousal.fit = lm(arousal ~ gender * picture, data = arousal.df) eovcheck(arousal.fit) # A regression model data(peru.df) peru.fit = lm(BP ~ height + weight + age + years, data = peru.df) eovcheck(peru.fit) # A time series model data(airpass.df) t = 1:144 month = factor(rep(1:12, 12)) airpass.df = data.frame(passengers = airpass.df$passengers, t = t, month = month) airpass.fit = lm(log(passengers)[-1] ~ t[-1] + month[-1] + log(passengers)[-144], data = airpass.df) eovcheck(airpass.fit)
# one way ANOVA - oysters data(oysters.df) oyster.fit = lm(Oysters ~ Site, data = oysters.df) eovcheck(oyster.fit) # Same model as the previous example, but using eovcheck.formula data(oysters.df) eovcheck(Oysters ~ Site, data = oysters.df) # A two-way model without interaction data(soyabean.df) soya.fit=lm(yield ~ planttime + cultivar, data = soyabean.df) eovcheck(soya.fit) # A two-way model with interaction data(arousal.df) arousal.fit = lm(arousal ~ gender * picture, data = arousal.df) eovcheck(arousal.fit) # A regression model data(peru.df) peru.fit = lm(BP ~ height + weight + age + years, data = peru.df) eovcheck(peru.fit) # A time series model data(airpass.df) t = 1:144 month = factor(rep(1:12, 12)) airpass.df = data.frame(passengers = airpass.df$passengers, t = t, month = month) airpass.fit = lm(log(passengers)[-1] ~ t[-1] + month[-1] + log(passengers)[-144], data = airpass.df) eovcheck(airpass.fit)
Calculates and prints Tukey multiple confidence intervals for contrasts in one or two-way ANOVA.
estimateContrasts( contrast.matrix, fit, row = TRUE, alpha = 0.05, L = NULL, FUN = identity )
estimateContrasts( contrast.matrix, fit, row = TRUE, alpha = 0.05, L = NULL, FUN = identity )
contrast.matrix |
a matrix of contrast coefficients. Separate rows of the matrix contain the contrast coefficients for that particular contrast, and a column for level of the factor. |
fit |
output from the |
row |
if |
alpha |
the nominal error rate for the multiple confidence intervals. |
L |
number of contrasts. If NULL, L will be set to the number of rows in the contrast matrix, otherwise L will be as specified. |
FUN |
optional function to be applied to estimates and confidence intervals. Typically for backtransformation operations. |
Returns a matrix whose rows correspond to the different contrasts being estimated and whose columns correspond to the point estimate of the contrast, the Tukey lower and upper limits of the confidence interval, the unadjusted p-value, the Tukey and Bonferroni p-values.
: This function is no longer exported as it should never be called by the user. It will ultimately be removed.
summary1way
, summary2way
, multipleComp
## computer data: data(computer.df) computer.df = within(computer.df, {selfassess = factor(selfassess)}) computer.fit = lm(score ~ selfassess, data = computer.df) contrast.matrix = matrix(c(-1/2, -1/2, 1), byrow = TRUE, nrow = 1, ncol = 3) contrast.matrix s20x:::estimateContrasts(contrast.matrix,computer.fit)
## computer data: data(computer.df) computer.df = within(computer.df, {selfassess = factor(selfassess)}) computer.fit = lm(score ~ selfassess, data = computer.df) contrast.matrix = matrix(c(-1/2, -1/2, 1), byrow = TRUE, nrow = 1, ncol = 3) contrast.matrix s20x:::estimateContrasts(contrast.matrix,computer.fit)
House damage and distance from the fire station, of 15 house fires. Data collected by an insurance company for homes in a particular area.
A data frame with 15 observations on 2 variables.
[,1] | damage | numeric | Damage (1000s of dollars) |
[,2] | distance | numeric | Distance from the fire station (miles) |
If hypothprob is absent: prints confidence intervals for the true proportions, a Chi-square test for uniformity, confidence intervals for differences in proportions (no corrections for multiple comparisons and plots the proportions.
freq1way( counts, hypothprob, conf.level = 0.95, addCIs = TRUE, digits = 4, arrowwid = 0.1, estimated = 0 )
freq1way( counts, hypothprob, conf.level = 0.95, addCIs = TRUE, digits = 4, arrowwid = 0.1, estimated = 0 )
counts |
A 1-way frequency table as produced by |
hypothprob |
If present, a set of probabilities to test the cell counts against. |
conf.level |
confidence level for the confidence interval, expressed as a decimal. |
addCIs |
If true, adds confidence limits to plot of sample proportions. |
digits |
used to control rounding of printout. |
arrowwid |
controls width of arrowheads. |
estimated |
default is |
If hypothprob is present: prints confidence intervals for the true proportions, a Chi-square test for the hypothesized probabilities, and plots the sample proportions (with atached confidence limits) alongside the corresponding hypothesized probabilities. )
An invisible list containing the following components:
CIs |
a matrix containing the confidence intervals. |
exp |
a vector of the expected counts. |
chi |
a vector of the components of Chi-square. |
These confidence intervals have been Bonferroni adjusted for multiple comparisons. This function has been deprecated and will be removed from future versions of the package
##Body image data: data(body.df) eth.table = with(body.df, table(ethnicity)) freq1way(eth.table) freq1way(eth.table,hypothprob=c(0.2,0.4,0.3,0.1))
##Body image data: data(body.df) eth.table = with(body.df, table(ethnicity)) freq1way(eth.table) freq1way(eth.table,hypothprob=c(0.2,0.4,0.3,0.1))
This data gives fecundity for female fruitflies, Drosophila melanogaster. The fecundity is the number of eggs laid, per day, for the fruitfly's first 14 days of life. There are three strains: A control group, NS, Nonselected Strain, as well as RS, a strain bred for resistance to DDT and SS, a strain bred for susceptibility to DDT. Each strain contains 25 measurements. It is of interest to compare the level of fecundity across strains.
A data frame with 75 observations on 2 variables.
[,1] | fecundity | numeric | Number of eggs laid, per day, per fruitfly |
[,2] | strain | factor | Strain of fruitfly (NS, RS, SS) |
A Handbook of Small Data Sets
Hand, D.J., Daly, F., Lunn, A.D., McConway, K.J. and Ostrowski, E. (1994). A Handbook of Small Data Sets. Boca Raton, Florida: Chapman and Hall/CRC.
Sokal, R.R. and Rohlf, F.J. (1981). Biometry, 2nd edition. San Francisco: W.H. Freeman, 239.
Returns the version number of the s20x package. This is useful if a student is has problems runnning commands and the maintainer needs to check the version number.
getVersion()
getVersion()
getVersion()
getVersion()
A random sample of 100 houses recently sold in Mt Eden, Auckland. For each house we have the advertised price and the actual sale price.
A data frame with 100 observations on 2 variables.
[,1] | advertised.price | numeric | Advertised price (dollars) |
[,2] | sell.price | numeric | Final sale price (dollars) |
Random sample of 152 families giving their mean income (1000s of dollars). The sample was taken by an advertising agency over their area of operations.
Displays data with intervals for each combination of the two factors and shows the mean differences between levels of the first factor for each level of the second factor. Note that there should be more than one observation for each combination of factors.
interactionPlots(y, ...) ## Default S3 method: interactionPlots( y, fac1 = NULL, fac2 = NULL, xlab = NULL, xlab2 = NULL, ylab = NULL, data.order = TRUE, exlim = 0.1, jitter = 0.02, conf.level = 0.95, interval.type = c("tukey", "hsd", "lsd", "ci"), pooled = TRUE, tick.length = 0.1, interval.distance = 0.2, col.width = 2/3, xlab.distance = 0.1, xlen = 1.5, ylen = 1, ... ) ## S3 method for class 'formula' interactionPlots( y, data = NULL, xlab = NULL, xlab2 = NULL, ylab = NULL, data.order = TRUE, exlim = 0.1, jitter = 0.02, conf.level = 0.95, interval.type = c("tukey", "hsd", "lsd", "ci"), pooled = TRUE, tick.length = 0.1, interval.distance = 0.2, col.width = 2/3, xlab.distance = 0.1, xlen = 1.5, ylen = 1, ... )
interactionPlots(y, ...) ## Default S3 method: interactionPlots( y, fac1 = NULL, fac2 = NULL, xlab = NULL, xlab2 = NULL, ylab = NULL, data.order = TRUE, exlim = 0.1, jitter = 0.02, conf.level = 0.95, interval.type = c("tukey", "hsd", "lsd", "ci"), pooled = TRUE, tick.length = 0.1, interval.distance = 0.2, col.width = 2/3, xlab.distance = 0.1, xlen = 1.5, ylen = 1, ... ) ## S3 method for class 'formula' interactionPlots( y, data = NULL, xlab = NULL, xlab2 = NULL, ylab = NULL, data.order = TRUE, exlim = 0.1, jitter = 0.02, conf.level = 0.95, interval.type = c("tukey", "hsd", "lsd", "ci"), pooled = TRUE, tick.length = 0.1, interval.distance = 0.2, col.width = 2/3, xlab.distance = 0.1, xlen = 1.5, ylen = 1, ... )
y |
either a formula of the form: y~fac1+fac2 where y is the response and fac1 and fac2 are the two explanatory variables used as factors, or a single response vector |
... |
optional arguments. |
fac1 |
if 'y' is a vector, then fac1 contains the levels of factor 1 which correspond to the y value |
fac2 |
if 'y' is a vector, then fac1 contains the levels of factor 2 which correspond to the y value |
xlab |
an optional label for the x-axis. If not specified the name of fac1 will be used. |
xlab2 |
an optional label for the lines. If not specified the name of fac2 will be used. |
ylab |
An optional label for the y-axis. If not specified the name of y will be used. |
data.order |
if TRUE the levels of fac1 and fac2 will be set to unique(fac1) and unique(fac2) respectively. |
exlim |
provide extra limits. |
jitter |
the amount of horizontal jitter to show in the plot. The actual jitter is determined as the function is called, and will likely be different each time the function is used. |
conf.level |
confidence level of the intervals. |
interval.type |
four options for intervals appearing on plot: 'tukey', 'hsd', 'lsd' or 'ci'. |
pooled |
two options: pooled or unpooled standard deviation used for plotted intervals. |
tick.length |
size of tick, in inches. |
interval.distance |
distance, as a fraction of the column width, between the points and interval. This is in addition to the extra space allocated for the jitter. |
col.width |
width of a factor ‘column’, as a fraction of the space between the centres of two columns. |
xlab.distance |
distance of x-axis labels from bottom of plot, as a fraction of the overall height of the plot. |
xlen , ylen
|
character interspacing factor for horizontal (x) and vertical (y) spacing of the legend. |
data |
an optional data frame containing the variables in the model. |
interactionPlots(default)
: Interactions Plot for Two-way Analysis of Variance
interactionPlots(formula)
: Interactions Plot for Two-way Analysis of Variance
data(mtcars) interactionPlots(wt ~ vs + gear, mtcars) ## note this usage is deprecated data(mtcars) with(mtcars, interactionPlots(wt, vs, gear))
data(mtcars) interactionPlots(wt ~ vs + gear, mtcars) ## note this usage is deprecated data(mtcars) with(mtcars, interactionPlots(wt, vs, gear))
The ages and lengths of 78 bluegills captured from Lake Mary, Minnesota.
A data frame with 78 observations on 2 variables.
[,1] | Age | numeric | Age of the fish (years) |
[,2] | Length | numeric | Length at capture (mm) |
Annual rainfall (in inches) for Los Angeles from 1908 to 1973.
A time series with 66 observations.
Allows an numRows
by numCols
matrix of plots to be displayed
in a single plot. If the function is called with no arguments, then the
plotting device layout will be reset to a single plot.
layout20x(numRows = 1, numCols = 1)
layout20x(numRows = 1, numCols = 1)
numRows |
number of rows in plot array |
numCols |
number of columns in plot array |
Function returns no value
This function is deprecated. It will be removed in future versions of the package.
data(course.df) layout20x(1,2) stripchart(course.df$Exam) boxplot(course.df$Exam)
data(course.df) layout20x(1,2) stripchart(course.df$Exam) boxplot(course.df$Exam)
Perform a Levene test for equal group variances in both one-way and two-way ANOVA. A table with the results is (normally) displayed.
levene.test(formula, data, digit = 5, show.table = TRUE)
levene.test(formula, data, digit = 5, show.table = TRUE)
formula |
a symbolic description of the model to be fitted: response ~ fac1 + fac2. |
data |
an optional data frame containing the variables in the model. |
digit |
the number of decimal places to display. |
show.table |
If this argument is FALSE then the output will be suppressed |
A list with the following elements:
df |
degrees of freedom. |
ss |
sum squares. |
ms |
mean squares. |
f.value |
F-statistic value. |
p.value |
P-value. |
## data(computer.df) levene.test(score ~ factor(selfassess), computer.df)
## data(computer.df) levene.test(score ~ factor(selfassess), computer.df)
Prices and ages of 124 Mazda cars collected from the Melbourne Age newspaper in 1991.
A data frame with 124 observations on 2 variables.
[,1] | price | numeric | Price (Australian dollars) |
[,2] | year | numeric | Year of manufacture |
This data shows the monthly number of notifications meningococcal disease in New Zealand from January 1990 to December 2001.
A data frame with 144 observations on 3 variables: Month, Year and mening.
A random selection of 38 consummated mergers from the USA, 1982, giving the number of days between the date the merger was announced and the date the merger became effective.
Plots four model checking plots: an pred-res plot (residuals against predicted values), a Normal Quantile-Quantile (Q-Q) plot, a histogram of the residuals with a normal distribution super-imposed and a Cook's Distance plot.
modcheck(x, ...) ## S3 method for class 'lm' modcheck( x, plotOrder = 1:4, args = list(eovcheck = list(smoother = FALSE, twosd = FALSE, levene = FALSE, ...), normcheck = list(xlab = c("Theoretical Quantiles", ""), ylab = c("Sample Quantiles", ""), main = c("", ""), col = "light blue", bootstrap = FALSE, B = 5, bpch = 3, bcol = "lightgrey", shapiro.wilk = FALSE, whichPlot = 1:2, usePar = TRUE, ...), cooks20x = list(main = "Cook's Distance plot", xlab = "observation number", ylab = "Cook's distance", line = c(0.5, 0.1, 2), cex.labels = 1, axisOpts = list(xAxis = TRUE), ...)), parVals = list(mfrow = c(2, 2), xaxs = "r", yaxs = "r", pty = "s", mai = c(0.2, 0.2, 0.05, 0.05)), ... )
modcheck(x, ...) ## S3 method for class 'lm' modcheck( x, plotOrder = 1:4, args = list(eovcheck = list(smoother = FALSE, twosd = FALSE, levene = FALSE, ...), normcheck = list(xlab = c("Theoretical Quantiles", ""), ylab = c("Sample Quantiles", ""), main = c("", ""), col = "light blue", bootstrap = FALSE, B = 5, bpch = 3, bcol = "lightgrey", shapiro.wilk = FALSE, whichPlot = 1:2, usePar = TRUE, ...), cooks20x = list(main = "Cook's Distance plot", xlab = "observation number", ylab = "Cook's distance", line = c(0.5, 0.1, 2), cex.labels = 1, axisOpts = list(xAxis = TRUE), ...)), parVals = list(mfrow = c(2, 2), xaxs = "r", yaxs = "r", pty = "s", mai = c(0.2, 0.2, 0.05, 0.05)), ... )
x |
a vector of observations, or the residuals from fitting a linear model. Alternatively, a fitted |
plotOrder |
the order of the plots. 1: pred-res plot, 2: normal Q-Q plot, 3: histogram, 4: Cooks's Distance plot. |
args |
a list containing three additional lists |
parVals |
the values that are set via |
... |
additional paramaters. Included for future flexibility, but unsure how this might be used currently. |
modcheck(lm)
: Model checking plots
# An exponential growth curve e = rnorm(100, 0, 0.1) x = rnorm(100) y = exp(5 + 3 * x + e) fit = lm(y ~ x, data = data.frame(x, y)) modcheck(fit) # An exponential growth curve with the correct transformation fit = lm(log(y) ~ x, data = data.frame(x, y)) modcheck(fit) # Peruvian Indians data data(peru.df) modcheck(lm(BP ~ weight, data = peru.df))
# An exponential growth curve e = rnorm(100, 0, 0.1) x = rnorm(100) y = exp(5 + 3 * x + e) fit = lm(y ~ x, data = data.frame(x, y)) modcheck(fit) # An exponential growth curve with the correct transformation fit = lm(log(y) ~ x, data = data.frame(x, y)) modcheck(fit) # Peruvian Indians data data(peru.df) modcheck(lm(BP ~ weight, data = peru.df))
Model checking plots Compact layout for model checking plots.
modelcheck(x, ...) ## S3 method for class 'lm' modelcheck(x, which = 1:3, mar = c(3, 4, 1.5, 4), ...)
modelcheck(x, ...) ## S3 method for class 'lm' modelcheck(x, which = 1:3, mar = c(3, 4, 1.5, 4), ...)
x |
The fitted model. |
which |
The plot(s) to be drawn. Residuals vs fitted values (
|
mar |
Margins applied to each selected plot. |
... |
any other arguments to pass to |
modelcheck(lm)
: Model checking plots
x = 1:30 y = rnorm(30) lm.fit = lm(y~x) # Plot resids vs fitted only modelcheck(lm.fit, 1) # Plot resids vs fitted, and histogram and QQ plot modelcheck(lm.fit, 1:2) # Plot all modelcheck(lm.fit)
x = 1:30 y = rnorm(30) lm.fit = lm(y~x) # Plot resids vs fitted only modelcheck(lm.fit, 1) # Plot resids vs fitted, and histogram and QQ plot modelcheck(lm.fit, 1:2) # Plot all modelcheck(lm.fit)
Length of movements from 11 of Mozart's early symphonies and 11 of his late symphonies.
A data frame with 88 observations on 3 variables.
[,1] | Time | numeric | Time of each movement (seconds) |
[,2] | Movement | factor | Movement (M1, M2, M3, M4) |
[,3] | Period | factor | Period that the symphony was written (early, late) |
Calculates and prints the estimate, multiple 95% confidence intervals; unadjusted, Tukey and Bonferroni p-values for all possible differences in means in a one-way ANOVA.
multipleComp(fit, conf.level = 0.95, FUN = identity)
multipleComp(fit, conf.level = 0.95, FUN = identity)
fit |
output from the command 'lm()'. |
conf.level |
confidence level for the confidence interval, expressed as a percentage. |
FUN |
optional function to be applied to estimates and confidence intervals. Typically for backtransformation operations. |
Returns a list of estimates, confidence intervals and p-values.
## computer data data(computer.df) fit = lm(score ~ factor(selfassess), data = computer.df) multipleComp(fit) ## butterfat data data("butterfat.df") fit <- lm(log(Butterfat) ~ Breed, data=butterfat.df) multipleComp(fit, FUN=exp)
## computer data data(computer.df) fit = lm(score ~ factor(selfassess), data = computer.df) multipleComp(fit) ## butterfat data data("butterfat.df") fit <- lm(log(Butterfat) ~ Breed, data=butterfat.df) multipleComp(fit, FUN=exp)
These data were collected to determine whether quick drying nail polish or regular nail polish dried faster. The time for each type of nail polish to dry was recorded.
A data frame with 60 observations on 2 variables.
[,1] | polish | factor | Type of polish (Regular, Quick) |
[,2] | dry | integer | Time (in seconds) for the polish to dry |
Plots two plots side by side. Firstly it draws a Normal QQ-plot of the
residuals, along with a line which has an intercept at the mean of the
residuals and a slope equal to the standard deviation of the residuals. If
shapiro.wilk = TRUE
then, in the top left hand corner of the Q-Q
plot, the P-value from the Shapiro-Wilk test for normality is given.
Secondly, it draws a histogram of the residuals. A normal distribution is
fitted and superimposed over the histogram. NOTE: if you want to leave the
x-axis blank in the histogram then, use xlab = c("Theoretical Quantiles", " ")
, i.e. leave a space between the quotes. If you don't leave a space, then information
will be extracted from x
.
normcheck(x, ...) ## Default S3 method: normcheck( x, xlab = c("Theoretical Quantiles", ""), ylab = c("Sample Quantiles", ""), main = c("", ""), col = "light blue", bootstrap = FALSE, B = 5, bpch = 3, bcol = "lightgrey", shapiro.wilk = FALSE, whichPlot = 1:2, usePar = TRUE, ... ) ## S3 method for class 'lm' normcheck( x, xlab = c("Theoretical Quantiles", ""), ylab = c("Sample Quantiles", ""), main = c("", ""), col = "light blue", bootstrap = FALSE, B = 5, bpch = 3, bcol = "lightgrey", shapiro.wilk = FALSE, whichPlot = 1:2, usePar = TRUE, ... )
normcheck(x, ...) ## Default S3 method: normcheck( x, xlab = c("Theoretical Quantiles", ""), ylab = c("Sample Quantiles", ""), main = c("", ""), col = "light blue", bootstrap = FALSE, B = 5, bpch = 3, bcol = "lightgrey", shapiro.wilk = FALSE, whichPlot = 1:2, usePar = TRUE, ... ) ## S3 method for class 'lm' normcheck( x, xlab = c("Theoretical Quantiles", ""), ylab = c("Sample Quantiles", ""), main = c("", ""), col = "light blue", bootstrap = FALSE, B = 5, bpch = 3, bcol = "lightgrey", shapiro.wilk = FALSE, whichPlot = 1:2, usePar = TRUE, ... )
x |
the residuals from fitting a linear model. Alternatively, a fitted |
... |
additional arguments which are passed to both |
xlab |
a title for the x-axis of both the Q-Q plot and the histogram: see |
ylab |
a title for the y-axis of both the Q-Q plot and the histogram: see |
main |
a title for both the Q-Q plot and the histogram: see |
col |
a color for the bars of the histogram. |
bootstrap |
if |
B |
the number of bootstrap samples to take. Five should be sufficient, but hey maybe you want more? |
bpch |
the plotting symbol used for the bootstrap samples. Legal values are the same as any legal
value for |
bcol |
the plotting colour used for the bootstrap samples. Legal values are the same as any legal
value for |
shapiro.wilk |
if |
whichPlot |
legal values are |
usePar |
if |
normcheck(default)
: Testing for normality plot
normcheck(lm)
: Testing for normality plot
# An exponential growth curve e = rnorm(100, 0, 0.1) x = rnorm(100) y = exp(5 + 3 * x + e) fit = lm(y ~ x) normcheck(fit) # An exponential growth curve with the correct transformation fit = lm(log(y) ~ x) normcheck(fit) # Same example as above except we use normcheck.default normcheck(residuals(fit)) # Peruvian Indians data data(peru.df) normcheck(lm(BP ~ weight, data = peru.df))
# An exponential growth curve e = rnorm(100, 0, 0.1) x = rnorm(100) y = exp(5 + 3 * x + e) fit = lm(y ~ x) normcheck(fit) # An exponential growth curve with the correct transformation fit = lm(log(y) ~ x) normcheck(fit) # Same example as above except we use normcheck.default normcheck(residuals(fit)) # Peruvian Indians data data(peru.df) normcheck(lm(BP ~ weight, data = peru.df))
Displays stripplot/boxplot of the reponse variable with intervals by factor levels. It is used as part of a one-way ANOVA analysis.
onewayPlot(x, ...) ## Default S3 method: onewayPlot( x, f, conf.level = 0.95, interval.type = "tukey", pooled = TRUE, strip = TRUE, vert = TRUE, verbose = FALSE, ylabel = deparse(terms(formula)[[2]]), flabel = deparse(terms(formula)[[3]]), ... ) ## S3 method for class 'formula' onewayPlot( formula, data = parent.frame(), conf.level = 0.95, interval.type = "tukey", pooled = TRUE, strip = TRUE, vert = TRUE, verbose = FALSE, ylabel = deparse(terms(formula)[[2]]), flabel = deparse(terms(formula)[[3]]), ... ) ## S3 method for class 'lm' onewayPlot(x, ..., ylabel = nms[1], flabel = nms[2])
onewayPlot(x, ...) ## Default S3 method: onewayPlot( x, f, conf.level = 0.95, interval.type = "tukey", pooled = TRUE, strip = TRUE, vert = TRUE, verbose = FALSE, ylabel = deparse(terms(formula)[[2]]), flabel = deparse(terms(formula)[[3]]), ... ) ## S3 method for class 'formula' onewayPlot( formula, data = parent.frame(), conf.level = 0.95, interval.type = "tukey", pooled = TRUE, strip = TRUE, vert = TRUE, verbose = FALSE, ylabel = deparse(terms(formula)[[2]]), flabel = deparse(terms(formula)[[3]]), ... ) ## S3 method for class 'lm' onewayPlot(x, ..., ylabel = nms[1], flabel = nms[2])
x |
a vector of responses, a formula object or an lm object |
... |
optional arguments. |
f |
if x is a vector of responses then f contains the group labels for each observation in x. That is, the ith value in f says which group the ith observation of x belongs to. |
conf.level |
confidence level of the intervals. |
interval.type |
three options for intervals appearing on plot: 'hsd','lsd' or 'ci'. |
pooled |
two options: pooled or unpooled standard deviation used for plotted intervals. |
strip |
if strip=F, boxplots are displayed instead. |
vert |
if vert=F, horizontal stripplots are displayed instead (boxplots can only be displayed vertically). |
verbose |
if true, print intervals on console. |
ylabel |
can be used to replace variable name of y by another string. |
flabel |
can be used to replace variable name of f by another string. |
formula |
a symbolic description of the model to be fit. |
data |
an optional data frame in which to evaluate the formula. |
onewayPlot(default)
: One-way Analysis of Variance Plot
onewayPlot(formula)
: One-way Analysis of Variance Plot
onewayPlot(lm)
: One-way Analysis of Variance Plot
##see example in 'summary1way' ##computer data: data(computer.df) onewayPlot(score~selfassess, data = computer.df) ##apple data: data(apples.df) twosampPlot(Weight~Propagated, data = apples.df) ##oyster data: data(oysters.df) onewayPlot(log(Oysters)~Site, data = oysters.df) ##oyster data: data(oysters.df) oyster.fit = lm(log(Oysters)~Site, data = oysters.df) onewayPlot(oyster.fit)
##see example in 'summary1way' ##computer data: data(computer.df) onewayPlot(score~selfassess, data = computer.df) ##apple data: data(apples.df) twosampPlot(Weight~Propagated, data = apples.df) ##oyster data: data(oysters.df) onewayPlot(log(Oysters)~Site, data = oysters.df) ##oyster data: data(oysters.df) oyster.fit = lm(log(Oysters)~Site, data = oysters.df) onewayPlot(oyster.fit)
Data from an experiment to determine the abundance of oysters recruiting from three sites in two different estuaries in New South Whales. One in Georges River and two in Port Stephens. The number of oysters were recorded for 10 cm by 10 cm panels over a two year period.
A data frame with 87 observations on 2 variables.
[,1] | Oysters | numeric | Number of oysters on each experimental panel |
[,2] | Site | factor | Location of the experimental panels (GR = Georges River, PS1 = First Port Stephens Site, PS2 = Second Port Stephens Site) |
Plots pairwise scatter plots with histograms and correlations for the data frame.
pairs20x(x, na.rm = TRUE, ...)
pairs20x(x, na.rm = TRUE, ...)
x |
a data frame. |
na.rm |
if TRUE then only complete cases will be displayed |
... |
optional argumments which are passed to the generic pairs function. |
Returns the plots.
'pairs', 'panel.smooth', 'panel.cor', 'panel.hist'
##peruvian indians data(peru.df) pairs20x(peru.df)
##peruvian indians data(peru.df) pairs20x(peru.df)
A random sample of Peruvian Indians born in the Andes mountains, but who have since migrated to lower altitudes. The sample was collected to assess the long term effects of altitude on blood pressure.
A data frame with 39 observations on 5 variables.
[,1] | age | numeric | Subject's age |
[,2] | years | numeric | Number of years since migration |
[,3] | weight | numeric | Subject's weight (kg) |
[,4] | height | numeric | Subject's height (mm) |
[,5] | BP | numeric | Subject's systolic blood pressure (mm Hg) |
Uses the main output and some error messages from R function 'predict' but gives you more output. (Error messages are not reliable when used in Splus.)
predict20x(object, newdata, cilevel = 0.95, digit = 3, print.out = TRUE, ...)
predict20x(object, newdata, cilevel = 0.95, digit = 3, print.out = TRUE, ...)
object |
an |
newdata |
prediction data frame. |
cilevel |
confidence level of the interval. |
digit |
decimal numbers after the point. |
print.out |
if |
... |
optional arguments that are passed to the generic 'predict' |
Note: The data frame, newdata, must have the same column order and data types (e.g. numeric or factor) as those used in fitting the model.
frame |
vector or matrix including predicted values, confidence intervals and predicted intervals. |
fit |
prediction values. |
se.fit |
standard error of predictions. |
residual.scale |
residual standard deviations. |
df |
degrees of freedom for residual. |
cilevel |
confidence level of the interval. |
This function is deprecated. It will be removed in future versions of the package.
this function is deprecated as it is never used in class any more. We prefer the standard predict
method.
predict
, predict.lm
, as.data.frame
.
# Zoo data data(zoo.df) zoo.df = within(zoo.df, {day.type = factor(day.type)}) zoo.fit = lm(log(attendance) ~ time + sun.yesterday + nice.day + day.type + tv.ads, data = zoo.df) pred.zoo = data.frame(time = 8, sun.yesterday = 10.8, nice.day = 0, day.type = factor(3), tv.ads = 1.181) predict20x(zoo.fit, pred.zoo) # Peruvian Indians data data(peru.df) peru.fit = lm(BP ~ age + years + I(years^2) + weight + height, data = peru.df) pred.peru = data.frame(age = 21, years = 2, `I(years^2)` = 2, weight = 71, height = 1629) predict20x(peru.fit, pred.peru)
# Zoo data data(zoo.df) zoo.df = within(zoo.df, {day.type = factor(day.type)}) zoo.fit = lm(log(attendance) ~ time + sun.yesterday + nice.day + day.type + tv.ads, data = zoo.df) pred.zoo = data.frame(time = 8, sun.yesterday = 10.8, nice.day = 0, day.type = factor(3), tv.ads = 1.181) predict20x(zoo.fit, pred.zoo) # Peruvian Indians data data(peru.df) peru.fit = lm(BP ~ age + years + I(years^2) + weight + height, data = peru.df) pred.peru = data.frame(age = 21, years = 2, `I(years^2)` = 2, weight = 71, height = 1629) predict20x(peru.fit, pred.peru)
Uses the main output and some error messages from R function 'predict' but gives you more output. (Error messages are not reliable when used in Splus.)
predictCount(object, newdata, cilevel = 0.95, digit = 3, print.out = TRUE, ...)
predictCount(object, newdata, cilevel = 0.95, digit = 3, print.out = TRUE, ...)
object |
a |
newdata |
prediction data frame. |
cilevel |
confidence level of the interval. |
digit |
decimal numbers after the point. |
print.out |
if |
... |
optional arguments that are passed to the generic |
Note: The data frame, newdata, must have the same column order and data types (e.g. numeric or factor) as those used in fitting the model.
A data frame with three columns:
the predicted count.
the lower bound of the predicted count.
the upper bound of the predicted count.
predict
, predict.glm
, as.data.frame
.
An alternative to predictCount
to handle Binomial as well as
Poisson models
predictGLM(object, newdata, type = "link", cilevel = 0.95, quasit = FALSE, ...)
predictGLM(object, newdata, type = "link", cilevel = 0.95, quasit = FALSE, ...)
object |
a |
newdata |
prediction data frame. |
type |
|
cilevel |
confidence level of the interval. |
quasit |
if |
... |
optional arguments that are passed to the generic |
Note: The data frame, newdata, must have the same column order and data types (e.g. numeric or factor) as those used in fitting the model.
A data frame with three columns:
the predicted count.
the lower bound of the predicted count.
the upper bound of the predicted count.
predict
, predict.glm
, as.data.frame
.
This function is called by rowdistr
.
propslsd.new(crosstablist, conf.level = 0.95, arrowlength = 0.1)
propslsd.new(crosstablist, conf.level = 0.95, arrowlength = 0.1)
crosstablist |
A list produced by |
conf.level |
Confidence level of the intervals. |
arrowlength |
Length of the arrows. |
This function is no longer exported as it should never be called by the user. It is also deprecated and will be removed from future versions of the package.
Data from an experiment to see if seeding clouds with Silver Nitrate effects the amount of rainfall.
A data frame with 50 observations on 2 variables.
[,1] | rain | numeric | Amount of rain |
[,2] | seed | factor | Whether the clouds are seeded or not (seeded, unseeded) |
Chambers, Cleveland, Kleiner, Tukey. (1983). Graphical Methods for Data Analysis.
Plots a scatter plot for the variables of the residuals and fitted values from the linear model, lmfit. A lowess smooth line for the underlying trend, as well as one standard deviation error bounds for the scatter about this trend, are added to this scatter plot. A test for a quadratic relationship between the residuals and the fitted values is also computed.
residPlot(lmfit, f = 0.5)
residPlot(lmfit, f = 0.5)
lmfit |
an |
f |
the smoother span. This gives the proportion of points in the plot which influence the smooth at each value. Larger values give more smoothness. |
Returns the plot.
This function is deprecated. It will be removed in future versions of the package.
# Peruvian Indians data data(peru.df) fit=lm(BP~age+years+weight+height, data = peru.df) residPlot(fit)
# Peruvian Indians data data(peru.df) fit=lm(BP~age+years+weight+height, data = peru.df) residPlot(fit)
Produces summaries and plots from a cross-tabulation. The output produced depends on the parameter 'comp'. Columns relate to response categories and rows to different populations.
rowdistr( crosstablist, comp = c("basic", "within", "between"), conf.level = 0.95, plot = TRUE, suppressText = FALSE )
rowdistr( crosstablist, comp = c("basic", "within", "between"), conf.level = 0.95, plot = TRUE, suppressText = FALSE )
crosstablist |
a list produced by 'crosstabs' or a matrix containing a 2-way table of counts (without marginal totals). |
comp |
three options: 'basic' (default), 'within', and 'between'. |
conf.level |
confidence level of the intervals. |
plot |
if |
suppressText |
if |
The 'basic' option (default) produces the response distribution for each row population together with comparative bar charts.
If comp = 'between' the resulting output displays how the probability of falling into a response class (column) differs between populations. Confidence intervals for differences in proportions are produced together with a set of barcharts with LSD intervals.
If comp = 'within' the resulting output shows the extent to which the component probabilities of the same row distribution differ. Separate Chi-square tests for uniformity are produced for each row distribution as are confidence intervals for differences in proportions within the same distribution.
Arguments plot
and suppressText
are really only used when
producing knitr or Sweave documents so that just the plot or just the text
can be displayed in the document.
A matrix of row proportions, i.e cell counts divided by row marginals.
data(body.df) z = crosstabs(~ ethnicity + married, data = body.df) rowdistr(z) rowdistr(z, comp='between') rowdistr(z, comp='within') ##from matrix of counts z = matrix(c(4,3,2,6,47,20,40,62,11,8,7,22,3,0,1,10), 4, 4) rowdistr(z)
data(body.df) z = crosstabs(~ ethnicity + married, data = body.df) rowdistr(z) rowdistr(z, comp='between') rowdistr(z, comp='within') ##from matrix of counts z = matrix(c(4,3,2,6,47,20,40,62,11,8,7,22,3,0,1,10), 4, 4) rowdistr(z)
These data record the number of seeds (out of 100) that germinated when given different amounts of water. The seeds were either exposed to light or kept in the dark. Four identical boxes were used for each combination of water and light
A data frame with 48 observations on 3 variables.
[,1] | Light | integer | Seeds exposed to light (N=No, Y=Yes) |
[,2] | Water | integer | Amount of water, higher levels correspond to more water (1, 2, 3, 4, 5, 6) |
[,3] | Count | integer | Number of seeds that germinated (out of 100) |
Sheep Data
A data frame with 100 observations on 3 variables.
[,1] | Weight | integer | . |
[,2] | Copper | factor | levels (No, Yes) |
[,3] | Cobalt | factor | levels (No, Yes) |
Calculates the skewness statistic of the data in 'x'. Values close to zero correspond to reasonably symmetric data, positive values of this measure indicate right-skewed data whereas negative values indicate left-skewness.
skewness(x, ...)
skewness(x, ...)
x |
vector containing the data. |
... |
any other variables to be passed to |
Returns the value of the skewness.
##Merger data: data(mergers.df) skewness(mergers.df$mergerdays)
##Merger data: data(mergers.df) skewness(mergers.df$mergerdays)
Male Egyptian skulls from five different epochs. Each skull has had four measurements taken of it, BH, Basibregmatic Height, BL, Basialveolar Length, MB, Maximum Breadth and NH, Nasal Height. It is of interest to investigate the change in shape over time. A gradual change, would indicate inbreeding of the populations. This data only includes the maximum breadth measurements.
A data frame with 150 observations on 2 variables.
[,1] | measurement | integer | |
[,2] | year | integer |
A Handbook of Small Data Sets
Hand, D.J., Daly, F., Lunn, A.D., McConway, K.J. and Ostrowski, E. (1994). A Handbook of Small Data Sets. Boca Raton, Florida: Chapman and Hall/CRC.
Thomson, A. and Randall-Maciver, R. (1905). Ancient Races of the Thebaid. Oxford: Oxford University Press.
Weight and length measurements of 844 snapper (Pagrus auratus) caught in the Hauraki Gulf, near Auckland, New Zealand.
A data frame with 844 observations on 2 variables.
Fork length in centimetres. The fork length of a fish measured from the tip of the snout to the end of the middle caudal fin rays and is used in fishes in which it is difficult to tell where the vertebral column ends. Essentially it is the measurement from the tip of the 'nose' of the fish to the 'vee' in the tail.
Weight of the fish in kilograms (kg).
Russell Millar, University of Auckland.
Data from an experiment to examine the effects of different planting times on the yield of soya beans, given four different cultivars.
A data frame with 32 observations on 3 variables.
[,1] | yield | numeric | Yield of each plant |
[,2] | cultivar | factor | Cultivar used (cult1, cult2, cult3, cult4) |
[,3] | planttime | factor | Month of planting (Novemb, Decemb) |
Littler, R. University of Waikato
Draws strip charts and normal quantile quantile plots of x for each value of the grouping variable g
stripqq(formula, ...) ## S3 method for class 'formula' stripqq(formula, data = NULL, ...)
stripqq(formula, ...) ## S3 method for class 'formula' stripqq(formula, data = NULL, ...)
formula |
A symbolic specification of the form |
data |
An optional data frame in which to evaluate the formula |
... |
Optional arguments that are passed to the |
stripqq(formula)
: Strip charts and normal quantile-quantile plots
This function is deprecated and will be removed in later versions of the pacakge.
## Zoo data data(zoo.df) stripqq(attendance~day.type, data = zoo.df)
## Zoo data data(zoo.df) stripqq(attendance~day.type, data = zoo.df)
Displays summary information for a one-way anova analysis. The lm object must come from a numerical response variable and a single factor. The output includes: (i) anova table; (ii) numeric summary; (iii) table of effects; (iv) plot of data with intervals.
summary1way( fit, digit = 5, conf.level = 0.95, inttype = "tukey", pooled = TRUE, print.out = TRUE, draw.plot = TRUE, ... )
summary1way( fit, digit = 5, conf.level = 0.95, inttype = "tukey", pooled = TRUE, print.out = TRUE, draw.plot = TRUE, ... )
fit |
an lm object, i.e. the output from |
digit |
decimal numbers after the point. |
conf.level |
confidence level of the intervals. |
inttype |
three options for intervals appeared on plot: 'hsd','lsd' or 'ci'. |
pooled |
two options: pooled or unpooled standard deviation used for plotted intervals. |
print.out |
if |
draw.plot |
if |
... |
more options. |
Df |
degrees of freedom for regression, residual and total. |
Sum of Sq |
sum squares for regression, residual and total. |
Mean
Sq |
mean squares for regression and residual. |
F value |
F-statistic value. |
Pr(F) |
|
Main Effect |
|
Group Effects |
summary2way
, anova
, aov
, dummy.coef
, onewayPlot
attitudes = c(5.2,5.2,6.1,6,5.75,5.6,6.25,6.8,6.87,7.1, 6.3,6.35,5.5,5.75,4.6,5.36,5.85,5.9) l = rep(c('Gp1','Gp2','Gp3'),rep(6,3)) l = factor(l) f = lm(attitudes ~ l) result = summary1way(f) result
attitudes = c(5.2,5.2,6.1,6,5.75,5.6,6.25,6.8,6.87,7.1, 6.3,6.35,5.5,5.75,4.6,5.36,5.85,5.9) l = rep(c('Gp1','Gp2','Gp3'),rep(6,3)) l = factor(l) f = lm(attitudes ~ l) result = summary1way(f) result
Displays summary information for a two-way anova analysis. The lm object must come from a numerical response variable and factors. The output depends on the value of page:
summary2way( fit, page = c("table", "means", "effects", "interaction", "nointeraction"), digit = 5, conf.level = 0.95, print.out = TRUE, new = TRUE, all = FALSE, FUN = "identity", ... )
summary2way( fit, page = c("table", "means", "effects", "interaction", "nointeraction"), digit = 5, conf.level = 0.95, print.out = TRUE, new = TRUE, all = FALSE, FUN = "identity", ... )
fit |
an lm object, i.e. the output from 'lm()'. |
page |
options for output: 'table', 'means', 'effects', 'interaction', 'nointeraction' |
digit |
the number of decimal places in the display. |
conf.level |
confidence level of the intervals. |
print.out |
if TRUE, print out the output on the screen. |
new |
if |
all |
Only applicable to |
FUN |
optional function to be applied to estimates and confidence intervals. Typically for backtransformation operations. |
... |
other arguments like inttype, pooled etc. |
page = 'table' anova table page = 'means' cell means matrix, numeric summary page = 'effects' table of effects page = 'interaction' tables of contrasts page = 'nointeraction' tables of contrasts
A list with the following components:
Df |
degrees of freedom for regression, residual and total. |
Sum of
Sq |
sum squares for regression, residual and total. |
Mean
Sq |
mean squares for regression and residual. |
F
value |
F-statistic value. |
Pr(F) |
The P-value assoicated with each F-test. |
Grand Mean |
The overall mean of the response variable. |
Row Effects |
The main effects for the first (row) factor. |
Col Effects |
The main effects for the second (column) factor. |
Interaction Effects |
The
interaction effects if an interaction model has been fitted,
otherwise |
results |
If |
.
summary1way
, model.tables
,
TukeyHSD
##Arousal data: data(arousal.df) arousal.fit = lm(arousal ~ gender * picture, data = arousal.df) summary2way(arousal.fit) ## Butterfat data: data("butterfat.df") fit <- lm(log(Butterfat)~Breed+Age, data=butterfat.df) summary2way(fit, page="nointeraction", FUN = exp)
##Arousal data: data(arousal.df) arousal.fit = lm(arousal ~ gender * picture, data = arousal.df) summary2way(arousal.fit) ## Butterfat data: data("butterfat.df") fit <- lm(log(Butterfat)~Breed+Age, data=butterfat.df) summary2way(fit, page="nointeraction", FUN = exp)
Produces a table of summary statistics for the data. If the argument
group
is missing, calculates a matrix of summary statistics for the
data in x
. If group
is present, the elements of group
are interpreted as group labels and the summary statistics are displayed for
each group separately.
summaryStats(x, ...) ## Default S3 method: summaryStats( x, group = rep("Data", length(x)), data.order = TRUE, digits = 2, ... ) ## S3 method for class 'formula' summaryStats(x, data = NULL, data.order = TRUE, digits = 2, ...) ## S3 method for class 'matrix' summaryStats(x, data.order = TRUE, digits = 2, ...)
summaryStats(x, ...) ## Default S3 method: summaryStats( x, group = rep("Data", length(x)), data.order = TRUE, digits = 2, ... ) ## S3 method for class 'formula' summaryStats(x, data = NULL, data.order = TRUE, digits = 2, ...) ## S3 method for class 'matrix' summaryStats(x, data.order = TRUE, digits = 2, ...)
x |
either a single vector of values, or a formula of the form data~group, or a matrix. |
... |
Optional arguments which are passed to the summary statistic functions.
For example |
group |
a vector of group labels. |
data.order |
if |
digits |
the number of decimal places to display. |
data |
an optional data frame containing the variables in the model. |
If x
is a single variable, i.e. there are no groups, then a
single list is invisibly returned with the following named items:
min |
Minimum value. |
max |
Maximum value. |
mean |
Mean value. |
var |
Variance – the average of the squares of the deviations of the data values from the sample mean. |
sd |
Standard deviation – the square root of the variance. |
n |
Number of data values – size of the data set. |
nMissing |
If there are missing values, and |
iqr |
Midspread (IQR) – the range spanned by central half of data; the interquartile range. |
skewness |
Skewness statistic – indicates how skewed the data set is. Positive values indicate right-skew data. Negative values indicate left-skew data. |
lq |
Lower quartile |
median |
Median – the middle value when the batch is ordered. |
uq |
Upper quartile |
If grouping is provided, either by using the
group
argument, or providing a factor in a formula, or by passing a
matrix where the different columns represent the groups, then the function
will return a data.frame
a row containing all the statistics above
for each group.
summaryStats(default)
: Summary Statistics
summaryStats(formula)
: Summary Statistics
summaryStats(matrix)
: Summary Statistics
## STATS20x data: data(course.df) ## Single variable summary with(course.df, summaryStats(Exam)) ## Using a formula summaryStats(Exam ~ Stage1, course.df) ## Using a matrix X = cbind(rnorm(50), rnorm(50)) summaryStats(X) ## Saving and extracting the information sumStats = summaryStats(Exam ~ Degree, course.df) sumStats ## Just the BAs sumStats['BA', ] ## Just the means sumStats$mean
## STATS20x data: data(course.df) ## Single variable summary with(course.df, summaryStats(Exam)) ## Using a formula summaryStats(Exam ~ Stage1, course.df) ## Using a matrix X = cbind(rnorm(50), rnorm(50)) summaryStats(X) ## Saving and extracting the information sumStats = summaryStats(Exam ~ Degree, course.df) sumStats ## Just the BAs sumStats['BA', ] ## Just the means sumStats$mean
Data from an experiment to assess the impact of three different teaching methods on language ability. 30 students were randomly allocated into three groups, one for each method. The students' IQ before instruction and a language test score after instruction were recorded.
A data frame with 30 observations on 3 variables.
[,1] | lang | numeric | Language test score after instruction |
[,2] | IQ | numeric | Student's IQ |
[,3] | method | factor | Teaching method (1, 2, 3) |
Salary information for all salaried employees of the Technitron Company.
A data frame with 46 observations on 8 variables.
[,1] | salary | numeric | Annual Salary (dollars) |
[,2] | yrs.empl | numeric | Number of years employed at Technitron |
[,3] | prior.yrs | numeric | Number of years prior experience |
[,4] | edu | numeric | Years of education after high school |
[,5] | id | numeric | Company identification number |
[,6] | gender | numeric | Gender (0 = female, 1 = male) |
[,7] | dept | numeric | Department employee works in (1 = Sales, 2 = Purchasing, 3 = Advertising, 4 = Engineering) |
[,8] | super | numeric | Number of employees supervised |
Data from an experiment to asses the effect of a new drug on the weight of the thyroid gland using 16 laboratory animals. The animals were randomly assigned into either a control group, or a treatment group, and each animal had its bodyweight recorded at the beginning of the experiment and its thyroid weight measured at the end of the experiment.
A data frame with 16 observations on 3 variables.
[,1] | thyroid | numeric | Weight of thyroid gland after 7 days (mg) |
[,2] | body | numeric | Animal body weight before experiment began (g) |
[,3] | group | factor | Animal's group (1 = control, 2 = drug) |
Two random samples of households, one of households who purchase Crest toothpaste and one of households who do not. For each household the age is recorded of the person responsible for purchasing the toothpaste.
A data frame with 20 observations on 2 variables.
[,1] | purchasers | numeric | Age of the person in the household responsible for purchases of Crest |
[,2] | nonpurchasers | numeric | Age of the person in the household responsible for purchases of other brands of toothpaste |
Plots a scatter plot for the variables x, y along with a lowess smooth for the underlying trend. One standard deviation error bounds for the scatter about this trend are also plotted.
trendscatter(x, ...) ## Default S3 method: trendscatter(x, y = NULL, f = 0.5, xlab = NULL, ylab = NULL, main = NULL, ...) ## S3 method for class 'formula' trendscatter( x, f = 0.5, data = NULL, xlab = NULL, ylab = NULL, main = NULL, ... )
trendscatter(x, ...) ## Default S3 method: trendscatter(x, y = NULL, f = 0.5, xlab = NULL, ylab = NULL, main = NULL, ...) ## S3 method for class 'formula' trendscatter( x, f = 0.5, data = NULL, xlab = NULL, ylab = NULL, main = NULL, ... )
x |
the coordinates of the points in the scatter plot. Alternatively, a formula. |
... |
Optional arguments |
y |
the y coordinates of the points in the plot, ignored if |
f |
the smoother span. This gives the proportion of points in the plot which influence the smooth at each value. Larger values give more smoothness. |
xlab |
a title for the x axis: see |
ylab |
a title for the y axis: see |
main |
a title for the plot: see |
data |
an optional data frame containing the variables in the model. |
Returns the plot.
trendscatter(default)
: Trend and scatter plot
trendscatter(formula)
: Trend and scatter plot
# A simple polynomial x = rnorm(100) e = rnorm(100) y = 2 + 3 * x - 2 * x^2 + 4 * x^3 + e trendscatter(y ~ x) # An exponential growth curve e = rnorm(100, 0, 0.1) y = exp(5 + 3 * x + e) trendscatter(log(y) ~ x) # Peruvian Indians data data(peru.df) trendscatter(BP ~ weight, data = peru.df) # Note: this usage is deprecated with(peru.df,trendscatter(weight,BP))
# A simple polynomial x = rnorm(100) e = rnorm(100) y = 2 + 3 * x - 2 * x^2 + 4 * x^3 + e trendscatter(y ~ x) # An exponential growth curve e = rnorm(100, 0, 0.1) y = exp(5 + 3 * x + e) trendscatter(log(y) ~ x) # Peruvian Indians data data(peru.df) trendscatter(BP ~ weight, data = peru.df) # Note: this usage is deprecated with(peru.df,trendscatter(weight,BP))
Data for 455 days of attendance records for Auckland Zoo, from January 1, 1993. Note that only 440 values are given due to missing values. It was of interest to assess whether an advertising campaign was effective in increasing attendance.
A data frame with 440 observations on 6 variables.
[,1] | attendance | numeric | Number of visitors |
[,2] | time | numeric | Time in days since the start of the study |
[,3] | sun.yesterday | numeric | Hours of sunshine the previous day |
[,4] | tv.adds | numeric | Average spending on TV advertising in the previous week (1000s of dollars per day) |
[,5] | nice.day | factor | Assessment based on number of hours of sunshine (0 = No, 1 = Yes) |
[,6] | day.type | factor | Type of day (1 = ordinary weekday, 2 = weekend day, 3 = school holiday weekday, 4 = public holday) |