In this Practical you will:
You are asked to complete the following exercises and submit to Canvas before the deadline. In addition to the points detailed below, 5 points are assigned to the cleanliness of the code and resulting pdf document. Only knit documents (.pdf, .doc, or .html) will be accepted. Unknit .Rmd files will not be graded.
Statistical models are typically comprised of a deterministic component, that describes the core behaviour of a system, and a stochastic component that describes the system’s randomness.
Let us suppose that we are interested in the relationship between body mass and age for some species, and our hypothesis is that mass is proportional to age. Because there are many different mathematical models that would be consistent with this verbal description of our hypothesis, this do not give us any clear way of actually testing the validity of our hypothesis. We are instead going to formalise this hypothesis according to a simple linear deterministic model:
The parameter \(\beta_0\) describes the intercept of this model (i.e., baseline body mass at age 0), and \(\beta_1\) describes the slope (i.e., the rate at which body bass changes with age). When an individual is born, it starts out with some mass, so the intercept is probably non-zero. Let’s say our intercept here is \(\beta_0 = 2\), meaning individuals at age 0 have a mass of 2kg. Let’s also say that we know for every unit of time the animal ages, the mass increases by 1kg (i.e., \(\beta_1 = 1\)). For these parameter values, we can then more explicitly describe the deterministic relationship between body mass and age as:
R
function that expresses this relationship. –
1 point(s)Stochastic models describe the randomness of the process. Simple linear regression accounts for stochasticity by assuming that each observation \(age_i\) is drawn from a normal distribution.
Going back to our numbers from above, what we’re now saying is that all animals of age 5 will have a mass that’s Gaussian distributed around 11.
rnorm()
function. –
1.5 point(s)runif()
function. – 0.5
point(s)Beyond making an assumption of normality on the errors, simple linear regression also assumes that the errors are independent and identically distributed
For our example, the identical part of IID means that the distribution of errors around each \(mass_i\) is the same. In other words, \(\sigma^{2}\) does not change with age and the error distribution for animals of age 2 is the same as the distribution of animals at age 3, which is the same as animals at age 4, and so on. The independent part means that there is no relationship among the errors. We will be discussing independence in greater detail in later lectures/practicals, but for now, let’s focus on what is meant by identically distributed errors.
The first thing our model is saying, is that for any age \(i\), we can expect mass to be evenly distributed around the expected value. And our errors \(\varepsilon_i\) to be evenly distributed around 0 (remember that \(\varepsilon_i = y_i - (\beta_0 + x_i \beta_1)\)).
set.seed()
function to set the random seed as 1
(this will ensure consistent results). – 0.5 point(s)The second thing our model is saying, is that for any age \(i\), we can expect the distribution of errors to be identical.
add = TRUE
to
overlay to plots. – 1 point(s)Simple linear regression models can be easily fit in R
either by manually estimating the regression coefficients, or via the
lm()
function.
lm()
function.– 1 point(s)summary()
to inspect the results. – 0.5
point(s)Wildlife-vehicle collisions (WVCs) represent a serious source of mortality for giant anteaters (Myrmecophaga tridactyla) in Brazil. Road passage structures are often used as measures to help wildlife safely cross roads, but, to be effective, animals need to use them. Noonan et al. (2021) used GPS tracking data to determine whether anteaters actively used passage structures to cross highways. Each time one of the 38 monitored animals crossed a road, it was recorded whether they used the structure or not. After ca. 1 year of monitoring, a total of 1575 road crossings were observed. The question is, are giant anteaters using crossing structure more than would be expected by chance alone.
Answer this by applying maximum likelihood estimation to the proportion of road crossing events that occurred via a passage structure. You will need to carry out the following steps:
table()
function to create a frequency table. –
1 point(s)seq()
function. – 2 point(s)