Course website for DATA 589: Spatial Statistics

Course Description

Introductory course on spatial statistics focused on combining data with models to generate mechanistic descriptions of spatial patterns. Beginning from a discussion about the importance spatially referenced data, the course teaches methods for modelling spatial data for two routinely occurring situations 1) when the locations of points are the variable of interest (i.e., point pattern analysis), and 2) when the locations are an arbitrary artefact of the sampling design (i.e., Kriging and spatial interpolation). Emphasis is placed on understanding and interpreting the analyses in an applied context.

Course Objectives

This course combines lectures on theory and concepts with time practicing statistical tools in R based software packages. The course is designed to equip students with the tools and knowledge they need to perform a variety of analyses that can model common features of spatial data. For each topic, there will be a core lecture module and a lab based practical module. With the lectures, students will learn the theory behind core modelling tools including point process models, variograms, Kriging, co-Kriging- regression-Kriging, and regression with correlated errors, while the lab modules are intended to provide a pathway that students can follow over the longer term as their skills develop.

Learning Outcomes:

Assess spatial point data for distinctive patterns
Fit point process models via maximum likelihood
Calculate empirical variograms from spatially collected data
Model spatial autocorrelation structures
Predict from spatial autocorrelation models
Handle non-independence in spatial data
Link questions, analyses, and studying design

Course Format

The course is divided into core lectures and lab-based practicals.

Lectures: Lectures will cover the core concepts of the course. Lecture slides will be posted on GitHub and Canvas prior to the lecture. Students are encouraged to take notes, and to ask questions in the lectures. Lectures will also be recorded and a link to the lecture will be made available via Canvas (subject to any unforeseen technical issues).

Practicals: The computer-labs use structured tutorials designed to train students on the use of the open-source software program R for applying the methods learned in the lectures to data. The lectures will cover the material required for students to complete the practical assignments, however, they are designed to be complementary and not all the material in the practicals will be covered in the lectures and vice versa. These also provide training on communicating the results of statistical analyses, with a focus on reproducibility.

Assessment of student learning will be based on submitted assignments and a practical final exam (details below).

Pre-course Checklist

The material learned in most entry level statistics course provides important background and context on the topics that will be covered in this course. It is recommended that you re-familiarise yourself with this material before the course (e.g., measures of central tendency, variances, standard deviations, probability distributions, etc.).
The course requires the use of the following R packages: spatstat, maptools, sp, sf, viridis, gstat, rgdal, nlme, raster, and terra. It is recommended that you install these packages before the course.
The course uses RMarkdown for assignment submission. You are encouraged to familiarize yourself with RMarkdown before attending the first lab.

Optional Material

There is no textbook for this course, but if students are interested in expanding their knowledge beyond what is covered in the course, the following textbook is recommended:

Baddeley, Rubok, and Turner (2015). Spatial Point Patterns: Methodology and Applications with R. Chapman and Hall. (Available digitally through the UBC Library)

Course Evaluation

The assignments and evaluation for this course are structured along two distinct, but complimentary lines: i) guided practical assignments; and ii) a core research project. The practical assignments are designed to provide students with the opportunity to learn how to apply the methods learned in the lectures to real data using the statistical software R. The core research project is designed to train students in data analysis and presentation using best practices in open science. The core project is divided into 3 components aimed at replicating the typical analytical process: 1) initial hypothesis and expected outcome(s); 2) a verbal presentation of the analyses and findings; and 3) the submission of a paper describing the work.

Practical assignments	39%	Due on a ca. a weekly basis
Group Project	30%	In-class during the final week
Final Exam	31%	In-class during the final week
Total	100%

Practical Assignments (39%)

During the lab sessions, students will be guided through applied practicals. They will then be asked to complete in-class assignments based on the day’s lab. There will be a total of 3 practicals to be completed throughout the course (i.e., one due on the Monday following the lab). Each is designed to provide students with the opportunity to learn how to apply the methods learned in the lectures and labs to novel datasets using tools implemented in the statistical software R. The purpose of these lab assignments is to ensure that students have the skills necessary to complete effective statistical analyses that are central to problems involving spatial data. The skills gained by completing each week’s assignment will also help students prepare for the final exam and project. GitHub and Canvas will host the labs, assignments, and the various datasets.

Grading: Each lab assignment is worth a total of 13% of your total grade with some points coming from answering the questions correctly, and some from the cleanliness and documentation of your code. Late assignments will be penalised at a rate of 5% per day.

Group Projects (30%)

Working in groups of 3-4, students will be required to complete a data analysis assignment. This assignment is designed to provide students with an opportunity to apply what they have learned in the lectures and labs in an unguided format as they would if they were analysing spatial point data outside of a classroom setting. For this project, students will apply the modelling tools covered in the lectures on an empirical dataset (details below) and give a short presentation about their work on these data that summarises their findings. This presentation should be comprised of five sections: Introduction, Methods, Results, Discussion, References.

Introduction: Provide a brief description of the study system from which the data come and an outline of what questions you intend on exploring with the data, citing any relevant literature.

Methods: Briefly describe the data and what variables are included. Provide a detailed description of the analytical workflow that was applied to the data, citing any relevant literature and statistical packages employed. There should be enough information that anyone can reproduce the workflow if they had access to the data.

Results: Describe your statistical findings. Tables and figures should be used throughout.

Discussion: Provide a brief summary of your findings.

References: Include references to all necessary literature.

In addition, students are to create a GitHub repository for their project where all relevant code and files are to be stored.

Datasets: To complete these assignments, each group much select a species of their choosing and download the occurrence records from the Global Biodiversity Information Facility (GBIF) data repository (https://www.gbif.org/Links to an external site.). The species must be found in British Columbia. Groups will be provided provided with a set of environmental variables, and must use these to describe trends in the observed point data.

Grading: Presentations will be graded out of 100, with 75% based on the pre-provided rubric, and the GitHub repository worth an additional 25%.

Final Exam (31%)

The final exam will consist of a set of question designed to assess students’ understanding of the core concepts covered in the course. The exam will be comprehensive, and any material covered in the lectures and labs will be testable.

Grading: The final is worth a total of 30% of your total grade.

Note: Final grades may be subject to scaling based on departmental policies.

Lecture Outline

(Approximate schedule of topics covered in lectures)

Week	Lecture Topics	Practical Assignment
1	Course introduction; Spatial Intensity	Lab 01: Introduction to the spatstat package
2	Spatial Correlations; Poisson Point Process Models	Lab 02: Exploratory Data Analysis
3	Point Process Model Validation	Lab 03: Fitting and Validating Poisson Point Process Models
4	Spatial Autocorrelation; Variograms; Kriging	No Lab
5	Kriging with Covariates	Lab time used for presentations