Skip to contents
library(animovement)
library(tibble)
library(dplyr, warn.conflicts = FALSE)
library(readxl)
library(here)
#> here() starts at /home/runner/work/animovement/animovement
here::i_am("vignettes/articles/clean-tracks.Rmd")
#> here() starts at /home/runner/work/animovement/animovement

The next step in our workflow is to clean the tracks. This step commonly covers three separate components:

Although these are not always completely separate steps in practice, we will treat them as such to ensure the integrity of our tracks.

Outlier detection

If you are confident that your sensors give correct readings, you can skip this step.. Even in a typical trackball setup with optical flow sensors (e.g. computer mice), there is a chance that we will get a few spurious readings from the sensors. The aim of this step is to root out those readings.

We’ll use the data we read in the Read data article, so if you’ve missed that step you need to revisit how to read data. Let us first visualise our dx and dy values.

library(ggplot2)
df <- df |>
  mutate(displacement = sqrt(dx^2 + dy^2))
full_trace <- df |>
  ggplot(aes(time, displacement)) +
  geom_line() +
  xlab("Time (s)") +
  ylab("dx, dy (dots)")

partial_trace <- df |>
  filter(time < 20) |>
  ggplot(aes(time, displacement)) +
  geom_line() +
  xlab("Time (s)") +
  ylab("dx, dy (dots)")

hist_dist <- df |>
  ggplot(aes(displacement)) +
  geom_histogram()

library(patchwork)
full_trace / partial_trace / hist_dist
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Some animals spend a significant amount of time sitting still (as was the case here). To ensure that these zero-readings do not skew our outlier detection, let’s filter them out and have another look at the histograms.

hist_dist2 <- df |>
  filter(displacement != 0) |>
  ggplot(aes(displacement)) +
  geom_histogram()

hist_dist2
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Now we can see that there might be a few observations with extreme observations. It is now up to each researcher to decide whether to attempt filtering out outliers, or whether we are happy with what we have.

we may want to filter out all the 0 readings At first sight, it all looks fine. We could stop here and say we’re confident there are no spurious outliers. We could also apply a variety of automated outlier detection techniques. There are several ways of achieving this, but luckily the performance package has got us covered with their check_outliers() function. Since most of these methods rely on histograms/probability density functions,

library(performance)
df |>
  mutate(
    ax = dx - lag(dx),
    ay = dy - lag(dy)
  ) |>
  na.omit() |>
  filter(time < 200) |>
  filter(dx != 0) |>
  select(ax, ay) |>
  check_outliers() |>
  plot()

Interpolation

In case we removed any outliers, we can now interpolate across the gaps created.

TO BE CONTINUED…

Smoothing

All there is left to do is smooth our tracks, which is done using the smooth_tracks() function. The smoothing itself is super simple. smooth_tracks() provides a few different options:

  • roll_mean
  • roll_median
  • SOON savitsky_golay

For the rolling filters you can provide the window_width, i.e. how many observations to use in the rolling filter. The filters result in some NA values at the beginning and end of your data.

An important point about smooth_tracks() is that, instead of using our x and y values, it first back-transforms into the raw values obtained from your sensors (which are effectively “differences” between coordinates, so dx and dy) and performs the smoothing on them, before finally converting back to x and y. This may seem strange if you have previously worked with tracking data from computer vision or GPS loggers. However, whereas those modalities would return to the “true” coordinates after an outlier, mouse sensors do not. So the only way we can identify rogue values is by filtering those raw values.

Let’s try smoothing our data with a rolling_mean filter with 0.5 second (30 observations at at a sampling rate of 60Hz) window width. In case we work with multiple keypoints and/or individuals we can use group_by with our metadata for a tidyverse-friendly workflow.

df_smooth <- df |>
  smooth_track(method = "rolling_median", window_width = 5)

Let’s visualise how they compare. Note that although the difference may seem negligible when plotting paths, they may become important when computing derivatives such as velocity and acceleration.

library(ggplot2)
ggplot() +
  geom_path(data = df, aes(x, y), colour = "red") +
  geom_path(data = df_smooth, aes(x, y), colour = "blue")
#> Warning: Removed 5 rows containing missing values or values outside the scale range
#> (`geom_path()`).

Not that different as the sensors are doing a good job! But we can see that the smoothed track end a bit further to the left than the raw version.