Identifies peaks (local maxima) in a numeric time series, with options to filter peaks based on height and prominence. The function handles missing values (NA) appropriately and is compatible with dplyr's mutate. Includes flexible handling of plateaus and adjustable window size for peak detection.
Usage
find_peaks(
x,
min_height = -Inf,
min_prominence = 0,
plateau_handling = c("strict", "middle", "first", "last", "all"),
window_size = 3
)
Arguments
- x
Numeric vector containing the time series data
- min_height
Minimum height threshold for peaks (default: -Inf)
- min_prominence
Minimum prominence threshold for peaks (default: 0)
- plateau_handling
String specifying how to handle plateaus. One of:
"strict" (default): No points in plateau are peaks
"middle": Middle point(s) of plateau are peaks
"first": First point of plateau is peak
"last": Last point of plateau is peak
"all": All points in plateau are peaks
- window_size
Integer specifying the size of the window to use for peak detection (default: 3). Must be odd and >= 3. Larger values detect peaks over wider ranges.
Value
A logical vector of the same length as the input where:
TRUE
indicates a confirmed peakFALSE
indicates a non-peakNA
indicates peak status could not be determined due to missing data
Details
The function uses a sliding window algorithm for peak detection (window size specified by window_size parameter), combined with a region-based prominence calculation method similar to that described in Palshikar (2009).
Note
The function is optimized for use with dplyr's mutate
For noisy data, consider using a larger window_size or smoothing the series before peak detection
Adjust min_height and min_prominence to filter out unwanted peaks
Choose plateau_handling based on your specific needs
Larger window_size values result in more stringent peak detection
Peak Detection
A point is considered a peak if it is the highest point within its window (default window_size of 3 compares each point with its immediate neighbors). The first and last (window_size-1)/2 points in the series cannot be peaks and are marked as NA. Larger window sizes will identify peaks that dominate over a wider range, typically resulting in fewer peaks being detected.
Prominence
Prominence measures how much a peak stands out relative to its surrounding values. It is calculated as the height of the peak minus the height of the highest minimum between this peak and any higher peaks (or the end of the series if no higher peaks exist).
Plateau Handling
Plateaus (sequences of identical values) are handled according to the plateau_handling parameter:
strict: No points in a plateau are considered peaks (traditional behavior)
middle: For plateaus of odd length, the middle point is marked as a peak. For plateaus of even length, the two middle points are marked as peaks.
first: The first point of each plateau is marked as a peak
last: The last point of each plateau is marked as a peak
all: Every point in the plateau is marked as a peak
Note that in all cases, the plateau must still qualify as a peak relative to its surrounding window (i.e., higher than all other points in the window).
Missing Values (NA) Handling
The function uses the following rules for handling NAs:
If a point is NA, it cannot be a peak (returns NA)
If any point in the window is NA, peak status cannot be determined (returns NA)
For prominence calculations, stretches of NAs are handled appropriately
A minimum of window_size points is required; shorter series return all NAs
References
Palshikar, G. (2009). Simple Algorithms for Peak Detection in Time-Series. Proc. 1st Int. Conf. Advanced Data Analysis, Business Analytics and Intelligence.
See also
find_troughs
for finding local minimafindpeaks
in the pracma package for alternative peak detection methods
Examples
# Basic usage with default window size (3)
x <- c(1, 3, 2, 6, 4, 5, 2)
find_peaks(x)
#> [1] NA TRUE FALSE TRUE FALSE TRUE NA
# With larger window size
find_peaks(x, window_size = 5) # More stringent peak detection
#> [1] NA NA FALSE TRUE FALSE NA NA
# With minimum height
find_peaks(x, min_height = 4, window_size = 3)
#> [1] NA FALSE FALSE TRUE FALSE TRUE NA
# With plateau handling
x <- c(1, 3, 3, 3, 2, 4, 4, 1)
find_peaks(x, plateau_handling = "middle", window_size = 3) # Middle of plateaus
#> [1] NA FALSE TRUE FALSE FALSE TRUE TRUE NA
find_peaks(x, plateau_handling = "all", window_size = 5) # All plateau points
#> [1] NA NA FALSE FALSE FALSE TRUE TRUE NA
# With missing values
x <- c(1, 3, NA, 6, 4, NA, 2)
find_peaks(x)
#> [1] NA NA NA NA NA NA NA
# Usage with dplyr
library(dplyr)
data_frame(
time = 1:10,
value = c(1, 3, 7, 4, 2, 6, 5, 8, 4, 2)
) %>%
mutate(peaks = find_peaks(value, window_size = 3))
#> Warning: `data_frame()` was deprecated in tibble 1.1.0.
#> ℹ Please use `tibble()` instead.
#> # A tibble: 10 × 3
#> time value peaks
#> <int> <dbl> <lgl>
#> 1 1 1 NA
#> 2 2 3 FALSE
#> 3 3 7 TRUE
#> 4 4 4 FALSE
#> 5 5 2 FALSE
#> 6 6 6 TRUE
#> 7 7 5 FALSE
#> 8 8 8 TRUE
#> 9 9 4 FALSE
#> 10 10 2 NA