Classify Values Into Sequences with Minimum Run Length Constraints
Source:R/classify_by_threshold.R
classify_by_threshold.Rd
Classifies numeric values into "high" and "low" categories based on a threshold, while enforcing minimum run lengths for both categories. Values exceeding the threshold are classified as "high", others as "low". Short runs that don't meet the minimum length requirement are reclassified into the opposite category.
Usage
classify_by_threshold(
values,
threshold,
min_low_frames,
min_high_frames,
return_type = c("numeric", "factor")
)
Arguments
- values
Numeric vector to be classified
- threshold
Numeric value used as classification boundary between "high" and "low"
- min_low_frames
Minimum number of consecutive frames required for a "low" sequence
- min_high_frames
Minimum number of consecutive frames required for a "high" sequence
- return_type
Should the function return "factor" ("high"/"low") or "numeric" (1/0) (default: "numeric")
Value
Character vector of same length as input, with values classified as either "high" or "low". NA values in input remain NA in output.
Details
The classification process occurs in two steps:
Initial classification based on threshold
Reclassification of sequences that don't meet minimum length requirements
The function first processes "low" sequences, then "high" sequences. This order can affect the final classification when there are competing minimum length requirements.
Examples
# Basic usage
values <- c(1, 1.5, 2.8, 3.2, 3.0, 2.9, 1.2, 1.1)
result <- classify_by_threshold(values,
threshold = 2.5,
min_low_frames = 2,
min_high_frames = 3)
# Handling NAs
values_with_na <- c(1, NA, 3, 3.2, NA, 1.2)
result <- classify_by_threshold(values_with_na,
threshold = 2.5,
min_low_frames = 2,
min_high_frames = 2)