Title:  A Versatile Cutting Tool 

Description:  A tool for cutting data into intervals. Allows singleton intervals. Always includes the whole range of data by default. Flexible labelling. Convenience functions for cutting by quantiles etc. Handles dates, times, units and other vectors. 
Authors:  David HughJones [aut, cre], Daniel Possenriede [ctb] 
Maintainer:  David HughJones <[email protected]> 
License:  MIT + file LICENSE 
Version:  1.0.0 
Built:  20241101 04:55:20 UTC 
Source:  https://github.com/hughjonesd/santoku 
santoku is a tool for cutting data into intervals. It provides
the function chop()
, which is similar to base R's cut()
or Hmisc::cut2()
.
chop(x, breaks)
takes a vector x
and returns a factor of the
same length, coding which interval each element of x
falls into.
Here are some advantages of santoku:
By default, chop()
always covers the whole range of the data, so you
won't get unexpected NA
values.
Unlike cut()
or cut2()
, chop()
can handle single values as well as
intervals. For example, chop(x, breaks = c(1, 2, 2, 3))
will create a
separate factor level for values exactly equal to 2.
Flexible and easy labelling.
Convenience functions for creating quantile intervals, evenlyspaced intervals or equalsized groups.
Convenience functions to quickly tabulate chopped data.
Can chop numbers, dates, datetimes and other objects.
These advantages make santoku especially useful for exploratory analysis, where you may not know the range of your data in advance.
To get started, read the vignette:
vignette("santoku")
For more details, start with the documentation for chop()
.
Maintainer: David HughJones [email protected]
Other contributors:
Daniel Possenriede [email protected] [contributor]
Useful links:
Report bugs at https://github.com/hughjonesd/santoku/issues
Class representing a set of intervals
## S3 method for class 'breaks' format(x, ...) ## S3 method for class 'breaks' print(x, ...) is.breaks(x, ...)
## S3 method for class 'breaks' format(x, ...) ## S3 method for class 'breaks' print(x, ...) is.breaks(x, ...)
x 
A breaks object 
... 
Unused 
Create a standard set of breaks
brk_default(breaks)
brk_default(breaks)
breaks 
A numeric vector. 
A function which returns an object of class breaks
.
chop(1:10, c(2, 5, 8)) chop(1:10, brk_default(c(2, 5, 8)))
chop(1:10, c(2, 5, 8)) chop(1:10, brk_default(c(2, 5, 8)))
breaks
object manuallyCreate a breaks
object manually
brk_manual(breaks, left_vec)
brk_manual(breaks, left_vec)
breaks 
A vector, which must be sorted. 
left_vec 
A logical vector, the same length as 
All breaks must be closed on exactly one side, like ..., x) [x, ...
(leftclosed) or ..., x) [x, ...
(rightclosed).
For example, if breaks = 1:3
and left = c(TRUE, FALSE, TRUE)
, then the
resulting intervals are
T F T [ 1, 2 ] ( 2, 3 )
Singleton breaks are created by repeating a number in breaks
. Singletons
must be closed on both sides, so if there is a repeated number
at indices i
, i+1
, left[i]
must be TRUE
and left[i+1]
must be
FALSE
.
A function which returns an object of class breaks
.
lbrks < brk_manual(1:3, rep(TRUE, 3)) chop(1:3, lbrks, extend = FALSE) rbrks < brk_manual(1:3, rep(FALSE, 3)) chop(1:3, rbrks, extend = FALSE) brks_singleton < brk_manual( c(1, 2, 2, 3), c(TRUE, TRUE, FALSE, TRUE)) chop(1:3, brks_singleton, extend = FALSE)
lbrks < brk_manual(1:3, rep(TRUE, 3)) chop(1:3, lbrks, extend = FALSE) rbrks < brk_manual(1:3, rep(FALSE, 3)) chop(1:3, rbrks, extend = FALSE) brks_singleton < brk_manual( c(1, 2, 2, 3), c(TRUE, TRUE, FALSE, TRUE)) chop(1:3, brks_singleton, extend = FALSE)
brk_width()
can be used with time interval classes from base R or the
lubridate
package.
## S3 method for class 'Duration' brk_width(width, start)
## S3 method for class 'Duration' brk_width(width, start)
width 

start 
If width
is a Period, lubridate::add_with_rollback()
is used to calculate the widths. This can be useful for e.g. calendar months.
if (requireNamespace("lubridate")) { year2001 < as.Date("20010101") + 0:364 tab_width(year2001, months(1), labels = lbl_discrete(" to ", fmt = "%e %b %y")) }
if (requireNamespace("lubridate")) { year2001 < as.Date("20010101") + 0:364 tab_width(year2001, months(1), labels = lbl_discrete(" to ", fmt = "%e %b %y")) }
chop()
cuts x
into intervals. It returns a factor
of the same length as
x
, representing which interval contains each element of x
.
kiru()
is an alias for chop
.
tab()
calls chop()
and returns a contingency table()
from the result.
chop( x, breaks, labels = lbl_intervals(), extend = NULL, left = TRUE, close_end = TRUE, raw = NULL, drop = TRUE ) kiru( x, breaks, labels = lbl_intervals(), extend = NULL, left = TRUE, close_end = TRUE, raw = NULL, drop = TRUE ) tab( x, breaks, labels = lbl_intervals(), extend = NULL, left = TRUE, close_end = TRUE, raw = NULL, drop = TRUE )
chop( x, breaks, labels = lbl_intervals(), extend = NULL, left = TRUE, close_end = TRUE, raw = NULL, drop = TRUE ) kiru( x, breaks, labels = lbl_intervals(), extend = NULL, left = TRUE, close_end = TRUE, raw = NULL, drop = TRUE ) tab( x, breaks, labels = lbl_intervals(), extend = NULL, left = TRUE, close_end = TRUE, raw = NULL, drop = TRUE )
x 
A vector. 
breaks 
A numeric vector of cutpoints or a function to create
cutpoints from 
labels 
A character vector of labels or a function to create labels. 
extend 
Logical. If 
left 
Logical. Leftclosed or rightclosed breaks? 
close_end 
Logical. Close last break at right? (If 
raw 
Logical. Use raw values in labels? 
drop 
Logical. Drop unused levels from the result? 
x
may be a numeric vector, or more generally, any vector which can be
compared with <
and ==
(see Ops). In particular Date
and datetime objects are supported. Character vectors
are supported with a warning.
breaks
may be a vector or a function.
If it is a vector, breaks
gives the break endpoints. Repeated values create
singleton intervals. For example breaks = c(1, 3, 3, 5)
creates 3
intervals: [1, 3)
, {3}
and (3, 5]
.
If breaks
is a function, it is called with the x
, extend
, left
and
close_end
arguments, and should return an object of class breaks
.
Use brk_*
functions to create a variety of datadependent breaks.
Names of breaks
may be used for labels. See "Labels" below.
By default, leftclosed intervals are created. If left
is FALSE
,
rightclosed intervals are created.
If close_end
is TRUE
the final break (or first break if left
is FALSE
)
will be closed at both ends. This guarantees that all values x
with
min(breaks) <= x <= max(breaks)
are included in the intervals.
Before version 0.9.0, close_end
was FALSE
by default, and also behaved
differently with respect to extended breaks: see "Extending intervals" below.
Using mathematical set notation:
If left
is TRUE
and close_end
is TRUE
, breaks will look like
[b1, b2), [b2, b3) ... [b_n1, b_n]
.
If left
is FALSE
and close_end
is TRUE
, breaks will look like
[b1, b2], (b2, b3] ... (b_n1, b_n]
.
If left
is TRUE
and close_end
is FALSE
, all breaks will look like
...[b1, b2) ...
.
If left
is FALSE
and close_end
is FALSE
, all breaks will look like
...(b1, b2] ...
.
If extend
is TRUE
, intervals will be extended to [Inf,
min(breaks))
and (max(breaks), Inf]
.
If extend
is NULL
(the default), intervals will be extended to
[min(x), min(breaks))
and (max(breaks), max(x)]
, only if
necessary – i.e. if elements of x
would be below or above the unextended
breaks.
close_end
is applied after breaks are extended, i.e. always to the very last
or very first break. This is a change from
previous behaviour. Up to version 0.8.0, close_end
was applied to the
userspecified intervals, then extend
was applied. Note that
if breaks are extended, then the extended break is always closed anyway.
labels
may be a character vector. It should have the same length as the
(possibly extended) number of intervals. Alternatively, labels
may be a
lbl_*
function such as lbl_seq()
.
If breaks
is a named vector, then nonzerolength names of breaks
will be
used as labels for the interval starting at the corresponding element. This
overrides the labels
argument (but unnamed breaks will still use labels
).
This feature is .
If labels
is NULL
, then integer codes will be returned instead of a
factor.
If raw
is TRUE
, labels will show the actual numbers calculated by breaks.
If raw
is FALSE
then labels may show other objects, such
as quantiles for chop_quantiles()
and friends, proportions of the range for
chop_proportions()
, or standard deviations for chop_mean_sd()
.
If raw
is NULL
then lbl_*
functions will use their default (usually
FALSE
). Otherwise, raw
argument to chop()
overrides raw
arguments
passed into lbl_*
functions directly.
NA
values in x
, and values which are outside the extended endpoints,
return NA
.
kiru()
is a synonym for chop()
. If you load {tidyr}
, you can use it to
avoid confusion with tidyr::chop()
.
Note that chop()
, like all of R, uses binary arithmetic. Thus, numbers may
not be exactly equal to what you think they should be. There is an example
below.
chop()
returns a factor
of the same length as x
, representing the
intervals containing the value of x
.
tab()
returns a contingency table()
.
base::cut()
, nonstandardtypes
for chopping objects that
aren't numbers.
Other chopping functions:
chop_equally()
,
chop_evenly()
,
chop_fn()
,
chop_mean_sd()
,
chop_n()
,
chop_proportions()
,
chop_quantiles()
,
chop_width()
,
fillet()
chop(1:7, c(2, 4, 6)) chop(1:7, c(2, 4, 6), extend = FALSE) # Repeat a number for a singleton break: chop(1:7, c(2, 4, 4, 6)) chop(1:7, c(2, 4, 6), left = FALSE) chop(1:7, c(2, 4, 6), close_end = FALSE) chop(1:7, brk_quantiles(c(0.25, 0.75))) # A single break is fine if `extend` is not `FALSE`: chop(1:7, 4) # Floating point inaccuracy: chop(0.3/3, c(0, 0.1, 0.1, 1), labels = c("< 0.1", "0.1", "> 0.1")) #  Labels  chop(1:7, c(Lowest = 1, Low = 2, Mid = 4, High = 6)) chop(1:7, c(2, 4, 6), labels = c("Lowest", "Low", "Mid", "High")) chop(1:7, c(2, 4, 6), labels = lbl_dash()) # Mixing names and other labels: chop(1:7, c("<2" = 1, 2, 4, ">=6" = 6), labels = lbl_dash()) #  Nonstandard types  chop(as.Date("20010101") + 1:7, as.Date("20010104")) suppressWarnings(chop(LETTERS[1:7], "D")) tab(1:10, c(2, 5, 8))
chop(1:7, c(2, 4, 6)) chop(1:7, c(2, 4, 6), extend = FALSE) # Repeat a number for a singleton break: chop(1:7, c(2, 4, 4, 6)) chop(1:7, c(2, 4, 6), left = FALSE) chop(1:7, c(2, 4, 6), close_end = FALSE) chop(1:7, brk_quantiles(c(0.25, 0.75))) # A single break is fine if `extend` is not `FALSE`: chop(1:7, 4) # Floating point inaccuracy: chop(0.3/3, c(0, 0.1, 0.1, 1), labels = c("< 0.1", "0.1", "> 0.1")) #  Labels  chop(1:7, c(Lowest = 1, Low = 2, Mid = 4, High = 6)) chop(1:7, c(2, 4, 6), labels = c("Lowest", "Low", "Mid", "High")) chop(1:7, c(2, 4, 6), labels = lbl_dash()) # Mixing names and other labels: chop(1:7, c("<2" = 1, 2, 4, ">=6" = 6), labels = lbl_dash()) #  Nonstandard types  chop(as.Date("20010101") + 1:7, as.Date("20010104")) suppressWarnings(chop(LETTERS[1:7], "D")) tab(1:10, c(2, 5, 8))
chop_equally()
chops x
into groups with an equal number of elements.
chop_equally( x, groups, ..., labels = lbl_intervals(), left = is.numeric(x), close_end = TRUE, raw = TRUE ) brk_equally(groups) tab_equally(x, groups, ..., left = is.numeric(x), raw = TRUE)
chop_equally( x, groups, ..., labels = lbl_intervals(), left = is.numeric(x), close_end = TRUE, raw = TRUE ) brk_equally(groups) tab_equally(x, groups, ..., left = is.numeric(x), raw = TRUE)
x 
A vector. 
groups 
Number of groups. 
... 
Passed to 
labels 
A character vector of labels or a function to create labels. 
left 
Logical. Leftclosed or rightclosed breaks? 
close_end 
Logical. Close last break at right? (If 
raw 
Logical. Use raw values in labels? 
chop_equally()
uses brk_quantiles()
under the hood. If x
has duplicate
elements, you may get fewer groups
than requested. If so, a warning will
be emitted. See the examples.
chop_*
functions return a factor
of the same length as x
.
brk_*
functions return a function
to create breaks
.
tab_*
functions return a contingency table()
.
Other chopping functions:
chop()
,
chop_evenly()
,
chop_fn()
,
chop_mean_sd()
,
chop_n()
,
chop_proportions()
,
chop_quantiles()
,
chop_width()
,
fillet()
chop_equally(1:10, 5) # You can't always guarantee `groups` groups: dupes < c(1, 1, 1, 2, 3, 4, 4, 4) quantile(dupes, 0:4/4) chop_equally(dupes, 4)
chop_equally(1:10, 5) # You can't always guarantee `groups` groups: dupes < c(1, 1, 1, 2, 3, 4, 4, 4) quantile(dupes, 0:4/4) chop_equally(dupes, 4)
chop_evenly()
chops x
into intervals
intervals of equal width.
chop_evenly(x, intervals, ..., close_end = TRUE) brk_evenly(intervals) tab_evenly(x, intervals, ...)
chop_evenly(x, intervals, ..., close_end = TRUE) brk_evenly(intervals) tab_evenly(x, intervals, ...)
x 
A vector. 
intervals 
Integer: number of intervals to create. 
... 
Passed to 
close_end 
Logical. Close last break at right? (If 
chop_evenly()
sets close_end = TRUE
by default.
chop_*
functions return a factor
of the same length as x
.
brk_*
functions return a function
to create breaks
.
tab_*
functions return a contingency table()
.
Other chopping functions:
chop()
,
chop_equally()
,
chop_fn()
,
chop_mean_sd()
,
chop_n()
,
chop_proportions()
,
chop_quantiles()
,
chop_width()
,
fillet()
chop_evenly(0:10, 5)
chop_evenly(0:10, 5)
chop_fn()
is a convenience wrapper: chop_fn(x, foo, ...)
is the same as chop(x, foo(x, ...))
.
chop_fn( x, fn, ..., extend = NULL, left = TRUE, close_end = TRUE, raw = NULL, drop = TRUE ) brk_fn(fn, ...) tab_fn( x, fn, ..., extend = NULL, left = TRUE, close_end = TRUE, raw = NULL, drop = TRUE )
chop_fn( x, fn, ..., extend = NULL, left = TRUE, close_end = TRUE, raw = NULL, drop = TRUE ) brk_fn(fn, ...) tab_fn( x, fn, ..., extend = NULL, left = TRUE, close_end = TRUE, raw = NULL, drop = TRUE )
x 
A vector. 
fn 
A function which returns a numeric vector of breaks. 
... 
Further arguments to 
extend 
Logical. If 
left 
Logical. Leftclosed or rightclosed breaks? 
close_end 
Logical. Close last break at right? (If 
raw 
Logical. Use raw values in labels? 
drop 
Logical. Drop unused levels from the result? 
chop_*
functions return a factor
of the same length as x
.
brk_*
functions return a function
to create breaks
.
tab_*
functions return a contingency table()
.
Other chopping functions:
chop()
,
chop_equally()
,
chop_evenly()
,
chop_mean_sd()
,
chop_n()
,
chop_proportions()
,
chop_quantiles()
,
chop_width()
,
fillet()
if (requireNamespace("scales")) { chop_fn(rlnorm(10), scales::breaks_log(5)) # same as # x < rlnorm(10) # chop(x, scales::breaks_log(5)(x)) }
if (requireNamespace("scales")) { chop_fn(rlnorm(10), scales::breaks_log(5)) # same as # x < rlnorm(10) # chop(x, scales::breaks_log(5)(x)) }
Intervals are measured in standard deviations on either side of the mean.
chop_mean_sd(x, sds = 1:3, ..., raw = FALSE, sd = deprecated()) brk_mean_sd(sds = 1:3, sd = deprecated()) tab_mean_sd(x, sds = 1:3, ..., raw = FALSE)
chop_mean_sd(x, sds = 1:3, ..., raw = FALSE, sd = deprecated()) brk_mean_sd(sds = 1:3, sd = deprecated()) tab_mean_sd(x, sds = 1:3, ..., raw = FALSE)
x 
A vector. 
sds 
Positive numeric vector of standard deviations. 
... 
Passed to 
raw 
Logical. Use raw values in labels? 
sd 
In version 0.7.0, these functions changed to specifying sds
as a vector.
To chop 1, 2 and 3 standard deviations around the mean, write
chop_mean_sd(x, sds = 1:3)
instead of chop_mean_sd(x, sd = 3)
.
chop_*
functions return a factor
of the same length as x
.
brk_*
functions return a function
to create breaks
.
tab_*
functions return a contingency table()
.
Other chopping functions:
chop()
,
chop_equally()
,
chop_evenly()
,
chop_fn()
,
chop_n()
,
chop_proportions()
,
chop_quantiles()
,
chop_width()
,
fillet()
chop_mean_sd(1:10) chop(1:10, brk_mean_sd()) tab_mean_sd(1:10)
chop_mean_sd(1:10) chop(1:10, brk_mean_sd()) tab_mean_sd(1:10)
chop_n()
creates intervals containing a fixed number of elements.
chop_n(x, n, ..., close_end = TRUE, tail = "split") brk_n(n, tail = "split") tab_n(x, n, ..., tail = "split")
chop_n(x, n, ..., close_end = TRUE, tail = "split") brk_n(n, tail = "split") tab_n(x, n, ..., tail = "split")
x 
A vector. 
n 
Integer. Number of elements in each interval. 
... 
Passed to 
close_end 
Logical. Close last break at right? (If 
tail 
String. What to do if the final interval has fewer than 
The algorithm guarantees that intervals contain no more than n
elements, so
long as there are no duplicates in x
and tail = "split"
. It also
guarantees that intervals contain no fewer than n
elements, except possibly
the last interval (or first interval if left
is FALSE
).
To ensure that all intervals contain at least n
elements (so long as there
are at least n
elements in x
!) set tail = "merge"
.
If tail = "split"
and there are intervals containing duplicates with more
than n
elements, a warning is given.
chop_*
functions return a factor
of the same length as x
.
brk_*
functions return a function
to create breaks
.
tab_*
functions return a contingency table()
.
Other chopping functions:
chop()
,
chop_equally()
,
chop_evenly()
,
chop_fn()
,
chop_mean_sd()
,
chop_proportions()
,
chop_quantiles()
,
chop_width()
,
fillet()
chop_n(1:10, 5) chop_n(1:5, 2) chop_n(1:5, 2, tail = "merge") # too many duplicates x < rep(1:2, each = 3) chop_n(x, 2) tab_n(1:10, 5) # fewer elements in one group tab_n(1:10, 4)
chop_n(1:10, 5) chop_n(1:5, 2) chop_n(1:5, 2, tail = "merge") # too many duplicates x < rep(1:2, each = 3) chop_n(x, 2) tab_n(1:10, 5) # fewer elements in one group tab_n(1:10, 4)
chop_pretty()
uses base::pretty()
to calculate breakpoints
which are 1, 2 or 5 times a power of 10. These look nice in graphs.
chop_pretty(x, n = 5, ...) brk_pretty(n = 5, ...) tab_pretty(x, n = 5, ...)
chop_pretty(x, n = 5, ...) brk_pretty(n = 5, ...) tab_pretty(x, n = 5, ...)
x 
A vector. 
n 
Positive integer passed to 
... 
Passed to 
base::pretty()
tries to return n+1
breakpoints, i.e. n
intervals, but
note that this is not guaranteed. There are methods for Date and POSIXct
objects.
For finegrained control over base::pretty()
parameters, use
chop(x, brk_pretty(...))
.
chop_*
functions return a factor
of the same length as x
.
brk_*
functions return a function
to create breaks
.
tab_*
functions return a contingency table()
.
chop_pretty(1:10) chop(1:10, brk_pretty(n = 5, high.u.bias = 0)) tab_pretty(1:10)
chop_pretty(1:10) chop(1:10, brk_pretty(n = 5, high.u.bias = 0)) tab_pretty(1:10)
chop_proportions()
chops x
into proportions
of its range, excluding
infinite values.
chop_proportions(x, proportions, ..., raw = TRUE) brk_proportions(proportions) tab_proportions(x, proportions, ..., raw = TRUE)
chop_proportions(x, proportions, ..., raw = TRUE) brk_proportions(proportions) tab_proportions(x, proportions, ..., raw = TRUE)
x 
A vector. 
proportions 
Numeric vector between 0 and 1: proportions of x's range.
If 
... 
Passed to 
raw 
Logical. Use raw values in labels? 
By default, labels show the raw numeric endpoints. To label intervals by
the proportions, use raw = FALSE
.
chop_*
functions return a factor
of the same length as x
.
brk_*
functions return a function
to create breaks
.
tab_*
functions return a contingency table()
.
Other chopping functions:
chop()
,
chop_equally()
,
chop_evenly()
,
chop_fn()
,
chop_mean_sd()
,
chop_n()
,
chop_quantiles()
,
chop_width()
,
fillet()
chop_proportions(0:10, c(0.2, 0.8)) chop_proportions(0:10, c(Low = 0, Mid = 0.2, High = 0.8))
chop_proportions(0:10, c(0.2, 0.8)) chop_proportions(0:10, c(Low = 0, Mid = 0.2, High = 0.8))
chop_quantiles()
chops data by quantiles.
chop_deciles()
is a convenience function which chops into deciles.
chop_quantiles( x, probs, ..., left = is.numeric(x), raw = FALSE, weights = NULL ) chop_deciles(x, ...) brk_quantiles(probs, ..., weights = NULL) tab_quantiles(x, probs, ..., left = is.numeric(x), raw = FALSE) tab_deciles(x, ...)
chop_quantiles( x, probs, ..., left = is.numeric(x), raw = FALSE, weights = NULL ) chop_deciles(x, ...) brk_quantiles(probs, ..., weights = NULL) tab_quantiles(x, probs, ..., left = is.numeric(x), raw = FALSE) tab_deciles(x, ...)
x 
A vector. 
probs 
A vector of probabilities for the quantiles. If 
... 
For 
left 
Logical. Leftclosed or rightclosed breaks? 
raw 
Logical. Use raw values in labels? 
weights 

For nonnumeric x
, left
is set to FALSE
by default. This works better
for calculating "type 1" quantiles, since they round down. See
stats::quantile()
.
If x
contains duplicates, consecutive quantiles may be the same number
so that some intervals get merged.
chop_*
functions return a factor
of the same length as x
.
brk_*
functions return a function
to create breaks
.
tab_*
functions return a contingency table()
.
Other chopping functions:
chop()
,
chop_equally()
,
chop_evenly()
,
chop_fn()
,
chop_mean_sd()
,
chop_n()
,
chop_proportions()
,
chop_width()
,
fillet()
chop_quantiles(1:10, 1:3/4) chop_quantiles(1:10, c(Q1 = 0, Q2 = 0.25, Q3 = 0.5, Q4 = 0.75)) chop(1:10, brk_quantiles(1:3/4)) chop_deciles(1:10) # to label by the quantiles themselves: chop_quantiles(1:10, 1:3/4, raw = TRUE) # duplicates: tab_quantiles(c(1, 1, 1, 2, 3), 1:5/5) set.seed(42) tab_quantiles(rnorm(100), probs = 1:3/4, raw = TRUE)
chop_quantiles(1:10, 1:3/4) chop_quantiles(1:10, c(Q1 = 0, Q2 = 0.25, Q3 = 0.5, Q4 = 0.75)) chop(1:10, brk_quantiles(1:3/4)) chop_deciles(1:10) # to label by the quantiles themselves: chop_quantiles(1:10, 1:3/4, raw = TRUE) # duplicates: tab_quantiles(c(1, 1, 1, 2, 3), 1:5/5) set.seed(42) tab_quantiles(rnorm(100), probs = 1:3/4, raw = TRUE)
chop_width()
chops x
into intervals of fixed width
.
chop_width(x, width, start, ..., left = sign(width) > 0) brk_width(width, start) ## Default S3 method: brk_width(width, start) tab_width(x, width, start, ..., left = sign(width) > 0)
chop_width(x, width, start, ..., left = sign(width) > 0) brk_width(width, start) ## Default S3 method: brk_width(width, start) tab_width(x, width, start, ..., left = sign(width) > 0)
x 
A vector. 
width 
Width of intervals. 
start 
Starting point for intervals. By default the smallest
finite 
... 
Passed to 
left 
Logical. Leftclosed or rightclosed breaks? 
If width
is negative, chop_width()
sets left = FALSE
and intervals will
go downwards from start
.
chop_*
functions return a factor
of the same length as x
.
brk_*
functions return a function
to create breaks
.
tab_*
functions return a contingency table()
.
Other chopping functions:
chop()
,
chop_equally()
,
chop_evenly()
,
chop_fn()
,
chop_mean_sd()
,
chop_n()
,
chop_proportions()
,
chop_quantiles()
,
fillet()
chop_width(1:10, 2) chop_width(1:10, 2, start = 0) chop_width(1:9, 2) chop(1:10, brk_width(2, 0)) tab_width(1:10, 2, start = 0)
chop_width(1:10, 2) chop_width(1:10, 2, start = 0) chop_width(1:9, 2) chop(1:10, brk_width(2, 0)) tab_width(1:10, 2, start = 0)
exactly()
duplicates its input.
It lets you define singleton intervals like this: chop(x, c(1, exactly(2), 3))
.
This is the same as chop(x, c(1, 2, 2, 3))
but conveys your intent more
clearly.
exactly(x)
exactly(x)
x 
A numeric vector. 
The same as rep(x, each = 2)
.
chop(1:10, c(2, exactly(5), 8)) # same: chop(1:10, c(2, 5, 5, 8))
chop(1:10, c(2, exactly(5), 8)) # same: chop(1:10, c(2, 5, 5, 8))
fillet()
calls chop()
with extend = FALSE
and drop = FALSE
. This
ensures that you get only the breaks
and labels
you ask for. When
programming, consider using fillet()
instead of chop()
.
fillet( x, breaks, labels = lbl_intervals(), left = TRUE, close_end = TRUE, raw = NULL )
fillet( x, breaks, labels = lbl_intervals(), left = TRUE, close_end = TRUE, raw = NULL )
x 
A vector. 
breaks 
A numeric vector of cutpoints or a function to create
cutpoints from 
labels 
A character vector of labels or a function to create labels. 
left 
Logical. Leftclosed or rightclosed breaks? 
close_end 
Logical. Close last break at right? (If 
raw 
Logical. Use raw values in labels? 
fillet()
returns a factor
of the same length as x
, representing
the intervals containing the value of x
.
Other chopping functions:
chop()
,
chop_equally()
,
chop_evenly()
,
chop_fn()
,
chop_mean_sd()
,
chop_n()
,
chop_proportions()
,
chop_quantiles()
,
chop_width()
fillet(1:10, c(2, 5, 8))
fillet(1:10, c(2, 5, 8))
This label style is userfriendly, but doesn't distinguish between left and rightclosed intervals. It's good for continuous data where you don't expect points to be exactly on the breaks.
lbl_dash( symbol = em_dash(), fmt = NULL, single = "{l}", first = NULL, last = NULL, raw = FALSE )
lbl_dash( symbol = em_dash(), fmt = NULL, single = "{l}", first = NULL, last = NULL, raw = FALSE )
symbol 
String: symbol to use for the dash. 
fmt 
String, list or function. A format for break endpoints. 
single 
Glue string: label for singleton intervals. See 
first 
Glue string: override label for the first category. Write e.g.

last 
String: override label for the last category. Write e.g.

raw 
. Use the 
If you don't want unicode output, use lbl_dash("")
.
A function that creates a vector of labels.
If fmt
is not NULL
then it is used to format the endpoints.
If fmt
is a string, then numeric endpoints will be formatted by
sprintf(fmt, breaks)
; other endpoints, e.g. Date objects, will be
formatted by format(breaks, fmt)
.
If fmt
is a list, then it will be used as arguments to format.
If fmt
is a function, it should take a vector of numbers (or other objects
that can be used as breaks) and return a character vector. It may be helpful
to use functions from the {scales}
package, e.g. scales::label_comma()
.
Other labelling functions:
lbl_discrete()
,
lbl_endpoints()
,
lbl_glue()
,
lbl_intervals()
,
lbl_manual()
,
lbl_midpoints()
,
lbl_seq()
chop(1:10, c(2, 5, 8), lbl_dash()) chop(1:10, c(2, 5, 8), lbl_dash(" to ", fmt = "%.1f")) chop(1:10, c(2, 5, 8), lbl_dash(first = "<{r}")) pretty < function (x) prettyNum(x, big.mark = ",", digits = 1) chop(runif(10) * 10000, c(3000, 7000), lbl_dash(" to ", fmt = pretty))
chop(1:10, c(2, 5, 8), lbl_dash()) chop(1:10, c(2, 5, 8), lbl_dash(" to ", fmt = "%.1f")) chop(1:10, c(2, 5, 8), lbl_dash(first = "<{r}")) pretty < function (x) prettyNum(x, big.mark = ",", digits = 1) chop(runif(10) * 10000, c(3000, 7000), lbl_dash(" to ", fmt = pretty))
lbl_discrete()
creates labels for discrete data, such as integers.
For example, breaks
c(1, 3, 4, 6, 7)
are labelled: "12", "3", "45", "67"
.
lbl_discrete( symbol = em_dash(), unit = 1, fmt = NULL, single = NULL, first = NULL, last = NULL )
lbl_discrete( symbol = em_dash(), unit = 1, fmt = NULL, single = NULL, first = NULL, last = NULL )
symbol 
String: symbol to use for the dash. 
unit 
Minimum difference between distinct values of data. For integers, 1. 
fmt 
String, list or function. A format for break endpoints. 
single 
Glue string: label for singleton intervals. See 
first 
Glue string: override label for the first category. Write e.g.

last 
String: override label for the last category. Write e.g.

No check is done that the data are discretevalued. If they are not, then
these labels may be misleading. Here, discretevalued means that if
x < y
, then x <= y  unit
.
Be aware that Date objects may have noninteger values. See Date.
A function that creates a vector of labels.
If fmt
is not NULL
then it is used to format the endpoints.
If fmt
is a string, then numeric endpoints will be formatted by
sprintf(fmt, breaks)
; other endpoints, e.g. Date objects, will be
formatted by format(breaks, fmt)
.
If fmt
is a list, then it will be used as arguments to format.
If fmt
is a function, it should take a vector of numbers (or other objects
that can be used as breaks) and return a character vector. It may be helpful
to use functions from the {scales}
package, e.g. scales::label_comma()
.
Other labelling functions:
lbl_dash()
,
lbl_endpoints()
,
lbl_glue()
,
lbl_intervals()
,
lbl_manual()
,
lbl_midpoints()
,
lbl_seq()
tab(1:7, c(1, 3, 5), lbl_discrete()) tab(1:7, c(3, 5), lbl_discrete(first = "<= {r}")) tab(1:7 * 1000, c(1, 3, 5) * 1000, lbl_discrete(unit = 1000)) # Misleading labels for noninteger data chop(2.5, c(1, 3, 5), lbl_discrete())
tab(1:7, c(1, 3, 5), lbl_discrete()) tab(1:7, c(3, 5), lbl_discrete(first = "<= {r}")) tab(1:7 * 1000, c(1, 3, 5) * 1000, lbl_discrete(unit = 1000)) # Misleading labels for noninteger data chop(2.5, c(1, 3, 5), lbl_discrete())
This is useful when the left endpoint unambiguously indicates the interval. In other cases it may give errors due to duplicate labels.
lbl_endpoints( left = TRUE, fmt = NULL, single = NULL, first = NULL, last = NULL, raw = FALSE ) lbl_endpoint(fmt = NULL, raw = FALSE, left = TRUE)
lbl_endpoints( left = TRUE, fmt = NULL, single = NULL, first = NULL, last = NULL, raw = FALSE ) lbl_endpoint(fmt = NULL, raw = FALSE, left = TRUE)
left 
Flag. Use left endpoint or right endpoint? 
fmt 
String, list or function. A format for break endpoints. 
single 
Glue string: label for singleton intervals. See 
first 
Glue string: override label for the first category. Write e.g.

last 
String: override label for the last category. Write e.g.

raw 
. Use the 
lbl_endpoint()
is and gives an
error since santoku 1.0.0.
A function that creates a vector of labels.
If fmt
is not NULL
then it is used to format the endpoints.
If fmt
is a string, then numeric endpoints will be formatted by
sprintf(fmt, breaks)
; other endpoints, e.g. Date objects, will be
formatted by format(breaks, fmt)
.
If fmt
is a list, then it will be used as arguments to format.
If fmt
is a function, it should take a vector of numbers (or other objects
that can be used as breaks) and return a character vector. It may be helpful
to use functions from the {scales}
package, e.g. scales::label_comma()
.
Other labelling functions:
lbl_dash()
,
lbl_discrete()
,
lbl_glue()
,
lbl_intervals()
,
lbl_manual()
,
lbl_midpoints()
,
lbl_seq()
chop(1:10, c(2, 5, 8), lbl_endpoints(left = TRUE)) chop(1:10, c(2, 5, 8), lbl_endpoints(left = FALSE)) if (requireNamespace("lubridate")) { tab_width( as.Date("20000101") + 0:365, months(1), labels = lbl_endpoints(fmt = "%b") ) } ## Not run: # This gives breaks `[1, 2) [2, 3) {3}` which lead to # duplicate labels `"2", "3", "3"`: chop(1:3, 1:3, lbl_endpoints(left = FALSE)) ## End(Not run)
chop(1:10, c(2, 5, 8), lbl_endpoints(left = TRUE)) chop(1:10, c(2, 5, 8), lbl_endpoints(left = FALSE)) if (requireNamespace("lubridate")) { tab_width( as.Date("20000101") + 0:365, months(1), labels = lbl_endpoints(fmt = "%b") ) } ## Not run: # This gives breaks `[1, 2) [2, 3) {3}` which lead to # duplicate labels `"2", "3", "3"`: chop(1:3, 1:3, lbl_endpoints(left = FALSE)) ## End(Not run)
glue
packageUse "{l}"
and "{r}"
to show the left and right endpoints of the intervals.
lbl_glue( label, fmt = NULL, single = NULL, first = NULL, last = NULL, raw = FALSE, ... )
lbl_glue( label, fmt = NULL, single = NULL, first = NULL, last = NULL, raw = FALSE, ... )
label 
A glue string passed to 
fmt 
String, list or function. A format for break endpoints. 
single 
Glue string: label for singleton intervals. See 
first 
Glue string: override label for the first category. Write e.g.

last 
String: override label for the last category. Write e.g.

raw 
. Use the 
... 
Further arguments passed to 
The following variables are available in the glue string:
l
is a character vector of left endpoints of intervals.
r
is a character vector of right endpoints of intervals.
l_closed
is a logical vector. Elements are TRUE
when the left
endpoint is closed.
r_closed
is a logical vector, TRUE
when the right endpoint is closed.
Endpoints will be formatted by fmt
before being passed to glue()
.
A function that creates a vector of labels.
If fmt
is not NULL
then it is used to format the endpoints.
If fmt
is a string, then numeric endpoints will be formatted by
sprintf(fmt, breaks)
; other endpoints, e.g. Date objects, will be
formatted by format(breaks, fmt)
.
If fmt
is a list, then it will be used as arguments to format.
If fmt
is a function, it should take a vector of numbers (or other objects
that can be used as breaks) and return a character vector. It may be helpful
to use functions from the {scales}
package, e.g. scales::label_comma()
.
Other labelling functions:
lbl_dash()
,
lbl_discrete()
,
lbl_endpoints()
,
lbl_intervals()
,
lbl_manual()
,
lbl_midpoints()
,
lbl_seq()
tab(1:10, c(1, 3, 3, 7), labels = lbl_glue("{l} to {r}", single = "Exactly {l}")) tab(1:10 * 1000, c(1, 3, 5, 7) * 1000, labels = lbl_glue("{l}{r}", fmt = function(x) prettyNum(x, big.mark=','))) # reproducing lbl_intervals(): interval_left < "{ifelse(l_closed, '[', '(')}" interval_right < "{ifelse(r_closed, ']', ')')}" glue_string < paste0(interval_left, "{l}", ", ", "{r}", interval_right) tab(1:10, c(1, 3, 3, 7), labels = lbl_glue(glue_string, single = "{{{l}}}"))
tab(1:10, c(1, 3, 3, 7), labels = lbl_glue("{l} to {r}", single = "Exactly {l}")) tab(1:10 * 1000, c(1, 3, 5, 7) * 1000, labels = lbl_glue("{l}{r}", fmt = function(x) prettyNum(x, big.mark=','))) # reproducing lbl_intervals(): interval_left < "{ifelse(l_closed, '[', '(')}" interval_right < "{ifelse(r_closed, ']', ')')}" glue_string < paste0(interval_left, "{l}", ", ", "{r}", interval_right) tab(1:10, c(1, 3, 3, 7), labels = lbl_glue(glue_string, single = "{{{l}}}"))
These labels are the most exact, since they show you whether intervals are "closed" or "open", i.e. whether they include their endpoints.
lbl_intervals( fmt = NULL, single = "{{{l}}}", first = NULL, last = NULL, raw = FALSE )
lbl_intervals( fmt = NULL, single = "{{{l}}}", first = NULL, last = NULL, raw = FALSE )
fmt 
String, list or function. A format for break endpoints. 
single 
Glue string: label for singleton intervals. See 
first 
Glue string: override label for the first category. Write e.g.

last 
String: override label for the last category. Write e.g.

raw 
. Use the 
Mathematical set notation looks like this:
[a, b]
: all numbers x
where a <= x <= b
;
(a, b)
: all numbers where a < x < b
;
[a, b)
: all numbers where a <= x < b
;
(a, b]
: all numbers where a < x <= b
;
{a}
: just the number a
exactly.
A function that creates a vector of labels.
If fmt
is not NULL
then it is used to format the endpoints.
If fmt
is a string, then numeric endpoints will be formatted by
sprintf(fmt, breaks)
; other endpoints, e.g. Date objects, will be
formatted by format(breaks, fmt)
.
If fmt
is a list, then it will be used as arguments to format.
If fmt
is a function, it should take a vector of numbers (or other objects
that can be used as breaks) and return a character vector. It may be helpful
to use functions from the {scales}
package, e.g. scales::label_comma()
.
Other labelling functions:
lbl_dash()
,
lbl_discrete()
,
lbl_endpoints()
,
lbl_glue()
,
lbl_manual()
,
lbl_midpoints()
,
lbl_seq()
tab(10:10, c(3, 0, 0, 3), labels = lbl_intervals()) tab(10:10, c(3, 0, 0, 3), labels = lbl_intervals(fmt = list(nsmall = 1))) tab_evenly(runif(20), 10, labels = lbl_intervals(fmt = percent))
tab(10:10, c(3, 0, 0, 3), labels = lbl_intervals()) tab(10:10, c(3, 0, 0, 3), labels = lbl_intervals(fmt = list(nsmall = 1))) tab_evenly(runif(20), 10, labels = lbl_intervals(fmt = percent))
This uses the midpoint of each interval for its label.
lbl_midpoints( fmt = NULL, single = NULL, first = NULL, last = NULL, raw = FALSE )
lbl_midpoints( fmt = NULL, single = NULL, first = NULL, last = NULL, raw = FALSE )
fmt 
String, list or function. A format for break endpoints. 
single 
Glue string: label for singleton intervals. See 
first 
Glue string: override label for the first category. Write e.g.

last 
String: override label for the last category. Write e.g.

raw 
. Use the 
A function that creates a vector of labels.
If fmt
is not NULL
then it is used to format the endpoints.
If fmt
is a string, then numeric endpoints will be formatted by
sprintf(fmt, breaks)
; other endpoints, e.g. Date objects, will be
formatted by format(breaks, fmt)
.
If fmt
is a list, then it will be used as arguments to format.
If fmt
is a function, it should take a vector of numbers (or other objects
that can be used as breaks) and return a character vector. It may be helpful
to use functions from the {scales}
package, e.g. scales::label_comma()
.
Other labelling functions:
lbl_dash()
,
lbl_discrete()
,
lbl_endpoints()
,
lbl_glue()
,
lbl_intervals()
,
lbl_manual()
,
lbl_seq()
chop(1:10, c(2, 5, 8), lbl_midpoints())
chop(1:10, c(2, 5, 8), lbl_midpoints())
lbl_seq()
labels intervals sequentially, using numbers or letters.
lbl_seq(start = "a")
lbl_seq(start = "a")
start 
String. A template for the sequence. See below. 
start
shows the first element of the sequence. It must contain exactly one
character out of the set "a", "A", "i", "I" or "1". For later elements:
"a" will be replaced by "a", "b", "c", ...
"A" will be replaced by "A", "B", "C", ...
"i" will be replaced by lowercase Roman numerals "i", "ii", "iii", ...
"I" will be replaced by uppercase Roman numerals "I", "II", "III", ...
"1" will be replaced by numbers "1", "2", "3", ...
Other characters will be retained asis.
A function that creates a vector of labels.
Other labelling functions:
lbl_dash()
,
lbl_discrete()
,
lbl_endpoints()
,
lbl_glue()
,
lbl_intervals()
,
lbl_manual()
,
lbl_midpoints()
chop(1:10, c(2, 5, 8), lbl_seq()) chop(1:10, c(2, 5, 8), lbl_seq("i.")) chop(1:10, c(2, 5, 8), lbl_seq("(A)"))
chop(1:10, c(2, 5, 8), lbl_seq()) chop(1:10, c(2, 5, 8), lbl_seq("i.")) chop(1:10, c(2, 5, 8), lbl_seq("(A)"))
Santoku can handle many nonstandard types.
If objects can be compared using <
, ==
etc. then they should
be choppable.
Objects which can't be converted to numeric are handled within R code, which may be slower.
Character x
and breaks
are chopped with a warning.
If x
and breaks
are not the same type, they should be able to
be cast to the same type, usually using vctrs::vec_cast_common()
.
Not all chopping operations make sense, for example, chop_mean_sd()
on a character vector.
For indexed objects such as stats::ts()
objects, indices will be dropped
from the result.
If you get errors, try setting extend = FALSE
(but also file a bug report).
To request support for a type, open an issue on Github.
brkwidthforDatetime
percent()
formats x
as a percentage.
For a wider range of formatters, consider the scales
package.
percent(x)
percent(x)
x 
Numeric values. 
x
formatted as a percent.
percent(0.5)
percent(0.5)