Rarefy BioTIME data to an equal number of samples per year
Source:R/resamplingBioTIME.R
resampling.RdTakes the output of gridding and applies sample-based
rarefaction to standardise the number of samples per year within each
cell-level time series (i.e. assemblageID).
Usage
resampling(
x,
measure,
resamps = 1L,
conservative = FALSE,
summarise = TRUE,
verbose = TRUE
)Arguments
- x
(
data.frame) BioTIME gridded data to be resampled (in the format of the output of thegriddingfunction).- measure
(
character) currency to be retained during the sample-based rarefaction. Can be either defined by a single column name or a vector of two or more column names.- resamps
(
integer) number of repetitions. Default is 1.- conservative
(
logical).FALSEby default. IfTRUE, whenever aNAis found in the measure field(s), the whole sample is removed instead of the missing observations only.- summarise
(
logical).TRUEby default. IfFALSE, the function returns abundance and/or biomass summed at the SAMPLE_DESC level git pull(i.e., per sample), rather than per species per year.- verbose
(
logical).TRUEby default. If FALSE, warnings when NA values or one-year-long time series are found inxand excluded are hidden.
Value
Returns a single long form data.frame containing the total
currency or currencies of interest (sum) for each species in each year within
each rarefied time series (i.e. assemblageID). An extra integer column
called resamp indicates the specific iteration.
Details
Sample-based rarefaction prevents temporal variation in sampling
effort from affecting diversity estimates (see Gotelli N.J., Colwell R.K.
2001 Quantifying biodiversity: procedures and pitfalls in the measurement and
comparison of species richness. Ecology Letters 4(4), 379-391) by selecting
an equal number of samples across all years in a time series.
resampling counts the number of unique samples taken in each year
(sampling effort), identifies the minimum number of samples across all years,
and then uses this minimum to randomly resample each year down to that
number. Thus, standardising the sampling effort between years, standard
biodiversity metrics can be calculated based on an equal number of samples
(e.g. using getAlphaMetrics, getAlphaMetrics).
measure is a character input specifying the chosen currency to
be used during the sample-based rarefaction. It can be a single column name
or a vector of two or more column names - e.g. for BioTIME,
measure="ABUNDANCE", measure="BIOMASS" or measure =
c("ABUNDANCE", "BIOMASS").
By default, any observations with NA within the currency field(s) are
removed. You can choose to remove the full sample where such observations are
present by setting conservative to TRUE. resamps can be
used to define multiple iterations, effectively creating multiple alternative
datasets as in each iteration different samples will be randomly selected for
the years where number of samples > minimum. Note that the function always
returns a single data frame, i.e. if resamps > 1, the returned data
frame is the result of individual data frames concatenated together, one from
each iteration identified by a numerical unique identifier 1:resamps.
Examples
if (FALSE) { # \dontrun{
set.seed(42)
x <- gridding(BTsubset_meta, BTsubset_data)
resampling(x, measure = "BIOMASS", summarise = TRUE)
resampling(x, measure = "ABUNDANCE", verbose = FALSE)
resampling(x, measure = c("ABUNDANCE","BIOMASS"))
# Without summarising the species abundances are summed at the SAMPLE_DESC level
resampling(x, measure = "BIOMASS", summarise = FALSE, conservative = FALSE)
} # }