`flocker`

is an R package for fitting occupancy models
that incorporate sophisticated effects structures using simple
formula-based syntax. `flocker`

is built on R package
`brms`

, which in turn is a front-end for
`Stan`

.

This vignette is intended as a companion to Socolar
& Mills 2023, where we provide details of the models and
post-processing functionality available in `flocker`

in
greater detail. Here, we provide illustrative R code for several types
of model, demonstrating data simulation, model fitting, and model
post-processing. We also showcase the `brms`

syntax that
`flocker`

can use to fit a variety of sophisticated effect
structures.

Socolar & Mills (2023) introduce several terms that figure importantly in this vignette, including:

**closure-unit**: The groupings of observations over which closure is assumed. In single-species models, a closure-unit corresponds to a “site” or “point”. In multi-species models, a closure-unit is a species-site combination. In dynamic (multi-season) models, a closure-unit is a site-season combination (or species-site-season in a multi-species dynamic model).**rep-constant**,**rep-varying**: We refer to models that assume constant detection probabilities across repeat visits within closure-units as*rep-constant models*, as contrasted with*rep-varying models*that incorporate event-specific detection covariates. It turns out that rep-constant models enable a more efficient parametrization of the likelihood than rep-varying models.**unit covariates**,**event covariates**: We refer to any covariate that does not vary across sampling events within closure-units as a “unit covariate”. This includes covariates that are intrinsically properties of single closure-units (e.g. the elevations of sites in a single-species model), covariates that are intrinsically properties of groups of closure units (e.g. elevations of sites in a multi-species model), and covariates that are intrinsically properties of sampling events but happen to be constant within all closure-units (e.g. observer in a sampling design where every site is visited by exactly one observer). We refer to any covariate that varies across sampling events within covariates as an “event covariate”. Note that while unit covariates may appear in either the occupancy or the detection formula, event covariates are restricted to the detection formula. Models that incorporate event covariates are*rep-varying*(see above); those that do not are*rep-constant*.

Installation instructions are available here. To request features or report bugs (much appreciated!), please open an issue on GitHub.

To make `flocker`

and `brms`

functions globally
available within an R session run:

General purpose data simulation is provided via
`simulate_flocker_data()`

, which by default will simulate a
dataset with 30 species sampled at 50 sites using four replicate surveys
(i.e. a single-season multi-species dataset). Non-default arguments will
simulate example data for other likelihoods, including multi-season and
data-augmented occupancy models.

The simulated data `d`

are in list form, with elements for
the detection/non-detection observations `d$obs`

, unit
covariates `d$unit_covs`

, and event covariates
`d$event_covs`

. `d$obs`

is a matrix where rows are
species-site combinations, columns are replicate visits, and entries are
`1`

(detection), `0`

(nondetection), or
`NA`

(no visit). `d$unit_covs`

is a dataframe
containing covariates that vary across the rows of obs (i.e. by
closure-unit) but not across the columns within any given row (i.e. do
not vary across replicate visits). `event_covs`

is a named
list of matrices, with each matrix having the same dimensions as the
observation matrix. Each list element corresponds to a covariate that
varies across the columns of `d$obs`

(i.e. varies between
replicate visits).

`flock()`

, the main function in `flocker`

for
fitting occupancy models, expects a highly specific data format that we
describe
more fully here. The function `make_flocker_data()`

formats data for use with `flock()`

automatically. For
single-season models, `make_flocker_data()`

takes as input a
matrix or dataframe of detection/non-detection data. Rows represent
closure-units, columns represent repeated sampling events within
closure-units, and entries must be `0`

(nondetection),
`1`

(detection), or `NA`

(no corresponding
sampling event). The data must be formatted so that all `NA`

s
are trailing within their rows. For example, if some units were sampled
four times and other three times, the three sampling events must be
treated as events 1, 2, and 3 (with the fourth event `NA`

)
rather than as events 1, 3, and 4 (with the second event
`NA`

) or any other combination.

Many occupancy models also include covariates that influence
occupancy or detection probabilities. Unit covariates (see Terms and definitions) can be passed
to `make_flocker_data()`

as a dataframe with the same number
of rows as the observation matrix and data in the same order as the rows
of the observation matrix. Columns are covariates, and we recommend
using informative column names. *Event covariates* (see Terms and definitions) can be passed
as a named list of matrices whose elements `[i, j]`

are the
covariate values for the sampling event represented by the corresponding
position of the observation matrix. Again, we recommend using
informative names for the list elements. If the corresponding
observation is `NA`

, then the value of the event covariate
does not matter.

To pass data to `flocker`

, we first pass the output from
`simulate_flocker_data()`

to
`make_flocker_data()`

, which will repackage data and apply
the necessary formatting:

```
fd_rep_varying <- make_flocker_data(
obs = d$obs,
unit_covs = d$unit_covs,
event_covs = d$event_covs
)
#> Formatting data for a single-season occupancy model. For details, see make_flocker_data_static. All warnings and error messages should be interpreted in the context of make_flocker_data_static
```

The function `make_flocker_data()`

outputs an object of
class `flocker_data`

that we can pass to flocker’s model
fitting function `flock()`

. Note that this is the general
workflow users will need to follow with real data. Alternative inputs to
`make_flocker_data()`

and `flock()`

enable the
user to readily fit multi-season models as well as multi-species models
with data augmentation (see below).

To fit a model, in this case a single-season multi-species occupancy
model, we use the function `flock()`

. By supplying different
arguments to this function, all flavors of occupancy model available in
`flocker`

can be fitted. Formulas for the different
distributional parameters in the model (occupancy, detection,
colonization, extinction, and autologistic terms as applicable) are
provided as one-sided formulas to the relevant arguments of
`flock()`

(`f_occ`

, `f_det`

,
`f_col`

, `f_ex`

, and `f_auto`

as
applicable).

```
rep_varying <- flock(
f_occ = ~ uc1 + (1 + uc1 | species),
f_det = ~ uc1 + ec1 + (1 + uc1 + ec1 | species),
flocker_data = fd_rep_varying,
cores = 4,
silent = 2,
refresh = 0
)
```

Arguments supplied to `flock()`

define formulas using
`brms`

syntax for the occupancy (`f_occ`

) and
detection (`f_det`

) components, and also provide the
formatted data. At this stage, the full flexibility and power of
`brms`

formula syntax are available to the user (see
following sections for some examples). `rep_varying`

is a
`brmsfit`

object from package `brms`

and also a
`flockerfit`

object from package `flocker`

.
Post-processing functions from `brms`

will typically not work
with this object and are instead replaced by `flocker`

equivalents.

`make_flocker_data()`

will automatically format the data
for a rep-constant model when `event_covs = NULL`

and the
desired model is a single-season model without data augmentation. To
take advantage of the efficiency gains and post-processing functionality
of the rep-constant model, it is necessary to supply
`event_covs = NULL`

to `make_flocker_data()`

at
the moment of data formatting; it is insufficient to omit event
covariates from the detection formula supplied to `flock()`

after formatting the data for a rep-varying model.

```
fd_rep_constant <- make_flocker_data(
obs = d$obs,
unit_covs = d$unit_covs
)
#> Formatting data for a single-season occupancy model. For details, see make_flocker_data_static. All warnings and error messages should be interpreted in the context of make_flocker_data_static
rep_constant <- flock(
f_occ = ~ uc1 + (1 + uc1 | species),
f_det = ~ uc1 + (1 + uc1 | species),
flocker_data = fd_rep_constant,
save_pars = save_pars(all = TRUE), # for loo with moment matching
silent = 2,
refresh = 0,
cores = 4
)
```

Note that within-chain parallelization is available (uniquely so) for the rep-constant mode:

Here we provide code examples to complement the companion
publication. For a more complete vignette on multi-season models in
`flocker`

, see the multiseason
models vignette.

First, we simulate some data that are valid for use with multi-season
models. Here, we will simulate data for three seasons with one unit
covariate and one event covariate. The data will be simulated under a
colonization-extinction model with explicit inits, but we will be able
to fit other models (autologistic, equilibrium inits) to the same data
(note that `simulate_flocker_data()`

can also simulate
directly from these other model types).

```
multi_data <- simulate_flocker_data(
n_season = 3,
n_pt = 300,
n_sp = 1,
multiseason = "colex",
multi_init = "explicit",
seed = 1
)
fd_multi <- make_flocker_data(
multi_data$obs,
multi_data$unit_covs,
multi_data$event_covs,
type = "multi",
quiet = TRUE
)
```

Below, we fit the colonization-extinction model with an explicit model for occupancy in the first timestep. Depending on hardware, fitting this model might take several minutes.

```
multi_colex <- flock(
f_occ = ~ uc1,
f_det = ~ uc1 + ec1,
f_col = ~ uc1,
f_ex = ~ uc1,
flocker_data = fd_multi,
multiseason = "colex",
multi_init = "explicit",
cores = 4,
silent = 2,
refresh = 0
)
```

Here is the colonization-extinction model using equilibrium occupancy probabilities in the first timestep:

```
multi_colex_eq <- flock(
f_det = ~ uc1 + ec1,
f_col = ~ uc1,
f_ex = ~ uc1,
flocker_data = fd_multi,
multiseason = "colex",
multi_init = "equilibrium",
cores = 4,
silent = 2,
refresh = 0
)
```

Here is the autologistic model with explicit occupancy probabilities
in the first timestep. To reflect the stereotypical autologistic model
with a constant logit-scale offset separating colonization and
persistence probabilities, we use the formula `f_auto = ~ 1`

,
but it is fine to relax this constraint and use,
e.g. `f_auto = ~ uc1`

.

```
multi_auto <- flock(
f_occ = ~ uc1,
f_det = ~ uc1 + ec1,
f_col = ~ uc1,
f_auto = ~ 1,
flocker_data = fd_multi,
multiseason = "autologistic",
multi_init = "explicit",
cores = 4,
silent = 2,
refresh = 0
)
```

And the autologistic model with equilibrium occupancy probabilities in the first timestep:

Here we provide a simple example of code for a data augmented model.
For a more complete unified vignette on data-augmented models in
`flocker`

, see the data-augmented
models vignette.

Fitting the data-augmented model in `flocker`

requires
passing the observed data as a three-dimensional array with sites along
the first dimension, visits along the second, and species along the
third. Additionally, we must supply the `n_aug`

argument to
`make_flocker_data()`

, specifying how many all-zero
pseudospecies to augment the data with.

```
augmented_data <- simulate_flocker_data(
augmented = TRUE
)
fd_augmented <- make_flocker_data(
augmented_data$obs, augmented_data$unit_covs, augmented_data$event_covs,
type = "augmented", n_aug = 100,
quiet = TRUE
)
augmented <- flock(
f_occ = ~ (1 | ff_species),
f_det = ~ uc1 + ec1 + (1 + uc1 + ec1 | ff_species),
augmented = TRUE,
flocker_data = fd_augmented,
cores = 4,
silent = 2,
refresh = 0
)
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
```

Here, the random effect of species is specified using the special
grouping keyword `ff_species`

(names beginning with
`ff_`

are reserved in `flocker`

and are not
allowed as names for user-supplied covariates).

`flocker`

provides functions for four main types of
bespoke post-processing for occupancy models.
`fitted_flocker()`

computes (and optionally summarizes)
posterior distributions of fitted values at the locations of the data
used in model fitting or of new data. `get_Z()`

provides the
posterior distribution for the latent occupancy state.
`predict_flocker()`

provides posterior predictions at the
observed points (e.g. for use in posterior predictive checking) or for
new data. `loo_flocker()`

and
`loo_compare_flocker()`

both provide functionality for model
comparison. See below for details on all four types of post-processing.
Both posterior predictions and model comparison rely on subtle aspects
of the occupancy model likelihood that we explain in more detail here.

All post-processing functions from `brms`

work on
single-season rep-constant models, but do not work on any other model
types. For example:

```
predictions_rep_constant <- brms::posterior_predict(rep_constant)
loo_rep_constant <- brms::loo(rep_constant, moment_match = TRUE)
brms::conditional_effects(rep_constant)
```

The following functions work on all model types available in
`flocker`

.

Fitted values for any of the distributional parameter (one or more of
occupancy, detection, colonization, extinction, autologistic, and/or
Omega, the fitted probability that a given (pseudo)species occurs in the
metacommunity) are available via `fitted_flocker`

. For
example:

```
fitted_flocker(rep_constant)
fitted_flocker(rep_varying)
fitted_flocker(multi_colex)
fitted_flocker(augmented)
```

`fitted_flocker`

provides a replacement for
`brms::posterior_linpred()`

. While the
`brms`

-native function executes on any `flocker`

model, it returns in an opaque shape related to the
flocker data format. `fitted_flocker()`

returns in the
shape of the observations passed to `make_flocker_data()`

,
with posterior iterations stacked along its final dimension.

The function `get_Z()`

returns the posterior distribution
of occupancy probabilities across the closure-units. The shape of the
output depends on the class of model, and is an array in the shape of
the first visit in `obs`

as passed to
`make_flocker_data`

, with posterior iterations stacked along
the final dimension. Thus, for a single-season rep-varying model, the
output is a matrix where rows are posterior iterations, columns are
closure-units, and values are draws from the posterior distribution of
occupancy probabilities:

For all model types, `get_Z()`

accepts an optional
`new_data`

argument. Leaving the default
`new_data = NULL`

supplies the posterior for the true
occupancy state at the locations of the data used to fit the model.
Otherwise, the posterior is computed over the new data. For
single-season models, `new_data`

can be supplied as a
dataframe of unit covariate values or as a `flocker_data`

object. For multi-season models, only a `flocker_data`

object
is allowed. Note that if predictions are desired at sites without
observations, it is acceptable to pass an array of dummy observations
(e.g. all zeros) to `make_flocker_data()`

and then to set
`history_condition = FALSE`

in the call to
`get_Z()`

.

`get_Z()`

accepts several additional arguments that
control the way that posterior is obtained and the values returned. See
the companion
paper and `?get_Z`

for details.

The function `predict_flocker()`

provides posterior
predictions. By default, predictions are provided for the covariate data
to which the model were fit, but predictions to new data are also
possible via the `new_data`

argument. The output differs by
model type. For single-season rep-constant models, the return is a
matrix where rows are iterations, columns are units, and values are the
number of detections. For single-season rep-varying models, the return
is an array whose first dimension is units, second dimension is sampling
events, third dimension is iterations, and values are `1`

,
`0`

, or `NA`

, representing detection,
nondetection, and no corresponding sampling event. For example:

`predict_flocker()`

accepts several additional arguments
that control the way that posterior is obtained and the values of
returned. See the companion
paper and `?predict_flocker`

for details.

The most straightforward way to compare models fit with
`flocker`

is the function `loo_compare_flocker()`

.
This function takes a list of flocker_fit objects as its argument and
returns a model comparison table based on the difference in the expected
log predictive density (elpd) between models. This table is a
`compare.loo`

object from `loo::loo_compare()`

.
The “leave-one-out” holdouts consist of entire closure-units
(single-season models), series (multi-season models), or species
(augmented models), not single sampling events (see the companion
paper and here for
details of why).

`loo_compare_flocker()`

accepts as input a list of
`flockerfit`

objects and outputs a model comparison table.
For example, we can compare the rep-constant and rep-varying models that
we fit to the same initial data. Recall that the data were simulated
with event-covariate effects on detection, and as expected the
rep-varying model performs best. Note that we ensure that these
comparisons between rep-constant and rep-varying models are valid by
omitting the binomial coefficient when computing the log-likelihood for
the rep-constant model.

```
loo_compare_flocker(
list(rep_constant, rep_varying)
)
#> Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
#> Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
#> elpd_diff se_diff
#> model2 0.0 0.0
#> model1 -171.1 17.2
```

Likewise, we can compare the four flavors of multi-season model that
we fit above. Recall that the data were simulated under
colonization-extinction dynamics (rather than autologistic) and under
explicit initial occupancy probabilities (rather than equilibrium). As
expected, the `multi_colex`

model performs best:

```
loo_compare_flocker(
list(multi_colex, multi_colex_eq, multi_auto, multi_auto_eq)
)
#> elpd_diff se_diff
#> model1 0.0 0.0
#> model3 -7.8 4.1
#> model2 -27.1 7.2
#> model4 -32.9 7.9
```

Flocker also provides the function `loo_flocker()`

to
return a table of `elpd_loo`

, `p_loo`

, and
`looic`

estimates from `loo::loo()`

or
`brms::loo()`

(the latter for single-season rep-constant
models only).

`brms`

tips and tricksMastering advanced occupancy modeling via `flocker`

is
mostly a matter of mastering the syntax available in `brms`

.
Here are some useful pieces of syntax:

Priors can be implemented as they would with any `brms`

model. Priors can be specified using `set_prior()`

, with
priors specified for groups of parameters (via `class`

) or
individual parameters (via `coef`

). The priors used for a
particular model can be retrieved using
`brms::prior_summary()`

, and the names of the parameters and
their default priors can be displayed prior to model fitting using
`get_flocker_prior()`

which is a drop-in replacement for
`brms::get_prior()`

.

```
get_flocker_prior(
f_occ = ~ uc1 + (1 + uc1 | species),
f_det = ~ uc1 + ec1 + (1 + uc1 + ec1 | species),
flocker_data = fd_rep_varying
)
#> prior class coef group resp dpar nlpar lb ub source
#> (flat) b default
#> (flat) b ec1 (vectorized)
#> (flat) b uc1 (vectorized)
#> lkj(1) cor default
#> lkj(1) cor species (vectorized)
#> student_t(3, 0, 2.5) Intercept default
#> student_t(3, 0, 2.5) sd 0 default
#> student_t(3, 0, 2.5) sd species 0 (vectorized)
#> student_t(3, 0, 2.5) sd ec1 species 0 (vectorized)
#> student_t(3, 0, 2.5) sd Intercept species 0 (vectorized)
#> student_t(3, 0, 2.5) sd uc1 species 0 (vectorized)
#> (flat) b occ default
#> (flat) b uc1 occ (vectorized)
#> (flat) Intercept occ default
#> student_t(3, 0, 2.5) sd occ 0 default
#> student_t(3, 0, 2.5) sd species occ 0 (vectorized)
#> student_t(3, 0, 2.5) sd Intercept species occ 0 (vectorized)
#> student_t(3, 0, 2.5) sd uc1 species occ 0 (vectorized)
brms::prior_summary(rep_varying)
#> prior class coef group resp dpar nlpar lb ub source
#> (flat) b default
#> (flat) b ec1 (vectorized)
#> (flat) b uc1 (vectorized)
#> (flat) b occ default
#> (flat) b uc1 occ (vectorized)
#> student_t(3, 0, 2.5) Intercept default
#> (flat) Intercept occ default
#> lkj_corr_cholesky(1) L default
#> lkj_corr_cholesky(1) L species (vectorized)
#> student_t(3, 0, 2.5) sd 0 default
#> student_t(3, 0, 2.5) sd occ 0 default
#> student_t(3, 0, 2.5) sd species 0 (vectorized)
#> student_t(3, 0, 2.5) sd ec1 species 0 (vectorized)
#> student_t(3, 0, 2.5) sd Intercept species 0 (vectorized)
#> student_t(3, 0, 2.5) sd uc1 species 0 (vectorized)
#> student_t(3, 0, 2.5) sd species occ 0 (vectorized)
#> student_t(3, 0, 2.5) sd Intercept species occ 0 (vectorized)
#> student_t(3, 0, 2.5) sd uc1 species occ 0 (vectorized)
```

Note that in examples like the above, with covariates shared between
both the occupancy and detection model formulas (`uc1`

in
this example), then the prior table will contain two entries associated
with the covariate, one for the parameter governing occupancy and one
for the parameter governing detection. Specifying priors for parameters
in formulas other than detection can be done with reference to the
`dpar`

column, e.g.:

```
user_prior <- c(brms::set_prior("normal(0, 1)", coef = "uc1"),
brms::set_prior("normal(0, 3)", coef = "uc1", dpar = "occ"))
```

where the `uc1`

parameter in the occupancy component is
specified by the addition of the `dpar`

argument, and the
`uc1`

parameter in the detection component is specified
without reference to `dpar`

.

For more on priors in `brms`

, see
`?brms::set_prior`

.

Users should understand the implications of the default
`brms`

behavior to internally center the design matrix, which
affects how the prior on the intercept gets set (see
`?brms::set_prior`

). Here is an example, based on a
single-season rep-varying model, wherein we set a logistic prior on the
value of the intercepts (flat on the probability scale) when all
predictors are held at their means and a moderately regularizing prior
on the coefficients:

```
rep_varying_prior1 <- flock(
f_occ = ~ uc1,
f_det = ~ ec1,
flocker_data = fd_rep_varying,
prior =
brms::set_prior("logistic(0,1)", class = "Intercept") +
brms::set_prior("logistic(0,1)", class = "Intercept", dpar = "occ") +
brms::set_prior("normal(0,2)", class = "b") +
brms::set_prior("normal(0,2)", class = "b", dpar = "occ"),
cores = 4,
silent = 2,
refresh = 0
)
brms::prior_summary(rep_varying_prior1)
#> prior class coef group resp dpar nlpar lb ub source
#> normal(0,2) b user
#> normal(0,2) b ec1 (vectorized)
#> normal(0,2) b occ user
#> normal(0,2) b uc1 occ (vectorized)
#> logistic(0,1) Intercept user
#> logistic(0,1) Intercept occ user
```

Here is an example where we set informative priors on the intercepts when all covariates are fixed to zero and the same moderately regularizing prior on the coefficients:

```
rep_varying_prior2 <- flock(
f_occ = ~ 0 + Intercept + uc1,
f_det = ~ 0 + Intercept + ec1,
flocker_data = fd_rep_varying,
prior =
brms::set_prior("normal(0,2)", class = "b") +
brms::set_prior("normal(0,2)", class = "b", dpar = "occ") +
brms::set_prior("normal(1, 1)", class = "b", coef = "Intercept") +
brms::set_prior("normal(-1, 1)", class = "b", coef = "Intercept", dpar = "occ"),
cores = 4,
silent = 2,
refresh = 0
)
brms::prior_summary(rep_varying_prior2)
#> prior class coef group resp dpar nlpar lb ub source
#> normal(0,2) b user
#> normal(0,2) b ec1 (vectorized)
#> normal(1, 1) b Intercept user
#> normal(0,2) b occ user
#> normal(-1, 1) b Intercept occ user
#> normal(0,2) b uc1 occ (vectorized)
```

Simple formulas follow the same syntax as R’s `lm()`

function. For example:

Simple random effects follow `lme4`

syntax, including
advanced `lme4`

syntax like `||`

for uncorrelated
effects and `/`

and `:`

for expansion of multiple
grouping terms. Here’s a simple example:

When a model includes multiple random effects with the same grouping
term, by default they are modeled as correlated *within* the
occupancy or detection formulas, but as uncorrelated *between*
formulas. For example, the code below estimates a single correlation for
the intercept and slope in the occupancy sub-model.

```
mod3 <- flock(
f_occ = ~ uc1 + (1 + uc1 | species),
f_det = ~ ec1 + (1 | species),
flocker_data = fd_rep_varying
)
```

However, this assumption can easily be relaxed using the
`|<ID>|`

syntax from `brms`

. The
`<ID>`

is an arbitrary character string representing a
group of terms to model as correlated. The below code, for example,
models correlated intercepts in the occupancy and detection sub-models,
and correlated effects of `sc1`

on occupancy and
`vc1`

on detection, but no correlations between the
intercepts and the slopes in either sub-model:

```
mod4 <- flock(
f_occ = ~ uc1 + (1 |g1| species) + (0 + uc1 |g2| species),
f_det = ~ ec1 + (1 |g1| species) + (0 + ec1 |g2| species),
flocker_data = fd_rep_varying
)
```

For more on `brms`

syntax for random effects syntax, see
the documentation
here.

Via `brms`

, `flocker`

supports Gaussian
processes of arbitrary dimensionality (`brms::gp()`

) as well
as `mgcv`

syntax for thin-plate regression splines
(`brms::s()`

) and tensor product smooths
(`brms::t2()`

), and `brms`

syntax for monotonic
effects of ordinal factors via `brms::mo()`

(see
here). For example:

```
mod5 <- flock(
f_occ = ~ s(uc1),
f_det = ~ t2(uc1, ec1),
flocker_data = fd_rep_varying
)
mod6 <- flock(
f_occ = ~ 1,
f_det = ~ gp(uc1, ec1),
flocker_data = fd_rep_varying
)
```

In addition, `brms`

provides the ability to estimate
models wherein the predictors (e.g. for occupancy and detection) are
parametric nonlinear functions whose parameters have their own
covariate-based linear predictors. For more details and an example, see
the nonlinear
models vignette.

Phylogenetic effects can be included by providing a covariance matrix
as a `data2`

argument and using the `brms::gr()`

function to link species identities in `flocker_data`

with
the supplied covariance matrix. Note that phylogenetic effects can be
included in either the occupancy component, the detection component, or
both! In our experience, it can be computationally tractable to include
multiple phylogenetic effects within a single occupancy model (see Mills et al. 2022).

```
# simulate an example phylogeny
phylogeny <- ape::rtree(30, tip.label = paste0("sp_", 1:30))
# calculate covariance matrix
A <- ape::vcv.phylo(phylogeny)
mod8 <- flock(
f_occ = ~ 1 + (1|gr(species, cov = A)),
f_det = ~ 1 + ec1 + (1|species),
flocker_data = fd_rep_varying,
data2 = list(A = A)
)
mod9 <- flock(
f_occ = ~ 1 + (1|gr(species, cov = A)),
f_det = ~ 1 + ec1 + (1|gr(species, cov = A)),
flocker_data = fd_rep_varying,
data2 = list(A = A)
)
```

See
here for further details about specifying phylogenetic effects in
`brms`

.

In addition to spatial Gaussian processes, `brms`

provides
a variety of autoregressive structures, both one-dimensional (see
`brms::ar()`

, `brms::arma()`

) and two-dimensional
(see `brms::car()`

, `brms::sar()`

. See
here for details about conditional autoregressive (CAR) models in
`brms`

, and note that `flock()`

accepts a
`data2`

argument that it can pass to `brms`

as
necessary.

Our principle caution for users is that these autoregressive
structures might lead to degenerate models when applied at the visit
level (in detection formulas) or at the closure-unit level (in
occupancy, colonization, extinction, or autologistic formulas) because
observation-level random effects are often degenerate in regressions
with Bernoulli responses. Thus we recommend applying autoregressive
terms to groupings of multiple visits (detection formula) or multiple
closure-units (other formulas). However, we note that
`flocker`

*does* provide a well-identified
one-dimensional first-order autoregressive structure for occupancy
across closure-units in a single-season model. This is achieved by
co-opting the autologistic parameterization of the multi-season model
and applying it instead to closure-units arranged along a
one-dimensional spatial transect, yielding a one-dimensional analog of a
spatial autologistic occupancy model.

A second caution is to remind users that in multi-species models,
users will likely want to fit separate spatial terms by species (Doser
et al 2022). For Gaussian processes, this can be achieved via the
`by`

argument to `brms::gp()`

. For some
conditional autoregressive structures (those that allow disconnected
islands), this can be achieved by passing a block-diagonal adjacency
matrix wherein species are disconnected components.

We note that gaussian process priors for spatially varying
coefficients are readily achieved via the nonlinear formula syntax of
`brms`

, though they may require large volumes of data to
successfully fit. For more details and an example, see the nonlinear
models vignette.

See
here for relevant `brms`

documentation.

`flock`

will pass any relevant parameters forward to
`brms::brm()`

, giving the user important control over the
algorithmic details of how the model is fit. See `?brms::brm`

for details. To speed up the execution, we recommend supplying the
argument `backend = "cmdstanr"`

. This requires the
`cmdstanr`

package and a working installation of
`cmdstan`

; see
here for instructions to get started and further details.