Short Tutorial

In this tutorial, we show how to use ocf to estimate the conditional choice probabilities and the covariates’ marginal effects, and conduct inference about these statistical targets. For illustration purposes, we use the synthetic data set provided in the orf package:

## Load data from orf package.


y <- as.numeric(odata[, 1])
X <- as.matrix(odata[, -1])

Conditional Probabilities

The ocf function constructs a collection of forests, one for each category of y (three in this case). We can then use the forests to predict out-of-sample conditional probabilities using the predict method. By default, predict returns a matrix with the predicted probabilities and a vector of predicted class labels (each observation is labelled to the highest-probability class).

## Training-test split.
train_idx <- sample(seq_len(length(y)), floor(length(y) * 0.5))

y_tr <- y[train_idx]
X_tr <- X[train_idx, ]

y_test <- y[-train_idx]
X_test <- X[-train_idx, ]

## Fit ocf on training sample. Use default settings.
forests <- ocf(y_tr, X_tr)

## Summary of data and tuning parameters.

## Out-of-sample predictions.
predictions <- predict(forests, X_test)

table(y_test, predictions$classification)

We can also implement honesty, which is a necessary condition to produce asymptotically normal and consistent predictions. In the following, we set honesty = TRUE to construct honest forests.

## Honest forests.
honest_forests <- ocf(y_tr, X_tr, honesty = TRUE)
honest_predictions <- predict(honest_forests, X_test)

## Compare predictions with adaptive fit.
cbind(head(predictions$probabilities), head(honest_predictions$probabilities))

To estimate standard errors for the predicted probabilities, we set inference = TRUE. This requires also to set honesty = TRUE: the formula for the variance is valid only for honest predictions. The estimation of standard errors considerably slows down the routine. However, we can increase the number of threads used to construct the forests to speed up the routine.

## Compute standard errors.
honest_forests <- ocf(y_tr, X_tr, honesty = TRUE, inference = TRUE, n.threads = 0) # Use all CPUs.

Covariates’ Marginal Effects

The marginal_effects function post-processes the predictions to estimate mean marginal effects, marginal effects at the mean, or marginal effects at the median, according to the eval argument. In the following, we construct our forests in the training sample and use them to estimate the marginal effects at the mean in the test sample.

## Fit ocf on training sample.
forests <- ocf(y_tr, X_tr)

## Marginal effects at the mean on test sample.
me_atmean <- marginal_effects(forests, data = X_test, eval = "atmean")

As before, we can set inference = TRUE to estimate the standard errors. Again, this requires the use of honest forests and considerably slows down the routine.

## Honest forests.
honest_forests <- ocf(y_tr, X_tr, honesty = TRUE) # Notice we do not need inference here!

## Compute standard errors.
honest_me_atmean <- marginal_effects(honest_forests, data = X_test , eval = "atmean", inference = TRUE)

print(honest_me_atmean, latex = TRUE)