In stability selection and consensus clustering, resampling techniques are used to enhance the reliability of the results. In this package, hyper-parameters are calibrated by maximising model stability, which is measured under the null hypothesis that all selection (or co-membership) probabilities are identical. Functions are readily implemented for the use of LASSO regression, sparse PCA, sparse (group) PLS or graphical LASSO in stability selection, and hierarchical clustering, partitioning around medoids, K means or Gaussian mixture models in consensus clustering.

The released version of the package can be installed from CRAN with:

`install.packages("sharp")`

The development version can be installed from GitHub:

`::install_github("barbarabodinier/sharp") remotes`

To illustrate the use of the main functions implemented in **sharp**,
three artificial datasets are created:

```
library(sharp)
# Dataset for regression
set.seed(1)
<- SimulateRegression(n = 200, pk = 10)
data_reg <- data_reg$xdata
x_reg <- data_reg$ydata
y_reg
# Dataset for structural equation modelling
set.seed(1)
<- SimulateStructural(n = 200, pk = c(5, 2, 3))
data_sem <- data_sem$data
x_sem
# Dataset for graphical modelling
set.seed(1)
<- SimulateGraphical(n = 200, pk = 20)
data_ggm <- data_ggm$data
x_ggm
# Dataset for clustering
set.seed(1)
<- SimulateClustering(n = c(10, 10, 10))
data_clust <- data_clust$data x_clust
```

Check out the R package **fake**
for more details on these data simulation models.

In a regression context, stability selection is done using LASSO
regression as implemented in the R package **glmnet**.

```
<- VariableSelection(xdata = x_reg, ydata = y_reg)
stab_reg SelectedVariables(stab_reg)
```

In a structural equation modelling context, stability selection is
done using series of LASSO regressions as implemented in the R package
**glmnet**.

```
<- LayeredDAG(layers = c(5, 2, 3))
dag <- StructuralEquations(xdata = x_sem, adjacency = dag)
stab_sem LinearSystemMatrix(vect = Stable(stab_sem), adjacency = dag)
```

In a graphical modelling context, stability selection is done using
the graphical LASSO as implemented in the R package **glassoFast**.

```
<- GraphicalModel(xdata = x_ggm)
stab_ggm Adjacency(stab_ggm)
```

Consensus clustering is done using hierarchical clustering as
implemented in the R package **stats**.

```
<- Clustering(xdata = x_clust)
stab_clust Clusters(stab_clust)
```

It is strongly recommended to check the calibration of the
hyper-parameters using the function `CalibrationPlot()`

on
the output from any of the main functions listed above. The functions
`print()`

, `summary()`

and `plot()`

can
also be used on the outputs from the main functions.

Stability selection and consensus clustering can theoretically be
done by aggregating the results from any selection (or clustering)
algorithm on subsamples of the data. The choice of the underlying
algorithm to use is specified in argument `implementation`

in
the main functions. Consensus clustering using partitioning around
medoids, K means or Gaussian mixture models are also supported in **sharp**:

```
<- Clustering(xdata = x_clust, implementation = PAMClustering)
stab_clust <- Clustering(xdata = x_clust, implementation = KMeansClustering)
stab_clust <- Clustering(xdata = x_clust, implementation = GMMClustering) stab_clust
```

Other algorithms can be used by defining a wrapper function to be
called in `implementation`

. Check out the documentation of
`GraphicalModel()`

for an example using a shrunk estimate of
the partial correlation instead of the graphical LASSO.

Barbara Bodinier, Dragana Vuckovic, Sabrina Rodrigues, Sarah Filippi, Julien Chiquet and Marc Chadeau-Hyam. Automated calibration of consensus weighted distance-based clustering approaches using sharp. (2023) Bioinformatics. link

Barbara Bodinier, Sarah Filippi, Therese Haugdahl Nost, Julien Chiquet and Marc Chadeau-Hyam. Automated calibration for stability selection in penalised regression and graphical models. (2021) Journal of the Royal Statistical Society: Series C (Applied Statistics). link

Nicolai Meinshausen and Peter Bühlmann. Stability selection. (2010) Journal of the Royal Statistical Society: Series B (Statistical Methodology). link

Stefano Monti, Pablo Tamayo, Jill Mesirov and Todd Golub. Consensus clustering. (2003) Machine Learning. link