HVT: Collection of functions used to build hierarchical topology preserving maps

Zubin Dowlaty

2023-11-17

1 Abstract

The HVT package is a collection of R functions to facilitate building topology preserving maps for rich multivariate data analysis, see Figure 1 as an example of a 2D torus map generated from the package. Tending towards a big data preponderance, a large number of rows. A collection of R functions for this typical workflow is organized below:

  1. Data Compression: Vector quantization (VQ), HVQ (hierarchical vector quantization) using means or medians. This step compresses the rows (long data frame) using a compression objective.

  2. Data Projection: Dimension projection of the compressed cells to 1D,2D or 3D with the Sammons Non-linear Algorithm. This step creates topology preserving map (also called an embedding) coordinates into the desired output dimension.

  3. Tessellation: Create cells required for object visualization using the Voronoi Tessellation method, package includes heatmap plots for hierarchical Voronoi tessellations (HVT). This step enables data insights, visualization, and interaction with the topology preserving map useful for semi-supervised tasks.

  4. Prediction: Scoring new data sets and recording their assignment using the map objects from the above steps, in a sequence of maps if required.

The HVT package allows creation of visually stunning tessellations, showcasing the power of topology preserving maps. Below is an image depicting a captivating tessellation of a torus, see vignette for more details.

Figure 1: The Voronoi tessellation for layer 1 and number of cells 900 with the heat map overlaid for variable z.

2 Version History

2.1 HVT (v23.11.01) | What’s New?

17th November, 2023

In this version of HVT package, the following new features have been introduced:

This package provides functionality to predict cells with layers based on a sequence of maps using predictLayerHVT.

2.2 HVT (v22.12.06)

06th December, 2022

This package provides functionality to predict based on a sequence of maps.

The creation of a predictive set of maps involves three steps -

  1. Compress: Compress the dataset using a percentage compression rate and a quantization threshold using the HVT() function (Map A).
  2. Remove novelty cells: Manually identify and remove the novelty cells from the dataset using the removeNovelty() function (Map B).
  3. Compress the dataset without novelty: Again, compress the dataset without novelty using n_cells, depth and a quantization threshold using the HVT() function (Map C).

Let us try to understand the steps with the help of the diagram below -

Figure 2: Flow diagram for predicting based on a sequence of maps using predictLayerHVT()

3 Installation of HVT (v23.11.01)

library(devtools)
devtools::install_github(repo = "Mu-Sigma/HVT")

4 Vignettes

Following are the links to the vignettes for the HVT package:

4.1 HVT Vignette

HVT Vignette: Contains descriptions of the functions used for vector quantization and construction of hierarchical voronoi tessellations for data analysis.

4.2 HVT Model Diagnostics Vignette

HVT Model Diagnostics Vignette: Contains descriptions of functions used to perform model diagnostics and validation for HVT model.

4.3 HVT - Predicting Cells with Layers using predictLayerHVT

HVT : Predicting Cells with Layers using predictLayerHVT : Contains descriptions of the functions used for predicting cells with layers based on a sequence of maps using predictLayerHVT.