# Generalized Score Matching on Generalized Domain Types

#### 2020-04-24

library(genscore)
if (!requireNamespace("tmvtnorm", quietly = TRUE))
stop("Please install package \"tmvtnorm\".")
library(tmvtnorm)

# Different domain types

The package offers a few types of domains on which the multivariate distribution is defined, namely "R", "R+", "uniform", "simplex", and "polynomial". As the domain is one of the two building blocks that define a distribution, we first present a guide to creating a domain object that should be passed to the two main functions gen() and estimate() (as the domain argument).

Note that since the probability densities considered in the package are defined with respect to the Lebesgue measure, the package is indifferent to whether the boundary points are included in the domain or not (i.e. $$\sum_ix_i^2<1$$ and $$\sum_ix_i^2\leq 1$$ are treated equally), except for the simplex domains.

Throughout the demonstration in this section, we assume the number of covariates is p=5.

p <- 5

## The entire real space

The most straightforward domain type is "R", which is the entire real space $$\mathbb{R}^p$$.

domain <- make_domain(type="R", p=p)
domain
#> $type #> [1] "R" #> #>$p
#> [1] 5
#>
#> $p_deemed #> [1] 5 #> #>$checked
#> [1] TRUE

## The non-negative orthant of the real space

The second most commonly used domain type would be the non-negative orthant of the $$\mathbb{R}^p$$ space, $$\mathbb{R}_+^p$$. Constructing the domain is also straightforward.

domain <- make_domain(type="R+", p=p)
domain
#> $type #> [1] "R+" #> #>$p
#> [1] 5
#>
#> $p_deemed #> [1] 5 #> #>$checked
#> [1] TRUE

## The simplex domains

Another useful type of domains the package offers is the simplices. Formally, we define the $$(p-1)$$-dimensional simplex as $$\{\boldsymbol{x}\in\mathbb{R}_+^p:\sum_{i=1}^p x_i=1, \boldsymbol{x}\succ\boldsymbol{0}\}$$.

Defining the domain is also straightforward without any additional arguments required:

domain <- make_domain(type="simplex", p=p)
domain
#> $type #> [1] "simplex" #> #>$p
#> [1] 5
#>
#> $p_deemed #> [1] 4 #> #>$checked
#> [1] TRUE
#>
#> $simplex_tol #> [1] 1e-10 The simplex_tol member is unique to this type of domain, and is used internally for checking if each row in the data matrix indeed sums to 1. It is also the only domain type that has p_deemed one less than p, whereas for other domain types these two are equal. It is because it is currently the only domain type implemented that is a Lebesgue-null subset of $$\mathbb{R}^p$$. ## Uniform-type domains This domain type assumes that each component/covariate has the same domain, which is a finite union of intervals. The lefts arguments specify the left endpoints of each interval, and rights specify the right endpoints accordingly. Formally, the domain is defined as $$\left(\cup_{i}[\mathrm{lefts}_i,\mathrm{rights}_i]\right)^p$$. For example, if we assume each covariate is larger than (or equal to) 1, then one can specify the domain as follows. domain <- make_domain(type="uniform", p=p, lefts=1, rights=Inf) domain #>$type
#> [1] "uniform"
#>
#> $p #> [1] 5 #> #>$p_deemed
#> [1] 5
#>
#> $lefts #> [1] 1 #> #>$rights
#> [1] Inf
#>
#> $left_inf #> [1] FALSE #> #>$right_inf
#> [1] TRUE
#>
#> $checked #> [1] TRUE Note again that we do not differentiate between open/closed/half-open half-closed intervals, as the probability of the random vector lying at the boundary points is assumed to be 0. If rights is just Inf and lefts is simply -Inf or 0, this corresponds to the "R" and "R+" domain types, and the domain type would be changed accordingly. domain <- make_domain(type="uniform", p=p, lefts=-Inf, rights=Inf) # Changed to R #> Warning in make_domain(type = "uniform", p = p, lefts = -Inf, rights = Inf): #> Domain type automatically changed to R. domain <- make_domain(type="uniform", p=p, lefts=0, rights=Inf) # Changed to R+ #> Warning in make_domain(type = "uniform", p = p, lefts = 0, rights = Inf): Domain #> type automatically changed to R+. Of course, the domain can also be bounded, e.g. $$[-1,1]^p$$. domain <- make_domain(type="uniform", p=p, lefts=-1, rights=1) domain #>$type
#> [1] "uniform"
#>
#> $p #> [1] 5 #> #>$p_deemed
#> [1] 5
#>
#> $lefts #> [1] -1 #> #>$rights
#> [1] 1
#>
#> $left_inf #> [1] FALSE #> #>$right_inf
#> [1] FALSE
#>
#> $checked #> [1] TRUE A more interesting case would be when the uniform domain for each component is a union of multiple intervals, e.g. $$((-\infty,-2]\cup[-1,1]\cup[2,+\infty))^p$$. domain <- make_domain(type="uniform", p=p, lefts=c(-Inf, -1, 2), rights=c(-2, 1, Inf)) domain #>$type
#> [1] "uniform"
#>
#> $p #> [1] 5 #> #>$p_deemed
#> [1] 5
#>
#> $lefts #> [1] -Inf -1 2 #> #>$rights
#> [1]  -2   1 Inf
#>
#> $left_inf #> [1] TRUE #> #>$right_inf
#> [1] TRUE
#>
#> $checked #> [1] TRUE ### Solution for infinite unions of intervals or non-uniform domains for each component Domains that are a union of infinitely many intervals are currently not supported, but in some cases they can be approximated by a finite union. For example, if the goal is to generate samples using gen(), $$\cup_{i=0}^{\infty}[2i,2i+1]$$ may be approximated by $$\cup_{i=0}^{10}[2i,2i+1]$$ in the first three lines of code below, assuming the joint density is negligible if any $$x_j>21$$. domain <- make_domain(type="uniform", p=p, lefts=seq(0, 20, by=2), rights=seq(1, 21, by=2)) # If goal is random sample generation, just truncate the infinite union by an interval large enough x <- gen(1000, setting="gaussian", abs=FALSE, eta=rep(0,p), K=diag(p), domain=domain, finite_infinity=100, seed=2, burn_in=1000, thinning=1000, verbose=FALSE, remove_outofbound=TRUE) # Generates a Gaussian hist(x, breaks=20) # Generated data far from the upper bound 21 we set above If the goal is estimation given a data matrix using estimate(), one may simply truncate the infinite union by $$\left[-\max_{i,j}\left|x_{ij}\right|, \max_{i,j}\left|x_{ij}\right|\right]$$, as below. # Suppose the goal is estimation, simply truncate to the maximum absolute value in x max_i <- ceiling((max(abs(x)) - 1) / 2) domain <- make_domain(type="uniform", p=p, lefts=seq(0, 2*max_i, by=2), rights=seq(1, 2*max_i+1, by=2)) # Estimate the inverse covariance matrix K with no penalty and no diagonal multiplier since n >> p, assuming mu = eta = 0 est <- estimate(x=x, setting="gaussian", domain=domain, centered = TRUE, mode="min_pow", param1=1, param2=3, lambda1s=0, diagonal_multiplier=1, verbose=FALSE, return_raw=TRUE) est$raw_estimates[[1]] # Should be close to diag(p) we used to generate x
#>              [,1]        [,2]         [,3]         [,4]        [,5]
#> [1,]  1.037142623 -0.19835515 -0.003285077  0.258111089  0.06697515
#> [2,] -0.198355147  1.16963917 -0.096986651 -0.123145792  0.16352970
#> [3,] -0.003285077 -0.09698665  1.156527553 -0.009055624 -0.03244605
#> [4,]  0.258111089 -0.12314579 -0.009055624  1.267582275 -0.32215930
#> [5,]  0.066975147  0.16352970 -0.032446050 -0.322159301  0.84250419

If each covariate has its own domain as a different union of intervals, refer to the polynomial-type domains below.

## Polynomial-type domains

The most complicated and flexible domain type is "polynomial". Although effort is made for simplifying the definition of this domain type, the user may find the exact rules/requirements confusing, but hopefully the examples should make it easier to follow.

Each polynomial-type domain is defined by a set of inequalities, where for each inequality a constant on the right-hand side is compared to a polynomial on the left-hand side, which must not have any interaction term and can have at most one term for each covariate. (That is, an inequality like $$x_1x_2>1$$ or $$x_1^2+x_1^3>1$$ are unfortunately not yet supported in the current version.) If there are more than one inequality, the user must specify a logical rule using "&" and "|" telling the program how to aggregate the domains defined by each inequality.

Each term can be $$\log(x)$$, $$\exp(nx)$$ with $$n$$ a nonzero integer, or a rational power of $$x$$, where $$x^{a/b}=(-1)^a|x|^{a/b}$$ ($$a$$, $$b$$ coprime) for $$x\geq 0$$ and $$x<0$$ if $$b$$ is odd, or NA if $$x<0$$ and $$b$$ is even.

For example, an inequality may look like $$1.3x_1^2-2.7 x_2^3+0.37\exp(2x_3)-1.4\log(x_4)>1.3$$, and another may look like $$0.5x_1^{-2/3}+1.91x_2^{-5/4}-0.73\exp(-3x_3)-1.7\log(x_4)<-1.3$$. If we wish to let our domain be the intersection of the two domains defined by the two inequalities, we write

domain <- make_domain(type="polynomial",
p=p,
rule="1 && 2",
ineqs=list(
list(expression="1.3x1^2-2.7* x2^3+0.37exp(2x3)-1.4log(x4)>1.3",
nonnegative=FALSE, abs=FALSE),
list(expression="5e-1x1^(-2/3)+1.91*x2^(-5/4)-0.73exp(-3*x3)-1.7*log(x4)<-1.3",
nonnegative=FALSE, abs=FALSE)
)
)
domain
#> $type #> [1] "polynomial" #> #>$p
#> [1] 5
#>
#> $p_deemed #> [1] 5 #> #>$checked
#> [1] TRUE
#>
#> $rule #> [1] "1 && 2" #> #>$ineqs
#> $ineqs[[1]] #>$ineqs[[1]]$uniform #> [1] FALSE #> #>$ineqs[[1]]$larger #> [1] TRUE #> #>$ineqs[[1]]$power_numers #> [1] 2 3 2 0 1 #> #>$ineqs[[1]]$power_denoms #> [1] 1 1 0 0 1 #> #>$ineqs[[1]]$coeffs #> [1] 1.30 -2.70 0.37 -1.40 0.00 #> #>$ineqs[[1]]$const #> [1] 1.3 #> #>$ineqs[[1]]$abs #> [1] FALSE #> #>$ineqs[[1]]$nonnegative #> [1] FALSE #> #> #>$ineqs[[2]]
#> $ineqs[[2]]$uniform
#> [1] FALSE
#>
#> $ineqs[[2]]$larger
#> [1] FALSE
#>
#> $ineqs[[2]]$power_numers
#> [1] -2 -5 -3  0  1
#>
#> $ineqs[[2]]$power_denoms
#> [1] 3 4 0 0 1
#>
#> $ineqs[[2]]$coeffs
#> [1]  0.50  1.91 -0.73 -1.70  0.00
#>
#> $ineqs[[2]]$const
#> [1] -1.3
#>
#> $ineqs[[2]]$abs
#> [1] FALSE
#>
#> $ineqs[[2]]$nonnegative
#> [1] FALSE
#>
#>
#>
#> $postfix_rule #> [1] "1 2 &" ### Inequalities In this subsection we discuss the ineqs argument of make_domain() when defining a polynomial-type domain. The argument must be a list, and each element in this list is a list itself that represents an inequality. The recommended way of representing an inequality uses a list of three members: (1) nonnegative, a logical indicating whether the domain of this inequality should be restricted to $$\mathbb{R}_+^p$$, (2) abs, a logical whether to use the absolute values $$|\boldsymbol{x}|$$ in place of $$\boldsymbol{x}$$ when evaluating the inequality, and (3) expression, a string expression of the inequality, which we explain the next. There is another highly discouraged way of representing an inequality by how the inequality is stored internally that is not covered in this guide. We call a term in an expression “uniform” if it is written as a function in “x”, and “non-uniform” if it is written as a function in “x” followed by an index, e.g. “x1” or “x2”. A uniform term can be (1) "log(x)", (2) "exp(x)", "exp(nx)" or "exp(n*x)" where n is a nonzero integer, or (3) a power in one of the following forms: "x^n", "x^(-n)", "x^(n/m)", "x^(-n/m)", "x^(n/-m)" (replace n and m by non-zero integers). A non-uniform term is similar (replacing x by e.g. x1 or x2), and can start with a coefficient, e.g. "1.2*log(x)", "-2.3x^2". An expression must have the variable part on the left-hand side, followed by one of “<”, “>”, “<=”, “>=”, and finally a number to compare to. The variable part can be (1) a single uniform term (e.g. x^(-2/3), exp(x), log(x)), (2) a single uniform term surrounded by "sum()" (e.g. sum(x^(-2/3)), sum(exp(x)), sum(log(x))), or (3) a sum of non-uniform terms separated by +/- (e.g. 1.3x1^2-0.7*x2^(2/3)+2e3log(x)+1.3e-2*exp(-x)). For (1), the same inequality will be applied to each covariate independently; (2) on the other hand is a shorthand for (3) with the same form for all components and coefficients all equal to 1 (e.g. "sum(x^2)" is just "x1^2+x2^2+...+xp^2"). In conclusion, the following are some examples of expression: "x<=-3.2e2" # (1) #> [1] "x<=-3.2e2" "x^(-2/3)>3.1" # (1) #> [1] "x^(-2/3)>3.1" "exp(x)>1.3" # (1) #> [1] "exp(x)>1.3" "exp(-23x)<=3e3" # (1) #> [1] "exp(-23x)<=3e3" "log(x) < 1.3" # (1) #> [1] "log(x) < 1.3" "sum(x)<=3e3" # (2) #> [1] "sum(x)<=3e3" "sum(x^2)>10" # (2) #> [1] "sum(x^2)>10" "sum(x^(1/3))>10" # (2) #> [1] "sum(x^(1/3))>10" "sum(x^(-2/3))<=3e3" # (2) #> [1] "sum(x^(-2/3))<=3e3" "sum(exp(-23x))<=3e3" # (2) #> [1] "sum(exp(-23x))<=3e3" "sum(log(x)) < 2" # (2) #> [1] "sum(log(x)) < 2" "x1>1" # (3) #> [1] "x1>1" "x2<=1" # (3) #> [1] "x2<=1" "x1^(2/3)-1.3x2^(-3)< 1" # (3) #> [1] "x1^(2/3)-1.3x2^(-3)< 1" "exp(x1)+2.3*x2^2 > 2" # (3) #> [1] "exp(x1)+2.3*x2^2 > 2" "1x1+2x2+3x3+4x4+5x5 <1" # (3) #> [1] "1x1+2x2+3x3+4x4+5x5 <1" "0.5*x1^(-2/3)-0.3x4^(4/-6)+2e3x3^(-6/9) < 3.5e5" # (3) #> [1] "0.5*x1^(-2/3)-0.3x4^(4/-6)+2e3x3^(-6/9) < 3.5e5" "0.5*x1^(-2/3)-x3^3 + 2log(x2)- 1.3e4exp(-25*x6)+x8-.3x5^(-3/-4) >= 2" # (3) #> [1] "0.5*x1^(-2/3)-x3^3 + 2log(x2)- 1.3e4exp(-25*x6)+x8-.3x5^(-3/-4) >= 2" ### Rule If more than one inequality is provided, the user must specify the rule to aggregate the domains defined by each function. The rule can only contain inequality numbers (indexed starting from 1 to length(domain$ineqs)), logical operators (& / |, or && / ||; no difference is made between the single and doubled operators), parentheses and space. The only other requirement is that & and | is given the same precedence, and thus only operators of the same kind can be chained without a parenthesis, i.e. 1 & 2 | 3 is not allowed; one must specify (1 & 2) | 3 or 1 | (2 & 3) to avoid ambiguity. The following are some rules allowed.

"1"
#> [1] "1"
"1 & 2" # Assuming there are at least 2 inequalities
#> [1] "1 & 2"
"1 || 2" # Assuming >= 2 inequalities
#> [1] "1 || 2"
"1 && 2 & 3" # Assuming >= 3 inequalities
#> [1] "1 && 2 & 3"
"1 | 2 || 3" # Assuming >= 3 inequalities
#> [1] "1 | 2 || 3"
"(((1 & 2) | (3 & 4 && 5) || 6 || 7) & 8 & 9 && 10) || 11 " # Assuming >= 11 inequalities
#> [1] "(((1 & 2) | (3 & 4 && 5) || 6 || 7) & 8 & 9 && 10) || 11 "

### Examples of polynomial domains

# x such that sum(x^2) > 10 && sum(x^(1/3)) > 10 with x allowed to be negative
domain <- make_domain("polynomial", p=p, rule="1 && 2",
ineqs=list(list(expression="sum(x^2)>10", abs=FALSE, nonnegative=FALSE),
list(expression="sum(x^(1/3))>10", abs=FALSE, nonnegative=FALSE)))

# x such that {x1 > 1 && log(1.3) < x2 < 1 && x3 > log(1.3) && ... && xp > log(1.3)}
domain <- make_domain("polynomial", p=p, rule="1 && 2 && 3",
ineqs=list(list(expression="x1>1", abs=FALSE, nonnegative=TRUE),
list(expression="x2<1", abs=FALSE, nonnegative=TRUE),
list(expression="exp(x)>1.3", abs=FALSE, nonnegative=FALSE)))

#' # x in R_+^p such that {sum(log(x))<2 || (x1^(2/3)-1.3x2^(-3)<1 && exp(x1)+2.3*x2>2)}
domain <- make_domain("polynomial", p=p, rule="1 || (2 && 3)",
ineqs=list(list(expression="sum(log(x))<2", abs=FALSE, nonnegative=TRUE),
list(expression="x1^(2/3)-1.3x2^(-3)<1", abs=FALSE, nonnegative=TRUE),
list(expression="exp(x1)+2.3*x2^2>2", abs=FALSE, nonnegative=TRUE)))

#' # x in R_+^p such that {x in R_+^p: sum_j j * xj <= 1}
domain <- make_domain("polynomial", p=p,
ineqs=list(
list(expression=paste(paste(sapply(1:p, function(j){paste(j, "x", j, sep="")}), collapse="+"), "<1"),
abs=FALSE, nonnegative=TRUE)))

# The l-1 ball {sum(|x|) < 1}
domain <- make_domain("polynomial", p=p,
ineqs=list(list(expression="sum(x)<1", abs=TRUE, nonnegative=FALSE)))

# Distribution Models Supported

This package assumes the densities are with respect to the Lebesgue measure, from the exponential family and form interaction graphical models, proportional to $\exp\left\{-\frac{1}{2a}{\boldsymbol{x}^a}^{\top}\mathbf{K}\boldsymbol{x}^a+\frac{1}{b}\boldsymbol{\eta}^{\top}\boldsymbol{x}^b\right\},$ where a=a_numer/a_denom and b=b_numer/b_denom with a_numer, a_denom, b_numer, b_denom are integers. We define $$\frac{1}{(0/0)}\boldsymbol{x}^{0/0}\equiv\log\boldsymbol{x}$$ and $$\frac{1}{(n/0)}\boldsymbol{x}^{n/0}\equiv\exp(n\boldsymbol{x})$$ for $$n\neq 0$$ (the $$\exp$$ case has been implemented for gen() but not yet for estimate() and thus has no name).

Note that $$\mathbf{K}$$ is generally assumed to be positive definite (except for the Aitchison $$A^d$$ model, "log_log_sum0" explained below). If $$b$$ is $$0/0$$ (i.e. $$\boldsymbol{\eta}^{\top}\log(\boldsymbol{x})$$ appears in the log density), $$\boldsymbol{\eta}\succ-1$$ is necessary for the density to have a finite normalizing constant.

When generating random samples using gen() or doing estimation using estimate(), you should specify the model by passing its name as the setting argument.

For example, the Gaussian graphical models have name "gaussian" with density proportional to $\exp\left\{-\frac{1}{2}{\boldsymbol{x}}^{\top}\mathbf{K}\boldsymbol{x}+\boldsymbol{\eta}^{\top}\boldsymbol{x}\right\};$ the square root exponential graphical models from Inouye et al. (2016) and Yu et al. (2019) have name "exp" with density proportional to $\exp\left\{-{\boldsymbol{x}^{1/2}}^{\top}\mathbf{K}\boldsymbol{x}^{1/2}+2\boldsymbol{\eta}^{\top}\boldsymbol{x}^{1/2}\right\};$ the gamma graphical models from Yu et al. (2019) have name "gamma" with density proportional to $\exp\left\{-{\boldsymbol{x}^{1/2}}^{\top}\mathbf{K}\boldsymbol{x}^{1/2}+\boldsymbol{\eta}^{\top}\log(\boldsymbol{x})\right\};$ the $$A^d$$ model on the simplex from Aitchison (1985) have name "log_log_sum0" with density proportional to $\exp\left\{-\frac{1}{2}\log(\boldsymbol{x})^{\top}\mathbf{K}\log(\boldsymbol{x})+\boldsymbol{\eta}^{\top}\log(\boldsymbol{x})\right\}$ where $$\mathbf{K}\boldsymbol{1}=\mathbf{K}^{\top}\boldsymbol{1}=\boldsymbol{0}$$. If we do not assume the rows and columns sum to 0, the setting has name "log_log".

For models with other $$a$$ and $$b$$, their names should have the form "ab_aVal_bVal" with aVal and bVal both integers or both two integers separated by "/", e.g. "ab_2_2" if $$a=b=2$$, or "ab_2_5/4" if $$a=2$$ and $$b=5/4$$, or "ab_2/3_1/2" if $$a=2/3$$ and $$b=1/2$$.

# Examples of Multivariate Graphical Models

In this section we show how to use the package to generate random samples from interaction models defined on a type of domain already implemented. We focus on the canonical example of the "R+" domain, namely the non-negative orthant $$\mathbb{R}_+^p$$. Examples of Aitchison (1985) models on the $$(p-1)$$-dimensional simplex would then follow.

## Truncated Gaussian Graphical Models on the Non-negative Orthant

### Data setup

We begin by generating a mean parameter and an inverse covariance matrix, and taking the true edge set.

n <- 200
p <- 50
tol <- 1e-8
domain <- make_domain(type="R+", p=p)
mu <- rnorm(p) * 0.5
K <- cov_cons(mode="sub", p=p, seed=1, spars=0.2, eig=0.1, subgraphs=5)
true_edges <- which(abs(K) > tol & diag(p) == 0)

Next, we generate a sample from the truncated Gaussian distribution with parameters generated above.

set.seed(1)
x <- rtmvnorm(n, mean = mu, sigma = solve(K), lower = rep(0, p), upper = rep(Inf, p), algorithm = "gibbs", burn.in.samples = 100, thinning = 10)

### Estimation using estimate() with x directly

We can obtain a path of estimated graphs over a grid of lambdas using a number of different ways. The simplest and the most convenient way would be to call the estimate() function. First suppose that in our estimation we use $$h(x)=\min(x,3)$$ and the “high” diagonal multiplier defined in Yu et al (2019). For simplicity we use the profiled estimator (lambda_eta = 0, lambda_ratio=Inf), but we remind that as shown in Yu et al (2019) using a reasonable lambda_ratio would significantly improve the result.

maxit <- 1000
nlambda <- 100
dm <- 1 + (1-1/(1+4*exp(1)*max(6*log(p)/n, sqrt(6*log(p)/n)))) # diagonal multiplier
h_mode <- "min_pow"
h_param1 <- 1
h_param2 <- 3
est <- estimate(x, setting="gaussian", domain=domain, centered=FALSE, symmetric="symmetric", scale="norm", lambda_length=nlambda, lambda_ratio=Inf, mode=h_mode, param1=h_param1, param2=h_param2, verbose=FALSE, tol=tol, maxit=maxit, BIC_refit=TRUE, diagonal_multiplier=dm, return_raw=TRUE)

The results returned contain a list of edge sets, edgess, one for each lambda.

est$edgess[[1]] # empty graph #> integer(0) length(est$edgess[[nlambda]]) == p*(p-1) # complete graph
#> [1] TRUE
length(est$edgess[[nlambda/2]]) # estimated graph for the 50th lambda #> [1] 2388 The result est also includes a matrix BIC containing the eBIC values, and BIC_refits if run with BIC_refit=TRUE. Each column corresponds to the choice of $$\gamma=0,0.5,1$$; c.f. Chen and Chen (2008). est$BICs[1:10,]
#>             [,1]       [,2]      [,3]
#>  [1,] -445.29295 -445.29295 -445.2929
#>  [2,] -408.37424 -367.14170 -325.9092
#>  [3,] -340.62780 -240.90543 -141.1831
#>  [4,] -237.98286  -64.51935  108.9442
#>  [5,] -159.50277   65.06685  289.6365
#>  [6,]  -25.86317  274.58896  575.0411
#>  [7,]  136.03505  516.53831  897.0416
#>  [8,]  325.88246  787.68786 1249.4933
#>  [9,]  538.26160 1078.32728 1618.3930
#> [10,]  805.11502 1426.80810 2048.5012
est$BIC_refits[1:10, ] #> [,1] [,2] [,3] #> [1,] -2101.744 -2101.744 -2101.744 #> [2,] -2345.880 -2304.647 -2263.415 #> [3,] -2651.017 -2551.294 -2451.572 #> [4,] -2964.166 -2790.702 -2617.239 #> [5,] -3193.223 -2968.654 -2744.084 #> [6,] -3535.282 -3234.830 -2934.378 #> [7,] -3866.533 -3486.029 -3105.526 #> [8,] -4170.900 -3709.095 -3247.290 #> [9,] -4422.488 -3882.422 -3342.357 #> [10,] -4654.398 -4032.705 -3411.012 We can also take a look at the lambda values we used in the estimation. We only specified the number of lambdas to lambda_length=100, and the lambda values are automatically generated by the function. est$lambda1s # lambda_Ks used
#>   [1] 1.305613e-01 1.181263e-01 1.068757e-01 9.669654e-02 8.748692e-02
#>   [6] 7.915445e-02 7.161558e-02 6.479473e-02 5.862351e-02 5.304006e-02
#>  [11] 4.798839e-02 4.341786e-02 3.928263e-02 3.554125e-02 3.215621e-02
#>  [16] 2.909357e-02 2.632263e-02 2.381560e-02 2.154734e-02 1.949511e-02
#>  [21] 1.763835e-02 1.595843e-02 1.443851e-02 1.306335e-02 1.181916e-02
#>  [26] 1.069348e-02 9.675001e-03 8.753530e-03 7.919822e-03 7.165518e-03
#>  [31] 6.483056e-03 5.865593e-03 5.306939e-03 4.801493e-03 4.344186e-03
#>  [36] 3.930435e-03 3.556091e-03 3.217400e-03 2.910966e-03 2.633718e-03
#>  [41] 2.382876e-03 2.155925e-03 1.950589e-03 1.764810e-03 1.596725e-03
#>  [46] 1.444649e-03 1.307057e-03 1.182570e-03 1.069939e-03 9.680351e-04
#>  [51] 8.758370e-04 7.924201e-04 7.169480e-04 6.486641e-04 5.868836e-04
#>  [56] 5.309874e-04 4.804148e-04 4.346589e-04 3.932609e-04 3.558057e-04
#>  [61] 3.219179e-04 2.912576e-04 2.635175e-04 2.384194e-04 2.157117e-04
#>  [66] 1.951668e-04 1.765786e-04 1.597608e-04 1.445448e-04 1.307780e-04
#>  [71] 1.183224e-04 1.070530e-04 9.685704e-05 8.763213e-05 7.928583e-05
#>  [76] 7.173444e-05 6.490227e-05 5.872082e-05 5.312810e-05 4.806804e-05
#>  [81] 4.348992e-05 3.934783e-05 3.560024e-05 3.220959e-05 2.914186e-05
#>  [86] 2.636632e-05 2.385512e-05 2.158310e-05 1.952747e-05 1.766763e-05
#>  [91] 1.598492e-05 1.446247e-05 1.308503e-05 1.183878e-05 1.071122e-05
#>  [96] 9.691060e-06 8.768059e-06 7.932967e-06 7.177411e-06 6.493816e-06
est$lambda2s # lambda_etas, all 0 since with lambda_ratio=Inf we are in the profiled setting #> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #> [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #> [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 If run with centered=FALSE, etas is also returned. The raw estimates of K are stored in raw_estimates if return_raw=TRUE in the function call. est$etas[1:5,1:5] # First 5 components for the first 5 lambdas
#>           [,1]       [,2]      [,3]      [,4]      [,5]
#> [1,] -1.233783 -0.8367694 -1.124133 -1.523860 -1.213421
#> [2,] -1.233783 -0.8367694 -1.124844 -1.523860 -1.214110
#> [3,] -1.233783 -0.8367694 -1.132828 -1.525857 -1.208743
#> [4,] -1.233783 -0.8367694 -1.147555 -1.531046 -1.191121
#> [5,] -1.235740 -0.8367694 -1.160732 -1.536887 -1.170514
est$raw_estimates[[nlambda/2]][1:5,1:5] # First few entries of the estimated K for the 50th lambda #> [,1] [,2] [,3] [,4] [,5] #> [1,] 0.11276236 -0.04158900 -0.03624884 0.036769531 0.07729333 #> [2,] -0.04158900 0.33835651 -0.06945853 -0.047081226 0.02482866 #> [3,] -0.03624884 -0.06945853 0.16939438 -0.031710551 -0.10068159 #> [4,] 0.03676953 -0.04708123 -0.03171055 0.002637566 0.07725902 #> [5,] 0.07729333 0.02482866 -0.10068159 0.077259018 0.13069943 Here we plot the refitted BIC versus lambda curve and the ROC curve. Note that points with the same color in both graphs correspond to the same lambda. ROC <- t(sapply(est$edgess, tp_fp, true_edges=true_edges, p=p)) # ROC curve
AUC(ROC) # The area under the curve
#> [1] 0.6750035
par(mfrow=c(1,2), mar=c(4,4,4,4))
colors_ <- rainbow(nlambda)
plot_BIC_ind <- !is.infinite(est$BIC_refits[,3]) plot(est$lambda1s[plot_BIC_ind], est$BIC_refits[plot_BIC_ind,3], col=colors_[plot_BIC_ind], pch=16, cex=1.5, xlab="lambda", ylab="eBIC", main="Refitted BIC (when exists)") plot(c(),c(), ylim = c(0,1), xlim = c(0,1), cex.lab=1, xlab = "False Positives", ylab = "True Positives", main = "ROC curve") points(ROC[,2], ROC[,1], type="l") points(ROC[,2], ROC[,1], pch=16, cex=1.5, col=colors_) points(c(0,1), c(0,1), type="l", lty=2) ### Estimate using elts The function estimate() internally calls get_elts(), which calculates summary statistics (elts) of x used for estimation. If one were to rerun estimate() a multiple of times, e.g. when trying out different h functions, it might be time-saving to calculate these elements using get_elts() first, and then call estimate() with these elements directly, as shown in the following example. Unlike estimate() which may take the name of the function family and the parameters of h as inputs, get_elts() takes the h function and its derivative (hp) explicitly. We may obtain these through get_h_hp(). h_hp <- get_h_hp(mode=h_mode, para=h_param1, para2=h_param2) # h(x)=min(x,3), h'(x)=(x<=3) elts_P <- get_elts(h_hp, x, setting="gaussian", domain=domain, centered=FALSE, profiled_if_noncenter=Inf, scale="norm", diagonal_multiplier=dm, use_C=TRUE, tol=tol) # P stands for Profiled We can then feed elts_P into estimate() to avoid recalculation of the summary statistics. Note that here since we have already defined the h and hp directly, we can pass them to estimate() instead of passing mode, param1 and param2, in which case estimate() internally calls get_h_hp() just as we did. Now the results should be the same as est that we obtained before. est2 <- estimate(x, setting="gaussian", domain=domain, elts=elts_P, centered=FALSE, symmetric="symmetric", scale="norm", lambda_length=nlambda, lambda_ratio=Inf, h_hp=h_hp, verbose=FALSE, tol=tol, maxit=maxit, BIC_refit=TRUE, diagonal_multiplier=dm, return_raw=TRUE) compare_two_results(est, est2) ## Should be all close to 0 #> edgess BICs K lambda1s lambda2s etas BIC_refits #> 0 0 0 0 0 0 0 ### Results for one lambda It may be convenient to have a function that estimates the graph for one specific lambda. The function get_results() is useful in this case. i <- 10 # Re-estimate at the 10th lambda lambda1 <- est$lambda1s[i]
sub_res <- get_results(elts_P, symmetric="symmetric", lambda1=lambda1, lambda2=0, tol=tol, maxit=maxit, previous_res=NULL, is_refit=FALSE)
# Check if the result matches the previous one
sum(abs(est$raw_estimates[[i]] - sub_res$K))
#> [1] 1.500903e-10
sum(abs(est$etas[i,] - sub_res$eta))
#> [1] 4.394796e-11
all(sub_res$edges == est$edgess[[i]])
#> [1] TRUE

Equivalently, we can use estimate() with one lambda value (optionally with elts_P if available).

est_sub <- estimate(x, setting="gaussian", domain=domain, centered=FALSE, symmetric="symmetric", scale="norm", lambda1s = lambda1, lambda_ratio=Inf, mode=h_mode, param1=h_param1, param2=h_param2, verbose=FALSE, tol=tol, maxit=maxit, BIC_refit=TRUE, diagonal_multiplier=dm,  return_raw=TRUE)
# Check if the result matches the previous one
sum(abs(est_sub$raw_estimates[[1]] - sub_res$K))
#> [1] 0
sum(abs(est_sub$etas[1,] - sub_res$eta))
#> [1] 0
all(est_sub$edgess[[1]] == est$edgess[[1]])
#> [1] TRUE

The same function can also be used for fitting an unpenalized model restricted to a specific edge set when e.g. calculating the refitted eBIC by restricting to the estimated edge set. One can thus manually calculate the refitted eBIC in this manner, but we encourage using the function wrapper estimate() directly, which produces this information automatically.

est_refit <- get_results(elts_P, symmetric="symmetric", lambda1=lambda1, lambda2=0, tol=tol, maxit=maxit, previous_res=sub_res, is_refit=TRUE)
# Manually calculate the BIC score (eBIC with gamma = 0)
2*n*est_refit$crit + (length(est_refit$edges)/2+length(est_refit$eta_support))*log(n) #> [1] -4654.398 # Should match the number returned previously by estimate() est$BIC_refit[i,1]
#> [1] -4654.398

### Aggregating multiple ROC curves

The avgrocs() function aggregates multiple ROC curves using vertical averaging by adopting algorithm 3 from Fawcett (2006). Here is a quick example.

ROCs <- NULL
par(mfrow=c(2,2), mar=c(3,3,3,3))
for (i in 1:3){
set.seed(i)
x <- rtmvnorm(n, mean = mu, sigma = solve(K), lower = rep(0, p), upper = rep(Inf, p), algorithm = "gibbs", burn.in.samples = 100, thinning = 10)
est <- estimate(x, setting="gaussian", domain=domain, centered=FALSE, symmetric="symmetric", scale="norm", lambda_length=nlambda, lambda_ratio=Inf, mode=h_mode, param1=h_param1, param2=h_param2, verbose=FALSE, tol=tol, maxit=maxit, BIC_refit=FALSE, diagonal_multiplier=dm, return_raw=FALSE)
ROC <- t(sapply(est$edgess, function(edges){tp_fp(edges, true_edges, p)})) ROCs[[i]] <- ROC plot(c(), c(), ylim=c(0,1), xlim=c(0,1), cex.lab=1, main=paste("ROC, trial ",i,", AUC ",round(AUC(ROC),4),sep=""), xlab="False Positives", ylab="True Positives") points(ROC[,2], ROC[,1], type="l") points(c(0,1), c(0,1), type = "l", lty = 2) } average_ROC <- avgrocs(ROCs, length(true_edges), p) plot(c(), c(), ylim=c(0,1), xlim=c(0,1), cex.lab=1, main=paste("Average ROC, AUC",round(AUC(average_ROC),4)), xlab="False Positives", ylab="True Positives") points(average_ROC[,2], average_ROC[,1], type="l") points(c(0,1), c(0,1), type = "l", lty = 2) ## Exponential Square-Root Graphical Models on the Non-negative Orthant ### Data setup Consider the exponential square-root graphical models from Inouye et al (2016) on $$\mathbb{R}_+^p$$, which is also defined as a special case of the a-b model in (1) in Yu et al (2019) with $$a=b=0.5$$. We reuse the mean and inverse covariance matrix parameters from the previous section, and generate a new sample from the exponential square-root distribution. x <- gen(n, setting="exp", abs=FALSE, eta=mu, K=K, domain=domain, finite_infinity=100, seed=2, burn_in=1000, thinning=1000, verbose=FALSE, remove_outofbound=TRUE) ### Estimation As suggested by the simulation studies in Yu et al (2019), we estimate using $$h(x)=\min(x,3)^{1.5}$$. We also try with a finite lambda_ratio as it is known to improve the result compared to the profiled estimator, although the best ratio is subject to careful tuning. h_mode <- "min_pow" h_param1 <- 1.5 h_param2 <- 3 est_exp <- estimate(x, setting="exp", domain=domain, centered=FALSE, symmetric="symmetric", scale="norm", lambda_length=nlambda, lambda_ratio=2, mode=h_mode, param1=h_param1, param2=h_param2, verbose=FALSE, tol=tol, maxit=maxit, BIC_refit=TRUE, diagonal_multiplier=dm, return_raw=FALSE) ROC_exp <- t(sapply(est_exp$edgess, tp_fp, true_edges=true_edges, p=p)) # ROC curve
AUC(ROC_exp) # The area under the curve
#> [1] 0.8418207
par(mfrow=c(1,2), mar=c(4,4,4,4))
plot_BIC_ind <- !is.infinite(est_exp$BIC_refits[,3]) plot(est_exp$lambda1s[plot_BIC_ind], est_exp$BIC_refits[plot_BIC_ind,3], col=colors_[plot_BIC_ind], pch=16, cex=1.5, xlab="lambda", ylab="eBIC", main="Refitted BIC (when exists)") plot(c(),c(), ylim = c(0,1), xlim = c(0,1), cex.lab=1, xlab = "False Positives", ylab = "True Positives", main = "ROC curve") points(ROC_exp[,2], ROC_exp[,1], type="l") points(ROC_exp[,2], ROC_exp[,1], pch=16, cex=1.5, col=colors_) points(c(0,1), c(0,1), type="l", lty=2) ## Gamma Graphical Models on the Non-negative Orthant We now try the same experiments with the gamma models ($$a=0.5$$, $$b=0$$) on $$\mathbb{R}_+^p$$, again with $$h(x)=\min(x,3)^{1.5}$$ and lambda_ratio=2. However, recall that the gamma graphical models require all entries in the linear parameter to be strictly larger than -1. mu[mu <= -1] <- abs(mu[mu <= -1]) # make sure all entries are strictly larger than 1 x <- gen(n, setting="gamma", abs=FALSE, eta=mu, K=K, domain=domain, finite_infinity=100, seed=3, burn_in=1000, thinning=1000, verbose=FALSE, remove_outofbound=TRUE) h_mode <- "min_pow" h_param1 <- 1.5 h_param2 <- 3 est_gamma <- estimate(x, setting="gamma", domain=domain, centered=FALSE, symmetric="symmetric", scale="norm", lambda_length=nlambda, lambda_ratio=2, mode=h_mode, param1=h_param1, param2=h_param2, verbose=FALSE, tol=tol, maxit=maxit, BIC_refit=TRUE, diagonal_multiplier=dm, return_raw=FALSE) ROC_gamma <- t(sapply(est_gamma$edgess, tp_fp, true_edges=true_edges, p=p)) # ROC curve
AUC(ROC_gamma) # The area under the curve
#> [1] 0.6860751
par(mfrow=c(1,2), mar=c(4,4,4,4))
plot_BIC_ind <- !is.infinite(est_gamma$BIC_refits[,3]) plot(est_gamma$lambda1s[plot_BIC_ind], est_gamma$BIC_refits[plot_BIC_ind,3], col=colors_[plot_BIC_ind], pch=16, cex=1.5, xlab="lambda", ylab="eBIC", main="Refitted BIC (when exists)") plot(c(),c(), ylim = c(0,1), xlim = c(0,1), cex.lab=1, xlab = "False Positives", ylab = "True Positives", main = "ROC curve") points(ROC_gamma[,2], ROC_gamma[,1], type="l") points(ROC_gamma[,2], ROC_gamma[,1], pch=16, cex=1.5, col=colors_) points(c(0,1), c(0,1), type="l", lty=2) ## General a-b Graphical Models on the Non-negative Orthant The same can be applied to models with a general choice of $$a$$ and $$b$$. We now experiment with $$a=1.5$$, $$b=0.5$$ on $$\mathbb{R}_+^p$$, and estimate using $$h(x)=\min(x,3)^{1/2}$$ and lambda_ratio=2. x <- gen(n, setting="ab_3/2_1/2", abs=FALSE, eta=mu, K=K, domain=domain, finite_infinity=100, seed=4, burn_in=1000, thinning=1000, verbose=FALSE, remove_outofbound=TRUE) h_mode <- "min_pow" h_param1 <- 0.5 h_param2 <- 3 est_ab <- estimate(x, setting="ab_3/2_1/2", domain=domain, centered=FALSE, symmetric="symmetric", scale="norm", lambda_length=nlambda, lambda_ratio=2, mode=h_mode, param1=h_param1, param2=h_param2, verbose=FALSE, tol=tol, maxit=maxit, BIC_refit=TRUE, diagonal_multiplier=dm, return_raw=FALSE) ROC_ab <- t(sapply(est_ab$edgess, tp_fp, true_edges=true_edges, p=p)) # ROC curve
AUC(ROC_ab) # The area under the curve
#> [1] 0.8145798
par(mfrow=c(1,2), mar=c(4,4,4,4))
plot_BIC_ind <- !is.infinite(est_ab$BIC_refits[,3]) plot(est_ab$lambda1s[plot_BIC_ind], est_ab$BIC_refits[plot_BIC_ind,3], col=colors_[plot_BIC_ind], pch=16, cex=1.5, xlab="lambda", ylab="eBIC", main="Refitted BIC (when exists)") plot(c(),c(), ylim = c(0,1), xlim = c(0,1), cex.lab=1, xlab = "False Positives", ylab = "True Positives", main = "ROC curve") points(ROC_ab[,2], ROC_ab[,1], type="l") points(ROC_ab[,2], ROC_ab[,1], pch=16, cex=1.5, col=colors_) points(c(0,1), c(0,1), type="l", lty=2) ## (Untruncated) Gaussian Graphical Models on the Entire Real Space By specifying an "R" type domain, the package also includes implementations for estimating the inverse covariance matrix from Gaussian graphical models on the entire $$R^p$$. However, for Gaussian graphical models we use the penalized version of the original score matching from Hyv"arinen (2005); see Lin et al (2016). Thus, the h and hp functions are ignored (where we fix $$h(x)=1$$), and the summary statistics (elts) are stored internally in a different way. Below we show an example where we draw samples using the same mean and inverse covariance matrix as before, and estimate with lambda_ratio=2. domain <- make_domain("R", p=p) set.seed(5) x <- rmvnorm(n, mean = mu, sigma = solve(K)) est_gauss <- estimate(x, setting="gaussian", domain=domain, centered=FALSE, symmetric="symmetric", scale="norm", lambda_length=nlambda, lambda_ratio=2, verbose=FALSE, tol=tol, maxit=maxit, BIC_refit=TRUE, diagonal_multiplier=dm, return_raw=TRUE) Alternatively, get the summary statistics first (without specifying h and hp) and pass it to estimate(). elts_gauss <- get_elts(NULL, x, setting="gaussian", domain=domain, centered=FALSE, profiled_if_noncenter=FALSE, scale="norm", diagonal_multiplier=dm, use_C=TRUE, tol=tol) est_gauss2 <- estimate(x, setting="gaussian", elts=elts_gauss, domain=domain, centered=FALSE, symmetric="symmetric", scale="norm", lambda_length=nlambda, lambda_ratio=2, verbose=FALSE, tol=tol, maxit=maxit, BIC_refit=TRUE, diagonal_multiplier=dm, return_raw=TRUE) compare_two_results(est_gauss, est_gauss2) ## Should be all close to 0 #> edgess BICs K lambda1s lambda2s etas BIC_refits #> 0 0 0 0 0 0 0 We can then draw the ROC curve as before. ROC_gauss <- t(sapply(est_gauss$edgess, tp_fp, true_edges=true_edges, p=p)) # ROC curve
AUC(ROC_gauss) # The area under the curve
#> [1] 0.9815651
par(mfrow=c(1,2), mar=c(4,4,4,4))
plot_BIC_ind <- !is.infinite(est_gauss$BIC_refits[,3]) plot(est_gauss$lambda1s[plot_BIC_ind], est_gauss$BIC_refits[plot_BIC_ind,3], col=colors_[plot_BIC_ind], pch=16, cex=1.5, xlab="lambda", ylab="eBIC", main="Refitted BIC (when exists)") plot(c(),c(), ylim = c(0,1), xlim = c(0,1), cex.lab=1, xlab = "False Positives", ylab = "True Positives", main = "ROC curve") points(ROC_gauss[,2], ROC_gauss[,1], type="l") points(ROC_gauss[,2], ROC_gauss[,1], pch=16, cex=1.5, col=colors_) points(c(0,1), c(0,1), type="l", lty=2) ## Aitchison $$A^d$$ Models on the Simplex The last interesting multivariate model we present is the $$A^d$$ model from Aitchison (1985) with density on the $$(p-1)$$-dimensional simplex proportional to $\exp\left\{-\frac{1}{2}\log(\boldsymbol{x})^{\top}\mathbf{K}\log(\boldsymbol{x})+\boldsymbol{\eta}^{\top}\log(\boldsymbol{x})\right\}$ with $$\mathbf{K}\boldsymbol{1}=\mathbf{K}^{\top}\boldsymbol{1}=\boldsymbol{0}$$ and $$\boldsymbol{\eta}\succ -1$$. With the zero column and row sum restriction on $$\mathbf{K}$$, the fields returned by get_elts() and calculations behind estimate() are very different than other previous cases, but the function prototypes remain the same, so there is no meaningful difference the user needs to be aware of. Note that scale="" is required in estimate() for simplex-type domains since the simplex is not invariant to scaling. eta <- rep(0.5, p) K <- -cov_cons("band", p, seed=1, spars=3) diag(K) <- diag(K) - rowSums(K) # So that K has row and column sums 0 eigen(K)$val # Verify that K has one 0 and (p-1) positive eigenvalues
#>  [1] 3.995773e+00 3.991890e+00 3.991634e+00 3.983272e+00 3.968596e+00
#>  [6] 3.967133e+00 3.962792e+00 3.937799e+00 3.929676e+00 3.928659e+00
#> [11] 3.905317e+00 3.881959e+00 3.881949e+00 3.867834e+00 3.839155e+00
#> [16] 3.825743e+00 3.822008e+00 3.801489e+00 3.774765e+00 3.773788e+00
#> [21] 3.760578e+00 3.743227e+00 3.727626e+00 3.724945e+00 3.720342e+00
#> [26] 3.708350e+00 3.707014e+00 3.649237e+00 3.550263e+00 3.435336e+00
#> [31] 3.306077e+00 3.162872e+00 3.007531e+00 2.842906e+00 2.669717e+00
#> [36] 2.484478e+00 2.282468e+00 2.062499e+00 1.827943e+00 1.584732e+00
#> [41] 1.339562e+00 1.099097e+00 8.697100e-01 6.573707e-01 4.675500e-01
#> [46] 3.051190e-01 1.742429e-01 7.828013e-02 1.969680e-02 2.664535e-15
true_edges <- which(abs(K) > tol & diag(p) == 0)
domain <- make_domain(type="simplex", p=p)
x <- gen(n, setting="log_log_sum0", abs=FALSE, eta=eta, K=K, domain=domain, finite_infinity=100, seed=6, burn_in=1000, thinning=1000, verbose=FALSE, remove_outofbound=TRUE)
h_mode <- "pow" # Simplex domains are bounded by nature, so no truncation needed
h_param1 <- 2
est_log_log_sum0 <- estimate(x, setting="log_log_sum0", domain=domain, centered=FALSE, symmetric="symmetric", scale="", lambda_length=nlambda, lambda_ratio=Inf, mode=h_mode, param1=h_param1, param2=NULL, verbose=FALSE, tol=tol, maxit=maxit, BIC_refit=TRUE, diagonal_multiplier=NULL,  return_raw=FALSE)

ROC_log_log_sum0 <- t(sapply(est_log_log_sum0$edgess, tp_fp, true_edges=true_edges, p=p)) # ROC curve AUC(ROC_log_log_sum0) # The area under the curve #> [1] 0.7421658 par(mfrow=c(1,2), mar=c(4,4,4,4)) plot_BIC_ind <- !is.infinite(est_log_log_sum0$BIC_refits[,3])
plot(est_log_log_sum0$lambda1s[plot_BIC_ind], est_log_log_sum0$BIC_refits[plot_BIC_ind,3], col=colors_[plot_BIC_ind], pch=16, cex=1.5, xlab="lambda", ylab="eBIC", main="Refitted BIC (when exists)")
plot(c(),c(), ylim = c(0,1), xlim = c(0,1), cex.lab=1, xlab = "False Positives", ylab = "True Positives", main = "ROC curve")
points(ROC_log_log_sum0[,2], ROC_log_log_sum0[,1], type="l")
points(ROC_log_log_sum0[,2], ROC_log_log_sum0[,1], pch=16, cex=1.5, col=colors_)
points(c(0,1), c(0,1), type="l", lty=2)

One may not necessarily wish to impose the condition that $$\mathbf{K}\mathbf{1}=\mathbf{K}^{\top}\mathbf{1}=\mathbf{0}$$, in which case estimate(x, setting="log_log_sum0", ...) should be changed to estimate(x, setting="log_log", ...). Performance is not guaranteed.

est_log_log <- estimate(x, setting="log_log", domain=domain, centered=FALSE, symmetric="symmetric", scale="", lambda_length=40, lambda_ratio=Inf, mode=h_mode, param1=h_param1, param2=NULL, verbose=FALSE, tol=tol, maxit=maxit, BIC_refit=TRUE, diagonal_multiplier=NULL,  return_raw=FALSE)
ROC_log_log <- t(sapply(est_log_log\$edgess, tp_fp, true_edges=true_edges, p=p)) # ROC curve
AUC(ROC_log_log)
#> [1] 0.6398557

Note that while setting = "log_log" is available for all domain types, setting = "log_log_sum0" is only available for simplex domains. On the other hand, these two settings are the only settings currently available for simplex domains.

This concludes our tutorial on data generation and estimation for multivariate interaction models on general domains.

# Univariate Truncated Normal Distributions on x > 0

In this section, we explore univariate truncated normal distributions with probability density function proportional to $$\exp(-(x-\mu)^2/(2\sigma^2))$$ on $$(0,+\infty)$$.

## Introduction

To begin with, we fix an $$h$$ function, and estimate the parameters from a random sample.

mode <- "min_log_pow"
param1 <- 1
param2 <- 2
n <- 1000
mu <- 0
sigma <- 1
domain <- make_domain("R+", 1)
set.seed(6)
x <- tmvtnorm::rtmvnorm(n, mean=mu, sigma=sigma^2, lower=c(0), upper=c(Inf))

We can choose to provide one of the parameters if the true value is known, or estimate both.

# Assuming sigma is known, estimated muhat should be close to the truth 0
mu_sigmasqhat(x, mode, param1, param2, sigma=sigma^2)
#> [1] 0.01015329 1.00000000
# Assuming mu is known, estimated sigmasqhat should be close to the truth 1
mu_sigmasqhat(x, mode, param1, param2, mu=mu)
#> [1] 0.000000 1.012342
# Assuming both are unknown, estimates should be close to the truth c(0,1)
mu_sigmasqhat(x, mode, param1, param2)
#> [1] -0.01763796  1.02421948

## Variance Estimation and Confidence Intervals

The varhat() function provides a numerical calculation of the asymptotic variance of the estimator of one parameter assuming knowing the other.

# The asymptotic variance for mu assuming sigmasq known
varhat(mu, sigma^2, mode, param1, param2, est_mu=TRUE, domain=domain, tol=1e-6)
#> [1] 2.795459
# The asymptotic variance for sigmasq assuming mu known
varhat(mu, sigma^2, mode, param1, param2, est_mu=FALSE, domain=domain, tol=1e-6)
#> [1] 2.379507

Plugging in the estimated values, we can thus construct a confidence interval using the asymptotic variance evaluated at the estimates.

alpha <- 0.05
muhat <- mu_sigmasqhat(x, mode, param1, param2, sigma=sigma^2)[1]
# Assuming sigmasq is known, plug in muhat
sdmu <- sqrt(varhat(muhat, sigma^2, mode, param1, param2, est_mu=TRUE, domain=domain, tol=1e-6)/n)
# The 95% confidence interval for mu
muhat + c(-1,1)*qnorm(1-alpha/2)*sdmu
#> [1] -0.09317385  0.11348043

sigmasqhat <- mu_sigmasqhat(x, mode, param1, param2, mu=mu)[2]
# Assuming mu is known, plug in sigmasqhat
sdsigmasq <- sqrt(varhat(mu, sigmasqhat, mode, param1, param2, est_mu=FALSE, domain=domain, tol=1e-6)/n)
# The 95% confidence interval for sigmasq
sigmasqhat + c(-1,1)*qnorm(1-alpha/2)*sdsigmasq
#> [1] 0.9155719 1.1091115

We now generate 10000 samples, each with sample size 1000, and test the coverage probability of the confidence intervals constructed as above.

samples <- 10000
# 1000*10000 random samples from TN(mu, sigma)
x <- matrix(tmvtnorm::rtmvnorm(n*samples, mean=mu, sigma=sigma^2, lower=c(0), upper=c(Inf)), nrow=n, ncol=samples)

# One muhat estimate for each of the 10000 samples
muhats <- apply(x, 2, mu_sigmasqhat, mode=mode, param1=param1, param2=param2, sigmasq=sigma^2)[1,]
# One standard error for each of the 10000 samples, plugging in the estimates for mu and true sigmasq
ses_mu <- sapply(muhats, function(muhat){sqrt(varhat(muhat, sigma^2, mode, param1, param2, est_mu=TRUE, domain=domain, tol=1e-6)/n)})
# Coverage probability of the confidence intervals for mu, should be close to 0.95
mean(mu >= muhats - qnorm(1-alpha/2)*ses_mu & mu <= muhats + qnorm(1-alpha/2)*ses_mu)
#> [1] 0.9514

# One sigmasqhat estimate for each of the 10000 samples
sigmasqhats <- apply(x, 2, mu_sigmasqhat, mode=mode, param1=param1, param2=param2, mu=mu)[2,]
# One standard error for each of the 10000 samples, plugging in the estimates for sigmasq and true mu
ses_sigmasq <- sapply(sigmasqhats, function(sigmasqhat){sqrt(varhat(mu, sigmasqhat^2, mode, param1, param2, est_mu=FALSE, domain=domain, tol=1e-6)/n)})
# Coverage probability of the confidence intervals for sigmasq, should be close to 0.95
mean(sigma^2 >= sigmasqhats - qnorm(1-alpha/2)*ses_sigmasq & sigma^2 <= sigmasqhats + qnorm(1-alpha/2)*ses_sigmasq)
#> [1] 0.9472

We can also compare the asymptotic variance of the estimator to the empirical variance, as well as the Cram'er-Rao lower bound.

trials <- 100
# Reshape x into a 1000*100*100 array
x <- array(x, dim=c(n,trials,samples/trials))
# The asymptotic variance for mu assuming sigmasq known
varhat(mu, sigma^2, mode, param1, param2, est_mu=TRUE, domain=domain, tol=1e-6)
#> [1] 2.795459
# Empirical variance by averaging 100 variance estimates over 100 samples, each with sample size 1000
n*mean(apply(apply(x, c(2,3), mu_sigmasqhat, mode=mode, param1=param1, param2=param2, sigmasq=sigma^2)[1,,], 1, var))
#> [1] 2.780947
# The Cram\'er-Rao lower bound on the variance of unbiased estimators of mu
crbound_mu(mu, sigma)
#> [1] 2.751938

# The asymptotic variance for sigmasq assuming mu known
varhat(mu, sigma^2, mode, param1, param2, est_mu=FALSE, domain=domain, tol=1e-6)
#> [1] 2.379507
# Empirical variance by averaging 100 variance estimates over 100 samples, each with sample size 1000
n*mean(apply(apply(x, c(2,3), mu_sigmasqhat, mode=mode, param1=param1, param2=param2, mu=mu)[2,,], 1, var))
#> [1] 2.34757
# The Cram\'er-Rao lower bound on the variance of unbiased estimators of sigmasq
crbound_sigma(mu, sigma)
#> [1] 2

## Plots from Yu et al (2019)

The following plot reproduces Figure 1 from Yu et al (2019), in which we compare the asymptotic variances and efficiencies (relative to the Cram'er-Rao lower bound) of our estimators with different $$h$$ functions when estimating the $$\mu$$ parameter, assuming $$\sigma^2=1$$ is known.

modes <- c("min_log_pow", "min_log_pow", "log_pow", "min_pow", "min_pow", "min_pow", "pow", "pow")
param1s <- c(1,1,1,1,1,2,1,2)
param2s <- c(1,2,-1,1,2,1,-1,-1)
h_names <- c("min(log(1+x),1)", "min(log(1+x),2)", "log(1+x)", "min(x,1)", "min(x,2)","min(x^2,1)","x","x^2")

mus <- seq(-4,4,length.out=100)
sigma <- 1
curves <- matrix(NA, nrow=length(h_names)+1, ncol=length(mus))
for (mu_i in 1:length(mus)){
for (hi in 1:length(h_names))
curves[hi,mu_i] <- varhat(mus[mu_i], sigma^2, modes[hi], param1s[hi], param2s[hi], est_mu=TRUE, domain=domain, tol=1e-6)
curves[length(h_names)+1,mu_i] <- crbound_mu(mus[mu_i],sigma^2)
}

par(mfrow=c(1,2), mar=c(5,5,1,1), xpd=TRUE, bty="n")
plot(c(),c(), xlim=range(mus), ylim=log(range(curves)), xlab=expression(mu[0]), ylab="log(Asymptotic var)", cex.lab=1, cex.axis=1)
order <- c(1,2,3,4,7,6,5,8)
colors <- c("red","darkorange1","gold3","green","forestgreen",
"hotpink","blue",gray.colors(1,start=0.7,end=0.7))[order]
for (hi in 1:length(h_names)){
points(mus, log(curves[hi, ]), col = colors[hi], type="l", lwd=3)
}
points(mus, log(curves[length(h_names)+1,]), lty=2, type="l", lwd=3)
legend("topright", x.intersp = 0.2, y.intersp=0.7, inset = c(0.45,0.0), lty = c(rep(1,length(h_names)),2), ncol = 1, lwd=c(rep(3,length(h_names)),2), seg.len=1, bty = "n", text.width = .1, col=c(colors[match(1:length(h_names), order)],"black"), legend=c(h_names[match(1:length(h_names), order)],"C-R lower bound"), cex=1.2)

plot(c(),c(),xlim=range(mus),ylim=range(t(curves[length(h_names)+1,]/t(curves))), xlab=expression(mu[0]), ylab="Efficiency", cex.lab=1, cex.axis=1)
for (hi in 1:length(h_names)){
points(mus, curves[length(h_names)+1,]/curves[hi,], col=colors[hi], type="l", lwd=3)
}

The following plot reproduces Figure 2 from Yu et al (2019), in which we compare the asymptotic variances and efficiencies (relative to the Cram'er-Rao lower bound) of our estimators with different $$h$$ functions when estimating the $$\sigma^2$$ parameter, assuming $$\mu=0.5$$ is known.

sigmas <- exp(seq(log(0.1),log(10),length.out=100))
mu <- 0.5
curves2 <- matrix(NA, nrow=length(h_names)+1, ncol=length(sigmas))
for (sigma_i in 1:length(sigmas)){
for (hi in 1:length(h_names))
curves2[hi,sigma_i] <- varhat(mu, sigmas[sigma_i]^2, modes[hi], param1s[hi], param2s[hi], est_mu=FALSE, domain=domain, tol=1e-6)
curves2[length(h_names)+1,sigma_i] <- crbound_sigma(mu,sigmas[sigma_i]^2)
}

par(mfrow=c(1,2), mar=c(5,5,1,1), xpd=TRUE, bty="n")
plot(c(),c(), xlim=range(sigmas), ylim=log(range(curves2)), xlab=expression(sigma[0]), ylab="log(Asymptotic var)", cex.lab=1, cex.axis=1)
order <- c(4,6,5,1,2,3,7,8)
colors <- c("red","darkorange1","gold3","green","forestgreen",
"hotpink","blue",gray.colors(1,start=0.7,end=0.7))[order]
for (hi in 1:length(h_names)){
points(sigmas, log(curves2[hi, ]), col = colors[hi], type="l", lwd=3)
}
points(sigmas, log(curves2[length(h_names)+1,]), type="l", lty=2, lwd=3)
legend("topright", x.intersp = 0.2, y.intersp=0.7, inset = c(0.5,0.3), lty = c(rep(1,length(h_names)),2), ncol = 1, lwd=c(rep(3,length(h_names)),2), seg.len=1, bty = "n", text.width = .1, col=c(colors[match(1:length(h_names), order)],"black"), legend=c(h_names[match(1:length(h_names), order)],"C-R lower bound"), cex=1)

plot(c(),c(),xlim=range(sigmas),ylim=range(t(curves2[length(h_names)+1,]/t(curves2))), xlab=expression(sigma[0]), ylab="Efficiency", cex.lab=1, cex.axis=1)
for (hi in 1:length(h_names)){
points(sigmas, curves2[length(h_names)+1,]/curves2[hi,], col=colors[hi], type="l", lwd=3)
}