---
title: "Peeling algorithm on Zadeh's Example"
author: "Claude Boivin^[Retired Statistician, Stat.ASSQ]"
date: "2022-03-03"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Peeling algorithm on Zadeh's Example}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include=FALSE}
#
# devtools::load_all(".") # only used in place of dst when testing with R-devel
library(dst) # attach package dst
#
# knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(
collapse = TRUE,
comment = ""
)
```
# Summary
The criticism of Dempster-Shafer Theory (DST) by L. A. Zadeh ^[L. A. Zadeh. A mathematical theory of evidence (book review). AI Magazine, 55(81—83), 1984] has generated a lot of discussions and articles on the subject of conflicting evidence. In 2005, R. Haenni ^[R. Haenni. Shedding New Light on Zadeh’s Criticism of Dempster’s Rule of Combination. Conference: Information Fusion, 2005 8th International Conference] showed that the surprising result obtained by applying Dempster's rule to Zadeh's example was more due to the modelling of the situation than Dempster's rule not working. I add my grain of salt to the debate on Zadeh's example by showing a formulation of the problem as a small belief network and using Dempster's rule of combination to obtain a realistic result. This belief network gives the same results as the combination of the two evidences with the disjunctive rule of combination ^[P. Smets (1993). Belief Functions: The Disjunctive Rule of Combination and the Generalized Bayesian Theorem. IRIDIA - Université Libre de Bruxelles, Brussels, Belgium]. At the same time, I show how to do the calculations using my *R* package *dst*^[https://cran.r-project.org/package=dst].
## Zadeh's Example
We suppose that a patient is examined by two doctors, A and B. A’s diagnosis is that P has either meningitis (M), with probability 0.99, or brain tumor (T), with probability 0.01. B agrees with A that the probability of a brain tumor is 0.01, but believes that it is the probability of concussion (C) rather than meningitis that is 0.99.
Zadeh considers the same space of diseases {M, T, C} for the two experts. Hence, after the combination of the two pieces of evidence by Dempster's rule, we find as a result that the belief of a brain tumor is certain.
```{r "Zadeh's example", echo = FALSE, warning=FALSE}
# Diagnosis from Expert 1. Coding the evidence with the bca function
Expert1 <- bca(tt = matrix(c(1,0,0,0,1,0,1,1,1), ncol=3, byrow=TRUE), m= c(0.99, 0.01, 0), cnames =c("M", "T", "C"), varnames = "Diagnosis1", idvar = 1)
# show the definition of Expert1
cat("Space of possibilities and Basic Chance Assignment of Expert 1")
Expert1$valuenames
cat("\r")
bcaPrint(Expert1)
#
# Diagnosis from Expert 2. Coding the evidence with the bca function
Expert2 <- bca(tt = matrix(c(0,1,0,0,0,1,1,1,1), ncol=3, byrow=TRUE), m= c(0.01, 0.99, 0), cnames =c("M", "T", "C"), varnames = "Diagnosis2", idvar = 2)
# show the definition of Expert2
cat("\r")
cat("Space of possibilities and Basic Chance Assignment of Expert 2")
Expert2$valuenames
cat("\r")
bcaPrint(Expert2)
# Combination of Expert 1 and Expert 2 using Dempster's rule
cat("\r")
cat("Combination of the two experts by Dempster's rule")
Ze1e2 <- nzdsr(dsrwon(Expert1, Expert2, relnb = 1))
zz <- tabresul(Ze1e2)
format(as.data.frame(zz$mbp), digits=2)
```
Indeed, the result does not reflect the opinions of the two experts. Before rejecting Dempster's rule as inappropriate to this situation, let's look more closely at the problem at hand.
# A correct solution using Dempster's rule of combination
## Diagnosis of the two experts
Let's take *Expert one*. *Expert number one* distributes the whole mass between the two singletons {M} and {T}. *Expert number one* does not consider {C} as a possibility. Hence we conclude that the space of possibilities of *Expert number one* cannot be {M, T, C}. We can say that *Expert number one* has restricted the space of possibilities of his/her diagnostic to the set {M, T}:
$F(D1) = \{M, T\}$. For simplicity, we write $D1 =\{M, T\}$.
The same line of reasoning is applied to *Expert number two*. The whole mass of one is allotted to the set {T, C}, and the third possibility (M) is not considered at all.
Hence $D2 = \{T, C\}$.
I show the coding of these two pieces of evidence with the function *bca* of the package *dst*.
```{r "pieces of evidence", echo = FALSE, warning=FALSE}
library(dst) # attach package dst
#
# Diagnosis from first expert (evidence e1 attached to variable D1)
e1 <- bca(tt = matrix(c(1,0,0,1,1,1), ncol=2, byrow=TRUE), m= c(0.99, 0.01, 0), cnames =c("M", "T"), varnames = "D1", idvar = 1)
#
# show the definition of e1
cat("Space of possibilities and Basic Chance Assignment of Expert 1")
e1$valuenames
cat("\r")
bcaPrint(e1)
#
# Diagnosis from second expert (evidence e2 attached to variable D2)
e2 <- bca(tt = matrix(c(1,0,0,1,1,1), ncol=2, byrow=TRUE), m= c(0.99, 0.01, 0), cnames =c("C", "T"), varnames = "D2", idvar = 2)
#
# show the definition of e2
cat("\r")
cat("Space of possibilities and Basic Chance Assignment of Expert 2")
e2$valuenames
cat("\r")
bcaPrint(e2)
```
## Linking the Experts to... the Patient
The two experts are reasoning in two different spaces of possibilities. To be able to combine their diagnosis, we need a common ground. This can be done if we introduce a third person, the patient, with a variable of interest, his/her disease (D). Then it is natural to take the union of the space of possibilities of *Expert number one* and *Expert number two* as the space of possibilities of the patient:
$D = \{M, T\} \cup \{C, T\} = \{M, T, C\}$.
Thus, the diagnosis of the patient's disease involves pooling the assessments of the two experts, using the "or" operator. This situation is described by a relation of implication between experts and patient:
r1: $D1 \cup D2 \rightarrow D$.
The relation *r1* is represented in the product space $\prod(D1, D2, D)$ by one focal set of mass one:
$m(M C M + M C C + M T M + M T T + T C T + T C C + T T T) = 1$ (for simplicity, the "+" sign is used as the $\vee$ disjunctive operator in the functions of the package dst). Now I use the function *bcaRel* to code this relation.
```{r "relation", echo = FALSE, warning=FALSE}
# 1. Defining the relation with a (0,1) matrix
tt_r1 <- matrix(c(1,0,1,0,1,0,0,1,0,1,0,0,0,1,1,0,0,1,1,0,0,1,0,0,1,0,1,0,0,1,1,0,0,1,0,0,1,1,0,0,0,1,0,1,0,1,0,1,0,1,1,1,1,1,1,1), ncol = 7,byrow = TRUE)
colnames(tt_r1) = c("M", "T", "C", "T", "M", "T", "C")
#
# 2. Setting the mass function
spec_r1 = matrix(c(rep(1,7),2, rep(1,7), 0), ncol = 2, dimnames = list(NULL, c("specnb", "mass")))
#
# 3. Names of variables names and dimension of their space of possibilities
info_r1 =matrix(c(1:3, 2,2,3), ncol = 2, dimnames = list(NULL, c("varnb", "size")) )
#
# The relation between e1, e2 and a patient p
r1 <-bcaRel(tt = tt_r1, spec = spec_r1, infovar = info_r1, varnames = c("D1", "D2", "D"), relnb = 1)
#
cat(" The relation r1")
bcaPrint(r1)
```
## The belief network
We now have all the elements of a small network made of one relation (r1) between three variables: Disease (D), Diagnosis1 (D1), Diagnosis2 (D2), and two pieces of evidence coming from *Expert one* (e1) and *Expert two* (e2).
The three variables *D*, *D1* and *D2* are the nodes of the graph. The edges (hyperedges) are given by the relation *r1* and the two pieces of evidence *e1* and *e2*.
Using the igraph package, ^[Csardi G, Nepusz T: The igraph software package for complex network research, InterJournal, Complex Systems 1695. 2006. https://igraph.org] a bipartite graph corresponding to the hypergraph can be obtained.
```{r, fig.show='hold', fig_caption: yes, echo=FALSE, message=FALSE}
# The network
if (requireNamespace("igraph", quietly = TRUE) ) {
library(igraph)
# Encode pieces of evidence and relations with an incidence matrix
rel1 <- 1*1:3 %in% r1$infovar[,1]
ev1 <- 1*1:3 %in% e1$infovar[,1]
ev2 <- 1*1:3 %in% e2$infovar[,1]
# information on variables
meddiag_vars1 <- c(r1$valuenames)
meddiag_vars <- rbind(r1$infovar)
meddiag_var_names <-names(meddiag_vars1)
rownames(meddiag_vars) <- meddiag_var_names
# infos on relations
meddiag_data_names <- c("e1", "e2", "r1")
# the incidence matrix
meddiag_hgm <- matrix(c(ev1,ev2, rel1), ncol=3, dimnames = list(c("D1", "D2", "D"), c("e1","e2", "r1")))
meddiag <- list(meddiag_hgm, meddiag_var_names, meddiag_data_names)
#
## The graph structure of the problem
#
meddiag_hg <- graph_from_biadjacency_matrix(incidence = meddiag_hgm, directed = FALSE, multiple = FALSE, weighted = NULL,add.names = NULL)
V(meddiag_hg)
# Show variables as circles, relations and evidence as rectangles
V(meddiag_hg)$shape <- c("circle", "crectangle")[V(meddiag_hg)$type+1]
V(meddiag_hg)$label.cex <- 0.6
V(meddiag_hg)$label.font <- 2
# render graph
plot(meddiag_hg, vertex.label = V(meddiag_hg)$name, vertex.size=(3+6*V(meddiag_hg)$type)*6, sub="Belief network for Zadeh's Example")
}
```
## Calculations on the Belief Network: The peeling algorithm
Our goal is the calculation of the belief function of the variable of interest "D" (Disease of the patient). We apply an algorithm called "Peeling" to the belief network. This is a process of successive elimination of variables (peeling) until only the variable of interest (*D* here) remains. The elimination of a variable has the effect of integrating its contribution to the reduced graph.
Four parameters are necessary to trigger the algorithm. The first three are already defined when constructing the hypergraph. The fourth parameter is an order of elimination of the variables that we have to set.
1. Identification of variables and their space of possibilities
$$F(D1) = \{M, T\}$$
$$F(D2) = \{M, C\}$$
$$F(D) = \{M, T, C\}$$
2. Incidence matrix of the graph (nodes and edges)
```{r Print incidence matrix, echo=FALSE}
cat("Row names are variables names (nodes).\n")
cat("Column names are for pieces of evidence and relations (edges).\n")
print(meddiag_hgm)
```
3. The names of data specifications (evidence and relations between variables)
```{r Print names of evidence and relations, echo=FALSE}
meddiag_data_names
```
4. Variable numbers are used to fix the order of elimination. Here we eliminate D1 first, then D2.
```{r Define elimination order, echo=FALSE}
format(as.data.frame(cbind(r1$infovar, r1$varnames) ) )
elim_order = c(1, 2, 3)
```
The calculations involved follow the principles of the valuation language of Shenoy ^[P. P. Shenoy. A Valuation-Based Language for Expert systems. International Journal of Approximate Reasoning 1989, 3 383--411]; see also ^[P. P. Shenoy. Valuation-Based Systems. Third School on Belief Functions and Their Applications, Stella Plage, France. September 30, 2015]. The variables are linked to functions (called valuations). A function can be a piece of evidence attached to a variable or a relation between two or more variables.
Three kinds of operations are involved in the process of variable elimination:
a) the minimal (vacuous) extension of a mass function to a larger space of possibilities;
b) the combination of two mass functions by Dempster's rule;
c) the marginalization of a mass function, i.e. eliminating a variable to reduce the function to a smaller space of possibilities. Let's do it.
First step: Eliminate variable D1 (Diagnosis1). The mass function *e1* is extended to the space $\prod(D1, D2, D)$; then *e1 extended* is combined with *r1* by Dempster's rule; finally, D1 is eliminated by marginalizing the result of the combination to $\prod(D2, D)$. The mass function obtained is named *rel_2*.
Second step: Eliminate variable D2 (Diagnosis2). Evidence *e2* is extended to the space $\prod(D2, D)$; Then *e2 extended* is combined with *rel_2* by Dempster's rule; the result of the combination is marginalized to D to produce the final result.
## The result
```{r The peeling, echo = FALSE, warning=FALSE}
# cat("\ ")
p <- peeling(vars_def = meddiag_vars1, hgm = meddiag_hgm, hg_rel_names = meddiag_data_names, elim_order = c(1, 2, 3), verbose = FALSE )
#
# add singletons with 0 mass to show all singletons in the results
p_sing <- addTobca(x = p, tt = matrix(c(1,0,0,0,0,1), ncol=3))
# "The final result after elimination of variable D2"
cat("\ ")
zz <- tabresul(p_sing)
format(as.data.frame(zz$mbp), digits=2)
```
The plausibility ratio column shows that the odds of $M \vee C$ against T are 50:1. hence, the disease of the patient must be M or C. We also see that each single hypothesis, M and C, remains highly plausible (0.99). Although there is some support for T, its plausibility is very weak at 0.019.
Finally, we use the plausibility transformation ^[Cobb, B. R. and Shenoy, P.P. (2006). On the plausibility transformation method for translating belief function models to probability models. Journal of Approximate Reasoning, 41(3), April 2006, 314–330] to look at the results from the point of view of probability distribution. We see again that the odds for M against T or for C against T are very similar.
```{r, echo = FALSE, warning=FALSE}
format(as.data.frame(plautrans(p) ), digits = 4)
```