We detail here our analysis of the dataset of Martin et al. (2023) presented in our article as a case study.

Context

We focus on the first experiment of Martin et al. (2023), which evaluated how 22 participants succeeded in observing object manipulations in an immersive environment. The authors studied the influence of four different factors, namely the Type of manipulation (4 levels), its Distance from the observer (2 levels), the Complexity of the manipulated item (2 levels), and the field of view (not examined here). To analyze the effect of the first three factors, the authors used a \(4 \times 2 \times 2\) repeated-measures ANOVA using ART.

The response variable we are interested in is binary: Detected vs. Not Detected.

Reading the dataset

We start by reading the data provided by the authors:

data <- read.csv("completed_data.csv", sep=";", header = TRUE)

data = data[data$Type != "FOV", ]
data <- data[c(2, 3, 4, 8, 12, 13, 14, 15)] # Choose the relevant columns

data$UserID<- as.factor(data$UserID)
data$TrialID <- as.factor(data$TrialID)

data$Type <- (as.factor(data$Type))
data$Complexity <- factor(data$Complexity, order=TRUE)
data$Distance <- factor(data$Distance, order=TRUE)

data$Detected <- as.logical(data$Detected)

Analysis

We conduct an analysis using both ART and a linear mixed-effects model (PAR). We use the authors’ original model formulation, where UserID and TrialID are both treated as random effects:

library(ARTool)
library(lmerTest)

m_art <- art(Detected ~ 1 + Type*Distance*Complexity + (1|UserID) + (1|TrialID), data = data)
m_lmer <- lmer(Detected ~ 1 + Type*Complexity*Distance + (1|UserID) + (1|TrialID), data = data)

Results are as follows:

printAnalysis <- function(num, method, model) {
  cat(num, ". Analysis using ", method, "\n\n", sep="")
  print(anova(model))
}

printAnalysis(1, "LMER model (Parametric)", m_lmer)
## 1. Analysis using LMER model (Parametric)
## 
## Type III Analysis of Variance Table with Satterthwaite's method
##                           Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
## Type                     2.21940 0.73980     3 408.93  3.4592 0.0164750 *  
## Complexity               1.94406 1.94406     1 406.25  9.0903 0.0027310 ** 
## Distance                 0.52738 0.52738     1 401.83  2.4660 0.1171225    
## Type:Complexity          2.89761 0.96587     3 405.64  4.5163 0.0039488 ** 
## Type:Distance            1.19950 0.39983     3 406.84  1.8696 0.1340878    
## Complexity:Distance      1.91345 1.91345     1 405.13  8.9471 0.0029488 ** 
## Type:Complexity:Distance 3.10180 3.10180     1 399.78 14.5037 0.0001618 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
printAnalysis(2, "ART", m_art)
## 2. Analysis using ART
## 
## Analysis of Variance of Aligned Rank Transformed Data
## 
## Table Type: Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df) 
## Model: Mixed Effects (lmer)
## Response: art(Detected)
## 
##                                   F Df Df.res     Pr(>F)    
## 1 Type                      0.16651  3 408.64 0.91892657    
## 2 Distance                  4.66489  1 404.64 0.03137207   *
## 3 Complexity                5.17436  1 403.19 0.02344709   *
## 4 Type:Distance             0.46214  3 406.53 0.70887683    
## 5 Type:Complexity           5.71599  3 404.90 0.00077128 ***
## 6 Distance:Complexity      10.86615  1 405.21 0.00106569  **
## 7 Type:Distance:Complexity 12.13378  1 401.64 0.00054963 ***
## ---
## Signif. codes:   0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We observe significant discrepancies between the results of the two methods. A problem with the above experimental design is that it is highly unbalanced. Given the strong interaction effects, interpreting main effects can be problematic.

It is also important to note that ARTool’s test of aligned responses indicates an issues, since there are non-zero “F values of ANOVAs on aligned responses:”

print(m_art)
## Aligned Rank Transform of Factorial Model
## 
## Call:
## art(formula = Detected ~ 1 + Type * Distance * Complexity + (1 | 
##     UserID) + (1 | TrialID), data = data)
## 
## Column sums of aligned responses (should all be ~0):
##                     Type                 Distance               Complexity 
##                        0                        0                        0 
##            Type:Distance          Type:Complexity      Distance:Complexity 
##                        0                        0                        0 
## Type:Distance:Complexity 
##                        0 
## 
## F values of ANOVAs on aligned responses not of interest (should all be ~0):
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.000005 0.000432 0.008192 0.604049 0.031647 7.207387

Fictional dataset

Let us consider a different dataset that we randomly generated from a population with no effect on any of the three factors (Type, Distance, and Complexity) and no interaction. To simplify our analysis, we assume that each user completed a single detection task per combination of the three factors, thus our data do not have a TrialID factor. Our design is fully balanced, while the average detection rate is \(46\%\), which is very similar to the detection rate observed by Martin et al. (2023).

We repeat our analysis with the two methods:

data2 <- read.csv("binary-example.csv", sep=",", header = TRUE)

data2$UserID <- as.factor(data2$UserID)
data2$Type <- (as.factor(data2$Type))
data2$Complexity <- factor(data2$Complexity, order=TRUE)
data2$Distance <- factor(data2$Distance, order=TRUE)
data2$Detected <- as.logical(data2$Detected)

m2_art <- art(Detected ~ 1 + Type*Distance*Complexity + (1|UserID), data = data2)
m2_lmer <- lmer(Detected ~ 1 + Type*Complexity*Distance + (1|UserID), data = data2)

Results are as follows:

printAnalysis(1, "LMER model (Parametric)", m2_lmer)
## 1. Analysis using LMER model (Parametric)
## 
## Type III Analysis of Variance Table with Satterthwaite's method
##                           Sum Sq  Mean Sq NumDF DenDF F value Pr(>F)
## Type                     0.19318 0.064394     3   315  0.4040 0.7502
## Complexity               0.01136 0.011364     1   315  0.0713 0.7896
## Distance                 0.28409 0.284091     1   315  1.7825 0.1828
## Type:Complexity          0.46591 0.155303     3   315  0.9744 0.4050
## Type:Distance            0.69318 0.231061     3   315  1.4498 0.2283
## Complexity:Distance      0.01136 0.011364     1   315  0.0713 0.7896
## Type:Complexity:Distance 0.51136 0.170455     3   315  1.0695 0.3622
printAnalysis(2, "ART", m2_art)
## 2. Analysis using ART
## 
## Analysis of Variance of Aligned Rank Transformed Data
## 
## Table Type: Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df) 
## Model: Mixed Effects (lmer)
## Response: art(Detected)
## 
##                                  F Df Df.res     Pr(>F)    
## 1 Type                      1.8875  3    315  0.1315652    
## 2 Distance                 16.6000  1    315 5.8462e-05 ***
## 3 Complexity                8.8795  1    315  0.0031083  **
## 4 Type:Distance             3.0836  3    315  0.0275870   *
## 5 Type:Complexity           1.9944  3    315  0.1147466    
## 6 Distance:Complexity       8.8897  1    315  0.0030915  **
## 7 Type:Distance:Complexity  1.1267  3    315  0.3383800    
## ---
## Signif. codes:   0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The difference between the results of the two methods are striking. As we show in the article, the Type I error rate of ART for binary data is extremely high, so its results cannot be trusted.

An alternative method for conducting an analysis with such data is to use generalized linear models with a binomial link function:

m1_glmer <- glmer(Detected ~ 1 + Type*Complexity*Distance + (1|UserID), data = data2, family = "binomial")

A problem now is how to test main and interaction effects. A possible way is to conduct Wald tests:

library(car)
Anova(m1_glmer)
## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: Detected
##                           Chisq Df Pr(>Chisq)
## Type                     1.0830  3     0.7812
## Complexity               0.0559  1     0.8130
## Distance                 1.7032  1     0.1919
## Type:Complexity          2.9068  3     0.4062
## Type:Distance            4.5232  3     0.2102
## Complexity:Distance      0.0749  1     0.7843
## Type:Complexity:Distance 3.4277  3     0.3303

We observe that the \(p\)-values are similar to those obtained with our linear model, showing no evidence for any main or interaction effects. Additional methods for testing effects include the comparison of alternative models. We build a range of hypothetical models below:

m2_glmer <- glmer(Detected ~ 1 + Type*Complexity + (1|UserID), data = data2, family = "binomial")
m3_glmer <- glmer(Detected ~ 1 + Type*Distance + (1|UserID), data = data2, family = "binomial")
m4_glmer <- glmer(Detected ~ 1 + Complexity*Distance + (1|UserID), data = data2, family = "binomial")
m5_glmer <- glmer(Detected ~ 1 + Type + (1|UserID), data = data2, family = "binomial")
m6_glmer <- glmer(Detected ~ 1 + Complexity + (1|UserID), data = data2, family = "binomial")
m7_glmer <- glmer(Detected ~ 1 + Distance + (1|UserID), data = data2, family = "binomial")
m8_glmer <- glmer(Detected ~ 1 + (1|UserID), data = data2, family = "binomial")

We can now compare them as follows:

anova(m1_glmer, m2_glmer, m3_glmer, m4_glmer, m5_glmer, m6_glmer, m7_glmer, m8_glmer)
## Data: data2
## Models:
## m8_glmer: Detected ~ 1 + (1 | UserID)
## m6_glmer: Detected ~ 1 + Complexity + (1 | UserID)
## m7_glmer: Detected ~ 1 + Distance + (1 | UserID)
## m4_glmer: Detected ~ 1 + Complexity * Distance + (1 | UserID)
## m5_glmer: Detected ~ 1 + Type + (1 | UserID)
## m2_glmer: Detected ~ 1 + Type * Complexity + (1 | UserID)
## m3_glmer: Detected ~ 1 + Type * Distance + (1 | UserID)
## m1_glmer: Detected ~ 1 + Type * Complexity * Distance + (1 | UserID)
##          npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## m8_glmer    2 389.32 397.05 -192.66   385.32                     
## m6_glmer    3 391.25 402.84 -192.62   385.25 0.0718  1     0.7887
## m7_glmer    3 389.52 401.11 -191.76   383.52 1.7276  0           
## m4_glmer    5 393.38 412.69 -191.69   383.38 0.1438  2     0.9306
## m5_glmer    5 394.09 413.41 -192.05   384.09 0.0000  0           
## m2_glmer    9 399.06 433.83 -190.53   381.06 3.0348  4     0.5520
## m3_glmer    9 395.73 430.51 -188.87   377.73 3.3241  0           
## m1_glmer   17 405.22 470.90 -185.61   371.22 6.5152  8     0.5897

The results indicate that the simplest model (m8_glmer) appears as the most promising one based on all criteria.