Recommendations for visual predictive checks in Bayesian workflow

Authors
Affiliation

Teemu Säilynoja

Aalto University

Andrew R. Johnson

Aalto University

Osvaldo A. Martin

Aalto University

Aki Vehtari

Aalto University

Under Review

This paper is under review on the experimental track of the Journal of Visualization and Interaction. See the reviewing process.

Abstract
Introduction

A key step in the Bayesian workflow for model building is the graphical assessment of model predictions, whether these are drawn from the prior or posterior predictive distribution. The goal of these assessments is to identify whether the model is a reasonable (and ideally accurate) representation of the domain knowledge and/or observed data. There are many commonly used visual predictive checks which can be misleading if their implicit assumptions do not match the reality. Thus, there is a need for more guidance for selecting, interpreting, and diagnosing appropriate visualizations. As a visual predictive check itself can be viewed as a model fit to data, assessing when this model fails to represent the data is important for drawing well-informed conclusions.

Demonstration

We present recommendations for appropriate visual predictive checks for observations that are: continuous, discrete, or a mixture of the two. We also discuss diagnostics to aid in the selection of visual methods. Specifically, in the detection of an incorrect assumption of continuously-distributed data: identifying when data is likely to be discrete or contain discrete components, detecting and estimating possible bounds in data, and a diagnostic of the goodness-of-fit to data for density plots made through kernel density estimates.

Conclusion

We offer recommendations and diagnostic tools to mitigate ad-hoc decision-making in visual predictive checks. These contributions aim to improve the robustness and interpretability of Bayesian model criticism practices.

Research materials

Source files of this article, as well as supplementary materials providing additional detail into the examples and case studies shown in this article, can be found at https://teemusailynoja.github.io/visual-predictive-checks.

Authorship
  • Teemu Säilynoja: Conceptualization, Methodology, Software, Investigation, Writing - Review & Editing, Visualization
  • Andrew R. Johnson: Conceptualization, Writing - Review & Editing
  • Osvaldo A. Martin: Writing - Review & Editing, Visualization
  • Aki Vehtari: Conceptualization, Methodology, Writing - Review & Editing, Supervision, Funding acquisition.
License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Conflicts of interest

The authors declare that there are no competing interests.

1 Introduction

Assessing the sensibility of model predictions and their fit to observations is a key part of most model building workflows. These assessments may reveal that the model predictions poorly represent (or replicate) the observed data, prompting the modeller to either improve their model, or adjust their confidence in the predictions accordingly. In this paper, we focus on examples rising from Bayesian workflows (Gabry et al. 2019; Gelman et al. 2020), such as posterior predictive checking [PPC; Box (1980);Rubin (1984); Gelman, Xiao-Li, and Stern (1996); Gelman et al. (2013)], and visualizations of data, posterior, and posterior predictive distributions. Visualization of data distribution is useful even if we do not do any modeling. In Bayesian inference, the posterior distribution presents the uncertainty in the parameter values after conditioning on data, and the posterior predictive distribution presents the uncertainty in the predictions for new observations. The posterior inference is often performed using Monte Carlo methods (Štrumbelj et al. 2024), and posterior and posterior predictive distributions are then represented with draws from these distributions. In the same way as we visualize data distributions, we can visualize posterior and posterior predictive distributions. In posterior predictive checking, we compare the data distribution to the posterior predictive distribution. If these distributions are not similar, the model is misspecified and we should consider improving the model. These visualizations are also applicable to prior predictive checking, where one might compare prior predictive samples to some reference values rising from domain knowledge (Wesner and Pomeranz 2021), and to cross-validation predictive checking where data are compared to cross-validation predictive distributions to avoid double use of data.

The purpose of this paper is to illustrate common issues in visual predictive checking, to provide better recommendations for different use cases, and to provide diagnostics to automatically warn if useless or misleading visualization is used. Together we have decades of experience in teaching visual predictive checking and helping modellers using popular visual predictive checking tools such as bayesplot (Gabry and Mahr (2024)) and ArviZ (Kumar et al. (2019)). We have seen many times where students and modellers use commonly recommended visualizations without realizing that in their context, the specific visualization is the wrong one and either useless or even misleading. Backed up with arguments from recent studies in uncertainty visualization, we provide a summary of methods we recommend. These methods are aimed to offer an informed basis for decision-making, be it for the modeller to improve on the model or for the end user to set expectations on the performance of the model.

The visualizations we discuss should be considered as broadly-applicable tools aimed to give insight into the overall goodness of model fit to the observations, as well as to possible deviations in certain parts of the predictive density. Most modeling workflows benefit from additional visualizations tailored to their specific use cases, as there is no general visual or numeric check that would reveal every aspect of the predictive capabilities of any given model. Some commonly-used methods for visualizing model predictions have risen from comparisons of continuous variables, and as such have varying degrees of usability for assessing predictions with discrete values. Some may not work with all continuous cases, in case of sharp upper or lower bounds in the set of possible values the prediction can take.

In Section 2, we inspect the common use of kernel density estimates (KDE) in summarizing the predictive distribution and the observed data, and show comparisons between commonly used visualization approaches. We highlight common cases when these visualizations may hide useful information and propose an automated goodness-of-fit test to alert the modeller of the presence of conflict between the data and the visualization. In Section 3, we discuss the use of visualizations assuming continuous data when the data and predictions are in reality discrete, but have a high number of unique values. We also discuss an alternate way of visualizing count data, when the number of cases is large, but still small enough for the modeller to be interested in attempting to assess the predictions as individual counts. In Section 4, we focus on workflows involving binary predictions. We showcase tools that expand the visualizations beyond the typical bar graphs and binned calibration plots and instead allow for a more robust assessment of the calibration of the predictive probabilities. In Section 5 and Section 6, we show how the visual predictive checks described for binary predictions can be extended to discrete predictions with a small or medium number of individual cases.

2 Visual predictive checks for continuous data

In this section, we consider visualizations for observations from a continuous or almost everywhere continuous distribution. We focus on three data visualizations; histograms, and KDE-based continuous density plots, as they are the two most commonly used visualizations for summarizing unidimensional distributions, and quantile dot plots (Kay et al. 2016) as we see that they offer a useful alternative that we recommend in many cases. Table 1 summarizes the main advantages and disadvantages of these three visualizations, and Figure 1 shows a side-by-side comparison of these visualizations. Histograms and density plots are implemented in various commonly used software packages for data visualization and are the two most common choices for initial visual PPCs (Wickham (2016); Gabry and Mahr (2024); Kumar et al. (2019); Kay (2024)).

(a) Histogram
(b) KDE Density plot
(c) Quantile dot plot
Figure 1: Three visualizations of the same sample from a smooth continuous distribution. a) Histogram with 30 equally spaced bins. b) Density plot using a Gaussian kernel and bandwidth 0.37 computed through the implementation of the method by Sheather and Jones (1991). c) Quantile dot plot with 100 quantiles as implemented in by Kay (2024).
Visualization Advantages Disadvantages
Histogram
  • Familiar to most users.
  • Can show discontinuities with right binning choice for the data.
  • Requires users to select bin width.
  • Challenges in comparing multiple histograms.
  • Artefacts with non-discrete valued distributions.
KDE plot
  • Familiar to most users.
  • Clear visual summary when sample adequately smooth.
  • Easy to stack multiple plots for comparison.
  • Tendency to over smooth.
  • Challenges with bounded data and discontinuities.
  • Selected kernel bandwidth can greatly impact the visualization.
Quantile dot plot
  • Dot position adaptive to local smoothness of sample.
  • Easily quantifiable tail probability estimation.
  • Not familiar to most users.
  • If area fixed and dots are restricted to circles, the aspect ratio is locked, unless vertical space is added between dots.
  • Exceptions for discrete distributions need to be implemented.
Table 1: Summary of advantages and disadvantages of the three distribution visualizations discussed in the article.

Histogram

Here, we consider the most commonly used histograms, which are centered over the data, and where all of the bins have an equal width (see an example in Figure 1 (a)). These bin widths are either chosen by the modeller, or through a heuristic, such as the Freedman-Diaconis rule (Freedman and Diaconis 1981). Often, instead of choosing the bin width directly, the modeller inputs a desired amount of bins and an equal partitioning of the data range is computed with an adequate number of breaks.

KDE-based density plot

The most typical KDE-based density plot, often simply called a density plot, is a line or filled area representing a density approximation obtained through convolution of the data with some kernel function (see an example in Figure 1 (b)). Most commonly, a Gaussian kernel is used for the approximation, and a kernel bandwidth is selected through one of the widely used algorithms (Sheather and Jones 1991; Scott 1992; Silverman 2018). As the default implementation of the KDE plot in most visualization software packages is usually enough to produce aesthetically pleasing smooth summaries of the data, KDE plots are a very popular method of visually summarizing data.

Quantile dot plot

The quantile dot plot (Kay et al. 2016), is a dot plot, where a set number of the observed quantiles—we use one hundred—are visualized instead of the observations themselves (see an example in Figure 1 (c)). This results in a visual density estimator with a low bias and high variance similar to that of a histogram (Wilkinson 1999). Kay et al. (2016) show that a quantile dot plot with one hundred quantiles performs very similarly to a KDE-based density plot for tasks of estimating probabilistic predictions from visualizations. Compared to KDEs, quantile dot plots have the added benefit of allowing for fast visual probability estimation in the tails of the distribution. In our experience, and as shown in Section 2.6 and Section 2.4 in case of discontinuities and outliers, quantile dot plots also offer a better visualization than kernel density plots and histograms.

Advantages and disadvantages of the three visualizations

In Table 1 we have collected what we see as the main advantages and disadvantages of histograms, KDE density plots, and quantile dot plots for visualizing observations and assessing the quality of model predictions. Both the histogram and the KDE density plot have commonly recognized drawbacks, summarized by a trade-off between over and under smoothing.

The choice of bin width and breakpoints causes histograms to suffer from binning artefacts (Dimitriadis, Gneiting, and Jordan 2021). Additionally, comparing the histogram of observations to multiple histograms visualizing predictive samples may be difficult.

Density plots from KDEs, as shown in Section 2.4, Section 2.5, and Section 2.6, have a tendency to either hide details such as discontinuities in the observation distribution if the chosen bandwidth is too large, or over-fit to the sample when the bandwidth is too small. When over-fitting, comparing multiple density plots may be difficult, as there is more variation on the estimate of the underlying distribution. These drawbacks are typically especially common in the standard implementations of KDE density plots. More specialized software often implement measures, such as automated boundary detection and less conservative bandwidth selection algorithms, to mitigate these issues (Kumar et al. 2019; Kay 2024).

Based on our experience, we prefer quantile dot plots in many cases, as they offer a low overhead solution to plotting the data without defining a suitable binning and are flexible enough to represent distributions with long tails.

Assessing the goodness-of-fit of a density visualization

As KDE plots, histograms, and quantile dot plots all produce a visualization of a density, we can assess the representativeness of the visualization by assessing the goodness-of-fit of the implied density estimate to the data being visualized. To assess the goodness-of-fit of a density estimate \hat f, we test for the uniformity of the probability integral transformed (PIT) data \lbrace x_1,\dots, x_N\rbrace when the density approximator is used for the transform. The PIT value, u_i, of x_i w.r.t. a density estimator \hat f is defined as the cumulative distribution value, u_i = \int_{-\infty}^{x_i}\hat f(x)\,dx. \tag{1}

The underlying principle of this goodness-of-fit test is that, when computed with regard to the true underlying density f s.t. x_i \sim f, the PIT values would satisfy u_n \sim \mathbb U(0,1), where \mathbb U(0,1) is the standard uniform distribution.

Our uniformity test of choice, due to its graphical representation that further enhances the explainability of the results, is the graphical uniformity test proposed by Säilynoja, Bürkner, and Vehtari (2022). This test provides simultaneous 1 - \alpha level confidence bands for the uniformity of the empirical cumulative distribution function (ECDF) of the PIT values. In the graphical test, the PIT-ECDF and the simultaneous confidence intervals are evaluated at a set of equidistant points along the unit interval. An example of these simultaneous confidence intervals for the ECDF of the PIT values, F_{\text{PIT}} is provided in Figure 3 (b). Alternatively, Figure 3 (c) shows the ECDF difference version of the same information, that is, the vertical axis of the plot is scaled to show the difference of the observed ECDF to the theoretical expectation, F_{\text{PIT}}(x) - x. Especially for large samples, this transformation allows for an easier assessment of the values near the ends of the unit interval.

As goodness-of-fit testing does not greatly increase the computational cost of constructing the visualizations, we propose implementing automated testing and recommendations for users to consider alternative visualization techniques when the fit of the chosen visualization isn’t satisfactory.

Next, we introduce how PIT can be implemented for the three visualizations.

PIT for KDE density plots

In theory, a KDE defines a proper probability distribution, and the PIT could be directly computed as in Equation 1. However, in practice the KDE density visualizations are often showing a truncated version of the KDE, limited to an interval centered around the domain of the observed data. For PIT computation, we limit the integration to the displayed range and normalize the results to make the truncated KDE integrate to one.

Most software implementations evaluate the KDE on a dense grid spanning the domain of the observed data, thus allowing numerical PIT computation by extracting the density values on these evaluation points and carrying out the aforementioned normalization to obtain PIT values.

PIT for histograms

For a histogram with equal bins of width h, we define the PIT as

\begin{align} \text{PIT}(x) = h\sum_{j=1}^{J} f_j + (x - l_{J+1})f_{J+1}, \end{align}

where J = \max_j \lbrace r_j \leq x\rbrace, and l_j and r_j are the left and right ends of the jth bin.

Again, the relative densities and boundaries of these bins are usually readily available in the software used for constructing these visualizations, and thus implementing the transform for histograms is a relatively simple task.

PIT for quantile dot plots

As the quantile dot plots are discrete, the PIT values computed according to Equation 1 are not guaranteed to be uniformly distributed. As demonstrated by Früiiwirth-Schnatter (1996), for a discrete random variable X, one can employ a randomized interpolation method to obtain uniformly distributed values,

\begin{align} u(x) = \alpha P(X \leq x) + (1-\alpha) P(X \leq (x - 1)), \end{align} where \alpha \sim \mathbb U(0,1).

For quantile dot plots, where n_q is the number of quantiles, c_k the horizontal position of the center of the kth dot and r the radius of the dots, we define a randomized PIT as

\begin{align} \text{PIT}(x) \sim \mathbb U\left(\frac{l(x)}{n_q}, \frac{u(x)}{n_q}\right) , \end{align}§

where n_q is the amount of quantiles, that is, dots, in the plot, and l(x) and u(x) are the quantile indices satisfying

\begin{align} u(x) = \min\lbrace k \in \{1,\dots,n_q\} \mid x < c_k - r\rbrace, \end{align}

and

\begin{align} l(x) = \begin{cases} 0,&\quad\text{if }u(x) = 1,\\ \min\lbrace k \in \{1,\dots,n_q\} \mid |c_k - x| \leq r\rbrace,&\quad\text{if }\exists \, k \text{ s.t. } |c_k - x| \leq r,\\ \max\lbrace k \in \{1,\dots,n_q\} \mid x \geq c_k - r\rbrace,&\quad\text{otherwise}. \end{cases} \end{align}

That is, we consider the left and right edges of the dots, c_k - r, and c_k + r respectively. Now the PIT value of x is limited from above by the smallest quantile dot fully to the right of x. The lower limit of the PIT value is determined in three cases. First, for those x that are to the left of all of the quantile dots, the lower limit of the PIT value is zero. Second, if x lies between the left and right edges of one or more quantile dots, the lower limit is determined by the smallest of these quantiles. Finally, when x is not between the edges of any single quantile dot, the largest dot fully to the left of x determines the lower limit of the PIT value.

Again, as the centre points and radius of the dots are required for constructing the visualizations in the first place, implementing the PIT computations is relatively straightforward.

Continuous valued observations

When the observation density is smooth and unbounded, and the practitioner uses visualizations aimed at continuous distributions, misrepresenting the data is less common. The main benefit of goodness-of-fit testing for visualizations arises, when the assumptions in the visualization do not meet the properties of the underlying observation distribution. Sometimes data, that may seem continuous at first glance, proves problematic for KDEs to visualize.

Below, we will first look at the ideal case of a continuously valued observation with a smooth and unbounded underlying distribution, and then step through three examples of continuous valued observations where the observation distribution is not smooth and visualizing the observation requires extra attention. The true underlying densities are visualized in Figure 2. In the three later cases, using the default KDE plot or histogram can hide important details of the observation, and looking at the issues detected through goodness-of-fit testing could affect future modeling choices.

(a) Density with steps
(b) Strictly bounded density
(c) Density with a point mass
Figure 2: True densities for the examples used in Section 2.4, Section 2.5, and Section 2.6. Each of these examples poses a challenge for KDE density plots and histograms, and avoiding misrepresenting the data requires special attention.

Smooth and unbounded density

When the underlying distribution is smooth and unbounded, large issues in goodness-of-fit of the discussed three visualizations are rare. Figure 3 (a) includes the KDE plot for an observation of 1000 standard normally distributed values. Figures Figure 3 (b) and Figure 3 (c) show the corresponding goodness-of-fit test with the PIT ECDF values well within the 95% simultaneous confidence intervals for uniformity. Figure 4 shows the histogram and the corresponding goodness-of-fit assessment of the same data. This time, we show only the ECDF difference version of the graphical test. Again, no issues are detected. Figure 5 in turn shows the quantile dot plot paired with the corresponding goodness-of-fit evaluation. Again, no issues are detected. Although one hundred evaluation points results in a quite smooth visualization, the discreteness of the PIT ECDF of the quantile dot plot is visible.

(a) KDE
(b) PIT ECDF
(c) PIT ECDF difference
Figure 3: Visualizing a sample from a smooth and unbounded density with a kernel density estimate. (a) the KDE of the sample in blue and the true density in black. (b) The corresponding PIT ECDF plot with 95% simultaneous confidence bands for uniformity. No goodness-of-fit issues are indicated by the plot as the PIT ECDF stays within the confidence bands. (c) The corresponding PIT ECDF difference plot, showing the deviation from the expected CDF when testing against the true distribution. The plot allows for a more dynamic use of the plotting area, making it easier to inspect the deviations from the expectation of the CDF.

Figure 4: A histogram of the sample, with the smooth and unbounded true density overlaid in black. We use the rule by Freedman and Diaconis (1981) to determine the number of bins. The PIT ECDF difference plot indicates a good fit between the visualization and the sample.

Figure 5: The quantile dot plot and the true density overlaid in black. As with all quantile dot plots in this article, we show 100 quantiles. The PIT ECDF difference plot indicates a good fit between the visualization and the sample.

Density with steps

When the observation density has steps, that is, points with unequal one-sided limits, the continuous representation offered by KDEs—that assume smooth density—may struggle to accurately represent the discontinuity. In turn, the visualizations provided by histograms and quantile dot plots are more flexible, but if the location of the step is not known, an unfortunate histogram binning may introduce large deviations to the true density values. If the location of the step was known, a good representation could also be obtained with KDE plots or histograms by using separate bounded KDEs for the two sides of the step (see Section 2.5), or by tailoring the histogram by aligning the discontinuity to the boundary of two adjacent bins.

To illustrate the challenges of visualizing observations from stepped densities, we use samples from the following true density f, shown in Figure 2 (a):

\begin{align} f(x) = \begin{cases} \frac{2}{5}\Phi(-\frac{1}{2})^{-1}\mathcal{N}(x\mid 0,1), & x \leq -\frac{1}{2}\\ \frac{1}{5}, & -\frac{1}{2} < x \leq \frac{1}{2}\\ \frac{2}{5}\Phi(-\frac{1}{4})^{-1}\mathcal{N}(x\mid 0,\frac{1}{4}), & x > \frac{1}{2}, \end{cases} \end{align}

this kind of bimodal skewed distribution with low density between the modes is present for example in the z-scores of published p-values (Van Zwet and Cator 2021).

Figure 6 shows the resulting density plot and the corresponding calibration for two common kernel bandwidth selection strategies; Silverman’s rule of thumb (Silverman 2018), and the Sheather-Jones (SJ) method (Sheather and Jones 1991). Of these, SJ is expected to give a more robust bandwidth selection to data from non-Gaussian distributions (Sheather and Jones 1991). Despite this, Silverman’s rule of thumb is the default strategy in many KDE density visualization implementations using Gaussian kernels. As seen in the figure, both strategies have difficulties representing the discontinuity in the observation density, and PIT ECDF for the KDE plot using Silverman’s rule of thumb crosses the 95% simultaneous confidence interval and is recognized as having significant goodness-of-fit issues.

Figure 7 shows the visualization and goodness-of-fit assessment for the same data, when using a histogram. Although the discontinuity is strongly hinted in the histogram, it is not located close to a bin boundary and causes significant goodness-of-fit issues as too much density is placed on the last values of the low density region preceding the discontinuity.

Lastly, Figure 8 shows the same process for a quantile dot plot. Again, the discontinuities are visible in the plot. Here, as the visualization follows the ECDF quantiles, the discontinuities don’t cause issues to the goodness-of-fit.

Figure 6: Two kernel density plots for a continuous valued sample from a density with steps. In red, a KDE with the bandwidth selected with Silverman’s rule of thumb (Silverman 2018), and in blue a KDE using the Sheather-Jones bandwidth selection method (Sheather and Jones 1991), which results to a smaller bandwidth and a better fit to the sample.

Figure 7: Visualizing the continuous valued sample from a density with steps with a histogram. Here, our bin width selection algorithm has resulted in an adequate fit to the data, although the local deviation from the expected CDF is clearly visible in the PIT ECDF plot.

Figure 8: Visualizing the continuous valued sample from a density with steps with a quantile dot plot. The larger step is clearly visible, and PIT ECDF plot stays within the simultaneous confidence intervals. The 100 quantiles are able to adapt to the sample better than the histogram above with a relatively low number of bins obtained from the used binning algorithm.

Density with strict bounds

Bounded density functions are very commonplace, and a special case of the stepped densities discussed above. If the data is known to be bounded, the knowledge allows for specialized visualizations; for example, KDE plots with boundary corrections methods, such as boundary reflection, or limiting the histogram bins to the domain of the density.

A problem arises, when unknowingly visualizing bounded data. Below, we inspect our three visualizations of interest applied to data from a truncated exponential distribution with rate parameter \lambda = 1 (shown in Figure 2 (b)). The truncation is to the central 80\% interval of the untruncated distribution.

Figure 9 shows a comparison between two KDE plots; one without any information on the boundedness of the data, the other made using boundary reflection based on bounds estimated through an automated boundary detection algorithm implemented in the ggdist R package (Kay 2024). The goodness-of-fit test clearly indicates that the density is misrepresented near the boundaries by the unbounded KDE plot. In general, a \cap-shaped PIT ECDF plot indicates bias towards too small PIT values, and the strong upward trend at small PIT values indicates that the estimated density at the left tail is lower than expected if the data was sampled from the estimated density.

Figure 10 shows the data visualized with a histogram without limiting the bins to lie inside the bounds. Again, bins overlap the discontinuities, causing the goodness-of-fit test to indicate possible data misrepresentation.

Figure 11 shows the data visualized with quantile dot plot using 100 quantiles. As the quantile dot plot by design isn’t placing any dots outside the data range, the goodness-of-fit is satisfactory.

Figure 9: Two kernel density plots for a continuous valued sample from a bounded density. In red, a KDE using a boundary correction and automated boundary detection as implemented in the ggdist package for R (Kay 2024). The KDE in blue assumes unbounded data and uses no boundary corrections. Again, the misrepresentation of the sample close to the distribution boundaries is detected with the graphical goodness-of-fit test.

Figure 10: Visualizing the continuous valued sample from a bounded density with a histogram. The PIT ECDF line crosses the simultaneous confidence bands at the extreme PIT values, indicating issues in representing the boundaries of the data distribution. The drop in the PIT ECDF close to zero indicates underestimation of the left bound of the distribution, and respectively, the smaller upwards peak close to one indicates over estimation of the right bound.

Figure 11: Visualizing the continuous valued sample from a bounded density with a quantile dot plot. The quantile dots are correctly placed within the bounds of the data distribution, making the quantile dot plot a good alternative for visualizing bounded data.

Density with point masses

The final example of continuous valued observations with discontinuities in the density is met when one or more point masses are present. These kinds of observation densities are often met in biomedical and economical studies, where zero-inflated, but otherwise continuous data is common (Liu et al. 2019). Another cause of point masses can be problematic treatment of missing values, where any missing fields in multidimensional observations are replaced with some predetermined value, be it zero or some summary statistic.

Below, we inspect an example following the density depicted in Figure 2 (c), where an observed value from the standard normal distribution is replaced with 1 with probability 0.2.

Figure 12 shows the KDE plot and the corresponding goodness-of-fit assessment, when the SJ method is used for bandwidth selection. Although the point mass does show as an additional bump in the density, the goodness-of-fit test notices that the KDE isn’t flexible enough and alerts us of the underlying point mass with a sharp jump in the PIT ECDF.

Figure 13 shows the same data visualized using a histogram. Now the discontinuity is arguably more visible in the visualization, but, as all the bins are of equal width, the PIT ECDF shows a sharp discontinuity.

Figure 14 shows the data visualized with quantile dot plot using 100 quantiles. Again, the point mass is visible in the visualization, and the PIT ECDF indicates that the visualization is representative of the sample.

Figure 12: KDE for a continuous valued sample from a density with a point mass. The selected bandwidth is too large to adequately represent the point mass. This is detected by the goodness-of-fit test.

Figure 13: Histogram for the continuous valued sample from a density with a point mass. The selected bin width is too wide and the misrepresentation of the data is detected by the goodness-of-fit test.

Figure 14: Quantile dot plot of the continuous valued sample from a density with a point mass. The quantiles dots are able to represent the point mass and the visualization passes the goodness-of-fit test.

Detecting if observation is discrete or mixture of discrete and continuous

In addition to the aforementioned goodness-of-fit test, a fast and simple to implement method is to simply count unique values in the observed data. Relatively high counts of repeated values can inform the practitioner if the observations are discrete or contain point masses, for example, zero-inflation.

If at least one non-unique value, we move to check for relative frequencies of repeated values. If the relative frequency of any value is more than 2% of the full sample, we consider the possibility that the data might contain discrete values.

Visual predictive checking with overlaid plots

When comparing data and model predictions, one approach is to show visualization of data and visualization of the draws from the predictive distribution overlaid in one plot. The most common approach is to plot overlaid KDE plots, as it is usually easy to distinguish several overlaid KDE lines from each other. The bandwidth selection plays an important role in the comparison. On one hand, if the chosen bandwidth is narrow, the KDEs can show large variation even when repeatedly drawing from the same distribution. On the other hand, with a wide bandwidth, the KDE may be too smooth and hide important details. With histograms, and quantile dot plots, overlaid plots are rarely used, as the bin boundaries and dot locations depend on the model prediction and the resulting overlaid figure becomes difficult to read. A common solution for histograms, shown later in Section 3, is to use a shared set of bins for the predictions and data, and to increase readability by only show the predictions through per bin summary statistics, such as mean and quantile based intervals. A drawback with this approach is, that the per bin summaries don’t show dependency between the bins, making it harder to assess the global shape of the density. To compare multiple quantile dot plots to a reference plot, we propose overlaying the reference plot with just the top dots of each stack in the quantile dot plots of the predictive samples. Figure 15 (c) shows the resulting plot, which illustrates the variation in the overall form of the predictive distribution. Again, when summarizing the quantile dot plots of the predictive draws, we lose information on the dependency between heights of the stacks and the overall shape of the predictive density.

Overlaid KDE plots are usually easy to interpret—assuming the smoothness assumptions match the underlying data and draws from the predictive distribution—although they lack a quantification for when the differences are significant. For better quantification of the difference and more robust behavior in the case of non-smooth underlying distributions, we recommend the use of the graphical PIT-ECDF test described in Section 2.1. In posterior predictive checks, we assess the fit between the observations and the predictive distribution of the model. The predictive distribution is not expected to give an exact match, for example the posterior predictive distribution of a Bayesian model usually has thicker tails than the true underlying observation distribution. To avoid using the same observations for both fitting the model and assessing the predictive performance, we use leave-one-out cross-validation (LOO-CV). Especially flexible models and small observations require LOO-CV for more reliable PIT values. If one were to not use cross-validation, the PIT values would tend to be too close to 0.5 and the PIT ECDF plot would be S-shaped, suggesting many observations are too close to their respective predictive means. For fast LOO-CV computation, we recommend using Pareto smoothed importance sampling (PSIS) (Vehtari et al. 2024), estimates the leave-one-out predictive distributions without the need for refitting the model for each left out observation. In our experience, the PIT ECDF plots of both the posterior predictive and LOO-predictive draws are useful for assessing that the predictive distribution of the model is well calibrated and the visualizations give insight to the nature of the possible issues in model fit.

(a) KDE Overlay
(b) Histogram Overlay
(c) Quantile dot plot overlay
Figure 15: Overlaid predictive checks. KDE plots can easily be overlaid and the resulting visualization is straightforward to read. The histogram for the data is overlaid with means and central 90% quantiles of the histogram bins of the posterior draws, when using the binning from the histogram of the observation. For the quantile dot plot, the top dot of each stack of dots in each posterior draw is overlaid to the full quantile dot plot of the observation, visualizing the variation in the shape of the predictive distributions.

3 Visual predictive checks for count data

In this section, we focus on count data, which although being discrete, can benefit from both continuous and discrete visualizations. Continuous visualizations are commonly used for count data and they can sometimes be a good choice, but there are cases where they can be misleading and visual predictive checks designed specifically for count data should be used.

In predictive checking for count data, we focus on two visualizations; the overlaid KDE plots introduced in Section 2.7, and our version of the rootogram (Kleiber and Zeileis 2016). In our experience, the higher the number of distinct counts being visualized, and the smaller the variance of probabilities of individual counts is, the more effective KDEs are for visualizing count data, although one should pay close attention to the effect of the boundedness on the KDE fit as count data is usually limited to non-negative values and sometimes also has maximum count limit. Rootograms offer a visualization which emphasizes the discrete nature of the predictive and observation distributions. In our experience, rootograms are good for predictive checking of count data, when the number of distinct counts is low or there is a sharp changes in the probabilities of consecutive counts, for example, when the data exhibits zero-inflation (the probability of 0 is much higher than the probability of 1).

In Section 3.1 we show how, similarly to the cases in Section 2, goodness-of-fit tests can be used to assess how well the KDE plot represents the data. In Section 3.2 we introduce our version of the rootogram, and demonstrate its use when the familiar KDE exhibits goodness-of-fit issues. The visual predictive checks in both Section 3.1 and Section 3.2 follow an example from a modeling workflow of estimating the effect of integrated pest management on reducing cockroach levels in urban apartments (Gelman, Hill, and Vehtari 2020). We focus on assessing the quality of the predictions of a generalized linear model with a negative binomial data model for the number of roaches caught in traps during the original experiment.

Visualizing count data with KDE plots

A common approach for count data, especially when the number of unique discrete values is high, is to assume that the values can be visualized as if they were from a continuous distribution. This enables the use of overlaid KDE plots for visual predictive checks. Figure 16 and Figure 17 illustrate how—after changing the default bandwidth algorithm—a KDE plot can have a satisfactory goodness-of-fit to discrete data, and how the PIT ECDF diagnostic can be used to assess the goodness-of-fit of the continuous visual representation of the discrete data.

The data in the roach dataset exhibits zero-inflation, and the default algorithm of the selected visualization software fits an unbounded KDE with a relatively large bandwidth, resulting in a bad fit shown in Figure 16. After using a left bounded KDE and the SJ bandwidth selection method, we see in Figure 17 that away from zero, the observation distribution is well summarized by the KDE plot.