Page 1 of 1

ChiSquaredTools module preview: Post-Hoc Analysis and Row/Column Clustering

Posted: Tue Nov 25, 2025 10:54 am
by GmA
Hi everyone,

As you might know from previous post (viewtopic.php?t=4093), I am developing ChiSquaredTools, a jamovi module that provides integrated facilities for contingency table analysis. This post previews two of the analytical tools currently available: Chi-Squared Post-Hoc Analysis and Contingency Table Row/Column Clustering. The attached screenshots, dataset, and module file allow you to reproduce and explore these features.

The dataset used in the screenshots is from Greenacre (2017), "Correspondence Analysis in Practice", 3rd edition.

ANALYSIS 1: CHI-SQUARED POST-HOC ANALYSIS

Purpose
A statistically significant chi-squared test tells you that rows and columns are not independent, but it does not tell you where the association lies. Post-hoc analysis addresses this by examining each cell individually, identifying which specific row-column combinations drive the overall result.

The facility offers eight different residual measures, each with distinct properties. One of these is the Percentage of Maximum Deviation (PEM), also known as Sakoda's D Local.

What is PEM?

PEM expresses each cell's deviation from independence as a percentage of the maximum possible deviation in that cell. This idea, introduced by Cibois (1993) building on earlier work by Sakoda (1981), answers the question: "Given this cell's structural constraints (its row and column totals), how close is the observed count to the most extreme value it could possibly take?"

A positive PEM indicates attraction between the row and column categories (the observed count exceeds what independence would predict). A negative PEM indicates repulsion (the observed count falls short of the independence expectation). The percentage scale (ranging from -100 to +100) makes comparison across cells straightforward, regardless of differences in marginal totals.

Statistical inference for PEM

PEM values on their own describe the data but do not indicate whether deviations are statistically meaningful. The module addresses this by computing bootstrap confidence intervals. At each replication, the procedure resamples the table under the multinomial model, recalculates PEM for every cell, and accumulates the distribution of each cell's PEM. The resulting percentile intervals indicate whether each cell's PEM is statistically distinguishable from zero.

Forest plot visualisation

The PEM forest plot (see Screenshot 1) displays point estimates with their bootstrap confidence intervals. The plot can be filtered to show only cells whose confidence intervals exclude zero, making it easy to identify which row-column combinations exhibit statistically meaningful attraction or repulsion. Positive associations appear in gold; negative associations appear in maroon.

Key references for PEM

Sakoda, J. M. (1981). A generalised index of dissimilarity. Demography, 18, 245-250.
Cibois, P. (1993). Le PEM, pourcentage de l'ecart maximum: un indice de liaison entre modalites d'un tableau de contingence. Bulletin de Methodologie Sociologique, 40, 43-63.
Lefevre, C., & Champely, S. (2009). Analyse d'un tableau de contingence: le pourcentage de l'ecart maximal a l'independance (PEM). Revue de Statistique Appliquee, 57, 5-26.

ANALYSIS 2: CONTINGENCY TABLE ROW/COLUMN CLUSTERING

Purpose
Sometimes a contingency table contains categories that are statistically indistinguishable in their relationship with the other variable. For instance, two age groups may have essentially identical purchasing patterns across stores. If so, merging these categories simplifies the table without losing meaningful information. The clustering facility identifies such groupings automatically.

How the clustering works

The procedure uses hierarchical clustering with the chi-squared statistic as the distance measure and Ward's method as the linkage criterion. The algorithm proceeds as follows:

-Step 0: Compute the chi-squared statistic for the full table. This is the baseline.

-Iterative merging: At each step, the algorithm evaluates every possible pair of rows (or columns) that could be merged. For each candidate pair, it temporarily combines the two categories by summing their frequencies, recalculates the chi-squared statistic for the reduced table, and computes how much chi-squared decreased. The pair producing the smallest decrease is merged permanently.

-Stopping criterion: Merging continues until the remaining chi-squared falls below a critical value that controls the family-wise error rate at 0.05. This critical value, drawn from tables published by Pearson and Hartley (1972) and reproduced in Greenacre (2017, Exhibit A.1, p.254), accounts for the multiple implicit comparisons made during the clustering process.

Interpreting the output

The Row Merging Sequence table (see Screenshot 2) shows each step: which items were merged, the resulting chi-squared, the absolute reduction, and the percentage reduction. The "Last significant step" indicates where the clustering should stop if you want to preserve statistically meaningful distinctions.

The Significant Row Groups table (see Screenshot 3) lists the final clusters. Categories within the same group can be treated as statistically equivalent in their relationship with the column variable. In the example, stores C and D form one group, whilst stores A, B, and E form another.

Interpreting the dendrogram

The dendrogram (Screenshot 3) visualises the hierarchical structure. The horizontal axis shows the chi-squared statistic in reverse: the bottom of the tree corresponds to high chi-squared (near the initial value), whilst the top corresponds to low chi-squared (as categories are merged). The red dashed line marks the significance threshold. Clusters forming below this line are statistically justified; those forming above it are not.

Key references for clustering
Greenacre, M. (1988). Clustering the rows and columns of a contingency table. Journal of Classification, 5, 39-51.
Greenacre, M. (2017). Correspondence Analysis in Practice, 3rd edition. Chapman and Hall/CRC.
Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236-244.
Pearson, E. S., & Hartley, H. O. (1972). Biometrika Tables for Statisticians, Volume 2. Cambridge University Press.

OTHER FACILITIES
The module includes additional analyses not shown here, including chi-squared testing with multiple methods (traditional, adjusted, permutation, and Monte Carlo), association measures with over 20 coefficients, and additional post-hoc measures (standardised, adjusted standardised, moment-corrected residuals, Quetelet index, IJ association factor, and median polish residuals).

FEEDBACK WELCOME
The module is approaching submission to the jamovi library. Should you have any suggestion, comment, etc, feel free to reach out.

It seems that the module and dataset are too large for attaching them here.
This is the URL to my GoogleDrive:
https://drive.google.com/file/d/18U2SlN ... sp=sharing


Thank you for your time.