Page 1 of 1

Contingency table not including all values

Posted: Thu Jan 05, 2023 9:01 am
by Ember
Hi,
I'm doing a chi-squared/Fisher's exact test with a data set of 26 samples in group A and 21 in group B. Looking at the contingency table however it only includes 21 values, i.e. it doesn't include the 5 last values from group A. Is that how the contingency table/program works or is there something wrong going on? Shouldn't the analysis include all 47 values? I mean, the results differs quite a lot depending on what values are put in row 22-26 (thus not being included in the analysis).

I'm quite new to all statistics and been stuck trying to figure this out for days now, so I truly appreciate some advice here!

Thanks and all the best,
Emma

Re: Contingency table not including all values

Posted: Thu Jan 05, 2023 9:07 am
by jonathon
attach a .omv file here (you may need to zip it up first) and we'll take a look.

Re: Contingency table not including all values

Posted: Thu Jan 05, 2023 9:32 am
by Ember
Attaching a copy of a part of the main file, but the main looks the same...

Thank you in advance!

Re: Contingency table not including all values

Posted: Thu Jan 05, 2023 11:26 am
by jonathon
so contingency tables work on pairs of observations -- in the case of your data, you've got 21 complete pairs, and so that's what shows up in your contingency table.

the contingency table summaries all the different combinations:

yes, yes
yes, no
no, yes
no, no

but if some of your pairs are missing values, they can't be summarised like this. if you do want to include the missing values, then assign them a value.

cheers

Re: Contingency table not including all values

Posted: Thu Jan 05, 2023 12:54 pm
by MAgojam
Hey Emma,
I had a look at your attached file.

I can tell you that the contingency tables do their job correctly, but I suggest your own control over the data.
Note that your file has three blank lines (27:29) for the four variables, which should be deleted.
Is the presence of NA to indicate missing data or is it one more level than Yes and No?
If NA is for missing data you should indicate it (with "Data>Setup>Missing values" on the variable), otherwise it will be interpreted as another level of the variable.
The Chi-squared test of association predicts that the total of the row or column marginals is the observed true between the combinations of the various levels of the two row and column variables.
If you bring NA as the missing value for both Alcohol A and B, it will be 14 in the contingency table for the two variables.
Are you sure using a Chi-squared association is what you need?

Cheers,
Maurizio