Just estimating a population parameter for a multilevel sample

reason180 · Post by **reason180** » Thu Dec 15, 2022 7:40 pm

It seems that we have all kinds of sophisticated models (GLM, Mixed Effects, etc.) for analyzing "effects," but if we want to simply estimate a population parameter (like a mean) we're mostly stuck with doing a one sample t test or binomial test. But what if the sample is "multilevel" as in the following example:

Each of 100 research participants is asked to guess which of two cities has the larger population. Each participant's response can be either correct or incorrect. The two cities are either (i) San Francisco and Dallas, or (ii) Miami and Philadelphia, or (iii) Las Vegas and Atlanta, or (iv) New Orleans and Minneapolis, or (v) Baltimore and Denver. Participants have been randomly assigned to make their judgment about City Pair i, ii, iii, iv, or v. Moreover, the researcher selected those five city pairs at random from the total set of American cities.

Traditionally, if one wants to know whether accuracy is significantly different from 0.5, one can conduct a binomial test, which assumes that participants have been randomly selected from a population of potential participants. But this ignores that the city pair selections are also random. Is there a way to estimate the population proportion in a way that treats both participants and city-pairs as having been randomly sampled? I can think of ways to hack a standard mixed-effects analysis to get an answer that's approximately correct. But given the problem's conceptual simplicity, is there a more standard way of approaching it?

mcfanda@gmail.com · Post by **mcfanda@gmail.com** » Fri Dec 16, 2022 5:55 pm

Hi
the binomial test is almost identical to a logistic model in which the outcome of each participant is the dependent variable and only the intercept is present (no predictors). The intercept a of the model is the log(odd) of the probability of success, thus exp(a)/(1+exp(a) gives you the probability of success. The inferential test associated with the intercept tests the null-hypothesis that the success probability is .50. The only difference between the binomial test and the logistic is that the logistic uses the z-test (or chi-squared) to obtain the p-value, but the results are nearly undistinguishable.

Given this equivalence, one can obtain a "multilevel binomial test" by estimating a logistic mixed model in which the dependent variable is the participant outcome, the intercept is the only fixed effect and the cities pairs are the clustering variables (the level 1 groups), with the random intercepts across cities pairs. Also here the intercept a of the model is the log(odd) of the probability of success, thus exp(a)/(1+exp(a) gives you the probability of success, the inferential test tests the null hypothesis p=.50, but the possible variations due to the clustering are taken into the account.

The only drawback of this approach is that the model may results not very powerful given the small number of clusters (i.e. city pairs).

reason180 · Post by **reason180** » Fri Dec 16, 2022 6:16 pm

@mcfanda@gmail.com Thank you. Yes, this is exactly what I was referring to as "hacking" a linear mixed effects model! I called it hacking because I was not sure it would be legitimate to have an intercept-only model (but apparently it is legitimate). The thing about the small number of city pairs is important. Probably good strategy to take in this situation would be to say something like, "Whereas the potential variability related to city pairs could be considered a "random effect," I elected not to model it as such, given the smallness (N = 5) of the sample of city pairs. Instead, I conducted a logistic regression in which city-pair (indicator-coded) was the predictor and accuracy was the criterion. The result of interest was the intercept parameter estimate."

mcfanda@gmail.com · Post by **mcfanda@gmail.com** » Wed Dec 21, 2022 2:52 pm

Yes, what you wrote is reasonable. You can even add city-pair as a (fixed) factor in the logistic and show whether there are differences due to the specific pair.

reason180 · Post by **reason180** » Sun Mar 05, 2023 5:27 pm

A related question:

If I have a two-level criterion variable, I can easily use binomial or ci square goodness-of-fit to test not only a significant deviation from 0.5, but a deviation from whatever I specify the null hypothesis to be (0.33, for example).

But what if I want to construct an intercept-only generalized linear mixed-effects model in which the two levels of the criterion variable are not equally likely under the null hypothesis. (Thus, the logistic intercept equals something other than zero under the null hypothesis.) Is there a standard way to do that?

Thanks in advance.

jamovi