Unbalanced Data, Continuous vs Categorical Coding
Posted: Fri Jul 17, 2020 8:24 pm
I am running a generalized mixed model with GAMLj in Jamovi and have an issue of unbalanced data. First, the experiment looks like this:
Participants take a personality measure and then have 50 rounds of a repeated dictator game where they are receivers. They are offered three payouts 0 points, 25 points and 50 points (bad, medium and good). They can react to any of these by Punishing the dictator or by Switching to a new dictator.
My long format data looks something like this, each row represents a round So I have a Mixed Model Logistic Regression predicting Punish or Switch that looks like this:
Punish or Switch ~ Personality * Round Payout (1 + Personality | Subject)
If I code Round Payout as continuous I do not get significant interaction effects but if I code it as categorical I do. The results can be seen here where I get significant results on Medium payouts (in the direction predicted and there are strong theoretical reasons why this would show up specifically for medium payouts). We are concerned that what is preventing us from reaching significance when it is coded as continuous is because of how unbalanced the data are. We have 740 data points for bad payouts, 280 for medium payouts and only 24 for good payouts. This is understandable as one would expect most people are happy to continue to the next round if they get a Good payout and most unhappy when they get the bad payouts. This means that there are vastly different numbers of data points depending on the payout. With this in mind could anyone provide some recommendations on how to approach this and what is appropriate here.
Participants take a personality measure and then have 50 rounds of a repeated dictator game where they are receivers. They are offered three payouts 0 points, 25 points and 50 points (bad, medium and good). They can react to any of these by Punishing the dictator or by Switching to a new dictator.
My long format data looks something like this, each row represents a round So I have a Mixed Model Logistic Regression predicting Punish or Switch that looks like this:
Punish or Switch ~ Personality * Round Payout (1 + Personality | Subject)
If I code Round Payout as continuous I do not get significant interaction effects but if I code it as categorical I do. The results can be seen here where I get significant results on Medium payouts (in the direction predicted and there are strong theoretical reasons why this would show up specifically for medium payouts). We are concerned that what is preventing us from reaching significance when it is coded as continuous is because of how unbalanced the data are. We have 740 data points for bad payouts, 280 for medium payouts and only 24 for good payouts. This is understandable as one would expect most people are happy to continue to the next round if they get a Good payout and most unhappy when they get the bad payouts. This means that there are vastly different numbers of data points depending on the payout. With this in mind could anyone provide some recommendations on how to approach this and what is appropriate here.