Beginner stats question

Discuss statistics related things

by BobEm » Tue Mar 05, 2019 4:47 pm

I'm a beginner at statistics and jamovi.

Question: I have survey data of a group of 800 employees. I'm trying to understand a few things, but for starters:

Independent Variable: Shift. There are 5 shifts: Day, Afternoon, Night, 12-hr Day, 12-hr Night

Dependent Variable: Self-reported mental health. 5 choices to "How much has mental health disrupted your work in the last 4 wks?" Choices are: Extremely disruptive; Very disruptive; Somewhat disruptive; Not so disruptive; Not at all disruptive.

I'm trying to learn whether there's a significant relationship between shift and answers to the mental health question, but can't figure out how. I tried categorizing the mental health question as ordinal (rather than nominal) and used the StatKat module, which suggested the Kruskal Wallis test. But that test requires the Dependent Variable to be continuous.

I'd appreciate any guidance. FYI: This not for research and does not require a research-level of rigor. It for an applied situation in which we're just trying to get insight into what *might* be going on. Anyone I present this to will certainly know even less about stats than I do.

As background: I'm trying to do some self-learning and completed a beginner stats MOOC via Notre Dame University. I also started working through the "Learning Statistics with Jamovi" ebook. (I don't want you to think that I just came straight to the Forum for an easy answer without trying to work it out on my own. I've been trying.)

Thank you.
BobEm
 
Posts: 2
Joined: Tue Mar 05, 2019 4:29 pm

by MAgojam » Tue Mar 05, 2019 10:17 pm

Hi BobEm,
you can answer your question using Chi-Square test of independence.

In jamovi you can find it in Frequencies->[Contingengy Tables]->Independent samples test of association.

The Chi-Square independence test is used to determine if there is a significant relationship between two categorical variables.
The frequency of each category for a variable is compared between the categories of the second variable. The data is displayed in a contingency table where each row represents a category for a variable and each column represents a category for the other variable.

For example, you want to examine the relationship between "Self-reported mental health" (Extremely disruptive; Very disruptive; Somewhat disruptive; Not so disruptive; Not at all disruptive) and "Shift" (Day; Afternoon; Night; 12-hr Day; 12-hr Night).
The Chi-square independence test can be used to examine this relationship. The null hypothesis for this test is that there is no relationship between "Self-reported mental healt" and "Shift".
The alternative hypothesis is that there is a relationship between "Self-reported mental healt" and "Shift" (e.g. there are more "Extremely disruptive" with "12-hr Day" than "12-hr Night" with "Extremely disruptive").
If the Chi square test is significant, in Statistics you can select Phi and Cramer's V to learn about the strength of this relationship.

If the Chi square test is significant, you may want to know which combinations have determined its significance.
In jamovi at the moment there is no possibility of a post hoc test, but if you select:
Cells->Counts->Expected, in the contingency table you will see the frequencies observed and the frequencies expected, to be used to calculate the standardized residuals of each combination and deepen the answer to your question.

I hope these few things will guide you to deepening.

Cheers
Maurizio
User avatar
MAgojam
 
Posts: 46
Joined: Thu Jun 08, 2017 2:33 pm
Location: Parma (Italy)

by jonathon » Tue Mar 05, 2019 10:32 pm

hey,

(oh, i wrote this response and i see that maurizio has responded before me. i'll add some stuff to the bottom with my take on his approach)

so normally what happens is data sets come through with the values 1,2,3,4,5 and then a user adds "Extremely disruptive", "very disruptive", etc. as labels to each of these values. In your case, you'll have to do the reverse, which is a little more tricky. For this, you'll need to 'recode' the values. here's a blog post on it:

https://blog.jamovi.org/2018/10/23/tran ... ables.html

so you'll create a second variable which recodes the values "extremely disruptive" -> 1, "very disruptive" -> 2, etc.

then you can analyse this column with kruskal wallis ... but wait! kruskal wallis *should* be able to handle text ordinal variables ... hmm, this is something i should fix.

i'll fix this in the next version, and then you won't need to worry about this.

another thing, a good starting point here would be just examine the descriptives/plots for your data. run descriptives on mental health, and split by shift, and then maybe look at bar plots. if you have lots of data, then you don't even need to worry about significance testing (so, in social science research, we're trying to get by with the absolute minimum number of subjects because running experiments is a hassle, and expensive) but if you've got lots of data, this is less of a concern. you just can eye-ball the data. with lots of data, almost everything can be (statistically) significant, and you're more interested in the size of the effect.

(you can take maurizio's approach too. both contingency tables, and kruskal wallis will work here. the neat thing about kruskal wallis is that it takes the *order* of the "mental health" responses into account. i.e. it recognises that 'Extremely disruptive' is *more* than 'very disruptive', which is *more* than ..., etc. this *can* result in more power, but because kruskal-wallis is non-parametric it can be less *power*ful to begin with. but yes, both approaches will work. use the one you feel most comfortable with)

cheers

jonathon
User avatar
jonathon
 
Posts: 836
Joined: Fri Jan 27, 2017 10:04 am

by BobEm » Wed Mar 06, 2019 4:59 pm

Thank you, Maurizio and Jonathon. I've been working through both your suggestions and, well, I'm going to need to work through it some more. The Chi-square test of independence seemed straightforward enough. Though it reminded me that there are some survey responses I want to filter (125 people who answered "Not Applicable" to the "shift" question, which they means they are office workers who I think may be diluting my findings regarding shift and mental health of production workers). The filter should be easy enough based on the user manual, but I'm doing something wrong with the syntax.

Similarly, Jonathon, the Transform syntax is not going as well as I'd hope. I followed your suggestion and ran the descriptives and bar plots. Actually, it seemed similar to some data that Survey Monkey -- the platform I used -- generated. The problem is that eventually there are quite a few additional relationships I want to examine, and I'd already generated more than 20 bar plots in Survey Monkey.. and was just creating a lot of noise. So, ultimately, I'm hoping to be able to use jamovi to hone in on the the most likely relationships, instead of looking at a huge number of relationships and trying to figure out which are worth closer examination. It's not that much data, but it quickly became overwhelming.

I'm guessing this is a common beginner's revelation, but I feel like I've already learned a lot about how to design future surveys so that this process will be easier.

From the perspective of a lazy person, is there any reason why I wouldn't transform my "Extremely disruptive, Very disruptive, etc." responses to numeric values in my original Excel file -- which I could do very easily, and, for the time being, use Excel to filter out the "Not applicable" responses? (I do see why this can get cumbersome if I have to keep going back to Excel and importing into jamovi. But just as a shortcut to confirm I'm on the right track?).

I will continue to work on the jamovi filters and transform function, and will keep you posted.

Thanks!
BobEm
 
Posts: 2
Joined: Tue Mar 05, 2019 4:29 pm

by jonathon » Wed Mar 06, 2019 11:59 pm

sounds like you're learning some good things.

From the perspective of a lazy person, is there any reason why I wouldn't transform my "Extremely disruptive, Very disruptive, etc." responses to numeric values in my original Excel file -- which I could do very easily, and, for the time being, use Excel to filter out the "Not applicable" responses?


only because it's much easier to perform these steps in jamovi than in excel :)

jonathon
User avatar
jonathon
 
Posts: 836
Joined: Fri Jan 27, 2017 10:04 am

by MAgojam » Thu Mar 07, 2019 9:58 am

BobEm wrote:I'm guessing this is a common beginner's revelation, but I feel like I've already learned a lot about how to design future surveys so that this process will be easier.


Hi, BobEm.
What you say I like, because ... who starts well is already half of the job!

Cheers.
Maurizio
User avatar
MAgojam
 
Posts: 46
Joined: Thu Jun 08, 2017 2:33 pm
Location: Parma (Italy)


Return to Statistics