Hello,
I’m trying to clean my data by removing outliers that are above and below 3SD of the mean. I have two between subject groups for my data (2 X 3 design), so the mean and SD is different for each group and condition. How can I use filters to remove the outliers?
Thank you!
Removing outliers using filters
Re: Removing outliers using filters
I can suggest using the new function "Univariate outliers identification and removal" in the jYS module.
3*SD is traditional Z-score method.
But you can use modified Z-score for median estimation for non normality assumption.
3*SD is traditional Z-score method.
But you can use modified Z-score for median estimation for non normality assumption.
Re: Removing outliers using filters
here's a video demonstrating the process step-by-step. you don't need to do it in in so many steps, but i've done it this way to make it clearer.
https://youtu.be/bvjaiDAd3HE
step 1, we need a column which a level for each group. because one of the variables was text, i could simply use the + operator.
step 2, we compute a Z score, using that group variable as the group_by
step 3, we use an if-statement to produce a value of 0 when the z-score is less than -2, or more than 2, and 1 otherwise (this is what the filters expect, 1 = good, 0 = filter out).
step 4 seems a bit silly, we have to copy/paste the values into a new column from step 3 ... the reason for this is because otherwise we'd end up with an infinite loop ... the filters would exclude some rows, that would update the Z calculation, which would in turn change the filters, and so on.
step 5 we point the filter at the copy/pasted column ... you see that when we activate this filter, that the z-calculations update because some rows have been excluded (that's why we had to copy/paste those values).
jonathon
https://youtu.be/bvjaiDAd3HE
step 1, we need a column which a level for each group. because one of the variables was text, i could simply use the + operator.
step 2, we compute a Z score, using that group variable as the group_by
step 3, we use an if-statement to produce a value of 0 when the z-score is less than -2, or more than 2, and 1 otherwise (this is what the filters expect, 1 = good, 0 = filter out).
step 4 seems a bit silly, we have to copy/paste the values into a new column from step 3 ... the reason for this is because otherwise we'd end up with an infinite loop ... the filters would exclude some rows, that would update the Z calculation, which would in turn change the filters, and so on.
step 5 we point the filter at the copy/pasted column ... you see that when we activate this filter, that the z-calculations update because some rows have been excluded (that's why we had to copy/paste those values).
jonathon