Hi everyone,
I'm trying to filter out the outliers (over or under 2 standard deviations or something similar with non-normally distributed data) from my dataset. I decided to use the IQR method because the data aren't normally distributed, but I'm having a few issues with the results and I'm stuck for now. I'd appreciate it if you could help me understand a few things better. This is what I did:
I created a new computed variable and used the formula MAXABSIQR([VARIABLE1], [VARIABLE2], …) < 3, which I found on https://jamovi.readthedocs.io/it/latest ... _data.html. The dependent variables are the formant values (Hz) for F1, F2, and F3 (related to phonemes). The formula gives me a true/false result, so I just need to filter out the 'true' ones, which are the useful data (without outliers). I copied the Boolean values into a new data variable because filters can't be applied to computed variables (I kept getting an error message whenever I tried). Then I filtered the "1s" (=true). I wanted to check if there were still outliers, so I did boxplots for all the phonemes. They were still there, so I tried changing the value to 2.5 or 2, but I couldn't see the difference. I don't understand what "3" is for in the formula "<3".
I'm new to this kind of procedure, so I'd really appreciate some suggestions.
Thank you a lot in advance!
Outliers identification using IQR
Re: Outliers identification using IQR
The boxplots show outliers as 1.5 iqr under/over the box. If you change the 3 in the formula to 1.5 your formula should find the same outliers as visible the boxplots.
An easy way to understand what the 3 (or 2.5 or 2) means is to do it graphically by hand:
- graph the boxplot
- measure the box (top to bottom)
- draw a limit above the box that is 3 box sizes on top of the box
- do the same for the lower limit
- look which values fall outside the limits you have drawn
An easy way to understand what the 3 (or 2.5 or 2) means is to do it graphically by hand:
- graph the boxplot
- measure the box (top to bottom)
- draw a limit above the box that is 3 box sizes on top of the box
- do the same for the lower limit
- look which values fall outside the limits you have drawn