Multivariate outliers

General help and assistance with jamovi. Bug reports can be made at our issues page: https://github.com/jamovi/jamovi/issues . (If you're unsure feel free to discuss it here)
Post Reply
Dilara_
Posts: 5
Joined: Fri Oct 18, 2019 7:56 pm

Multivariate outliers

Post by Dilara_ »

How can I detect multivariate outlier in my data set? I want to use Mahalanobis distance but it is not given in jamovi. How can I run it manually?
User avatar
jonathon
Posts: 2613
Joined: Fri Jan 27, 2017 10:04 am

Re: Multivariate outliers

Post by jonathon »

hi,

perhaps with the rj editor?

https://blog.jamovi.org/2018/07/30/rj.html

kind regards

jonathon
User avatar
MAgojam
Posts: 421
Joined: Thu Jun 08, 2017 2:33 pm
Location: Parma (Italy)

Re: Multivariate outliers

Post by MAgojam »

Hi, @Dilara_.
I am attaching a screenshot with a simple example of using the RJ editor (as suggested by Jonathon).
Perhaps the script with the Mahalanobis function can be useful to answer your question?
ScreenShot.png
ScreenShot.png (118.16 KiB) Viewed 15644 times
Cheers.
Maurizio
Dilara_
Posts: 5
Joined: Fri Oct 18, 2019 7:56 pm

Re: Multivariate outliers

Post by Dilara_ »

I'll try it. Thank you
DeborahA
Posts: 22
Joined: Tue Apr 14, 2020 11:38 am

Re: Multivariate outliers

Post by DeborahA »

Hi there! I tried implementing this code but I could not for the life of me get it to work. So instead here's some simplified code that I wrote to achieve this for a student - I thought I might as well share it here.

Code: Select all

library(jmv)
library(dplyr)
library(magrittr)

dat <- select(data, "V1", "V2", "V3", "V4") # select only the variables you want to use 

Sx <- cov(dat) # get the covariance matrix 

D2 <- mahalanobis(dat, colMeans(dat), Sx) # calculate the Mahalnobis distances on the centred data

# Optionally, make some fancy plots 
plot(density(D2, bw = 0.5),
     main="Squared Mahalanobis distances, n=100, p=3") ; rug(D2)
qqplot(qchisq(ppoints(100), df = 3), D2,
       main = expression("Q-Q plot of Mahalanobis" * ~D^2 *
                         " vs. quantiles of" * ~ chi[3]^2))
abline(0, 1, col = 'gray')

# Add the distances to your selected dataset and calculate whether there are outliers greater than 4.5

dat$mahalanobis <-D2
dat$outlier <- FALSE
dat$outlier[dat$mahalanobis > 4.5] <- TRUE

dat
I hope this helps someone!

It would be great to see this implemented in jamovi some time soon, as our Honours students are expected to calculate this.

Best

Deborah.
Claire1998
Posts: 3
Joined: Thu Aug 05, 2021 5:40 pm

Re: Multivariate outliers

Post by Claire1998 »

Hi there,

I am trying to use the code for RJ editor to calculate MD - I have a large data set with approx 100 variables, is there a way I can number the columns or is there a way to accurately input the desired column number into the code without counting them manually?

Thanks
User avatar
jonathon
Posts: 2613
Joined: Fri Jan 27, 2017 10:04 am

Re: Multivariate outliers

Post by jonathon »

hey,

in R you can refer to columns by name, if that's easier than refering to them by column number.

cheers
Post Reply