Multivariate outliers
Multivariate outliers
How can I detect multivariate outlier in my data set? I want to use Mahalanobis distance but it is not given in jamovi. How can I run it manually?
Re: Multivariate outliers
Hi, @Dilara_.
I am attaching a screenshot with a simple example of using the RJ editor (as suggested by Jonathon).
Perhaps the script with the Mahalanobis function can be useful to answer your question?
Cheers.
Maurizio
I am attaching a screenshot with a simple example of using the RJ editor (as suggested by Jonathon).
Perhaps the script with the Mahalanobis function can be useful to answer your question?
Cheers.
Maurizio
Re: Multivariate outliers
I'll try it. Thank you
Re: Multivariate outliers
Hi there! I tried implementing this code but I could not for the life of me get it to work. So instead here's some simplified code that I wrote to achieve this for a student - I thought I might as well share it here.
I hope this helps someone!
It would be great to see this implemented in jamovi some time soon, as our Honours students are expected to calculate this.
Best
Deborah.
Code: Select all
library(jmv)
library(dplyr)
library(magrittr)
dat <- select(data, "V1", "V2", "V3", "V4") # select only the variables you want to use
Sx <- cov(dat) # get the covariance matrix
D2 <- mahalanobis(dat, colMeans(dat), Sx) # calculate the Mahalnobis distances on the centred data
# Optionally, make some fancy plots
plot(density(D2, bw = 0.5),
main="Squared Mahalanobis distances, n=100, p=3") ; rug(D2)
qqplot(qchisq(ppoints(100), df = 3), D2,
main = expression("Q-Q plot of Mahalanobis" * ~D^2 *
" vs. quantiles of" * ~ chi[3]^2))
abline(0, 1, col = 'gray')
# Add the distances to your selected dataset and calculate whether there are outliers greater than 4.5
dat$mahalanobis <-D2
dat$outlier <- FALSE
dat$outlier[dat$mahalanobis > 4.5] <- TRUE
dat
It would be great to see this implemented in jamovi some time soon, as our Honours students are expected to calculate this.
Best
Deborah.
-
- Posts: 3
- Joined: Thu Aug 05, 2021 5:40 pm
Re: Multivariate outliers
Hi there,
I am trying to use the code for RJ editor to calculate MD - I have a large data set with approx 100 variables, is there a way I can number the columns or is there a way to accurately input the desired column number into the code without counting them manually?
Thanks
I am trying to use the code for RJ editor to calculate MD - I have a large data set with approx 100 variables, is there a way I can number the columns or is there a way to accurately input the desired column number into the code without counting them manually?
Thanks
Re: Multivariate outliers
hey,
in R you can refer to columns by name, if that's easier than refering to them by column number.
cheers
in R you can refer to columns by name, if that's easier than refering to them by column number.
cheers