Data Cleaning Module?

Everything related to the development of modules in jamovi
Post Reply
simonmoon
Posts: 17
Joined: Thu Oct 27, 2022 3:45 pm

Data Cleaning Module?

Post by simonmoon »

Hello,
This is my first post. However, I have evaluated Jamovi for a while as a substitute for SPSS used at my institution. Up to now, I am impressed by the functionality, available modules, and the enthusiastic contributors. One thing that keeps me from making the full transition is the data cleaning aspect. By "data cleaning" I mean two different things. One is the data cleaning based on the assumption checking, and the other is regarding "careless responding." Although Jamovi modules provide assumption checks, the data editor is limited in functions that can help the process. Sorting the data, for example, can be done in Excel, but that means you have to get out of the program. I use the sorting often to eyeball extreme values. The careless responding has been an emerging issue, especially because of the popularity of crowd-sourcing. Identifying unusual response patterns, for example, a zero responder, becomes an important aspect of research.

I am not sure if there are modules that can deal with these issues already. If so, please kindly inform me. If not, is there any one willing to develop modules for these issues?
User avatar
jonathon
Posts: 2613
Joined: Fri Jan 27, 2017 10:04 am

Re: Data Cleaning Module?

Post by jonathon »

hi,

yes, the ability to sort variables is something we need to implement (although we did recently add an 'extreme values' option to the descriptives, to help check for outliers), and yes, using excel isn't a solution.

if you had time, i'd love for you to outline the minimum features you think are necessary.

with thanks

jonathon
simonmoon
Posts: 17
Joined: Thu Oct 27, 2022 3:45 pm

Re: Data Cleaning Module?

Post by simonmoon »

Sorry that I had lapsed for a while. The end of the semester this year has been a bit crazier than usual.

For data cleaning module, Meade & Craig's (2012) article can be a good starting point. M&C outlined data screening methods in use in their Table 2. Some of the methods, "Average Long String", "Max Long String", "Even Odd Consistency", and "Mahalanovis D", can be good candidates in a Jamovi module. Also intra-individual response SD, a version of "long string" indicators, could be also a potential function. With the popularity of online data collection, I believe a data cleaning module will be more useful.

I have checked Jamovi for other functions related to data cleaning. One thing that can be improved was the filter. The current filter cannot use the calculated variables. When I tried to filter the extreme cases that showed large outlier stat values (Cook's D in this case), I could not create the filter variable because I cannot choose the calculated Cook's D values. To do this, I had to save the Cook's D variable as a regular variable. This process takes too long because often the analysis should be repeated filtering out potential outliers. Lacking the sorting function also contributes to this issue.

Please let me know if I can help out in any way.
Gpower
Posts: 4
Joined: Wed Jan 11, 2023 5:54 am

Re: Data Cleaning Module?

Post by Gpower »

My first post too!

I'm really keen to see a data cleaning module. As well as ways to identify univariate outliers, it should identify multivariate outliers and influential scores for an analysis like multiple regression. It should include Mahalanobis distance, Cook's distance and both studentised and standardised residuals. I know you can now generate Cook's and residuals, but just generating them isn't really enough. It is essential to be able to identify the participant or row in the data set so you can then filter them out, if you can't re-order scores ascending or descending to find them. I have a couple of R scripts for doing that, but a module would be great.

If it could also do missing data analysis, that would be helpful too. I was hoping to create such a module but so far I can't even get jmvtools to work, so I might be out of my depth! I'll post separately to see if I can get help with jmvtools.
Post Reply