Hi
a small problem happens with the file "cogni.csv" in :
https://github.com/mcfanda/gamlj_glm/tree/develop/data
The file features two variables that are coded -.5 vs .5. If I try to convert them to a "nominal" all values are converted to 0 so the info in the variable is lost. Is it because of the floating format?
But this made me think about one general strategy for handling variables definition in modules. Do you think that modules should respect the variable definition or it is better that they convert variables on the fly? I give you an example:
A module accepts factors as independent variables. What if a user wants to fill in the "factors" slot a numerical variable? I see two strategies:
1) The user is not allowed. The "factors" slot will not accept the numerical variable via the "permit:" option in .a.yaml, and the user is forced to go back to the data and change the variable definition. This is ok, but it is cumbersome and a bit time-consuming. Of course, one should be sure that the conversion works. This is what R does.
2) The module takes care of the conversion. This is what SPSS does. However, because AFAIK the .init() function does not get the full dataframe, the module cannot guess how the numerical variable will be converted into a factor (the number of levels depends on the unique values in the variable) and thus the table initialization would not be always perfect.
What do you think is the best strategy?
Small problem and general strategies
- mcfanda@gmail.com
- Posts: 462
- Joined: Thu Mar 23, 2017 9:24 pm
Re: Small problem and general strategies
hey,
in the case of cogni.csv, i'd prevent/reject continuous variables, and require the user to change the columns `decision` and `reward` to be nominal text.
cheers
yup, nominal only supports whole integer values (although you can have whatever label you want)values are converted to 0 so the info in the variable is lost. Is it because of the floating format?
generally speaking, convert on the fly is the best approach. however, as you say, this does pose a problem during the init phase when continuous variables are used where a factor is wanted. in this case i generally prevent/reject continuous variables (and require the user to change it to a factor).modules should respect the variable definition or it is better that they convert variables on the fly?
in the case of cogni.csv, i'd prevent/reject continuous variables, and require the user to change the columns `decision` and `reward` to be nominal text.
cheers