Page 1 of 1
How to handle max and min outputs in a statistical analysis?
Posted: Mon Jul 15, 2024 8:22 am
by Biochemist
Hello,
When performing measurements from serum samples or other types of biological samples, the assay used to quantify the parameter may yield two non-numerical outputs:
a) <Minimum
b) >Maximum
Samples only available in limited amount, which are already used up, or samples already used in undiluted form may prevent any re-measurements in order to obtain numerical values.
<Minimum means the amount of sample was below the detection level of the assay. Is it OK to use 0 in these cases?
>Maximum means the readout for the sample was beyond the detection range, i.e. beyond the highest standard curve value. Extrapolation could be an option but provides reliable values only for a limited range. What would be the best way to handle these >Maximum outputs in a statistical analysis?
Re: How to handle max and min outputs in a statistical analysis?
Posted: Tue Jul 16, 2024 4:58 am
by Biochemist
Another option I can think of would be to consider them missing values but that would reduce the dataset.
Re: How to handle max and min outputs in a statistical analysis?
Posted: Tue Jul 16, 2024 2:12 pm
by MattC
To answer your second question first, you should definitely not treat values above and below the limits of detection (LLoD & ULoD) as missing. These are valuable data points that ought to be allowed to contribute to the results.
I don't know of a 'standard' method for deciding what values to use instead of these observations, but there are a few different approaches you can try. The first thing I would suggest is to examine the distribution of the known results using a histogram. This should give you a feel for a reasonable range in which the truncated values might lie. For example, if the gap between the LLoD and zero is small in comparison with the the width of the distribution, and it looks like zero is a reasonable value, then you could either use this or perhaps LLoD/2. If it is large, then you will need to come up with some other arbitrary but reasonable-looking value just below the LLoD. The upper values are trickier (unless you know of a hard upper limit you can use like the zero in the lower case) but the approach is essentially the same.
In the past, I've tried some more sophisticated ideas, like modelling the truncated distribution, extrapolating it outwards and substituting random numbers drawn from the missing region, but I'm not sure that these necessarily produce better results.
One thing to note - often bio/chem parameters will follow a log-normal distribution, in which case you might want to log transform the data before using the above approaches.
Finally, there's nothing wrong with trying out your analyses with a few different substitution approaches to see what difference it makes. If the main conclusions are essentially unchanged, then you don't have too much to worry about. Just make sure you clearly describe your approach in the report.
Hope this helps. Good luck.
Matt
Re: How to handle max and min outputs in a statistical analysis?
Posted: Wed Jul 17, 2024 7:35 am
by Biochemist
Thank you, Matt, for your reply. I will try out different approaches and see how they affect the results and then decide which one I will eventually use - and, of course, describe it in my report.
Re: How to handle max and min outputs in a statistical analysis?
Posted: Thu Jul 18, 2024 2:28 pm
by reason180
To me, a straightforward and convincing approach would be to convert each instance of "max" to equal the value of the largest non-max value in your data set. Likewise, convert each "min" the the value of the smallest non-min value you measured. (This approach is very similar [though not identical] to Winsorization.)
Re: How to handle max and min outputs in a statistical analysis?
Posted: Fri Jul 19, 2024 9:02 am
by Biochemist
Thanks for your suggestion, reason180.
The things is this: The >max values are only max values for the calculated parameter, which is generally a concentration. The raw data are absorbance, fluorescence or chemiluminescence values and there is always such a value. The calculated parameter is determined from the standard curve, which of course has an upper and a lower limit. So, by looking at the raw data for the >max values, I can tell that there are differences between >max values that are just a little bit beyond the upper limit of the standard curve and >max values that are much higher. I just cannot say how much of a difference in absolute quantitative values. So, by assigning the highest measured non-max value to all >max values, I would even out some existing differences. But I guess there is no way around that.
Re: How to handle max and min outputs in a statistical analysis?
Posted: Fri Jul 19, 2024 1:23 pm
by reason180
Are there just a few <min and >max values? If so, then what you're seeing as a problem (with this pseudo-Winsorization I've proposed) may be a benefit in that it prevents outliers from distorting your results. Alternatively, if there are lots of <min and >max values, then you really do have a problem. Perhaps a way to handle that is to do non-parametric analyses on ranks (such that all of the <min values have tied ranks and all of the >max values have tied ranks)?
Re: How to handle max and min outputs in a statistical analysis?
Posted: Fri Jul 19, 2024 1:47 pm
by Biochemist
OK, I see. That may indeed be a benefit in the case of just a few of these values.
There are quite a lot of <min values for some parameters (and none for others) but there are only very few >max values for all parameters.