an example is when you find that the value of k being equal to 1.2
asked Apr 12, 2014

As I understand it, k is either an integer or a percentage. The purpose of winsorization is to reduce the effects of outliers in an ordered dataset. When k is an integer rather than a percentage, k identifies the number of elements from the low and high end that are to be winsorized. Therefore if k is not an integer, it must be a percentage. Let's take the example of k=1.2 and an ordered dataset of 1000 data values. 1.2% is equivalent to replacing the first 12 data with the value of the 13th data value and the last 12 data with the value of the 988th data value.

If the sum of the 13th to the 988th data values is S and the 13th data value is L and the 988th is H, then the mean for the whole data becomes (12L+S+12H)/1000. If the sum of the 24 outliers is X, the mean before winsorization would be (S+X)/1000. So the difference in the mean is (12(L+H)-X)/1000=0.012(L+H)-0.001X. That is, 1.2% of the sum of the lowest and highest values in the winsorized dataset less 0.1% of the total of the outliers. The percentage applied to X depends on how much data there is in the dataset. If the size of the dataset was 250 instead of 1000, it would be 0.4% of X, for example.

answered Feb 17 by Top Rated User (442,220 points)