I'm a novice R. I want to do some outlier cleaning and all-scaling from 0 to 1 before placing the sample in a random forest.
g<-c(1000,60,50,60,50,40,50,60,70,60,40,70,50,60,50,70,10)
If I do a simple scaling from 0 to 1, the result will be:
> round((g - min(g))/abs(max(g) - min(g)),1)
[1] 1.0 0.1 0.0 0.1 0.0 0.0 0.0 0.1 0.1 0.1 0.0 0.1 0.0 0.1 0.0 0.1 0.0
So my idea is to replace the values of each column that are greater than 0.95-quantile with the next value less than 0.95-quantile - and the same for 0.05-quantile.
Thus, the result with preliminary scaling:
g<-c(**70**,60,50,60,50,40,50,60,70,60,40,70,50,60,50,70,**40**)
and scales:
> round((g - min(g))/abs(max(g) - min(g)),1)
[1] 1.0 0.7 0.3 0.7 0.3 0.0 0.3 0.7 1.0 0.7 0.0 1.0 0.3 0.7 0.3 1.0 0.0
I need this formula for the entire data frame, so the functional implementation inside R should look something like this:
> apply(c, 2, function(x) x[x`<quantile(x, 0.95)]`<-max(x[x, ... max without the quantile(x, 0.95))
Can anyone help?
: , , . cut cut2. cut - - ; cut2 , , 0 1.
:
a<-c(100,6,5,6,5,4,5,6,7,6,4,7,5,6,5,7,1)
b<-c(1000,60,50,60,50,40,50,60,70,60,40,70,50,60,50,70,10)
c<-cbind(a,b)
c<-as.data.frame(c)
,