Linear regression using categories as functions

I am trying to build a linear regression model, but some of my traits are not numerical, for example. “Car color”, while others are, for example, “Engine size”. In non-numerical cases, I'm not sure how to represent this when added as an input function. The only way I could do this is to present each color with a different value, for example. (red = 1, blue = 2, green = 3 ...), however this does not seem acceptable as it means that green is "better" than red.

Can someone help ... I implement this in Java, so I would appreciate algorithms expressed in this language, or language-independent.

+5
source share
1 answer

One way to do this is to use dummy coding ; another method encodes the effect .

Please refer to this article for more details, I think the author explained this better than what I can do here.

Coding of categorical variables in regression models: Coding of mannequins and effects from Resmi Gupta

I assume this solution will fall into your language-independent category;)

To encode the color of the car (I assume that the color of the car can only take 3 values: red, blue, green)

You can encode it as follows:

Color  Dummy_Var_One  Dummy_Var_Two

Red        1              0
Blue       0              1
Green      0              0 

Green . , n, n-1 .

Java Weka NominalToBinary, n n.

+9

All Articles