All Levels of a Factor in a Model Matrix in R

Question

I have a data.frame that includes factor and numeric variables, as may be seen below.

testFrame = data.frame(First=sample (1:10), Second=sample (1:20), Third=sample (1:10), Replace=T);
Fifth=rep(c("Edward","Frank","Georgia","Hank","Isaac"),4) I want to construct a matrix that assigns dummy variables to the factor and leaves the numeric variables alone. Fourth=rep(c("Alice","Bob","Charlie","David"), 5), and Fifth=rep(c("Edward","Frank","Georgia

First + Second + Third + Fourth + Fifth, data=testFrame, model.matrix
This eliminates the reference level for one level of each factor, as expected when executing lm. But I want to create a matrix that includes a dummy or indicator variable for each level of every factor. I am not concerned about multicollinearity because I am developing this matrix for glmnet.

Is there a way to have model.matrix create the dummy for every level of the factor?

score 0 · Answer 1 · Jun 22, 2023

Yes, you can modify the model.matrix() function to create dummy variables for every level of a factor variable, including all levels of each factor. By default, model.matrix() uses a treatment contrast coding, which creates dummy variables for each level except one (reference level). To include all levels as separate dummy variables, you can use the contrasts.arg parameter in the model.matrix() function. Here's an example:

testFrame <- data.frame(First = sample(1:10), Second = sample(1:20), Third = sample(1:10), Replace = TRUE)
Fourth <- rep(c("Alice", "Bob", "Charlie", "David"), 5)
Fifth <- rep(c("Edward", "Frank", "Georgia", "Hank", "Isaac"), 4)
testFrame$Fourth <- as.factor(Fourth)
testFrame$Fifth <- as.factor(Fifth)

dummyMatrix <- model.matrix(~., data = testFrame, contrasts.arg = lapply(testFrame[ , sapply(testFrame, is.factor)], contrasts, contrasts = FALSE))

In this example, we convert the Fourth and Fifth variables to factors and then pass the testFrame data.frame to the model.matrix() function. The contrasts.arg parameter uses lapply() to apply the contrasts() function with contrasts = FALSE to all factor variables in testFrame. This ensures that dummy variables are created for all levels of each factor variable.

The resulting dummyMatrix will include dummy variables for every level of each factor variable, while leaving the numeric variables unchanged.

Enhance your data skills with our comprehensive Data Analytics Courses – Enroll now!