How to create dummy variables based on a categorical variable of lists in R

0 votes

There is a data frame with a categorical variable holding listss of strings having various lengths. Consider the below example:

data <- data.frame(x = 1:5)
data$y <- list("A", c("A", "B"), "C", c("B", "D", "C"), "E")
data
  x       y
1 1       A
2 2    A, B
3 3       C
4 4 B, D, C
5 5       E

The required form is a dummy variable for each unique string being seen anywhere in data$y, i.e.:

data.frame(x = 1:5, A = c(1,1,0,0,0), B = c(0,1,0,1,0), C = c(0,0,1,1,0), D = c(0,0,0,1,0), E = c(0,0,0,0,1))
  x A B C D E
1 1 1 0 0 0 0
2 2 1 1 0 0 0
3 3 0 0 1 0 0
4 4 0 1 1 1 0
5 5 0 0 0 0 1

The approach I have chosen is very slow on big data frames. Below is my approach

unique_Strings <- unique(unlist(data$y))
n <- ncol(data)
for (i in 1:length(unique_Strings)) {
+   data[,  n + i] <- sapply(data$y, function(x) ifelse(unique_Strings[i] %in% x, 1, 0))
+   colnames(data)[n + i] <- unique_Strings[i]
+ }

Any suggestions so that I can improve on my code!

Apr 13, 2018 in Data Analytics by Sahiti
• 6,370 points
2,636 views

1 answer to this question.

0 votes

You can use mtabulate in the following way:

library(qdapTools)
cbind(data[1], mtabulate(data$y))
#  x A B C D E
#1 1 1 0 0 0 0
#2 2 1 1 0 0 0
#3 3 0 0 1 0 0
#4 4 0 1 1 1 0
#5 5 0 0 0 0 1
answered Apr 13, 2018 by CodingByHeart77
• 3,750 points

Related Questions In Data Analytics

0 votes
2 answers

How to arrange a data set in ascending order based on a variable?

In your case it'll be, orderedviews = arrange(movie_views, ...READ MORE

answered Nov 27, 2018 in Data Analytics by Kalgi
• 52,350 points
1,109 views
+1 vote
3 answers

How to change the value of a variable using R programming in a data frame?

Try this: df$symbol <- as.character(df$symbol) df$symbol[df$sym ...READ MORE

answered Jan 11, 2019 in Data Analytics by Tyrion anex
• 8,700 points
35,851 views
+1 vote
1 answer

How to create a 2D array of vectors of different lengths in R programming?

You can try making a list of matrices ...READ MORE

answered Feb 1, 2019 in Data Analytics by Sophie may
• 10,620 points
1,709 views
0 votes
1 answer

How to create a date variable in R?

Create a string with date notation as ...READ MORE

answered Jul 16, 2019 in Data Analytics by anonymous
8,320 views
+1 vote
1 answer

How to convert a list of vectors with various length into a Data.Frame?

We can easily use this command as.data.frame(lapply(d1, "length< ...READ MORE

answered Apr 4, 2018 in Data Analytics by DeepCoder786
• 1,720 points
1,689 views
0 votes
1 answer

How to create a list of Data frames?

Basically all we have to do is ...READ MORE

answered Apr 9, 2018 in Data Analytics by DeepCoder786
• 1,720 points
1,368 views
0 votes
1 answer

How to convert a list to data frame in R?

Let's assume your list of lists is ...READ MORE

answered Apr 12, 2018 in Data Analytics by nirvana
• 3,130 points

edited Apr 12, 2018 by nirvana 22,127 views
0 votes
1 answer
+4 votes
3 answers

How to sum a variable by group in R?

You can also try this way, x_new = ...READ MORE

answered Aug 1, 2019 in Data Analytics by Cherukuri
• 33,030 points
78,188 views
0 votes
1 answer

How to write a custom function which will replace all the missing values in a vector with the mean of values in R?

Consider this vector: a<-c(1,2,3,NA,4,5,NA,NA) Write the function to impute ...READ MORE

answered Jul 4, 2018 in Data Analytics by CodingByHeart77
• 3,750 points
4,573 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP