Error saying vector size cannot be NA when using R with data mining

+1 vote

I'm using R for data analytics and connected it with elasticsearch and retrieve a dataset of Shakespeare Complete Works.

library("elastic")
connect()
maxi <- count(index = 'shakespeare')
s <- Search(index = 'shakespeare',size=maxi)

dat <- s$hits$hits[[1]]$`_source`$text_entry
for (i in 2:maxi) {
  dat <- c(dat , s$hits$hits[[i]]$`_source`$text_entry)
}
rm(s)

After that I want to do a tf_idf matrix but apparently I can't since it uses too much memory (I have 4GB of RAM), here is my code:

library("tm")
myCorpus <- Corpus(VectorSource(dat))
myCorpus <- tm_map(myCorpus, content_transformer(tolower),lazy = TRUE)
myCorpus <- tm_map(myCorpus, content_transformer(removeNumbers),lazy = TRUE)
myCorpus <- tm_map(myCorpus, content_transformer(removePunctuation),lazy = TRUE)
myCorpus <- tm_map(myCorpus, content_transformer(removeWords), stopwords("en"),lazy = TRUE)
myTdm <- TermDocumentMatrix(myCorpus,control = list(weighting = function(x) weightTfIdf(x, normalize = FALSE)))

myCorpus is around 400 Mb.

But then I do:

> m <- as.matrix(myTdm)
Error in vector(typeof(x$v), nr * nc) : vector size cannot be NA
In addition: Warning message:
In nr * nc : NAs produced by integer overflow
Nov 15, 2018 in Data Analytics by Ali
• 11,360 points
4,826 views

1 answer to this question.

0 votes

You can use the removesparseterm function. 

Removes sparse terms from a document-term or term-document matrix.

something like this:

# NOT RUN {
 data("crude") 
tdm <- TermDocumentMatrix(crude) 
removeSparseTerms(tdm, 0.2) # }

answered Nov 15, 2018 by Maverick
• 10,840 points

Related Questions In Data Analytics

0 votes
1 answer

How to order data frame rows according to vector with specific order using R?

You can try using match: data <- data.frame(alphabets=letters[1:4], ...READ MORE

answered Apr 30, 2018 in Data Analytics by Sahiti
• 6,370 points
7,460 views
+1 vote
1 answer
+1 vote
2 answers
0 votes
1 answer

Trying to find frequent itemsets of a data set using arules package

Try replacing ID <- c("A123","A123","A123","A123","B456","B456","B456") item <- c("bread", "butter", "milk", ...READ MORE

answered Nov 15, 2018 in Data Analytics by Maverick
• 10,840 points
809 views
0 votes
1 answer

Error saying "Error in df$item : object of type 'closure' is not subsettable" when trying to use arules package

Try replacing ID <- c("A123","A123","A123","A123","B456","B456","B456") item <- c("bread", "butter", ...READ MORE

answered Nov 15, 2018 in Data Analytics by Maverick
• 10,840 points
1,763 views
0 votes
1 answer
0 votes
1 answer
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP