hclust size limit

Question

'm new to R. I'm trying to run hclust() on about 50K items. I have 10 columns to compare and 50K rows of data. When I tried assigning the distance matrix, I get: "Cannot allocate vector of 5GB".

Is there a size limit to this? If so, how do I go about doing a cluster of something this large?

DataKing99 · Answer

Classic hierarchical clustering approaches are&#160;O(n^3)&#160;in runtime and&#160;O(n^2)&#160;in memory complexity. So yes, they scale incredibly bad to large data sets. Obviously, anything that requires materialization of the distance matrix is in&#160;O(n^2)&#160;or worse.Note that there are some specializations of hierarchical clustering such as SLINK and CLINK that run in&#160;O(n^2), and depending on the implementation may also only need&#160;O(n)&#160;memory.You might want to look into more modern clustering algorithms. Anything that runs in&#160;O(n log n)&#160;or better should work for you. There are plenty of good reasons to&#160;not&#160;use hierarchical clustering: usually it is rather sensitive to noise (i.e. it doesn't really know what to do with outliers) and the results are hard to interpret for large data sets (dendrograms are nice, but only for small data sets).

hclust size limit

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Data Analytics

How to limit output of a dataframe in R?

How to change font size in R?

What is the Difference in Size and Count in pandas (python)?

How can I control the size of points in an R scatterplot?

How can I change font size and direction of axes text in ggplot2 ?

Error saying "vector size cannot be NA" when using R with data mining

hclust size limit?

Big Data transformations with R

Transforming a key/value string into distinct rows in R

Finding frequency of observations in R

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES