hclust size limit

Question

I'm new to R. I'm trying to run hclust() on about 50K items. I have 10 columns to compare and 50K rows of data. When I tried assigning the distance matrix, I get: "Cannot allocate vector of 5GB".

Is there a size limit to this? If so, how do I go about doing a cluster of something this large?

SDeb · Answer 1 · Jul 10, 2019

Classic hierarchical clustering approaches are O(n^3) in runtime and O(n^2) in memory complexity. So yes, they scale incredibly bad to large data sets. Obviously, anything that requires materialization of the distance matrix is in O(n^2) or worse.

Note that there are some specializations of hierarchical clustering such as SLINK and CLINK that run in O(n^2), and depending on the implementation may also only need O(n) memory.

answered Jul 10, 2019 by SDeb
• 13,300 points

hclust size limit

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Python

Size of an object in Python

How to get the size of a string in Python?

Create an empty list in python with certain size

How to find the size of a string in Python?

Is there any easy way to fill in missing data?

hclust size limit

SMOTE-function not working in R

How to find out cluster center mean of DBSCAN in R?

Size of an open file object

How to increase plt.title font size?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES