Use different distance formula other than euclidean distance in k means

0 votes
I am working with latitude-longitude data.

My objective is to make clusters based on the distance between two points.

Now distance between two different point is =ACOS(SIN(lat1)*SIN(lat2)+COS(lat1)*COS(lat2)*COS(lon2-lon1))*6371

How to use k means in R. Is there any way I can override distance calculation in that process?
Jun 21, 2018 in Data Analytics by DataKing99
• 8,250 points
1,645 views

1 answer to this question.

0 votes

K-means is based on variance minimization. The sum-of-variance formula equals the sum of squared Euclidean distances, but the converse, for other distances, will not hold.

If you want to have a k-means like an algorithm for other distances (where the mean is not an appropriate estimator), use k-medoids (PAM). In contrast to k-means, k-medoids will converge with arbitrary distance functions!

For Manhattan distance, you can also use K-medians. The median is an appropriate estimator for L1 norms (the median minimizes the sum-of-differences; the mean minimizes the sum-of-squared-distances).

For your particular use case, you could also transform your data into 3D space, then use (squared) Euclidean distance and thus k-means. But your cluster centers will be somewhere underground!

answered Jun 21, 2018 by Sahiti
• 6,370 points

Related Questions In Data Analytics

0 votes
1 answer
0 votes
1 answer

In a dpylr pipline how to use sample and seq?

For avoiding rowwise(), I prefer to use ...READ MORE

answered Apr 6, 2018 in Data Analytics by DeepCoder786
• 1,720 points

edited Jun 9, 2020 by Gitika 1,182 views
0 votes
2 answers

How to use group by for multiple columns in dplyr, using string vector input in R?

data = data.frame(   zzz11def = sample(LETTERS[1:3], 100, replace=TRUE),   zbc123qws1 ...READ MORE

answered Aug 6, 2019 in Data Analytics by anonymous
14,090 views
+1 vote
2 answers

Which function can I use to clear the console in R and RStudio ?

Description                   Windows & Linux           Mac Clear console                      Ctrl+L ...READ MORE

answered Apr 17, 2018 in Data Analytics by anonymous
82,285 views
+1 vote
1 answer

k means vs KNN

K-means clustering is basically an unsupervised clustering ...READ MORE

answered Oct 30, 2018 in Data Analytics by kurt_cobain
• 9,350 points
1,005 views
+1 vote
1 answer

How to handle Nominal Data?

Nominal data is basically data which can ...READ MORE

answered Jul 24, 2018 in Data Analytics by Abhi
• 3,720 points
646 views
+2 votes
2 answers

How to handle outliers

There are multiple ways to handle outliers ...READ MORE

answered Jul 24, 2018 in Data Analytics by Abhi
• 3,720 points
944 views
+1 vote
2 answers

Different data structures in R

The different data types in R are ...READ MORE

answered Aug 26, 2019 in Data Analytics by anonymous
• 33,030 points
1,640 views
0 votes
1 answer

How to change y axis max in time series using R?

The axis limits are being set using ...READ MORE

answered Apr 3, 2018 in Data Analytics by Sahiti
• 6,370 points
3,832 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP