Cluster Analysis Steps in Business Analytics with R

Last updated on Nov 28,2024 8.4K Views

Cluster Analysis Steps in Business Analytics with R

edureka.co

Cluster Analysis is a fundamental modelling technique, which is all about grouping. The steps involved in clustering are valid for all techniques.

Here are the steps for Cluster Analysis:

1.Choose the Right Variable – The concept involves identifying what is the right attribute and how much is it worth it. Here, one must select a variable that one feels may be important for identifying and understanding differences among groups of observation within the data.

2.Scaling the Data – In this, the data samples from different sources may be grouped in different scales. For example, if we are working on personal data, such as age where it goes from 0 to 100, weight between 40-180 and height between 1-6 feet. Here, the variables in the analysis vary in range; the variable with the largest range will have the greatest impact on the results.

3.Calculate Distances- Here, if the variables in the analysis vary in range, the variable with the largest range will have the greatest impact on the results.

A Point to note is that each of the attributes has different scales. If we try to come out with an equation, then normalization must be considered, where we may have to bring all attributes and variables. For example, given that we are doing analysis on weather and evaluate the sample data from India & US, the scale is different in this case. This is because one would be using metric system and the other is using US system. Thus, our objective is to bring them to the same standard. Also, the basic purpose of Cluster Analysis is to calculate distances

Calculation of Distance between Points in a Cluster

Here, one objective can be to group similar points together into one cluster.

1)      One way is that we can take the center of the cluster and find out the center of the next group and calculate distance between the centers.

2)      Or take the closest point and find distance between closest points.

3)      Or take the largest distance points and find out the distant between them.

Simple linkage – produces elongated clusters. It is the shortest distance between a point in one cluster and a point in the other cluster.

Complete linkage– longest distance between a point in one cluster and a point in the other cluster

Average linkage– average distance between each point in one cluster and each point in the other cluster

Centroid – distance between the centroids (mean vector over the variables) of the two clusters

Ward– combines clusters that lead to the smallest distance within clusters, sum of all squares over all variables

Note: These concepts may be applied to multiple techniques. In each and every technique we have multiple options to choose from. When it comes to cluster analysis, this is called as hierarchical cluster analysis, where one can use multiple methods. Each method has its own advantage, disadvantage and properties.

If you wish to learn Power BI and build a career in data visualization or BI, then check out our Power BI Certification Course which comes with instructor-led live training and real-life project experience. This training will help you understand Power BI in-depth and help you achieve mastery over the subject. Also, Take your career to the next level by mastering the skills required for business analysis. Enroll in our Business Analyst Course today and take the first step towards a fulfilling and lucrative career.

Got a question for us? Mention them in the comments section and we will get back to you.

Related Posts:

Introduction to Business Analytics with R

Get started with Business Analytics with R

Is there a way to have Looker charts cross-filter when they use different datasets (although joined by a common UID)?

Upcoming Batches For Business Analyst Masters Course
Course NameDateDetails
Business Analyst Masters Course

Class Starts on 4th January,2025

4th January

SAT&SUN (Weekend Batch)
View Details
BROWSE COURSES