Random initialization trap

when the centroids are randomly initialized, each run of k means produce different WCSS. Incorrect choice of centroids lead to suboptimal clustering.

To solve the issue of incorrect centroids, we use K-means++, where we select the centroids as far as possible at initialization.

The idea is to have centroids to create distinct clusters centers to have optimal clustering to converge fast.

let’s explain that with an example

We have a dataset as shown in the scatterplot below and we have to cluster the data into three clusters.

Based on the random initialization of centroids, we have have clustering 1 and clustering 2 shown below

different clusters based on different initialization of centroids

This shows that clustering will be different based on different initialization for the centroids. The circled point displays how data points are grouped differently based on different initialization for centroids

This is solved by k-means++, which uses the following algorithm

Step 1: pick up random centroids for k clusters

Step 2: calculate sum of squares distance of each point to each centroid

Step 3: find the smallest distance or the cluster closet for each of the data points in the dataset

Step 4: find how many points are assigned to each cluster and calculate the mean for each cluster and they become the new centroid.

we repeat this based on a configurable parameter .

Search This Blog

Known-Space

Random initialization trap

Random initialization trap

Comments

Post a Comment

Popular posts from this blog

Nginx

AWS Configuration For RDS(postgres),ElastiCache(Redis) with ElasticBean

Use @Initbinder in Spring MVC

CSRF Protection using Synchronizer Tokens

How to read Dates with Hibernate

CSRF Protection using Double Submitted Cookies

Different ways to Authenticate a Web Application

Add Logging Messages in Spring 5.1 - All Java Config Version

The TRUE difference between [] and {{}} bindings in Angular