Skip to content

KMeans

crank_ml.cluster.kmeans.KMeans

The most common way to implement batch k-means is to use Lloyd's algorithm, which consists in assigning all the data points to a set of cluster centers and then moving the centers accordingly.

In this implementation we start by finding the cluster that is closest to the current observation. We then move the cluster's central position towards the new observation. The halflife parameter determines by how much to move the cluster toward the new observation.

The KMeans implementation does not require learning via differentiation, and is updated analytically.

Parameters

Parameter Description
n_features Number of input features
n_clusters (Default: 8) Maximum number of clusters to assign.
halflife (Default: 0.5) Amount by which to move the cluster centers, a reasonable value if between 0 and 1

Example

import numpy as np
import torch

from crank_ml.cluster.kmeans import KMeans

X = torch.from_numpy(np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])).float()

kmeans = KMeans(n_clusters=2, n_features=2)

for _ in range(5):
    _ = kmeans(X)

kmeans.eval()
labels = kmeans(X)