Questions tagged [clustering]

Clustering is grouping (partitioning) a set of objects so that items in the same group are more similar to each other than to items in different groups, where the notion of similarity may be variously defined.

Clustering is a task of grouping (partitioning) a set of objects so that items in the same group are more similar (closer) to each other than to items in different groups. Often the notion of similarity is expressed as a distance measure, with greater distance conveying less similarity. The study of clustering algorithms (cluster analysis) originated in the social sciences but has become important in statistical data analysis (data mining) and in machine learning.

Examples of such algorithms are $K$-means and self-organizing map.

328 questions
1
vote
1 answer

Kernel k-means formula notation

the kernel k-means problem is notated as: $ \overset{min}{U \in \{0,1\}^{kxs}} \overset{max}{c_k(\cdot) \in \mathcal{H}_{K}} \sum_{j=1}^{k}\sum_{i=1}^{S} \frac{U_{ji}}{s} \left \| c_j(\cdot)-\kappa(x_i, \cdot)) \right \|^2_{\mathcal{H}_K} $ U is the…
1
vote
1 answer

Is the laplacian L derived from the affinity matrix A in spectral clustering always block diagonal?

I am currently working through the theory of spectral clustering and most of the 'tutorials' explain the laplacian L as being block diagonal where the blocks are related to possible clusters. It seems to me this heavily depends on the ordering of…
1
vote
0 answers

Identify outliers in a set of elements

I have a set of elements that has been partitioned into clusters based on several criteria, one of which is the length of the elements. To be precise, element $x$ cannot belong to cluster A if…
0
votes
1 answer

In Cluster Analysis, how do we calculate Purity?

In cluster analysis how do we calculate purity? What's the equation? I'm not looking for a code to do it for me. Let $\omega_k$ be cluster k, and $c_j$ be class j. So is purity practically accuracy? it looks like were summing the amount of truly…
Iancovici
  • 63
  • 6
0
votes
0 answers

Standard metric for the distance between two clusters

Let $A=\{A_1,A_2,\cdots,A_m\}$ and $B=\{B_1,B_2,\cdots,B_n\}$ be two sets of points in $k$-dimensional Euclidean space. Each points $A_i$ or $B_i$ can be thought of as a feature vector of a data sample. I want to know if two distribution of $A$ and…
govindah
  • 107
0
votes
0 answers

rank clusters by comparing their importances

I would like to rank clusters according to their importance. From 113 demand time series data I extract the following 6 features for each time series. "Average Demand", "n95 demand", "max's deviation from n95", "Zero demand counts", "kurtosis",…
0
votes
1 answer

How do you find the mean proximity of two clusters using Manhattan Distance way or the Euclidean Distance way?

Question Solution I don't understand how the mean proximity is calculated here like it says take the average of the $x$ components then add it with the average of the $y$ components of these $16$ distances. From my understanding I thought it was…
0
votes
0 answers

How ball tree classification algorithm works?

Good afternoon! My question is very simple. I searched on the web without finding an answer. I'm wanting to understand how the ball tree algorithm works ( i prefer step by step in a simple example ). I don't know if my question meets the…
Tou Mou
  • 131
0
votes
0 answers

Can I use a clustering algorithm or some mechanism to classify parts?

I am new to the site if I am posting on the wrong site sorry about that and please if so lead me to where I can post this... I work in the automotive industry but specifically an area that deals with packaging, logistics, and the most efficient ways…
JonH
  • 101
  • 3
0
votes
1 answer

Selecting a cluster based on minimum average distance

I have a symmetric matrix of non-Euclidean distances of size $N$ (say, 500) and I would like to select one cluster of a fixed size $K$ (say, 25), so that it has the smallest average distance within this cluster. What is a good algorithm for doing…
0
votes
1 answer

What's the common way to utilize distance functions for clustering?

What's the common way to utilize distance functions for clustering? Like does one set some thresholds for the distances and do grouping based on that?
mavavilj
  • 7,270
0
votes
1 answer

Getting started with biclustering

I have been doing some casual internet research on biclusters. (I have read the Wiki article several times.) So far, it seems as if there are few definitions or standard terminology. I was wondering if there were any standard papers or books that…
Henry B.
  • 2,028
0
votes
0 answers

Clustering with cluster size/mass constraint

I'm rather new to this field and maybe even using a bit naive terminology. So I need to classify (cluster) some objects, but limiting each cluster in size, so for each cluster $C_i$, $$\sum_{o \in C_i} m(o) \leq N$$ where $m(\cdot)$ is a some…
saabeilin
  • 126
0
votes
0 answers

Shortest distance of a location to X number of locations

Anyone have advice on this problem? "Shortest distance of a location to X number of locations" First lets assume X=3 (3 addresses) We know the following: Distance in Miles or KM of : A1 to A1, A1 to A2, A1 to A3 | A2 to A1, A2 to A2, A2 to A3 | A3…
0
votes
1 answer

A question about stability of clustering

I'm reading a paper about interactive clustering, and I'm stuck with a definition about stability property of a clustering (based on this paper): What I understand is that $A$ and $A$ are samples of the data, and a clustering algorithm $C$ is…
azer89
  • 125