Questions tagged [data-mining]

This tag is for questions about data mining, which is an interdisciplinary subfield of computer science. It is the computational process of discovering patterns in large data sets.

Finding patterns within massive amounts of unexplored data requires the use of sophisticated linear algebra and presents a unique challenge.

Some relevant topics used in data mining are linear discriminant analysis, principal component analysis, and support vector machine.

140 questions
5
votes
1 answer

Does Gini index considers only a binary split for each attribute or can it have multi way spliting?

Does Gini-index based classification split values for any attribute always as a binary split or can it split into more than $2$ branches (multi-way split)? For more clarification, if a split on $A$ partitions $D$ into $D_1$ and $D_2$ , the…
3
votes
0 answers

Gini coefficient vs Gini impurity - decision trees

The problem refers to decision trees building. According to Wikipedia 'Gini coefficient' should not be confused with 'Gini impurity'. However both measures can be used when building a decision tree - these can support our choices when splitting the…
brunner
  • 31
1
vote
0 answers

Derivation of the Chebyshev distance from the Minkowski distance

I know that the Minkowski distance is defined as follows: $$d(x,y)=\lim_{r\to\infty}\left(\sum_{k=1}^n|x_k-y_k|^r\right)^{1/r}$$ and I also read that the Chebyshev distance could be considered as a form of the Minkowski distance. I wanted to get a…
Lila
  • 479
0
votes
1 answer

What's matrix $W$ in nonlinear PCA?

Nonlinear PCA is based on minimizing wrt matrix $W$ the function: $$I = E \{ \|x-Wg(W^Tx)\|^2\}$$ where $g$ is an odd function. However, what is $W$?
mavavilj
  • 7,270
0
votes
1 answer

What are correlation coefficients used for in PCA?

What are correlation coefficients used for in PCA? One can discover them through the PCA formulation, but what are they useful for?
mavavilj
  • 7,270
0
votes
1 answer

Why approximate pairwise distances only over lower triangle of distance matrix?

In the context of dimensionality reduction. Why approximate pairwise distances only over lower triangle of distance matrix? $$\min_{\{\hat{x_i}\}} I = \sum_{i
mavavilj
  • 7,270
0
votes
1 answer

How to interpret maximizing "separability and reciprocal of scattering" in Fisher's LDA?

How to interpret maximizing "separability and reciprocal of scattering" in Fisher's LDA? That is, if $s_1$ is minimized scattering inside (projected) class 1 and $s_2$ is the same for class 2. Then in so called "Joint criterion" one wants to…
mavavilj
  • 7,270
0
votes
0 answers

Extrapolation of measurement data

I would like to extrapolate the results of a measurement. Following picture shows the measurements results. The best I can do is that to calculate the average values for 10 measurement and I use it for approximation, but the error is too large. In…
0
votes
2 answers

Factor Analysis in Text Mining task

I was faced with the task of determining the topics of big text massive. For example you have 1 million any text phrases or sentences. I want factorize the main topic from this massive. The ordinary factor analysis works with continuous data. Is…
Julia
  • 21