Questions tagged [machine-learning]

How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?

From The Discipline of Machine Learning by Tom Mitchell:

The field of Machine Learning seeks to answer the question "How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?" This question covers a broad range of learning tasks, such as how to design autonomous mobile robots that learn to navigate from their own experience, how to data mine historical medical records to learn which future patients will respond best to which treatments, and how to build search engines that automatically customize to their user's interests. To be more precise, we say that a machine learns with respect to a particular task T, performance metric P, and type of experience E, if the system reliably improves its performance P at task T, following experience E. Depending on how we specify T, P, and E, the learning task might also be called by names such as data mining, autonomous discovery, database updating, programming by example, etc.

3322 questions
1
vote
0 answers

Nearest neighbor and bayes error rate

I am not able to solve this to get 1.6 at most error rate. Can someone help me out? $3$-Nearest Neighbor Analysis Show that the asymptotic error rate of $3$-NN classifier is at most 1.6 times Bayes optimal classifier. My try:
1
vote
0 answers

Why if the "True Error" equals $0$ i.e. $L_{(\mathcal{D}, f)}(h^{*}) = 0$ then the "Training Error" equals $0$ i.e. $ L_{S}(h^{*}) = 0$?

I have read this question and I am confused by a part of the first answer, even though it is asked in the comments. I don't understand why $$L_{(\mathcal{D}, f)}(h^{*}) = 0 \implies L_{S}(h^{*}) = 0$$ Why if the "True Error" equal to $0$ i.e.…
1
vote
2 answers

What is the difference between training the model and fitting the model?

In this book - https://www.oreilly.com/library/view/machine-learning-with/9781491989371/ - I came to the differentiation of these to terms like this: Train - Applying a learning algorithm to data using numerical approaches like gradient descent. Fit…
Retko
  • 141
1
vote
0 answers

How to push back stochastic term in computational graph?

Assume the following (variatonal auto encoder) model. $$\begin{align} h_i=&\;g_{\lambda}(x_i)\\ z_i \sim &\; N(h_i,I_L)\\ \tilde x_i =&\; f_\psi(z_i)\\ \mathcal{L}=&\;||x_i - \tilde x_i||_2^2 \end{align}$$ If we wanted to optimize the parameters of…
1
vote
1 answer

How to algoritmically improve metric for k-nearest neighbors classification

Let's say I have a dataset with $n$ rows. The $i$th row ($x^i$) has entries $x^i_j$. For each $x^i$ I have an integer label, $p^j$. New data comes along, with rows $y^i$, and I want to predict the corresponding labels $q^j$. To predict the label for…
zabop
  • 1,011
1
vote
1 answer

What is the correct formula for the penalty term in an elastic net regression?

I've a question concerning the penalty term in an elastic net regression. In The elements of Statistical Learning by Hastie, Tibshirani & Friedman the formula (3.54) on p.73 says the penalty term is given by: $$ \lambda \cdot…
1
vote
1 answer

Random Forest Bias in Permutation Importance.

I just read on several blogs something at the form: Variable Importance using permutation will lead to a bias if the variables exhibit correlation. It is for instance stated by…
Mathe
  • 129
1
vote
0 answers

Understanding the backpropagation algorithm

I am currently trying to implement back propagation as described in the Wikipedia article. It defines the gradient of the weights in layer $l$ as: $$\delta^l (a^{l-1})^T$$ where $a^{l}$ is is the output of layer $l$. The article says: Note that…
Luca9984
  • 59
  • 6
1
vote
1 answer

The meaning of realizable case in ML

When learning the Chapter 3 of 'Foundations of Machine Learning', I saw the 'realizable case' and 'non-realizable case' which has never mentioned before. Wish someone to tell me if you know the meaning. Thanks for your time.
1
vote
0 answers

TD(0) evaluation of terminal states

I'm going through Sutton and Barto's Introduction to Machine Learning and am currently reading into Temporal-Difference Learning TD(0) methods. In the textbook they use Random walk as a toy example, which works like this: You have a chain of 5…
1
vote
1 answer

Reworking an equation to overcome arithmetic overflow

I am implementing the Maximum Entropy Markov Model (MEMM) algorithm by following Collins' notes: http://www.cs.columbia.edu/~mcollins/loglinear.pdf The problematic term (see p. 18): $\ln\sum_y\exp(\vec v \cdot f(x^{(i)},y))$ (Although not explicitly…
Howie
  • 111
1
vote
0 answers

Understanding WGAN: slope of $f$ and $g$

To understand WGAN better (hopefully maths in it) I followed this blog. While this is a great blog, I still couldn’t understand Fig 5 of it. Para below it states that: “....If we see the values $f(\xi)$ as connected with line segments, this means…
user1953366
  • 191
  • 7
1
vote
0 answers

Resnet downsampling

I'm currently studying about Resnet and I have question in downsampling. In the paper, it is written that ( When the dimensions increase, the shortcut will perform identity mapping with extra zero entries padded in increasing dimension, or, use 1x1…
1
vote
0 answers

difference between well known books in machine learning

Hi All: I don't know much about the deep-learning field but I was looking around in case I ever wanted to try to look into it more deeply (no pun intended). My question is, with regard to neural networks/deep learning/reinforcement learning, there…
mark leeds
  • 1,514
1
vote
0 answers

How to investigate the behavior of C in Soft Margin Linear Support Vector Machine?

I know what is C and what is Soft Margin Linear SVM. But, couldn't find any way to solution the above 3 questions. Can you please, explain and give the proper answer. I think, I should use the decision boundary equation W.x + b = 0. But, can't what…