Questions tagged [machine-learning]

How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?

From The Discipline of Machine Learning by Tom Mitchell:

The field of Machine Learning seeks to answer the question "How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?" This question covers a broad range of learning tasks, such as how to design autonomous mobile robots that learn to navigate from their own experience, how to data mine historical medical records to learn which future patients will respond best to which treatments, and how to build search engines that automatically customize to their user's interests. To be more precise, we say that a machine learns with respect to a particular task T, performance metric P, and type of experience E, if the system reliably improves its performance P at task T, following experience E. Depending on how we specify T, P, and E, the learning task might also be called by names such as data mining, autonomous discovery, database updating, programming by example, etc.

3322 questions
4
votes
2 answers

Perceptron find weight exercise

I have some difficulties with the following exercise. There are three different diagrams. If possible, find the perceptron-weights $w_0, w_1,$ and $w_2$ for each of them (the decision surface is clearly divided into two regions, one ”positive” the…
user16168
  • 697
4
votes
2 answers

Compressing the Mandelbrot set

This question may not have a definitive answer. However, if someone is able to illuminate the topic for me, I would be very grateful. The Mandelbrot set is the set obtained from the quadratic recurrence…
4
votes
2 answers

GAN Nash equilibrium

I'm reading Ian Goodfellow’s article about Generative Adversarial Networks (https://arxiv.org/pdf/1701.00160.pdf) and, on page 22, I found a sentence that I don’t understand. It’s about the GAN convergence evaluated with the Nash game…
4
votes
1 answer

Cross-Entropy loss in Reinforcement Learning

In the context of supervised learning for classification using neural networks, when we are identifying the performance of an algorithm we can use cross-entropy loss, given by: $$ L = -\sum_1^n log(\pi (f(x_i))_{y_i}) $$ Where $x_i$ is a vector…
Michael Murray
  • 133
  • 1
  • 9
4
votes
2 answers

For a PAC learnable hypothesis Show that its sample complexity $m_{\mathcal{H}}$ is monotonically non-increasing in each of its parameters

Not sure if this is the right place to post this, if this isn't i'll be grateful if someone will direct me where best to post it. I'm independently taking the course Introduction to Machine language (as in, doing it by myself) using the book:…
3
votes
2 answers

VC dimension of perpendicular lines classifier

I was learning about VC dimension, and I saw an example in the "Introduction to Machine learning" that the VC dimension of a rectangle is 4. I'm just curious about VC-dimension of two perpendicular lines. I try to shatter some points but I'm not…
3
votes
1 answer

Equation (3.89) seems wrong in Bishop pattern recognition & machine learning book

In Bishop's pattern recognition & machine learning book, I seem to have found a serious mistake in an math equation; serious because all subsequent arguments rely on it. It is the eq. (3.89) on page 168: $$ 0 = \frac{M}{2\alpha}…
Royalblue
  • 155
  • 5
3
votes
1 answer

Normalized distance from origin to discriminant function for linear classifiers

I'm currently studying machine learning with the book Pattern Recognition and Machine Learning (Bishop, 2006) and had a question regarding finding the distance between the origin and a linear discriminant function. For anyone curious, this is from…
Sean
  • 1,487
3
votes
0 answers

cross entropy for binary or multiclass classification

I'm building a NN classifier to predict if a sample is of class 1 or 0. I'm trying 3 differents network configuration: One unit in the output layer with sigmoid activation function Two units in the output layer with sigmoid activation function Two…
cylon86
  • 131
  • 3
3
votes
1 answer

How does one code the generative adversarial network loss function?

I was reading Ian Goodfellow paper on GAN and I read that the loss function for GANs are : $J^{(G)} = -J^{(J)} = \frac{1}{2} \mathbb{E}_{x \sim p_{\rm data}}\Big[ \log D(x)\Big] + \frac{1}{2} \mathbb{E}_{z} \Big[\log (1-D(G(z)))\Big]$ I saw a few…
3
votes
1 answer

Notation in the derivative of the hinge loss function

The hinge loss function (summed over $m$ examples): $$ l(w)= \sum_{i=1}^{m} \max\{0 ,1-y_i(w^{\top} \cdot x_i)\} $$ My calculation of the subgradient for a single component and example is: $$ l(z) = \max\{0, 1 - yz\} $$ $$ l^{\prime}(z) = \max\{0, -…
jds
  • 2,274
  • 3
  • 24
  • 35
3
votes
0 answers

Michael Nielsen's book “Neural Networks and Deep Learning” Cauchy-Schwarz Inequality Proof

In the online free book the following is stated: If $C$ is a cost function which depends on $v1,v2,...,vn$ he states that we make a move in the $Δv$ direction to decrease $C$ as much as possible, and that's equivalent to minimizing $ΔC≈∇C⋅Δv$. So if…
par
  • 131
3
votes
4 answers

VC-Dimension of Real Linear Classifier Proof

Does anyone have know or have a link to a proof of why the VC-Dimension of Linear Classifiers in $\mathbb{R}^n$ is $n+1$? That is the set of $h_a : \mathbb{R}^n \rightarrow \{-1,1\}, h_a(b) = sgn(a \cdot b + k)$ where $a,b \in \mathbb{R}^n, k \in…
3
votes
3 answers

How does kernel work work

all, I have been learning kernel method for a long time. But I am still not very sure how it works. In my opinion, it works as follows: say $f(x) = \sum_i\alpha_ik(x_i, x)$. First we need to decide which kernel we should use. The common one is the…
tqjustc
  • 143
3
votes
1 answer

Gaussian Process Regression

Observations: $$ X= \begin{pmatrix} x_1 \\ x_2 \\ \end{pmatrix} = \begin{pmatrix} 0 & 1 \\ 0.5 & 2 \\ \end{pmatrix} $$ $$ y= \begin{pmatrix} y_1 \\ y_2 \\ …
Xxx
  • 671
  • 4
  • 11
1
2
3
14 15