Questions tagged [machine-learning]

How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?

From The Discipline of Machine Learning by Tom Mitchell:

The field of Machine Learning seeks to answer the question "How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?" This question covers a broad range of learning tasks, such as how to design autonomous mobile robots that learn to navigate from their own experience, how to data mine historical medical records to learn which future patients will respond best to which treatments, and how to build search engines that automatically customize to their user's interests. To be more precise, we say that a machine learns with respect to a particular task T, performance metric P, and type of experience E, if the system reliably improves its performance P at task T, following experience E. Depending on how we specify T, P, and E, the learning task might also be called by names such as data mining, autonomous discovery, database updating, programming by example, etc.

3322 questions
2
votes
1 answer

Help understanding machine learning cost function

I am taking an online class on Machine Learning and I'm trying to fully understand how the cost function work. Can someone explain to me exactly what is going on in the function below: Cost function $$J(\theta_0, \theta_1) = \frac{1}{2} m…
2
votes
1 answer

SVM - Variable Input Dimension

Is it possible for a trained support vector machine (SVM) to take an input of a different length (say during the testing phase) than the length used when it was trained? e.g. training data input: vector $\bf x_r \in R^n$ ; test data input:…
2
votes
3 answers

Understanding the existence of a kernel function that transforms non linearly separable samples to separable samples in general

I'm studying section $5.11$ on support vector machines from Duda and Hart's Pattern Classification. The authors write: With an appropriate nonlinear mapping $\phi$ to a sufficiently high dimension, data from two categories can always be separated…
2
votes
1 answer

Perceptron linearly separable but not linearly separable through the origin

Can anyone explain the solution to this problem? "Provide two points, (x0, x1) and (y0, y1) in two dimensions that are linearly separable but not linearly separable through the origin. Enter a Python list with two entries of the form [[x0, x1],…
2
votes
0 answers

formula in the VAE paper

I have question about a formula in the machine learning paper. The paper is as follows. https://arxiv.org/pdf/1906.02691.pdf In page 9, formula (1.6), I totally agree with it since it is famous formula in the Prof.Koller's book "probabilistic…
vorton
  • 21
2
votes
0 answers

Correct Understanding of Bellman Optimality

I was reading about the "Bellman Principle of Optimality" (https://en.wikipedia.org/wiki/Bellman_equation) : It seems that the "Bellman Principle of Optimality" state that for some problem, "the overall optimal policy" can be considered as the "sum…
stats_noob
  • 3,112
  • 4
  • 10
  • 36
2
votes
1 answer

Why is the inequality true?

I am studying the book "Understanding Machine Learning: From Theory to Algorithms". I am struggling to understand the solution to exercise 3 (2) on page 41. Exercise: An axis aligned rectangle classifier in the plane is a classifier that assigns 1…
2
votes
1 answer

How to find the function parameters with some constraint?

In the context of neural networks, I am using a function to increase the difference between "good" accuracies and "bad" accuracies, i.e, for example all accuracies below 0.8 are considered bad and all accuracies above, good. The function looks as…
2
votes
0 answers

Find the number of distinct hypotheses within hypothesis space

Suppose that for given instance space $X=\{0,1\}^3$ we're observing the following model H which consists of hypotheses $h(x|\theta) = x_1\theta_1 + x_2\theta_2 + x_3\theta_3 + \theta_4$, such that $h(x|\theta)<0$, $x = (x_1, x_2, x_3)$, $x_i \in…
2
votes
0 answers

Properties of A-Softmax Loss, how to understand the decision boundary?

In this paper, "SphereFace: Deep hypersphere embedding for face recognition", at section 3.2 "Introducing Angular Margin to Softmax Loss": It mentioned that the decision boundaries will produce an angular margin of $\frac{m-1}{m+1} \theta^1_2$ where…
2
votes
0 answers

What is the Rademacher complexity of continuous functions from $[0,1]$ to $[0,1]$?

Recall that the RC is defined as $$\mathfrak{R}_m(\mathcal{G}) := \mathbb{E}_{S\sim \mathcal{D}^m}[\hat{\mathfrak{R}}_S(\mathcal{G})]$$ where $$\hat{\mathfrak{R}}_S(\mathcal{G}) := \mathbb{E}_{\sigma} [\sup_{g\in \mathcal{G}} \frac1m \sum_{i=1}^m…
Jakob Elias
  • 1,375
2
votes
0 answers

How to find correct upper bound of Vapnik-Chervonenkis dimension

I have a question about computing the VC dimension for a general hypothesis class $\mathcal{H}$. I know the process for computing VC dimension is as follows: Find the lower bound $n$, such that there exists a set $|S|=n$ can be shattered by…
Francis
  • 121
2
votes
2 answers

Why is non-uniformly learnable hypotheses class a countable union of uniformly learnable ones?

The question concerns the proof of theorem 7.2 in Understanding Machine Learning by Shalev-Schwartz & Ben-David. The authors argue in the following way: suppose $\mathcal{H}$ is non-uniformly learnable, that is, Def: There exists an algorithm $A$ an…
2
votes
0 answers

Help with back propagation weight computation

I recently started studying and programming in java the back propagation algorithm based on this: https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/ For now, I am just copying everything step by step so I can reproduce all the…
Mark
  • 825
2
votes
1 answer

How to Show that AdaBoost Weighted Error is Exactly 1/2

I am trying to prove that in an AdaBoost model $Y \rightarrow [-1,1]$ $err_t'= \frac{\sum_{i=1}^{N}w'_i1\{h_t(x^{(i)})\neq t^{(i)}\}}{\sum_{i=1}^{N}w'_i} = \frac{1}{2}$ here, $w_i' = w_i exp(-\alpha t^{(i)}h_t(x^{(i)})$ is the reweighted weight at…
1 2
3
14 15