Curve fitting problems are solved by minimizing a cost/error function with respect to the model's parameters. Gradient descent and Newton's method are among many algorithms commonly used to minimize this function.
The $L_\infty$ norm can also be used as a cost function for linear/polynomial regression. My question: is it possible to use gradient descent to minimize cost defined by the $L_\infty$ norm (i.e. $\text{cost} = \max|\text{predicted} - \text{actual}|$)? How is the gradient of this function even defined?