Broadly speaking, when numerically minimizing a d-dimensional objective function:
Gradient descent generally requires more iterations, but each iteration is fast
(we only need to compute 1st derivatives)Newton's method generally requires fewer iterations, but each iteration is slow
(we need to compute 2nd dervatives too)
My question is: in terms of the total amount of computation required, which one generally ends up being faster -- Newton's method or gradient-descent? Does this depend on $d$? How?
If this is a better question for another site please let me know.
Minor update
If it matters for the sake of comparison, let's assume the function is convex and "typical" (i.e. I'm not going to explicitly choose a function that exhibits the worst-case behavior of either algorithm). I'm just trying to understand what the rule of thumb is regarding the performance of each method.