In trying to put a neural network on my FPGA, I am running into the problem of describing my activation function. On my computer, I typically use the well-known tanh, so I am inclined to stick to my choice on the board too. However, I can't seem to find the best way to calculate tanh on an FPGA, so my question is:
Given a signed 16-bit word (with 8 bits of fraction length), what's the easiest way to implement the tanh function with reasonable accuracy?
Keep in mind that the target is an FPGA, so things like multiplications are OK (as long as I can prevent the word-length from growing too fast), but divisions are tricky and I would like to avoid them as much as possible. Also, the output word length can be optimized (I can devote all but two bits from the word to the fractional part, since the range is (-1, 1)). And by reasonable accuracy, I mean at least 5 decimals worth.
The options I have researched already are:
1) Look-up tables: These need no explanation, I am sure.
2) CORDIC: I was able to write a tanh CORDIC implementation using details from this paper from Walther, though I do not fully understand the 'convergence' of this algorithm, or how I can test it (meaning my implementation returns the right answer for radian values > 1.13, so where's the problem?)
3) Minimax approximation: This requires a division, and more importantly, the input argument needs to be raised to up to the nth power (n being the degree of the polynomial), which I think will cause problems with a fixed-point format like mine.
Are there other computationally cheap ways of calculating this function? There's plenty of literature out there on this very subject, using everything from stochastic machines to DCT interpolation filters, but none have clear details that I could understand.
Thanks in advance!