1

For base $10$ natural number $n$, how would one find the base that represents $n$ in the minimum (number of digits) + (the number of digits in base $b$)?

This is for a programming project related to data compression, I am not much of a mathematician. This is not from a book or homework, this is something I need for a programming project.

My question: is there an accepted way to do this or a proven solution that already exists?

My steps for attempting to solve the problem ($b$ is the base):

$\text{total_digits} = \log_b(n) + 1 + \log_b(10)$

(from How to determine the number of digits needed to represent a number in different bases?)

Then we want to minimize $ \log_b(n) + 1 + \log_b(10) + \log_b(n) + 1 $

We want minimum value of this expression. Taking the derivative of this expression with respect to $b$ and setting it equal to zero gives us:

$$\frac{\ln(b) / \ln(10)}{\left(\ln(b) / \ln(10)\right)^2 + 1/n} = \log_b(e)$$

Denominator of the left side of equation is always greater than $1/n$, drop the $1/n$ term to get:

$$\frac{\ln(b) / \ln(10)}{\left(\ln(b) / \ln(10)\right)^2 + 1} \approx \log_b(e)$$

$$\implies \left(\frac{\ln(b)}{\ln(10)}\right)^2 \approx \frac{\ln(b)}{\ln(10)} + \log_e(n)$$

Let $x = \ln(b) / \ln(10)$ and $y = \log_e(n)$, then

$$x^2 - x - y \approx 0$$

$$\implies x = \frac{1 + \sqrt{1 + 4y}}{2}$$

Substituting the value of $x$ back in:

$$\ln(b) = \frac{\ln(10)}{2} \left( \sqrt{1+4\ln_e(n)} - 1 \right)$$

Solve for $b$:

$$b \approx e^{\frac{\ln(10)}{2}(1 + \sqrt{1 + 4\ln_e(n)})} \implies x \sqrt{10} e^{\frac{1}{2}\ln(10)\sqrt{1 + 4\ln_e(n)}} \implies \lceil \sqrt{10n} \rceil$$

Please let me know if I made a mistake somewhere. However, something else this solution is lacking is that often the answer is not a valid base for n. How do I find the base that represents $n$ in the minimum (number of digits )+ (num digits in base) while ensuring that the base is valid?

Thank you!

433MEA
  • 33
  • Welcome to [math.se] SE. Take a [tour]. You'll find that simple "Here's the statement of my question, solve it for me" posts will be poorly received. What is better is for you to add context (with an [edit]): What you understand about the problem, what you've tried so far, etc.; something both to show you are part of the learning experience and to help us guide you to the appropriate help. You can consult this link for further guidance. – Another User Mar 03 '23 at 19:30
  • @coudy Suppose that $n = 1000.$ Then $1000_{1001}$ has 1 digit, and $1001$ has $4$ digits. So the sum associated with the choice of $1001$ as the base is $4 + 1 = 5.$ Alternatively, consider for example, using base $40.$ Then $1000_{40}$ will have $2$ digits precisely because $1000 < (40)^2.$ Also, $(40)$ itself has $2$ digits. Therefore, the sum associated with the alternative choice of $40$ as the base, which is $2 + 2 = 4,$ is less than the sum associated with $1001$ as a base, when $n = 1000.$ – user2661923 Mar 03 '23 at 21:18
  • @coudy I agree that in addition to the fact that the OP (i.e. original poster) has shown no work, and given no background on the problem (i.e. source of the problem), the problem was poorly stated. However, the interpretation in my last comment seems to be the only one that I can think of, that makes the problem non-trivial. – user2661923 Mar 03 '23 at 21:21
  • 1
    To the OP: Please see this article on MathSE protocol. As onerous as the article may appear to you, it provides a defense mechanism against the MathSE forum being used as a do my homework forum. In particular, please see the Edit-Tools section of the article, and the portion of the article that discusses showing work. It is irrelevant whether the problem is homework. What counts is whether the protocol is observed. – user2661923 Mar 03 '23 at 21:24
  • @user2661923, I am sorry for not following MathSE protocol. I have done my best to edit post to comply. Please let me know if I have to make other edits, and when I can post this same question again so others can actually see it. – 433MEA Mar 03 '23 at 22:16
  • 2
    To the OP: Now that you have edited your posting, I have voted to re-open your posting. Someone else has similarly voted to re-open. I will also lobby for others to re-open the posting. When your posting receives 3 more re-open votes, it will be automatically opened. – user2661923 Mar 03 '23 at 22:52
  • I'm not sure why the align environment is acting up, it works fine elsewhere. – bobeyt6 Mar 04 '23 at 00:19
  • It would be helpful to include some examples of what to minimise, with concrete $n$ and $b$. I am wondering what your $\text{total_digits}$ and the $\log_b(10)$ is for: $$\text{total_digits} = \log_b(n) + 1 + \underbrace{\log_b(10)}_{??} = \log_b(10n) + 1,$$ when $\left\lfloor\log_b(n)+1\right\rfloor$ is already the number of digits according to your linked question. – peterwhy Mar 04 '23 at 18:31

1 Answers1

1

The formula for the number of digits in base $~b \in \Bbb{Z_{\geq 2}},~$ when $~b~$ is expressed in base $~10~$ is

$$f(b) = 1 + \left\lfloor \log_{10} (b) \right\rfloor.$$

Similarly, the formula for the number of digits in $~n \in \Bbb{Z_{\geq 2}},~$ when $~n~$ is expressed in base $~b~$ is

$$g(b) = 1 + \left\lfloor \log_{b} (n) \right\rfloor = 1 + \left\lfloor \frac{\log_{10} n}{\log_{10} b}\right\rfloor.$$

So, for a given positive integer constant $n$, it is desired to find the positive integer value of $b$ that minimizes the sum $f(b) + g(b).$

It is clear that choosing $b = \left[10^k\right] - 1, ~: ~k \in \Bbb{Z^+},~$ will always be just as good, if not better, than choosing $b_1~$ such that $10^{k-1} \leq b_1 \leq \left[10^k\right] - 2.$

This is because $~f(b) = k = f(b_1),~$ and $~b > b_1 \implies g(b) \leq g(b_1).$

With the choice of base $~b~$ restricted to the set

$$\left\{\left[10^k\right] - 1 ~: ~k \in \Bbb{Z^+} ~\right\},$$

you can make the simplifying approximation that $~\log_{10}(b) = k,~$ without introducing much error, especially for larger values of $k$.

Note:
See the Addendum for a more convoluted/accurate approach.

So, for a fixed $n$, you are attempting to find $~k \in \Bbb{Z^+},~$ so that the expression

$$k + \left\{ ~1 + \left\lfloor \frac{\log_{10} n}{k}\right\rfloor ~\right\} \tag1 $$

is minimized.

Since the expression in (1) above only contains one floor expression, you can minimize it by (instead) minimizing the function

$$h(k) = k + \frac{\log_{10} n}{k} ~: ~k \in \Bbb{Z^+}.$$

Temporarily pretending that the domain of $h(k)$ is $~\Bbb{R^+},~$ you have that

$$h'(k) = 1 - \frac{\log_{10} n}{k^2}, ~h''(k) > 0.$$

So, the minimum value for $h(k)$ is achieved when $~\displaystyle k = \sqrt{\log_{10} n}.$

This implies that the optimal value for $k \in \Bbb{Z^+}~$ must be one of the elements

$$~k_1, ~k_2 ~: ~k_1 = \left\lfloor ~\sqrt{\log_{10} n} ~\right\rfloor, ~k_2 = \left\lceil ~\sqrt{\log_{10} n} ~\right\rceil.$$

Then, setting $~b_1 = 10^{k_1} - 1, ~b_2 = 10^{k_2} - 1,$ the optimal sum will be

$$\min\left\{ ~\left[ ~f(b_1) + g(b_1) ~\right], ~\left[ ~f(b_2) + g(b_2) ~\right] ~\right\}.$$


Addendum
This section discusses a (somewhat convoluted) numerical approximation refinement for computing

$$\log_{10} \left[ ~\left(10^k ~\right) - 1 ~\right] ~: ~k \in \Bbb{Z^+}.$$

You have the following:

  • Setting $~a(x) = \ln(x) \implies a'(x) = \dfrac{1}{x}.$
    This implies that $\displaystyle ~\ln(10^k) - \ln\left[ ~\left(10^k\right) - 1 ~\right] \approx \dfrac{1}{10^k}.$

  • You also have that for all $~x \in \Bbb{R^+},~$ that
    $\displaystyle \log_{10} x = \frac{\ln(x)}{\ln(10)} \approx \frac{\ln(x)}{2.30}.$

From the above two bullet points, you can conclude that

$$\log_{10} \left\{ ~\left[10^k\right] - 1 ~\right\} = k - ~\left[ ~\log_{10} 10^k ~\right] + \log_{10} \left\{ ~\left[10^k\right] - 1 ~\right\}$$

$$\approx k - \frac{1}{2.30 \times 10^k}.$$

user2661923
  • 35,619
  • 3
  • 17
  • 39