0

As a first timer in Math StackExchange and as someone who really has very little Maths background, my apologies in advance if my question does not make sense or does not fall in Maths domain. But if it does I will appreciate for the help.

My specific problem is that I would want to get a function that would convert the maximum and minimum values into a range of 0 and 1. More specifically is that I have multiple strings in form of lists and I would want to identify differences in length between the string with maximum length and the string with minimum length. Of course when the strings are of equal lengths the difference would be zero. But when the strings are of different lengths mere subtraction would not give me what I want as I would want to limit the largest difference to 1 (as some kind of a highest ratio?).

Thanks.

  • 1
    Maybe this can help? https://math.stackexchange.com/questions/3355027/a-function-or-a-factor-to-scale-a-list-of-real-numbers-from-one-range-to-another/3355071#3355071 – Matti P. Oct 23 '19 at 12:15
  • Suggested https://en.wikipedia.org/wiki/Normalization_(statistics) – The Demonix _ Hermit Oct 23 '19 at 12:20
  • Is there a theoretical maximum for the difference in length? Or what is the "maximum difference"? Is it the maximum observed value in some specific set? – Matti P. Oct 23 '19 at 12:24
  • There is no theoretical maximum, and as you said it , it the maximum observed value in some specific set. But I would want to scale this observed "maximum difference" value for each set to never exceed 1 for all set. – user3014974 Oct 23 '19 at 12:34

1 Answers1

0

Depends on really what you want to achieve by this? Very difficult to suggestion a function without its potential uses.

Anyway, if any function would do, then $$exp^{- (abs(|s1| - |s2|))}$$ can work where $|s|$ denotes the length of $s$.

You can also check sigmoid function.

  • or markup and margin ... –  Oct 23 '19 at 12:22
  • Well, it is a bioinformatics data analysis problem. I would want to check how similar my sequence are based on sequence length metric. Same genes can produce protein isoforms of different lengths and it would be ideal to identify clusters of sequences which do not have much variations in their lengths. Does this help? – user3014974 Oct 23 '19 at 12:28
  • If you really need to normalize the metric, then you can try experimenting with some exponential functions as suggested. If you are trying to cluster based on some machine learning algorithms, the distance measures could as well take values more than 1. So, you might not need normalization. – Praveen Dhinwa Oct 23 '19 at 12:42