I'm writing a program to match anagrams in order to practice coding. One way I want to try this is to assign values to letters such that adding up the letters in the individual words creates a unique value. This will allow me to match words without worrying about the order of the letters. I think I can do this by playing around with different approaches but I really want to know how to do this mathematically. Assume all letters will be uppercase (so 26 total letters) and, for this example, I don't want to worry about any kind of character other than A-Z.
Asked
Active
Viewed 1,543 times
2
-
Use binary notation? No, that won't pick up repeat letters. To distinguish even repeats, you need to pick numbers that are linearly independent over the rationals. – GEdgar Mar 29 '14 at 01:43
-
Side comment, the way I handle it when I code is to observe that word 1 is an anagram of word 2 if and only if the alphabetized list of letters in word 1 equals the alphabetized list of letters in word 2. Alphabetizing is a way of not worrying about the order of the letters. But that is a different approach than what you are laying out in your question. – Jason Zimba Mar 29 '14 at 02:26
-
An interesting feature of this idea is that it "forgets" the length of the word. That may introduce inefficiencies compared to handling the dictionary slice-by-slice (equal-word-length slices). (Not meant as a criticism...I know you are trying out a range of approaches as an exercise.) – Jason Zimba Mar 29 '14 at 11:50
-
Thanks for the feedback, Jason. I hadn't actually considered alphabetizing yet so I'll add that to my list of things to try. I'm also playing around with data preparation vs real time processing (i.e, assigning codes to the words in my dictionary file to remove the "forgetfulness" you note). It's all just an exercise in the end. Some things I'm just playing with to help me learn Python. – texrer Mar 31 '14 at 22:44
2 Answers
2
This will require large numbers, as you get $26^n$ possibilities for an $n$ lettered word. To make a unique number for each word, have the letters numbered 0 to n-1, and find the sum
unique word number=$\sum_{i=0}^{n-1}$$26^i$*letter value of letter i(0 to 25, not 1 to 26)
This will make every number unique for every word, even if they end up large.
Asimov
- 3,024
-
-
Thanks, John. So if I understand correctly: If I number my letters from 1-26, and use 0-25 for the powers then the value for Cab = 2081? C = 26^2 * 3 + A = 26^0 * 1 + B = 26^1 * 2. Is this correct? – texrer Mar 31 '14 at 22:54
-
Letters should be 0 to 25 for multiplication, and the powers are based on which letter it is from 0 to word length-1. So $CAB=(26^22)+(26^10)+(26^0*1)= 1352+0+26=1378$ – Asimov Apr 01 '14 at 00:31
-
The powers represent which letter it is in the word (1st,2nd,3rd..), the multiplication represents what letter it is (a,b,c..) – Asimov Apr 01 '14 at 00:33
0
Every letter is a different prime number, multiply them all together for a word. MAY and YAM should result in the same value.