1

Math is not my first skill, so I figured it might be useful to ask mathematicians. I?m using a clustering algorithm to group 2D segments using M and Q in order to find coherent lines. For the sake of escaping infinity I compute the arctan of M (actually I do it also for Q) so I have a values between -$\frac{\pi}{2}$ and $\frac{\pi}{2}$.

What I would like is to find a way to represent the M such as values near -$\frac{\pi}{2}$ is clustered with values very close to $\frac{\pi}{2}$.
No need to say that just apply an offset to the rotation just shift the problem.

I'm currently using hdbscan as clustering algorithm.

No sure how to tag it

Cesar
  • 255
  • ^_^ is not meaningful here - some of us are old. "rects" is not much better. Meanwhile using $\frac{\pi}{2}$ to display $\frac{\pi}{2}$ is easier to read than pi/2 – Henry Feb 22 '18 at 10:51
  • 1
    @Henry Rectified the question following your inputs, however, I must say that I find cheerfulness appropriate also for serious topics like mathematics as well as for people of all ages. – Cesar Feb 22 '18 at 14:57
  • 1
    If you can define a distance function for your clustering, you could try something like $d(x,y)=\min(|x-y|,|x-y+\pi|,|x-y-\pi|)$ – Henry Feb 22 '18 at 15:08
  • @Henry I'm currenty using hdbscan, and as far as I've seen it looks like I can define a custom distance function. However, I'm not completely sure I understand your formula in the context of clustering segments raw input (x1,y1,x2,y2) can you elaborate a little more ? Thanks – Cesar Feb 23 '18 at 09:48
  • The $x$ and $y$ in @Henry 's comment are the values of $\arctan M$ for two different segments. – Taneli Huuskonen Feb 23 '18 at 11:53
  • @TaneliHuuskonen ho, I see, I'll have to dig deeper as it clusters potentially on N features besides the M value, but thank you for the explanation – Cesar Feb 23 '18 at 15:46

1 Answers1

1

You can use the pair $(\cos 2\alpha,\sin 2\alpha)$ to code $\alpha=\arctan M$.

EDIT: The mention of a custom distance function made me think of computing a custom distance straight from the coordinates of the endpoints, rather than dealing with slopes and intercepts. One possibility that crossed my mind is to calculate how far the endpoints are from the line that passes through the midpoints of the segments. You can do that easily enough with the cross product.

Let $(x_1,y_1)$ and $(x_2,y_2)$ be the endpoints of one segment, $(x_3,y_3)$ and $(x_4,y_4)$ those of another segment. Let $(u,v)$ be the vector from the midpoint of first segment to the midpoint of the other one, that is, $$\begin{eqnarray*} u&=&\frac{x_3+x_4}{2}-\frac{x_1+x_2}{2},\\ v&=&\frac{y_3+y_4}{2}-\frac{y_1+y_2}{2}.\\ \end{eqnarray*} $$ Now the distance of $(x_1,y_1)$ from the line passing through the midpoints is $$\frac{|(x_1-x_2,y_1-y_2)\times(u,v)|}{2\sqrt{u^2+v^2}} =\frac{|(x_1-x_2)v-(y_1-y_2)u|}{2\sqrt{u^2+v^2}}. $$ Naturally, the point $(x_2,y_2)$ is at the same distance from the line, on the other side of it. Similarly, the distance of the endpoints of the other segment from the line is $$\frac{|(x_3-x_4,y_3-y_4)\times(u,v)|}{2\sqrt{u^2+v^2}}. $$ I suggest using the sum of the squares of the above distances, scaled to remove the constant factor $1/2$, as your custom distance function for the clustering algorithm. The expressions simplify a bit when you square them: $$ \frac{((x_1-x_2)v-(y_1-y_2)u)^2+((x_3-x_4)v-(y_3-y_4)u)^2}{u^2+v^2}. $$

  • I've tested it quickly on excel and they look nicely continuous! I'll try adding it on the pipeline this weekend and I'll let you know how it works ^_^ – Cesar Feb 23 '18 at 09:51