diff --git a/src/year1/image-processing-and-computer-vision/module1/sections/_instance_obj_detection.tex b/src/year1/image-processing-and-computer-vision/module1/sections/_instance_obj_detection.tex index 529d8c7..b9efea9 100644 --- a/src/year1/image-processing-and-computer-vision/module1/sections/_instance_obj_detection.tex +++ b/src/year1/image-processing-and-computer-vision/module1/sections/_instance_obj_detection.tex @@ -98,8 +98,8 @@ Edge-based template matching that works as follows: \tilde{\vec{u}}_k(\tilde{P}_k) = \frac{\nabla \tilde{I}_{i,j}(\tilde{P}_k)}{\Vert \nabla \tilde{I}_{i,j}(\tilde{P}_k) \Vert} \] \item Compute the similarity as the mean of the cosine similarities of each pair of gradients: - \[ S(i, j) = \frac{1}{n} \sum_{k=1}^{n} \vec{u}_k(P_k) \cdot \tilde{\vec{u}}_k(\tilde{P}_k) = \frac{1}{n} \sum_{k=1}^{n} \cos \theta_k \in [-1, 1] \] - $S(i, j) = 1$ when the gradients perfectly match. A minimum threshold $S_\text{min}$ is used to determine if there is a match. + \[ S(T, \tilde{I}_{i,j}) = \frac{1}{n} \sum_{k=1}^{n} \vec{u}_k(P_k) \cdot \tilde{\vec{u}}_k(\tilde{P}_k) = \frac{1}{n} \sum_{k=1}^{n} \cos \theta_k \in [-1, 1] \] + $S(T, \tilde{I}_{i,j}) = 1$ when the gradients perfectly match. A minimum threshold $S_\text{min}$ is used to determine if there is a match. \end{enumerate} \begin{figure}[H] @@ -109,15 +109,15 @@ Edge-based template matching that works as follows: \end{figure} -\subsection{Invariance to global inversion of contrast polarity} +\subsection{Invariance to contrast polarity inversion} As an object might appear on a darker or brighter background, more robust similarity functions can be employed: \begin{description} - \item[Global polarity inversion contrast] + \item[Global contrast polarity inversion] \[ S(i, j) = \frac{1}{n} \left\vert \sum_{k=1}^{n} \vec{u}_k(P_k) \cdot \tilde{\vec{u}}_k(\tilde{P}_k) \right\vert = \frac{1}{n} \left\vert \sum_{k=1}^{n} \cos \theta_k \right\vert \] - \item[Local polarity inversion contrast] + \item[Local contrast polarity inversion] \[ S(i, j) = \frac{1}{n} \sum_{k=1}^{n} \left\vert \vec{u}_k(P_k) \cdot \tilde{\vec{u}}_k(\tilde{P}_k) \right\vert = \frac{1}{n} \sum_{k=1}^{n} \left\vert \cos \theta_k \right\vert \] @@ -269,11 +269,11 @@ Hough transform extended to detect an arbitrary shape. \item For each $\vec{r}_i$ in the corresponding row of the R-table: \begin{enumerate} \item Compute an estimate of the barycenter as $\vec{y} = \vec{x} + \vec{r}_i$. - \item Cast a vote in the accumulator array $A[\vec{y}] \texttt{+=} 1$ + \item Cast a vote in the accumulator array $A[\vec{y}] \texttt{+=} 1$. \end{enumerate} \end{enumerate} - \item Find the local maxima of the accumulator vector to estimate the barycenters. - The shape can then be visually found by overlaying the template barycenter to the found barycenters. + \item Find the local maxima of the accumulator vector to estimate the barycenter. + The shape can then be visually found by overlaying the template barycenter to the found barycenter. \end{enumerate} \end{description} diff --git a/src/year1/image-processing-and-computer-vision/module1/sections/_local_features.tex b/src/year1/image-processing-and-computer-vision/module1/sections/_local_features.tex index 35e23d3..8f10f4f 100644 --- a/src/year1/image-processing-and-computer-vision/module1/sections/_local_features.tex +++ b/src/year1/image-processing-and-computer-vision/module1/sections/_local_features.tex @@ -406,7 +406,7 @@ After finding the keypoints, a descriptor of a keypoint is computed from the pix Given a pixel $(x, y)$, its gradient magnitude and direction is computed from the Gaussian smoothed image $L$: \[ \begin{split} - \vert \nabla L(x, y) \vert &= \sqrt{ \big( L(x+1, y) - L(x-1, y) \big)^2 + \big( L(x, y+1) - L(x, y-1) \big)^2 } \\ + \Vert \nabla L(x, y) \Vert &= \sqrt{ \big( L(x+1, y) - L(x-1, y) \big)^2 + \big( L(x, y+1) - L(x, y-1) \big)^2 } \\ \theta_L(x, y) &= \arctan\left( \frac{L(x, y+1) - L(x, y-1)}{L(x+1, y) - L(x-1, y)} \right) \end{split} \] @@ -416,7 +416,7 @@ After finding the keypoints, a descriptor of a keypoint is computed from the pix By dividing the directions into bins (e.g. bins of size $10^\circ$), it is possible to define for each keypoint a histogram by considering its neighboring pixels within a patch. For each pixel $(x, y)$ neighboring a keypoint $(x_k, y_k)$, its contribution to the histogram along the direction $\theta_L(x, y)$ is given by: - \[ G_{(x_k, y_k)}(x, y, \frac{3}{2} \sigma_s(x_k, y_k)) \cdot \vert \nabla L(x, y) \vert \] + \[ G_{(x_k, y_k)}\left( x, y, \frac{3}{2} \sigma_s(x_k, y_k) \right) \cdot \Vert \nabla L(x, y) \Vert \] where $G_{(x_k, y_k)}$ is a Gaussian centered on the keypoint and $\sigma_s(x_k, y_k)$ is the scale of the keypoint. The characteristic orientation of a keypoint is given by the highest peak of the orientation histogram.