Part 4: Studying Digital Image Processing with OpenCV-Python - Tech It Yourself

## Thursday, 24 May 2018

1. Fourier Transform
Fourier Transform is used to analyze the frequency characteristics of various filters.
A signal $x(t)=Asin(2\Pi ft)$ with f is the frequency of signal, and if its frequency domain is taken, we can see a spike at f.
If f is high (high frequency signal), the amplitude of signal varies so fast in short time.
If f is low (low frequency signal), the amplitude of signal varies so slowin short time.
An image as a signal which is sampled in two directions X and Y. So the Fourier transform will be taken in both  directions.
In image, edges and noises are high frequency contents in an image. If there is no much changes in amplitude, it is a low frequency component.
2D Discrete Fourier Transform (DFT) is used to find the the frequency representation (frequency domain) of image.
Fast Fourier Transform (FFT) is used for calculation of DFT.
1.1. Fourier Transform in openCV
OpenCV provides the functions cv2.dft() and cv2.idft() for Fourier Transform. The result has 2 channels (real part and imaginary parts of the result). The input image should be converted to np.float32.
Using cv2.cartToPolar() to calculate both magnitude and phase.
Apply and high pass filter for the image below:
Figure: input image
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 import numpy as np import cv2 from matplotlib import pyplot as plt img = cv2.imread('opencv_logo.png',0) dft = cv2.dft(np.float32(img),flags = cv2.DFT_COMPLEX_OUTPUT) dft_shift = np.fft.fftshift(dft) #fourier transform magnitude_spectrum = 20*np.log(cv2.magnitude(dft_shift[:,:,0],dft_shift[:,:,1])) rows, cols = img.shape crow,ccol = rows/2 , cols/2 # high pass filter, create a mask, center square is 0, remaining is 1 mask = np.ones((rows,cols,2),np.uint8) mask[crow-30:crow+30, ccol-30:ccol+30] = 0 # apply mask and inverse DFT fshift = dft_shift*mask f_ishift = np.fft.ifftshift(fshift) img_back = cv2.idft(f_ishift) #magnitude calculation is based on channels img_back = cv2.magnitude(img_back[:,:,0],img_back[:,:,1]) plt.subplot(141),plt.imshow(img, cmap = 'gray') plt.title('Input Image'), plt.xticks([]), plt.yticks([]) plt.subplot(142),plt.imshow(magnitude_spectrum, cmap = 'gray') plt.title('Magnitude Spectrum'), plt.xticks([]), plt.yticks([]) plt.subplot(143),plt.imshow(img_back, cmap = 'gray') plt.title('Image after HPF'), plt.xticks([]), plt.yticks([]) plt.subplot(144),plt.imshow(img_back) plt.title('Result in JET'), plt.xticks([]), plt.yticks([]) plt.show() 

Figure: output FFT, HPF
1.2. Performance Optimization of DFT
Performance of DFT calculation is better for some array size: power of two, a product of 2’s, 3’s, and 5’s. If your array size is not satisfy the condition, you can modify the size of the array by padding zeros before calculating.
OpenCV provides a function, cv2.getOptimalDFTSize() to find the optimal size of the array
The new code:
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 import numpy as np import cv2 from matplotlib import pyplot as plt img = cv2.imread('opencv_logo.png',0) rows, cols = img.shape print rows,cols nrows = cv2.getOptimalDFTSize(rows) ncols = cv2.getOptimalDFTSize(cols) print nrows, ncols nimg = np.zeros((nrows,ncols)) nimg[:rows,:cols] = img dft = cv2.dft(np.float32(nimg),flags = cv2.DFT_COMPLEX_OUTPUT) dft_shift = np.fft.fftshift(dft) #fourier transform magnitude_spectrum = 20*np.log(cv2.magnitude(dft_shift[:,:,0],dft_shift[:,:,1])) crow,ccol = nrows/2 , ncols/2 # high pass filter, create a mask, center square is 0, remaining is 1 mask = np.ones((nrows,ncols,2),np.uint8) mask[crow-30:crow+30, ccol-30:ccol+30] = 0 # apply mask and inverse DFT fshift = dft_shift*mask f_ishift = np.fft.ifftshift(fshift) nimg_back = cv2.idft(f_ishift) img_back = nimg_back[:rows,:cols] #magnitude calculation is based on channels img_back = cv2.magnitude(img_back[:,:,0],img_back[:,:,1]) plt.subplot(141),plt.imshow(img, cmap = 'gray') plt.title('Input Image'), plt.xticks([]), plt.yticks([]) plt.subplot(142),plt.imshow(magnitude_spectrum, cmap = 'gray') plt.title('Magnitude Spectrum'), plt.xticks([]), plt.yticks([]) plt.subplot(143),plt.imshow(img_back, cmap = 'gray') plt.title('Image after HPF'), plt.xticks([]), plt.yticks([]) plt.subplot(144),plt.imshow(img_back) plt.title('Result in JET'), plt.xticks([]), plt.yticks([]) plt.show() 
1.3. Filters and Fourier Transform
Take the Fourier transform of Laplacian, Sobel, ... you can understand what kind of kernel it is.
Figure: Fourier transform of some filters
+ Low frequencies are near the origin
+ High frequencies are away from the origin

From the figure above you know what frequency region each kernel blocks, and what region each kernel passes.
So mean filter, Gaussian is low pass filter, remaining are high pass filters.
2. Template Matching
Template Matching is a method for searching and finding the location of a template image (contains objects) in an image.
OpenCV provides cv2.matchTemplate() for this feature. It slides the template image over the input image and compares the template image with a patch of input image.
There are some comparison methods:
cv2.TM_CCOEFF
cv2.TM_CCOEFF_NORMED
cv2.TM_CCORR
cv2.TM_CCORR_NORMED
cv2.TM_SQDIFF
cv2.TM_SQDIFF_NORMED
It returns a gray-scale image, where each pixel denotes how much does the neighborhood of that pixel match with template.
Note:
The problem with template matching is that it only works in very controlled environments. It works perfectly if  the template image was taken from the input image, but it won't work if the resolution is different or even if the image is a little bit turned.
If input image is of size (WxH) and template image is of size (wxh), output image will have a size of (W-w+1, H-h+1). In order to mark the region matching the template, use cv2.minMaxLoc() function to find the maximum or minimum location of the best matches then calculate top-left corner of rectangle and take (w,h) as width and height of the rectangle.
Make a demo to find the template image below:
Figure: template image
in the image:
Figure: input image
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 import cv2 import numpy as np from matplotlib import pyplot as plt inimage = cv2.imread('toy-story.jpg') img = cv2.cvtColor(inimage,cv2.COLOR_BGR2GRAY) img2 = img.copy() template = cv2.imread('template.png',0) w, h = template.shape[::-1] img = img2.copy() # Apply template Matching res = cv2.matchTemplate(img,template,cv2.TM_CCOEFF_NORMED) min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res) #find rectangle top_left = max_loc bottom_right = (top_left + w, top_left + h) cv2.circle(inimage,max_loc, 10, (0,255,0), -1) cv2.rectangle(inimage,top_left, bottom_right, (0,0,255), 2) #plot plt.subplot(121),plt.imshow(res,cmap = 'gray') plt.title('Matching Result'), plt.xticks([]), plt.yticks([]) plt.subplot(122),plt.imshow(cv2.cvtColor(inimage, cv2.COLOR_BGR2RGB)) plt.title('Detected Point'), plt.xticks([]), plt.yticks([]) plt.suptitle('TM_CCOEFF_NORMED') plt.show() 
Figure: red rectangle in output
In case, there are multiple matched objects, we will use thresholding to find best matched areas. and then extract location of these areas.
Make demo for image and template below:
Figure: input image
Figure: template
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 import cv2 import numpy as np from matplotlib import pyplot as plt inimage = cv2.imread('stars.png') img = cv2.cvtColor(inimage,cv2.COLOR_BGR2GRAY) img2 = img.copy() template = cv2.imread('star.png',0) w, h = template.shape[::-1] img = img2.copy() # Apply template Matching ress = cv2.matchTemplate(img,template,cv2.TM_CCOEFF_NORMED) threshold = 0.8 loc = np.where( ress >= threshold) for pt in zip(*loc[::-1]): cv2.circle(inimage,pt, 10, (0,255,0), -1) cv2.rectangle(inimage, pt, (pt + w, pt + h), (0,0,255), 2) #plot plt.subplot(111),plt.imshow(cv2.cvtColor(inimage, cv2.COLOR_BGR2RGB)) plt.title('Detected Point'), plt.xticks([]), plt.yticks([]) plt.suptitle('TM_CCOEFF_NORMED') plt.show() 
Figure: output
3.  Hough transform
Hough Transform is a popular technique to detect any shape. It can detect the shape even if it is broken or distorted a little bit.
In order to use this method you have to represent that shape in mathematical form.
3.1 Hough Line Transform
Let 's consider to detect a line.
In Cartesian coordinates: $y=mx+c$
In Polar coordinates $\rho =xcos\theta +ysin\theta$, line can be represented in these two terms $(\rho,\theta)$
Figure: a line in Polar coordinates
Create a 2D accumulator array and initialize it to 0. Let rows denote the $\rho$ and columns denote the $\theta$.
Size of array depends on the accuracy. If the accuracy of angles is 1 degree then number of columns is 180. If the accuracy of distance is 1 pixel, because the maximum distance of $\rho$ is the diagonal length of the image then number of rows is diagonal length of the image.
Consider a 100x100 image with a horizontal line at the middle. Do steps:
- Take the first point of the line with (x,y). Substitute (x,y) to $\rho =xcos\theta +ysin\theta$, for each value of $\theta$=0,1,2,....,180, we have one $\rho$. For every $(\rho,\theta)$ pair, you increment value by one in our accumulator in $(\rho,\theta)$ cells. So the cell (50,90) = 1 along with some other cells.
- Take the second point on the line. Do the same as above. Increment the the values in the cells corresponding to $(\rho,\theta)$. This time, the cell (50,90) = 2.
- Continue this process for every point on the line. At each point, the cell (50,90) will be incremented, while other cells may or may not be voted up.
- At the end, the cell (50,90) will have maximum votes which says, there is a line in this image at distance 50 from origin and at angle 90 degrees.
Figure: Hough Line Transform animation
OpenCV provides cv2.HoughLines() to detect lines in image with parameters:

+ First Input image should be a binary image.
Note: apply threshold or use canny edge detection before finding applying Hough transform.
+ Second and third parameters are $\rho$ and $\theta$ accuracies respectively.
+ Fourth argument is the threshold, which means minimum vote it should get for it to be considered as a line.
Note: number of votes depend upon number of points on the line. So it represents the minimum length of line that should be detected.
+ It returns an array of $(\rho,\theta)$ values.Each $(\rho,\theta)$ is a line.
Let 's make a demo to find lines for image below:
Figure: input image
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 import cv2 as cv import numpy as np import random img = cv.imread('images/lines.png') gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY) ret,thresh = cv.threshold(gray,150,255,cv.THRESH_BINARY_INV) lines = cv.HoughLines(thresh,1,5*np.pi/180,100) for line in zip(*lines[::-1]): b1 = random.randrange(1,255) g1 = random.randrange(1,255) r1 = random.randrange(1,255) rho,theta = line a = np.cos(theta) b = np.sin(theta) x0 = a*rho y0 = b*rho x1 = int(x0 + 1000*(-b)) y1 = int(y0 + 1000*(a)) x2 = int(x0 - 1000*(-b)) y2 = int(y0 - 1000*(a)) cv.line(img,(x1,y1),(x2,y2),(b1,g1,r1),2) cv.imshow('img',img) cv.waitKey(0) cv.destroyAllWindows() 
Figure: output using Hough Line Transform
The result has more than 2 lines, it depends on the accuracy of $(\rho,\theta)$. Moreover the line is not perfectly 1 pixel thin.
3.2 Probabilistic Hough Transform
Probabilistic Hough Transform is an optimization of Hough Transform. It doesn't take all the points into consideration. It only takes a random subset of points and that is sufficient for line detection. We just have to decrease the threshold.
OpenCV provides cv2.HoughLinesP(), it is implemented based on Robust Detection of Lines Using the Progressive Probabilistic Hough Transform.
+ minLineLength - Minimum length of line. Line segments shorter than this are rejected.
+ maxLineGap - Maximum allowed gap between line segments to treat them as single line.
+ It directly returns the two endpoints of lines.
Reuse the image above.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import cv2 import numpy as np img = cv2.imread('linen.png') gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) edges = cv2.Canny(gray,50,150,apertureSize = 3) #using MORPH_CLOSE to extend line after using Canny kernel = np.ones((5,5),np.uint8) closing = cv2.morphologyEx(edges, cv2.MORPH_CLOSE, kernel) lines = cv2.HoughLinesP(closing,1,np.pi/180,120,minLineLength=100,maxLineGap=1);print(lines) for line in zip(*lines[::-1]): x1,y1,x2,y2 = line cv2.line(img,(x1,y1),(x2,y2),(0,255,0),2) cv2.imshow('detected lines',img) cv2.waitKey(0) cv2.destroyAllWindows() 
Figure: output using Probabilistic Hough Transform
3.3 Hough Circle Transform
A circle is represented by mathematical form;
$(x-x_{center}^{2})+(y-y_{center}^{2})=r^{2}$
where:
$(x_{center},y_{center})$ : center of circle
$r$ : radius of circle
We have 3 parameters $(x_{center},y_{center}), r$, so we need a 3D accumulator for hough transform. It is ineffective so Hough Gradient Method is used. This method uses the gradient information of edges.
OpenCV provides cv.HoughCircles() to detect circles in image. Just modify 2 parameters to achieve the accuracy.
+ param1: First method-specific parameter. In case of HOUGH_GRADIENT , it is the higher threshold of the two passed to the Canny edge detector (the lower one is twice smaller).
+ param2 : Second method-specific parameter. In case of HOUGH_GRADIENT , it is the accumulator threshold for the circle centers at the detection stage. The smaller it is, the more false circles may be detected. Circles, corresponding to the larger accumulator values, will be returned first.
Let 's make a demo for image below:
Figure: input image
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 import numpy as np import cv2 as cv import cv2 img = cv.imread('circles.png') gray_img = cv.cvtColor(img,cv.COLOR_BGR2GRAY) circles = cv.HoughCircles(gray_img,cv2.cv.CV_HOUGH_GRADIENT,1,20, param1=300,param2=20,minRadius=0,maxRadius=0) circles = np.uint16(np.around(circles)) for i in circles[0,:]: # draw the outer circle cv.circle(img,(i,i),i,(255,255,0),2) # draw the center of the circle cv.circle(img,(i,i),2,(0,0,255),3) cv.imshow('detected circles',img) cv.waitKey(0) cv.destroyAllWindows() 
Adjust param1, param2 to achieve expectation.
4. Distance Transform
This is an operator applied to binary images. The result  is a gray-scale image that is the same size to the input image, except that the gray level intensities of points inside foreground regions are changed to show the distance to the closest boundary.
Imagine that you simultaneously start a fire at all points on the boundary of a foreground region in a binary image and letting the fire burn its way into the interior. If we then label each point in the interior with the amount of time that the fire took to reach that point, then we have effectively computed the distance transform of that region.
There are several kinds of distance transform, depending on the method to calculate the distance between pixels such as Euclidean Distance, City Block Distance, Chessboard Distance.
Figure: distance transform using Chessboard Distance (source)
OpenCV provides cv.distanceTransform() with some kinds of distance transform.
Figure: some kinds of distance transform
Let make a demo with the image below:
Figure: input image

 1 2 3 4 5 6 7 8 9 import cv2 as cv import cv2 import numpy as np import random img = cv.imread('images/disttrans.png') gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY) ret,thresh = cv.threshold(gray,127,255,cv.THRESH_BINARY) dist_transform = cv.distanceTransform(thresh,cv2.cv.CV_DIST_L2,5) cv.imwrite('dist_transform.png',dist_transform) 
Figure: the interior is brighter than outside
5. Image Segmentation with Watershed Algorithm
Any grayscale image can be viewed as a topographic surface where high intensity denotes hills and low intensity denotes valleys.

Figure: grayscale image
Figure: grayscale image can be viewed as a topographic surface (source)
Start filling valleys (local minimal) with colored water (labels). As the water rises, water from different valleys, obviously with different colors will start to merge. To avoid that, you build barriers in the locations where water merges.
Figure: water starts filling valleys and barriers are on the peak of hills (source)
Until all the peaks are under water, the barriers are the segmentation result.
Figure: barriers are in red - segmentation result
OpenCV implemented a marker-based watershed algorithm. With this algorithm, we have to specify which valleys need to be merged. So we give different labels for objects:
+ Label the region which is the foreground or object with one color (or intensity)
+ Label the region which is background or non-object with another color.
+ Label the region which is undetermined with 0.
Then our marker will be updated with the labels we gave, and the boundaries of objects will have a value of -1.
6. Application of Distance Transform and Watershed Algorithm
Segment the touching coins in image below:
Figure: the touching coins
Code and steps to do in comments
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 import numpy as np import cv2 as cv import cv2 from matplotlib import pyplot as plt img = cv.imread('images/coins.png') gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY) ret, thresh = cv.threshold(gray,0,255,cv.THRESH_BINARY_INV+cv.THRESH_OTSU) #create marker marker = np.ones((gray.shape),np.int32) #remove small dots and get sure background kernel = np.ones((3,3),np.uint8) sure_bg = cv.morphologyEx(thresh, cv.MORPH_OPEN, kernel, iterations = 2) #it is ok to use sure background above or can use this sure background #sure_bg = cv.dilate(opening,kernel,iterations=3) # Finding sure foreground area dist_transform = cv.distanceTransform(thresh,cv2.cv.CV_DIST_L2,5) ret, sure_fg = cv.threshold(dist_transform,0.7*dist_transform.max(),255,0) sure_fg = np.uint8(sure_fg) #markers for unknown areas=sure_bg-sure_fg unknown = cv.subtract(sure_bg,sure_fg) marker[unknown==255] = 0 #markers for sureforeground contours, hierarchy = cv.findContours(sure_fg, cv.RETR_CCOMP, cv.CHAIN_APPROX_SIMPLE) idx = len(contours)+1 for cnt in contours: cv.drawContours(marker, [cnt], 0, idx, -1 ) idx = idx -1 # apply watershed cv.watershed(img, marker) img[marker == -1] = [255,0,0] #display segmentation result plt.subplot(111),plt.imshow(img) plt.title('Result in JET'), plt.xticks([]), plt.yticks([]) plt.show() 
Figure: segmented result using Watershed