Understanding an Image

Grid-like representation

A 2D image can be represented as a rectangular grid, composed of many square cells, called pixels.

How do these pixels look like?

Just try to imagine a chessboard:


Just like the black and white squares on the chessboard, pixels are nicely aligned in straight lines, both horizontally and vertically. We will refer to the horizontal ones as rows and to the vertical ones as columns. It is easy to see that a chessboard has 8 rows and 8 columns.

But can you guess how many rows and columns of pixels this image has?


To find this out using Python and OpenCV:

1) Save the image in the folder where you wish to run your Python code.
2) Import OpenCV library in your Python interpreter or .py file (if you are not familiar with running Python code, just look it up, it should be pretty easy).

import cv2

3) Load the image in memory (we will refer to this operation as reading an image).

image = cv2.imread('<replace_with_image_name>.jpg')

4) Print the image dimensions.

print image.shape

The program should print (232, 223, 3). The first two numbers are the number of rows and number of columns. So, 232 x 223 = 51736 pixels in total. For now, don’t worry about the third number, it represents the number of colors or, more formally, the number of color channels. We will talk about it in the last section.

Identifying pixels

Now that you know how to find the number of pixels in an image, how do you differentiate between them?

First of all through their address. In real life we use street names, house numbers, city, country etc. Pixels could also be seen as having a street name and a house number.

Street Name = How many rows are there upwards of the pixel?
House Number = How many columns are there to the left of the pixel?

A pixel location is formally defined as a pair (row, column), both indexed from 0 (their numbering starts from 0, instead of 1). See if you can guess the location of the red, green and blue pixels on the following grid.


The right answers are:
Red pixel at (3,5)
Green pixel at (13,8)
Blue pixel at (10,21)

In Python, for accessing a pixel at location (row, column), we would have to specify the image that we are referring to, and the row and column in square brackets. For example, the pixel in the center of the image would be accessed as follows:

import cv2

image = cv2.imread('<replace_with_image_name>.jpg')
no_rows = image.shape[0]
no_cols = image.shape[1]

# Remember that image.shape returns (no_rows, no_cols, no_channels)
# [0] fetches the first number in brackets, [1] - the second one and so on

row = no_rows/2
col = no_cols/2
print image[row, col]

If you run this on the image with Spiderman, the script should print [ 63 75 141].

What color do you expect it to have?

We can visualize the subimage that contains this only one pixel. Generally, if you want to crop a rectangular portionwith top-left pixel at (start_row, start_col) and bottom-right pixel at (end_row, end_col), you would use:

sub_image = image[start_row:end_row + 1, start_col:end_col + 1]

and for our particular case:

pixel_image = image[row:row+1, col:col+1]

Because this is just a single-pixel image, it is more difficult to visualize, so I recommend resizing the image before displaying it. This can be done as follows:

desired_no_rows = 100
desired_no_cols = 100
pixel_image = cv2.resize(pixel_image, (desired_no_cols, desired_no_rows))

Now, if you just want to display the image obtained, there is the cv2.imshow(window_name, image) function, which is perfect especially for debugging, when you change an image a number of times and you want to make sure that you are doing the right thing after each step.

For writing an image to disk, you would use the cv2.imwrite(image_name, image) function.

Code for our case:

cv2.imshow('This looks like one pixel', pixel_image)
cv2.waitKey() # For resuming running the code once a key is pressed
cv2.imwrite('pixel.png', pixel_image)

And the pixel that I got is this one


Looks reddish, which makes sense because the center of the image falls on a red patch from Spiderman’s costume.

Another useful tool when analyzing images is the iteration through every pixel. This might turn out helpful in finding pixels or regions with certain properties, or in couting certain events that occur in the image.

Row by row (one street at a time):

no_rows = image.shape[0]
no_cols = image.shape[1]
for row in range(no_rows):
    for col in range(no_cols):
        print image[row, col], 'at row', row, 'column', col

Column by column (one house number at a time):

no_rows = image.shape[0]
no_cols = image.shape[1]
for col in range(no_cols):
    for row in range(no_rows):
        print image[row, col], 'at row', row, 'column', col

Color representation

Let us get back to the chessboard analogy. Each cell was either black or white. Pixels are the same, but instead of just 2 possible values to describe one pixel

alt {0 = black, 1 = white}

we have 256 possibilities for in-between, transitional colors for grayscale images

alt {0 = black, 1, 2, …, 128 = gray, …, 255 = white}

and 256 x 256 x 256 possibilities for colored(BGR) images.

alt x (256 variations in the strength of blue)

alt x (256 variations in the strength of green)

alt (256 variations in the strength of red)

So the color of pixels can be represented in a 3-dimensional space.
Remember [ 63 75 141]?
The description of the pixel in the center.
And the third number in (232, 223, 3)?
Which showed the number of color channels.

Well, one component (axis) is to measure the strength of blue(63), one to measure the strength of green(75) and one to measure the strength of red(141). It turns out that combinations of these 3 colors are enough to create most of the visually perceivable colors by humans. Thus, the BGR color space can be viewed as a cube, where 3 opposite points represent blue, green and red. All the other points represent combinations of these 3 core colors.


Where blue(255,0,0) and red(0,0,255) meet, we get magenta(255,0,255).
Where red(0,0,255) and green(0,255,0) meet, we get yellow(0,255,255).
Where green(0,255,0) and blue(255,0,0) meet, we get cyan(255,255,0).

What can you notice about colors in the BGR color space?

Their components are summed up independently.

What BGR components do you think black has?
What BGR components do you think white has?

Have a look at the highlighted region in the image with Spiderman…


…put under the microscope


Source code to reproduce this effect for any image

If you look closely, you will notice that I only used black, blue, green and red pixels. Therefore no yellow and no white. They are simply an illusion. Feel free to inspect the white regions and see that white is created by strong blue, strong green and strong red. On the other hand, black is the result of the absence of blue, green and red components. This is why the BGR color space is called additive. The more color you add, the brighter the result is.

Another way to come to the same conclusion is to display one component at a time. That is, if I want to see the green component of an image, I’ll simply make the other two components equal zero. And the same with blue and green independently.

blue_component = image.copy() 
green_component = image.copy()
red_component = image.copy()

# First copy image into 3 separate images
# Then make other components equal 0
# Can use the code for iterating an image

no_rows = image.shape[0]
no_cols = image.shape[1]

# Isolate blue component
for row in range(no_rows):
    for col in range(no_cols):
        blue = blue_component[row, col, 0]
        blue_component[row, col] = (blue, 0, 0)

cv2.imwrite('blue.jpg', blue_component)

# Isolate green component
for row in range(no_rows):
    for col in range(no_cols):
        green = green_component[row, col, 1]
        green_component[row, col] = (0, green, 0)

cv2.imwrite('green.jpg', green_component)

# Isolate red component
for row in range(no_rows):
    for col in range(no_cols):
        red = red_component[row, col, 2]
        red_component[row, col] = (0, 0, red)

cv2.imwrite('red.jpg', red_component)

The wiser alternative is to use cv2.split and cv2.merge as follows:

zeros = np.zeros((no_rows, no_cols), np.uint8)
B,G,R = cv2.split(image)

blue_component = cv2.merge((B, zeros, zeros))
green_component = cv2.merge((zeros, G, zeros))
red_component = cv2.merge((zeros, zeros, R))

And the result is:


The last image is in a different colorspace, called grayscale. We have seen that this space has only 256 variations of light intensity (or the color gray). Having just one component (in this case the intensity component) can turn out very convenient, because it can enable us to view an image as a landscape, which can provide many valuable intuitions for certain image processing tasks.

Conversion from BGR to grayscale:

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

Viewed from above it is very similar to the initial image alt
From the sides we can start seeing the dark spots as valleys alt
And the white spots as mountains, peaks or plateaus alt
Transitions can be smooth or abrupt and help us distinguish borders and regions alt


After reading this article you should know or keep in mind the following:

  1. How images have a grid-like representation, with small square cells named pixels.
  2. Using cv2.imread to store image in memory.
  3. Using shape to get image dimensions.
  4. Using cv2.imshow to display an image, good for debugging purposes.
  5. Using cv2.imwrite to save an image to disk.
  6. Getting the pixel at a specific location, given by row and column.
  7. Iterating through an image.
  8. Extracting a rectangular region from an image.
  9. Using cv2.resize to adapt an image to certain dimensions.
  10. Color representation: grayscale vs. BGR.
  11. Using cv2.cvtColor to convert between different color spaces.
  12. Separation of the blue, green and red components in BGR images (cv2.split, cv2.merge).
  13. Landscape representation of grayscale images.


1.. Mirror the Image

Using the iteration process of an image, flip an image around the vertical axis. Now flip it around the horizontal axis. If you are already familiar with writing such for loops, note that you can use the

cv2.flip(image, flipCode) function.



Vertical flip:


Horizontal flip:



# Run: python mirror.py <image_name>

import cv2
import sys

# flips around the horizontal axis, done manually(with iteration)
def flipHorizontallyWithIteration(image):
    flipped = image.copy()
    no_rows = flipped.shape[0]
    no_cols = flipped.shape[1]
    for col in range(no_cols):
        halfway = no_rows/2
        for row in range(halfway):
            pix = flipped[row, col].copy()
            flipped[row, col] = flipped[no_rows - row - 1, col].copy()
            flipped[no_rows - row - 1, col] = pix
    return flipped

# flips around the vertical axis, done manually(with iteration)
def flipVerticallyWithIteration(image):
    flipped = image.copy()
    no_rows = flipped.shape[0]
    no_cols = flipped.shape[1]
    for row in range(no_rows):
        halfway = no_cols/2
        for col in range(halfway):
            pix = flipped[row, col].copy()
            flipped[row, col] = flipped[row, no_cols - col - 1].copy()
            flipped[row, no_cols - col - 1] = pix
    return flipped

# flips around the horizontal axis
def flipHorizontally(image):
    flipped = cv2.flip(image, 0)
    return flipped

# flips around the vertical axis
def flipVertically(image):
    flipped = cv2.flip(image, 1)
    return flipped

# fetch the name of the image and store it in memory
image_name = sys.argv[1]
image = cv2.imread(image_name)

vertical = flipVerticallyWithIteration(image)
horizontal = flipHorizontallyWithIteration(image)

# output the images
cv2.imwrite('vertical_flip.jpg', vertical)
cv2.imwrite('horizontal_flip.jpg', horizontal)


2.. Make a Puzzle

Write a method that first divides the image into square blocks with a given length(a certain number of pixels on each side). To make sure each block contains the same number of pixels from the initial image, adjust the image dimensions to a multiple of the requested length. Your method should then scramble the blocks in a random or different order. To make sure you correctly divide the image into blocks, add borders to them before merging them back together. You could use the

cv2.copyMakeBorder(image, pixels_on_top, pixels_on_bottom, pixels_on_left, pixels_on_right) function.



Example result (60 pixels):


Example result (30 pixels):


Example result (15 pixels):



# Run: python puzzle.py <image_name> <no pixels/block side> <border thickness in pixels>

import cv2
import numpy as np
import sys
import random

# Input: which image to scramble, 
# the number of pixels on one side of a square block, 
# the number of pixels for border between blocks
# Output: scrambled image
def scrambleImage(image, pixs, addBorder):
    scrambled = image.copy()

    no_rows = scrambled.shape[0]
    no_cols = scrambled.shape[1]

    rows_n = (no_rows/pixs)*pixs
    cols_n = (no_cols/pixs)*pixs

    # resize to make sure each block has the same number of pixels from the initial image
    scrambled = cv2.resize(scrambled, (cols_n, rows_n))

    # store all the blocks in a list, which we shuffle later and remerge
    allBlocks = []

    # for loops to extract all blocks
    for row in range(rows_n-pixs+1):
        for col in range(cols_n-pixs+1):
            if (row % pixs == 0 and col % pixs == 0):
                block = scrambled[row:row+pixs, col:col+pixs].copy()

    # randomly shuffle

    # recompute the number of rows and columns after border is added
    hm_rows = rows_n/pixs
    hm_cols = cols_n/pixs
    rows_n = rows_n + 2*hm_rows*addBorder
    cols_n = cols_n + 2*hm_cols*addBorder
    scrambled = np.zeros((rows_n, cols_n, 3), np.uint8)
    pixs = pixs + 2*addBorder

    next_block = 0

    # reassemble the blocks
    for row in range(rows_n-pixs+1):
        for col in range(cols_n-pixs+1):
            if (row % pixs == 0 and col % pixs == 0):
                cur_block = allBlocks[next_block].copy()
                next_block += 1
                with_borders = cv2.copyMakeBorder(cur_block,
                    addBorder,addBorder,addBorder,addBorder,cv2.BORDER_CONSTANT,value = 0)
                scrambled[row:row+pixs, col:col+pixs] = with_borders

    return scrambled

image_name = sys.argv[1]
pix_per_square = int(sys.argv[2])
border_thick = int(sys.argv[3])
image = cv2.imread(image_name)

scrambled = scrambleImage(image, pix_per_square, border_thick)

cv2.imwrite(str(pix_per_square) + '_scrambled_' + image_name, scrambled)


3.. Replace by the Average Pixel

Write a method that replaces a given rectangular region specified by the top-left pixel and bottom-right pixel with the average of the pixels contained.



Example result (region consisting of top 15 rows replaced by average):


Example result (all rows replaced by their average):


Example result (all columns replaced by their average):



# Run: python average.py <image_name> <start_row> <end_row> <start_col> <end_col>

import cv2
import numpy as np
import sys

# replaces a given rectangluar portion by its average
def replaceRectByAverage(image, startRow, endRow, startCol, endCol):
    blue_sum = 0.0
    green_sum = 0.0
    red_sum = 0.0
    pixel_count = 0

    rectangle = image[startRow:endRow+1, startCol:endCol+1]
    rows = rectangle.shape[0]
    cols = rectangle.shape[1]

    for row in range(rows):
        for col in range(cols):
            pixel = rectangle[row, col]
            blue_sum += float(pixel[0])
            green_sum += float(pixel[1])
            red_sum += float(pixel[2])
            pixel_count += 1

    #print blue_sum, green_sum, red_sum, pixel_count

    average_blue = int(blue_sum/float(pixel_count))
    average_green = int(green_sum/float(pixel_count))
    average_red = int(red_sum/float(pixel_count))

    #print average_blue, average_green, average_red

    new_image = image.copy()
    new_rectangle = np.zeros((1,1,3), np.uint8)
    new_rectangle[0,0] = (average_blue, average_green, average_red)
    new_rectangle = cv2.resize(new_rectangle, (cols, rows))
    new_image[startRow:endRow+1, startCol:endCol+1] = new_rectangle

    return new_image

# replaces each row by the average pixel in the row
def replaceEachRowByAverage(image):
    average_rows = image.copy()
    no_rows = image.shape[0]
    no_cols = image.shape[1]
    for row in range(no_rows):
        average_rows = replaceRectByAverage(average_rows, row, row, 0, no_cols - 1)
    cv2.imwrite('average_rows.jpg', average_rows)

# replaces each column by the average pixel in the column
def replaceEachColByAverage(image):
    average_cols = image.copy()
    no_rows = image.shape[0]
    no_cols = image.shape[1]
    for col in range(no_cols):
        average_cols = replaceRectByAverage(average_cols, 0, no_rows - 1, col, col)
    cv2.imwrite('average_cols.jpg', average_cols)

image_name = sys.argv[1]
sr = int(sys.argv[2]) # start row
er = int(sys.argv[3]) # end row
sc = int(sys.argv[4]) # start col
ec = int(sys.argv[5]) # end col

image = cv2.imread(image_name)

average_image = replaceRectByAverage(image, sr, er, sc, ec)
cv2.imwrite('average_' + image_name, average_image)



If you found this useful please take a moment to share this with your friends.

Subscribe to our mailing list

  5 comments for “Understanding an Image

  1. David Miller
    December 16, 2014 at 10:01 am

    That’s awesome, thank you for the so simple and clear-cut explanation.

  2. t
    December 17, 2014 at 2:31 am

    How were the 3D (“Viewed from above”) images generated? I’m not familiar with OpenCV, so apologies if it’s something really obvious.

  3. January 5, 2015 at 11:57 pm

    Isolating the components of an image is better accomplished with the cv2.split function. Explicitly looping the rows and columns is very slow and expensive.

    • January 6, 2015 at 11:58 am

      Thanks for the comment. I added a snippet to show the use of cv2.split and cv2.merge functions.

Leave a Reply

Your email address will not be published. Required fields are marked *