Getting started with exploring GTSRB

The GTSRB dataset, compiled and generously published by the real-time computer vision research group in Institut für Neuroinformatik, was originally used for a competition of classifying single images of traffic signs. It consists of a training set of 39,209 labeled images and a testing test of 12,630 unlabeled images. The training dataset contains 43 classes—43 types of traffic signs. We will go through all classes and exhibit several samples for each class.

The dataset can be downloaded via  http://benchmark.ini.rub.de/Dataset/GTSRB_Final_Training_Images.zip (located in the Downloads | Training dataset section on the page). Unzip the downloaded file and there will be a folder called Images containing 43 folders (00000, 00001... up to 00042); they represent 43 classes of images. These images are in the following form:

  • The image files are in PPM (short for portable pixmap) format.
  • The number of images from each class ranges from 210 to 2250. So it is an unbalanced multi-class classification problem.
  • Each image contains one traffic sign.
  • The sizes of the images are not uniform, ranging from 15*15 to 250*250 pixels, and images are not necessarily square.
  • Images contain a border of up to 10% around the actual sign. Thus the sign is not necessarily centered with the image.

Let's start by plotting a sample, 00000_00002.ppm, in the 00000 folder.

We use the pixmap package (https://cran.r-project.org/web/packages/pixmap) to read the PPM file:

> library('pixmap') 
> image <- read.pnm('GTSRB/Final_Training/Images/00000/00000_00002.ppm',cellres=1) 

Now we obtain a pixmapRGB object with attributes red, green, and blue (which are the pixels for each of the three channels), as well as size, which is the width and height of the image. And we can access the red, green, and blue channel as follows:

> red_matrix <- matrix(image@red, nrow = image@size[1], ncol = image@size[2]) 
> green_matrix <- matrix(image@green, nrow = image@size[1], ncol = image@size[2]) 
> blue_matrix <- matrix(image@blue, nrow = image@size[1], ncol = image@size[2]) 

We visualize the original image and its three channels individually:

> plot(image, main=sprintf("Original")) 
> rotate <- function(x) t(apply(x, 2, rev)) 
> par(mfrow=c(1, 3)) 
> image(rotate(red_matrix), col = grey.colors(255), main=sprintf("Red")) 
> image(rotate(green_matrix), col = grey.colors(255), main=sprintf("Green")) 
> image(rotate(blue_matrix), col = grey.colors(255), main=sprintf("Blue")) 

Note that here we reuse the rotate function we defined in the last chapter to better view the images. This is an original image:

The following images show the output for the red, blue, and green channel, respectively:

It is a 20 km/h speed limit sign.

We can now go through 43 classes of signs and display three samples for each type by defining the following function:

> plot_samples <- function(training_path, class, num_sample){ 
+     classes <- c("Speed limit (20km/h)", "Speed limit (30km/h)",  
                   "Speed limit (50km/h)", "Speed limit (60km/h)", 
+                  "Speed limit (70km/h)", "Speed limit (80km/h)",  
                   "End of speed limit (80km/h)", 
+                  "Speed limit (100km/h)", "Speed limit (120km/h)",  
                   "No passing",  
+                  "No passing for vehicles over 3.5 metric tons",  
                   "Right-of-way at the next intersection",  
+                  "Priority road", "Yield", "Stop", "No vehicles",  
                   "Vehicles over 3.5 metric tons prohibited", 
+                  "No entry", "General caution", "Dangerous curve to  
                   the left", "Dangerous curve to the right", 
+                  "Double curve", " Bumpy road", "Slippery road",  
                   "Road narrows on the right", "Road work", 
+                  "Traffic signals", "Pedestrians", "Children  
                   crossing", "Bicycles crossing",  
                   "Beware of ice/snow", 
+                  "Wild animals crossing",  
                   "End of all speed and passing limits",  
                   "Turn right ahead",  
+                  "Turn left ahead", "Ahead only",  
                   "Go straight or right", "Go straight or left",  
                   "Keep right", "Keep left", "Roundabout mandatory",  
                   "End of no passing",  
+                  "End of no passing by vehicles over 3.5 metric  
                    tons") 
+     if (class<10) { 
+       path <- paste(training_path, "0000", class, "/", sep="") 
+     } else { 
+       path <- paste(training_path, "000", class, "/", sep="") 
+     } 
+     par(mfrow=c(1, num_sample)) 
+     # Randomly display num_sample samples 
+     all_files <- list.files(path = path) 
+     title <- paste('Class', class, ':', classes[class+1]) 
+     print(paste(title, "          (", length(all_files),  
                 " samples)", sep="")) 
+     files <- sample(all_files, num_sample) 
+     for (file in files) { 
+       image <- read.pnm(paste(path, file, sep=""), cellres=1) 
+       plot(image) 
+     } 
+     mtext(title, side = 3, line = -23, outer = TRUE) 
+ } 

Call the function with class=0:

> training_path <- "GTSRB/Final_Training/Images/" 
> plot_samples(training_path, 0, 3) 
[1] "Class 0 : Speed limit (20km/h)          (211 samples)" 

Three samples are displayed:

Repeat this function call with a different class (or use a loop) to go through the remaining 42 types:

> plot_samples(training_path, 1, 3) 
[1] "Class 1 : Speed limit (30km/h)          (2221 samples)" 

Three samples from class 1 are displayed as follows:

> plot_samples(training_path, 2, 3) 
[1] "Class 2 : Speed limit (50km/h)          (2251 samples)" 

Three images from class 2 are shown here:

> plot_samples(training_path, 3, 3) 
[1] "Class 3 : Speed limit (60km/h)          (1411 samples)" 

Here is the result for class 3:

> plot_samples(training_path, 4, 3) 
[1] "Class 4 : Speed limit (70km/h)          (1981 samples)" 

We plot three images from class 4:

Here we skip the remainder, but it is clear that the images were captured under various conditions, including weather, illumination, occlusion, rotations, and so on. Instead, we list all types of sample sizes for easy reference:

ID

Type

Number of samples

0

Speed limit (20km/h)

211

1

Speed limit (30km/h)

2221

2

Speed limit (50km/h)

2251

3

Speed limit (60km/h)

1411

4

Speed limit (70km/h)

1981

5

Speed limit (80km/h)

1861

6

End of speed limit (80km/h)

421

7

Speed limit (100km/h)

1441

8

Speed limit (120km/h)

1411

9

No passing

1471

10

No passing for vehicles over 3.5 metric tons

2011

11

Right-of-way at the next intersection

1321

12

Priority road

2101

13

Yield

2161

14

Stop

781

15

No vehicles

631

16

Vehicles over 3.5 metric tons prohibited

421

17

No entry

1111

18

General caution

1201

19

Dangerous curve to the left

211

20

Dangerous curve to the right

361

21

Double curve

331

22

Bumpy road

391

23

Slippery road

511

24

Road narrows on the right

271

25

Road work

1501

26

Traffic signals

601

27

Pedestrians

241

28

Children crossing

541

29

Bicycles crossing

271

30

Beware of ice/snow

451

31

Wild animals crossing

781

32

End of all speed and passing limits

241

33

Turn right ahead

690

34

Turn left ahead

421

35

Ahead only

1201

36

Go straight or right

391

37

Go straight or left

211

38

Keep right

2071

39

Keep left

301

40

Roundabout mandatory

361

41

End of no passing

241

42

End of no passing by vehicles over 3.5 metric tons

241

 

Obviously, the signs, our regions of interest (ROI)are not centered within the images, whose sizes unfortunately vary. As a result, we need to separate the ROI from the image and standardize its size (resizing it to 32*32 as most researchers have done) before we can analyze and classify the data. We resort to the annotations provided along with the images. Each class folder contains an annotation file, for example, GT-00000.csv located at 00000. Each annotation file contains the following useful fields:

  • Filename: The filename of the image
  • ROI.X1: The x coordinate of the top-left corner of the ROI bounding box
  • ROI.Y1: The y coordinate of the top-left corner of the ROI bounding box
  • ROI.X2: The x coordinate of the bottom-right corner of the ROI bounding box
  • ROI.Y2: The y coordinate of the bottom-right corner of the ROI bounding box

Here is an example of the ROI in a sample:

Now we define the preprocessing function for a raw image, which includes separating the ROI and resizing it to 32*32:

> source("http://bioconductor.org/biocLite.R") 
> biocLite("EBImage") 
> library("EBImage") 
> roi_resize <- function(input_matrix, roi){ 
+     roi_matrix <- input_matrix[roi[1, 'Roi.Y1']:roi[1, 'Roi.Y2'],  
                    roi[1, 'Roi.X1']:roi[1, 'Roi.X2']] 
+     return(resize(roi_matrix, 32, 32)) 
+ } 

Note that the resize function is from the EBImage package:

https://bioconductor.org/packages/release/bioc/html/EBImage.html

We try it out on our first sample (red channel only):

> # read annotation csv file 
> annotation <- read.csv(file="GTSRB/Final_Training/Images/00000/GT-00000.csv", header=TRUE, sep=";") 
> roi = annotation[3, ] 
> red_matrix_cropped <- roi_resize(red_matrix, roi) 
> par(mfrow=c(1, 2)) 
> image(rotate(red_matrix), col = grey.colors(255) , main=sprintf("Original")) 
> image(rotate(red_matrix_cropped), col = grey.colors(255) , main=sprintf("Preprocessed")) 

We get the preprocessed red channel on the right:

Similarly, we can process the other two channels. Based on these three channels, how can we construct the feature space? Discarding any channel might result in loss of information. Simply stacking them up could lead to redundancy. So, combining three channels into one would be a better solution. In the color world, Y'UV is an encoding system that encrypts brightness information separately from color information. It is typically used as part of a color image pipeline and computer graphics hardware. Y'UV represents human perception of color in terms of three components: Y' as the luminance (brightness), and U and V as the chrominance (color). Y'UV can be converted from RGB using:

  • Y' = 0.299R + 0.587G + 0.114B
  • U = 0.492(B - Y')
  • V = 0.877(R - Y')

For our feature space, we can only take the brightness channel Y'.

Now that we have the last piece of the whole preprocessing ready, let's put them together, load, and process (ROI + resize + conversion to Y') the entire labeled dataset:

> load_labeled_data <- function(training_path, classes){ 
+   # Initialize the pixel features X and target y 
+   X <- matrix(, nrow = 0, ncol = 32*32) 
+   y <- vector() 
+   # Load images from each of the 43 classes 
+   for(i in classes) { 
+     print(paste('Loading images from class', i)) 
+     if (i<10) { 
+       annotation_path <- paste(training_path, "0000", i, "/GT-0000",  
                                 i, ".csv", sep="") 
+       path <- paste(training_path, "0000", i, "/", sep="") 
+     } else { 
+       annotation_path <- paste(training_path, "000", i, "/GT-000",  
                                 i, ".csv", sep="") 
+       path <- paste(training_path, "000", i, "/", sep="") 
+     } 
+     annotation <- read.csv(file=annotation_path, header=TRUE,  
                             sep=";") 
+      
+     for (row in 1:nrow(annotation)) { 
+       # Read each image 
+       image_path <- paste(path, annotation[row, "Filename"], sep="") 
+       image <- read.pnm(image_path, cellres=1) 
+       # Parse RGB color space 
+       red_matrix <- matrix(image@red, nrow = image@size[1],  
                             ncol = image@size[2]) 
+       green_matrix <- matrix(image@green, nrow = image@size[1],   
                               ncol = image@size[2]) 
+       blue_matrix <- matrix(image@blue, nrow = image@size[1],  
                              ncol = image@size[2]) 
+       # Crop ROI and resize 
+       red_matrix_cropped <- roi_resize(red_matrix,  
                                         annotation[row, ]) 
+       green_matrix_cropped <- roi_resize(green_matrix,  
                                           annotation[row, ]) 
+       blue_matrix_cropped <- roi_resize(blue_matrix,  
                                          annotation[row, ]) 
+       # Convert to brightness, e.g. Y' channel 
+       x <- 0.299 * red_matrix_cropped + 0.587 *  
             green_matrix_cropped + 0.114 * blue_matrix_cropped 
+       X <- rbind(X, matrix(x, 1, 32*32)) 
+       y <- c(y, i) 
+     } 
+      
+   } 
+    
+   return(list("x" = X, "y" = y)) 
+ } 

After defining the data loading function as shown previously, we apply it to the entire raw dataset:

> classes <- 0:42 
> data <- load_labeled_data(training_path, classes) 

Be patient as it might take a couple of hours to read and process 39,209 images. Just in case anything unexpected happens, a good practice is to save the data object so that we can restore it anytime later:

> # Save the data object to a file 
> saveRDS(data, file = "43 classes.rds") 
> # Restore the data object 
> data <- readRDS(file = "43 classes.rds") 

Just do a quick check on the ready-to-use data:

> data.x <- data$x 
> data.y <- data$y 
> dim(data.x) 
[1] 39209  1024 

Correct dimension!

> summary(as.factor(data.y)) 
   0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22  
 210 2220 2250 1410 1980 1860  420 1440 1410 1470 2010 1320 2100 2160  780  630  420 1110 1200  210  360  330  390  
  23   24   25   26   27   28   29   30   31   32   33   34   35   36   37   38   39   40   41   42  
 510  270 1500  600  240  540  270  450  780  240  689  420 1200  390  210 2070  300  360  240  240 

Correct class sizes, and again they are rather unbalanced!

Never skip checking the class balance for classification.

After ensuring that the data is loaded and processed properly, we do more exploratory analysis on the distribution of features, that is, the pixels. As an example, we take the 16 pixels from the central 4*4 block (222nd to 225th, 254th to 257th, 286th to 289th, and 318th to 321st) in each image from class 1 (Speed limit=30km/h), 14 (Stop), 20 (Dangerous curve to the right), and 27 (Pedestrians). We display their histograms:

> central_block <- c(222:225, 254:257, 286:289, 318:321) 
> par(mfrow=c(2, 2)) 
> for(i in c(1, 14, 20, 27)) { 
+   hist(c(as.matrix(data.x[data.y==i, central_block])),  
+        main=sprintf("Histogram for class %d", i),  
+        xlab="Pixel brightness") 
+ } 

The resulting pixel brightness histograms are displayed as follows:

The brightness of the central pixels is distributed differently among these four classes. For instance, the majority of the central pixels from class 20 are dark, as the sign (Dangerous curve to the right) has a thick black stroke through the center; while in class 14, the stop sign has a white stroke (the left part of the O) near the central area. Pixels taken from other positions can also be distinctly distributed among different classes.

The exploratory analysis we just conducted helps us to move forward with building classification models based on pixels.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset