Getting started with exploring GTSRB

The GTSRB dataset, compiled and generously published by the real-time computer vision research group in Institut für Neuroinformatik, was originally used for a competition of classifying single images of traffic signs. It consists of a training set of 39,209 labeled images and a testing test of 12,630 unlabeled images. The training dataset contains 43 classes—43 types of traffic signs. We will go through all classes and exhibit several samples for each class.

The dataset can be downloaded via http://benchmark.ini.rub.de/Dataset/GTSRB_Final_Training_Images.zip (located in the Downloads | Training dataset section on the page). Unzip the downloaded file and there will be a folder called Images containing 43 folders (00000, 00001... up to 00042); they represent 43 classes of images. These images are in the following form:

The image files are in PPM (short for portable pixmap) format.
The number of images from each class ranges from 210 to 2250. So it is an unbalanced multi-class classification problem.
Each image contains one traffic sign.
The sizes of the images are not uniform, ranging from 15*15 to 250*250 pixels, and images are not necessarily square.
Images contain a border of up to 10% around the actual sign. Thus the sign is not necessarily centered with the image.

Let's start by plotting a sample, 00000_00002.ppm, in the 00000 folder.

We use the pixmap package (https://cran.r-project.org/web/packages/pixmap) to read the PPM file:

> library('pixmap') 
> image <- read.pnm('GTSRB/Final_Training/Images/00000/00000_00002.ppm',cellres=1)

Now we obtain a pixmapRGB object with attributes red, green, and blue (which are the pixels for each of the three channels), as well as size, which is the width and height of the image. And we can access the red, green, and blue channel as follows:

> red_matrix <- matrix(image@red, nrow = image@size[1], ncol = image@size[2]) 
> green_matrix <- matrix(image@green, nrow = image@size[1], ncol = image@size[2]) 
> blue_matrix <- matrix(image@blue, nrow = image@size[1], ncol = image@size[2])

We visualize the original image and its three channels individually:

> plot(image, main=sprintf("Original")) 
> rotate <- function(x) t(apply(x, 2, rev)) 
> par(mfrow=c(1, 3)) 
> image(rotate(red_matrix), col = grey.colors(255), main=sprintf("Red")) 
> image(rotate(green_matrix), col = grey.colors(255), main=sprintf("Green")) 
> image(rotate(blue_matrix), col = grey.colors(255), main=sprintf("Blue"))

Note that here we reuse the rotate function we defined in the last chapter to better view the images. This is an original image:

The following images show the output for the red, blue, and green channel, respectively:

It is a 20 km/h speed limit sign.

We can now go through 43 classes of signs and display three samples for each type by defining the following function:

> plot_samples <- function(training_path, class, num_sample){ 
+     classes <- c("Speed limit (20km/h)", "Speed limit (30km/h)",  
                   "Speed limit (50km/h)", "Speed limit (60km/h)", 
+                  "Speed limit (70km/h)", "Speed limit (80km/h)",  
                   "End of speed limit (80km/h)", 
+                  "Speed limit (100km/h)", "Speed limit (120km/h)",  
                   "No passing",  
+                  "No passing for vehicles over 3.5 metric tons",  
                   "Right-of-way at the next intersection",  
+                  "Priority road", "Yield", "Stop", "No vehicles",  
                   "Vehicles over 3.5 metric tons prohibited", 
+                  "No entry", "General caution", "Dangerous curve to  
                   the left", "Dangerous curve to the right", 
+                  "Double curve", " Bumpy road", "Slippery road",  
                   "Road narrows on the right", "Road work", 
+                  "Traffic signals", "Pedestrians", "Children  
                   crossing", "Bicycles crossing",  
                   "Beware of ice/snow", 
+                  "Wild animals crossing",  
                   "End of all speed and passing limits",  
                   "Turn right ahead",  
+                  "Turn left ahead", "Ahead only",  
                   "Go straight or right", "Go straight or left",  
                   "Keep right", "Keep left", "Roundabout mandatory",  
                   "End of no passing",  
+                  "End of no passing by vehicles over 3.5 metric  
                    tons") 
+     if (class<10) { 
+       path <- paste(training_path, "0000", class, "/", sep="") 
+     } else { 
+       path <- paste(training_path, "000", class, "/", sep="") 
+     } 
+     par(mfrow=c(1, num_sample)) 
+     # Randomly display num_sample samples 
+     all_files <- list.files(path = path) 
+     title <- paste('Class', class, ':', classes[class+1]) 
+     print(paste(title, "          (", length(all_files),  
                 " samples)", sep="")) 
+     files <- sample(all_files, num_sample) 
+     for (file in files) { 
+       image <- read.pnm(paste(path, file, sep=""), cellres=1) 
+       plot(image) 
+     } 
+     mtext(title, side = 3, line = -23, outer = TRUE) 
+ }

Call the function with class=0:

> training_path <- "GTSRB/Final_Training/Images/" 
> plot_samples(training_path, 0, 3) 
[1] "Class 0 : Speed limit (20km/h)          (211 samples)"

Three samples are displayed:

Repeat this function call with a different class (or use a loop) to go through the remaining 42 types:

> plot_samples(training_path, 1, 3) 
[1] "Class 1 : Speed limit (30km/h)          (2221 samples)"

Three samples from class 1 are displayed as follows:

> plot_samples(training_path, 2, 3) 
[1] "Class 2 : Speed limit (50km/h)          (2251 samples)"

Three images from class 2 are shown here:

> plot_samples(training_path, 3, 3) 
[1] "Class 3 : Speed limit (60km/h)          (1411 samples)"

Here is the result for class 3:

> plot_samples(training_path, 4, 3) 
[1] "Class 4 : Speed limit (70km/h)          (1981 samples)"

We plot three images from class 4:

Here we skip the remainder, but it is clear that the images were captured under various conditions, including weather, illumination, occlusion, rotations, and so on. Instead, we list all types of sample sizes for easy reference:

ID	Type	Number of samples
0	Speed limit (20km/h)	211
1	Speed limit (30km/h)	2221
2	Speed limit (50km/h)	2251
3	Speed limit (60km/h)	1411
4	Speed limit (70km/h)	1981
5	Speed limit (80km/h)	1861
6	End of speed limit (80km/h)	421
7	Speed limit (100km/h)	1441
8	Speed limit (120km/h)	1411
9	No passing	1471
10	No passing for vehicles over 3.5 metric tons	2011
11	Right-of-way at the next intersection	1321
12	Priority road	2101
13	Yield	2161
14	Stop	781
15	No vehicles	631
16	Vehicles over 3.5 metric tons prohibited	421
17	No entry	1111
18	General caution	1201
19	Dangerous curve to the left	211
20	Dangerous curve to the right	361
21	Double curve	331
22	Bumpy road	391
23	Slippery road	511
24	Road narrows on the right	271
25	Road work	1501
26	Traffic signals	601
27	Pedestrians	241
28	Children crossing	541
29	Bicycles crossing	271
30	Beware of ice/snow	451
31	Wild animals crossing	781
32	End of all speed and passing limits	241
33	Turn right ahead	690
34	Turn left ahead	421
35	Ahead only	1201
36	Go straight or right	391
37	Go straight or left	211
38	Keep right	2071
39	Keep left	301
40	Roundabout mandatory	361
41	End of no passing	241
42	End of no passing by vehicles over 3.5 metric tons	241

Obviously, the signs, our regions of interest (ROI), are not centered within the images, whose sizes unfortunately vary. As a result, we need to separate the ROI from the image and standardize its size (resizing it to 32*32 as most researchers have done) before we can analyze and classify the data. We resort to the annotations provided along with the images. Each class folder contains an annotation file, for example, GT-00000.csv located at 00000. Each annotation file contains the following useful fields:

Filename: The filename of the image
ROI.X1: The x coordinate of the top-left corner of the ROI bounding box
ROI.Y1: The y coordinate of the top-left corner of the ROI bounding box
ROI.X2: The x coordinate of the bottom-right corner of the ROI bounding box
ROI.Y2: The y coordinate of the bottom-right corner of the ROI bounding box

Here is an example of the ROI in a sample:

Now we define the preprocessing function for a raw image, which includes separating the ROI and resizing it to 32*32:

> source("http://bioconductor.org/biocLite.R") 
> biocLite("EBImage") 
> library("EBImage") 
> roi_resize <- function(input_matrix, roi){ 
+     roi_matrix <- input_matrix[roi[1, 'Roi.Y1']:roi[1, 'Roi.Y2'],  
                    roi[1, 'Roi.X1']:roi[1, 'Roi.X2']] 
+     return(resize(roi_matrix, 32, 32)) 
+ }

Note that the resize function is from the EBImage package:

https://bioconductor.org/packages/release/bioc/html/EBImage.html

We try it out on our first sample (red channel only):

> # read annotation csv file 
> annotation <- read.csv(file="GTSRB/Final_Training/Images/00000/GT-00000.csv", header=TRUE, sep=";") 
> roi = annotation[3, ] 
> red_matrix_cropped <- roi_resize(red_matrix, roi) 
> par(mfrow=c(1, 2)) 
> image(rotate(red_matrix), col = grey.colors(255) , main=sprintf("Original")) 
> image(rotate(red_matrix_cropped), col = grey.colors(255) , main=sprintf("Preprocessed"))

We get the preprocessed red channel on the right:

Similarly, we can process the other two channels. Based on these three channels, how can we construct the feature space? Discarding any channel might result in loss of information. Simply stacking them up could lead to redundancy. So, combining three channels into one would be a better solution. In the color world, Y'UV is an encoding system that encrypts brightness information separately from color information. It is typically used as part of a color image pipeline and computer graphics hardware. Y'UV represents human perception of color in terms of three components: Y' as the luminance (brightness), and U and V as the chrominance (color). Y'UV can be converted from RGB using:

Y' = 0.299R + 0.587G + 0.114B
U = 0.492(B - Y')
V = 0.877(R - Y')

For our feature space, we can only take the brightness channel Y'.

Now that we have the last piece of the whole preprocessing ready, let's put them together, load, and process (ROI + resize + conversion to Y') the entire labeled dataset:

> load_labeled_data <- function(training_path, classes){ 
+   # Initialize the pixel features X and target y 
+   X <- matrix(, nrow = 0, ncol = 32*32) 
+   y <- vector() 
+   # Load images from each of the 43 classes 
+   for(i in classes) { 
+     print(paste('Loading images from class', i)) 
+     if (i<10) { 
+       annotation_path <- paste(training_path, "0000", i, "/GT-0000",  
                                 i, ".csv", sep="") 
+       path <- paste(training_path, "0000", i, "/", sep="") 
+     } else { 
+       annotation_path <- paste(training_path, "000", i, "/GT-000",  
                                 i, ".csv", sep="") 
+       path <- paste(training_path, "000", i, "/", sep="") 
+     } 
+     annotation <- read.csv(file=annotation_path, header=TRUE,  
                             sep=";") 
+      
+     for (row in 1:nrow(annotation)) { 
+       # Read each image 
+       image_path <- paste(path, annotation[row, "Filename"], sep="") 
+       image <- read.pnm(image_path, cellres=1) 
+       # Parse RGB color space 
+       red_matrix <- matrix(image@red, nrow = image@size[1],  
                             ncol = image@size[2]) 
+       green_matrix <- matrix(image@green, nrow = image@size[1],   
                               ncol = image@size[2]) 
+       blue_matrix <- matrix(image@blue, nrow = image@size[1],  
                              ncol = image@size[2]) 
+       # Crop ROI and resize 
+       red_matrix_cropped <- roi_resize(red_matrix,  
                                         annotation[row, ]) 
+       green_matrix_cropped <- roi_resize(green_matrix,  
                                           annotation[row, ]) 
+       blue_matrix_cropped <- roi_resize(blue_matrix,  
                                          annotation[row, ]) 
+       # Convert to brightness, e.g. Y' channel 
+       x <- 0.299 * red_matrix_cropped + 0.587 *  
             green_matrix_cropped + 0.114 * blue_matrix_cropped 
+       X <- rbind(X, matrix(x, 1, 32*32)) 
+       y <- c(y, i) 
+     } 
+      
+   } 
+    
+   return(list("x" = X, "y" = y)) 
+ }

After defining the data loading function as shown previously, we apply it to the entire raw dataset:

> classes <- 0:42 
> data <- load_labeled_data(training_path, classes)

Be patient as it might take a couple of hours to read and process 39,209 images. Just in case anything unexpected happens, a good practice is to save the data object so that we can restore it anytime later:

> # Save the data object to a file 
> saveRDS(data, file = "43 classes.rds") 
> # Restore the data object 
> data <- readRDS(file = "43 classes.rds")

Just do a quick check on the ready-to-use data:

> data.x <- data$x 
> data.y <- data$y 
> dim(data.x) 
[1] 39209  1024

Correct dimension!

> summary(as.factor(data.y)) 
   0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22  
 210 2220 2250 1410 1980 1860  420 1440 1410 1470 2010 1320 2100 2160  780  630  420 1110 1200  210  360  330  390  
  23   24   25   26   27   28   29   30   31   32   33   34   35   36   37   38   39   40   41   42  
 510  270 1500  600  240  540  270  450  780  240  689  420 1200  390  210 2070  300  360  240  240

Correct class sizes, and again they are rather unbalanced!

Never skip checking the class balance for classification.

After ensuring that the data is loaded and processed properly, we do more exploratory analysis on the distribution of features, that is, the pixels. As an example, we take the 16 pixels from the central 4*4 block (222^nd to 225^th, 254^th to 257^th, 286^th to 289^th, and 318^th to 321^st) in each image from class 1 (Speed limit=30km/h), 14 (Stop), 20 (Dangerous curve to the right), and 27 (Pedestrians). We display their histograms:

> central_block <- c(222:225, 254:257, 286:289, 318:321) 
> par(mfrow=c(2, 2)) 
> for(i in c(1, 14, 20, 27)) { 
+   hist(c(as.matrix(data.x[data.y==i, central_block])),  
+        main=sprintf("Histogram for class %d", i),  
+        xlab="Pixel brightness") 
+ }

The resulting pixel brightness histograms are displayed as follows:

The brightness of the central pixels is distributed differently among these four classes. For instance, the majority of the central pixels from class 20 are dark, as the sign (Dangerous curve to the right) has a thick black stroke through the center; while in class 14, the stop sign has a white stroke (the left part of the O) near the central area. Pixels taken from other positions can also be distinctly distributed among different classes.

The exploratory analysis we just conducted helps us to move forward with building classification models based on pixels.

Table of Contents for Getting started with exploring GTSRB

Create new playlist

Sign In

Sign Up

Table of Contents for
Getting started with exploring GTSRB