The GTSRB dataset, compiled and generously published by the real-time computer vision research group in Institut für Neuroinformatik, was originally used for a competition of classifying single images of traffic signs. It consists of a training set of 39,209 labeled images and a testing test of 12,630 unlabeled images. The training dataset contains 43 classes—43 types of traffic signs. We will go through all classes and exhibit several samples for each class.
The dataset can be downloaded via http://benchmark.ini.rub.de/Dataset/GTSRB_Final_Training_Images.zip (located in the Downloads | Training dataset section on the page). Unzip the downloaded file and there will be a folder called Images containing 43 folders (00000, 00001... up to 00042); they represent 43 classes of images. These images are in the following form:
- The image files are in PPM (short for portable pixmap) format.
- The number of images from each class ranges from 210 to 2250. So it is an unbalanced multi-class classification problem.
- Each image contains one traffic sign.
- The sizes of the images are not uniform, ranging from 15*15 to 250*250 pixels, and images are not necessarily square.
- Images contain a border of up to 10% around the actual sign. Thus the sign is not necessarily centered with the image.
Let's start by plotting a sample, 00000_00002.ppm, in the 00000 folder.
We use the pixmap package (https://cran.r-project.org/web/packages/pixmap) to read the PPM file:
> library('pixmap') > image <- read.pnm('GTSRB/Final_Training/Images/00000/00000_00002.ppm',cellres=1)
Now we obtain a pixmapRGB object with attributes red, green, and blue (which are the pixels for each of the three channels), as well as size, which is the width and height of the image. And we can access the red, green, and blue channel as follows:
> red_matrix <- matrix(image@red, nrow = image@size[1], ncol = image@size[2]) > green_matrix <- matrix(image@green, nrow = image@size[1], ncol = image@size[2]) > blue_matrix <- matrix(image@blue, nrow = image@size[1], ncol = image@size[2])
We visualize the original image and its three channels individually:
> plot(image, main=sprintf("Original")) > rotate <- function(x) t(apply(x, 2, rev)) > par(mfrow=c(1, 3)) > image(rotate(red_matrix), col = grey.colors(255), main=sprintf("Red")) > image(rotate(green_matrix), col = grey.colors(255), main=sprintf("Green")) > image(rotate(blue_matrix), col = grey.colors(255), main=sprintf("Blue"))
Note that here we reuse the rotate function we defined in the last chapter to better view the images. This is an original image:
The following images show the output for the red, blue, and green channel, respectively:
It is a 20 km/h speed limit sign.
We can now go through 43 classes of signs and display three samples for each type by defining the following function:
> plot_samples <- function(training_path, class, num_sample){ + classes <- c("Speed limit (20km/h)", "Speed limit (30km/h)", "Speed limit (50km/h)", "Speed limit (60km/h)", + "Speed limit (70km/h)", "Speed limit (80km/h)", "End of speed limit (80km/h)", + "Speed limit (100km/h)", "Speed limit (120km/h)", "No passing", + "No passing for vehicles over 3.5 metric tons", "Right-of-way at the next intersection", + "Priority road", "Yield", "Stop", "No vehicles", "Vehicles over 3.5 metric tons prohibited", + "No entry", "General caution", "Dangerous curve to the left", "Dangerous curve to the right", + "Double curve", " Bumpy road", "Slippery road", "Road narrows on the right", "Road work", + "Traffic signals", "Pedestrians", "Children crossing", "Bicycles crossing", "Beware of ice/snow", + "Wild animals crossing", "End of all speed and passing limits", "Turn right ahead", + "Turn left ahead", "Ahead only", "Go straight or right", "Go straight or left", "Keep right", "Keep left", "Roundabout mandatory", "End of no passing", + "End of no passing by vehicles over 3.5 metric tons") + if (class<10) { + path <- paste(training_path, "0000", class, "/", sep="") + } else { + path <- paste(training_path, "000", class, "/", sep="") + } + par(mfrow=c(1, num_sample)) + # Randomly display num_sample samples + all_files <- list.files(path = path) + title <- paste('Class', class, ':', classes[class+1]) + print(paste(title, " (", length(all_files), " samples)", sep="")) + files <- sample(all_files, num_sample) + for (file in files) { + image <- read.pnm(paste(path, file, sep=""), cellres=1) + plot(image) + } + mtext(title, side = 3, line = -23, outer = TRUE) + }
Call the function with class=0:
> training_path <- "GTSRB/Final_Training/Images/" > plot_samples(training_path, 0, 3) [1] "Class 0 : Speed limit (20km/h) (211 samples)"
Three samples are displayed:
Repeat this function call with a different class (or use a loop) to go through the remaining 42 types:
> plot_samples(training_path, 1, 3) [1] "Class 1 : Speed limit (30km/h) (2221 samples)"
Three samples from class 1 are displayed as follows:
> plot_samples(training_path, 2, 3) [1] "Class 2 : Speed limit (50km/h) (2251 samples)"
Three images from class 2 are shown here:
> plot_samples(training_path, 3, 3) [1] "Class 3 : Speed limit (60km/h) (1411 samples)"
Here is the result for class 3:
> plot_samples(training_path, 4, 3) [1] "Class 4 : Speed limit (70km/h) (1981 samples)"
We plot three images from class 4:
Here we skip the remainder, but it is clear that the images were captured under various conditions, including weather, illumination, occlusion, rotations, and so on. Instead, we list all types of sample sizes for easy reference:
ID |
Type |
Number of samples |
0 |
Speed limit (20km/h) |
211 |
1 |
Speed limit (30km/h) |
2221 |
2 |
Speed limit (50km/h) |
2251 |
3 |
Speed limit (60km/h) |
1411 |
4 |
Speed limit (70km/h) |
1981 |
5 |
Speed limit (80km/h) |
1861 |
6 |
End of speed limit (80km/h) |
421 |
7 |
Speed limit (100km/h) |
1441 |
8 |
Speed limit (120km/h) |
1411 |
9 |
No passing |
1471 |
10 |
No passing for vehicles over 3.5 metric tons |
2011 |
11 |
Right-of-way at the next intersection |
1321 |
12 |
Priority road |
2101 |
13 |
Yield |
2161 |
14 |
Stop |
781 |
15 |
No vehicles |
631 |
16 |
Vehicles over 3.5 metric tons prohibited |
421 |
17 |
No entry |
1111 |
18 |
General caution |
1201 |
19 |
Dangerous curve to the left |
211 |
20 |
Dangerous curve to the right |
361 |
21 |
Double curve |
331 |
22 |
Bumpy road |
391 |
23 |
Slippery road |
511 |
24 |
Road narrows on the right |
271 |
25 |
Road work |
1501 |
26 |
Traffic signals |
601 |
27 |
Pedestrians |
241 |
28 |
Children crossing |
541 |
29 |
Bicycles crossing |
271 |
30 |
Beware of ice/snow |
451 |
31 |
Wild animals crossing |
781 |
32 |
End of all speed and passing limits |
241 |
33 |
Turn right ahead |
690 |
34 |
Turn left ahead |
421 |
35 |
Ahead only |
1201 |
36 |
Go straight or right |
391 |
37 |
Go straight or left |
211 |
38 |
Keep right |
2071 |
39 |
Keep left |
301 |
40 |
Roundabout mandatory |
361 |
41 |
End of no passing |
241 |
42 |
End of no passing by vehicles over 3.5 metric tons |
241 |
Obviously, the signs, our regions of interest (ROI), are not centered within the images, whose sizes unfortunately vary. As a result, we need to separate the ROI from the image and standardize its size (resizing it to 32*32 as most researchers have done) before we can analyze and classify the data. We resort to the annotations provided along with the images. Each class folder contains an annotation file, for example, GT-00000.csv located at 00000. Each annotation file contains the following useful fields:
- Filename: The filename of the image
- ROI.X1: The x coordinate of the top-left corner of the ROI bounding box
- ROI.Y1: The y coordinate of the top-left corner of the ROI bounding box
- ROI.X2: The x coordinate of the bottom-right corner of the ROI bounding box
- ROI.Y2: The y coordinate of the bottom-right corner of the ROI bounding box
Here is an example of the ROI in a sample:
Now we define the preprocessing function for a raw image, which includes separating the ROI and resizing it to 32*32:
> source("http://bioconductor.org/biocLite.R") > biocLite("EBImage") > library("EBImage") > roi_resize <- function(input_matrix, roi){ + roi_matrix <- input_matrix[roi[1, 'Roi.Y1']:roi[1, 'Roi.Y2'], roi[1, 'Roi.X1']:roi[1, 'Roi.X2']] + return(resize(roi_matrix, 32, 32)) + }
Note that the resize function is from the EBImage package:
https://bioconductor.org/packages/release/bioc/html/EBImage.html
We try it out on our first sample (red channel only):
> # read annotation csv file > annotation <- read.csv(file="GTSRB/Final_Training/Images/00000/GT-00000.csv", header=TRUE, sep=";") > roi = annotation[3, ] > red_matrix_cropped <- roi_resize(red_matrix, roi) > par(mfrow=c(1, 2)) > image(rotate(red_matrix), col = grey.colors(255) , main=sprintf("Original")) > image(rotate(red_matrix_cropped), col = grey.colors(255) , main=sprintf("Preprocessed"))
We get the preprocessed red channel on the right:
Similarly, we can process the other two channels. Based on these three channels, how can we construct the feature space? Discarding any channel might result in loss of information. Simply stacking them up could lead to redundancy. So, combining three channels into one would be a better solution. In the color world, Y'UV is an encoding system that encrypts brightness information separately from color information. It is typically used as part of a color image pipeline and computer graphics hardware. Y'UV represents human perception of color in terms of three components: Y' as the luminance (brightness), and U and V as the chrominance (color). Y'UV can be converted from RGB using:
- Y' = 0.299R + 0.587G + 0.114B
- U = 0.492(B - Y')
- V = 0.877(R - Y')
For our feature space, we can only take the brightness channel Y'.
Now that we have the last piece of the whole preprocessing ready, let's put them together, load, and process (ROI + resize + conversion to Y') the entire labeled dataset:
> load_labeled_data <- function(training_path, classes){ + # Initialize the pixel features X and target y + X <- matrix(, nrow = 0, ncol = 32*32) + y <- vector() + # Load images from each of the 43 classes + for(i in classes) { + print(paste('Loading images from class', i)) + if (i<10) { + annotation_path <- paste(training_path, "0000", i, "/GT-0000", i, ".csv", sep="") + path <- paste(training_path, "0000", i, "/", sep="") + } else { + annotation_path <- paste(training_path, "000", i, "/GT-000", i, ".csv", sep="") + path <- paste(training_path, "000", i, "/", sep="") + } + annotation <- read.csv(file=annotation_path, header=TRUE, sep=";") + + for (row in 1:nrow(annotation)) { + # Read each image + image_path <- paste(path, annotation[row, "Filename"], sep="") + image <- read.pnm(image_path, cellres=1) + # Parse RGB color space + red_matrix <- matrix(image@red, nrow = image@size[1], ncol = image@size[2]) + green_matrix <- matrix(image@green, nrow = image@size[1], ncol = image@size[2]) + blue_matrix <- matrix(image@blue, nrow = image@size[1], ncol = image@size[2]) + # Crop ROI and resize + red_matrix_cropped <- roi_resize(red_matrix, annotation[row, ]) + green_matrix_cropped <- roi_resize(green_matrix, annotation[row, ]) + blue_matrix_cropped <- roi_resize(blue_matrix, annotation[row, ]) + # Convert to brightness, e.g. Y' channel + x <- 0.299 * red_matrix_cropped + 0.587 * green_matrix_cropped + 0.114 * blue_matrix_cropped + X <- rbind(X, matrix(x, 1, 32*32)) + y <- c(y, i) + } + + } + + return(list("x" = X, "y" = y)) + }
After defining the data loading function as shown previously, we apply it to the entire raw dataset:
> classes <- 0:42 > data <- load_labeled_data(training_path, classes)
Be patient as it might take a couple of hours to read and process 39,209 images. Just in case anything unexpected happens, a good practice is to save the data object so that we can restore it anytime later:
> # Save the data object to a file > saveRDS(data, file = "43 classes.rds") > # Restore the data object > data <- readRDS(file = "43 classes.rds")
Just do a quick check on the ready-to-use data:
> data.x <- data$x > data.y <- data$y > dim(data.x) [1] 39209 1024
Correct dimension!
> summary(as.factor(data.y)) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 210 2220 2250 1410 1980 1860 420 1440 1410 1470 2010 1320 2100 2160 780 630 420 1110 1200 210 360 330 390 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 510 270 1500 600 240 540 270 450 780 240 689 420 1200 390 210 2070 300 360 240 240
Correct class sizes, and again they are rather unbalanced!
After ensuring that the data is loaded and processed properly, we do more exploratory analysis on the distribution of features, that is, the pixels. As an example, we take the 16 pixels from the central 4*4 block (222nd to 225th, 254th to 257th, 286th to 289th, and 318th to 321st) in each image from class 1 (Speed limit=30km/h), 14 (Stop), 20 (Dangerous curve to the right), and 27 (Pedestrians). We display their histograms:
> central_block <- c(222:225, 254:257, 286:289, 318:321) > par(mfrow=c(2, 2)) > for(i in c(1, 14, 20, 27)) { + hist(c(as.matrix(data.x[data.y==i, central_block])), + main=sprintf("Histogram for class %d", i), + xlab="Pixel brightness") + }
The resulting pixel brightness histograms are displayed as follows:
The brightness of the central pixels is distributed differently among these four classes. For instance, the majority of the central pixels from class 20 are dark, as the sign (Dangerous curve to the right) has a thick black stroke through the center; while in class 14, the stop sign has a white stroke (the left part of the O) near the central area. Pixels taken from other positions can also be distinctly distributed among different classes.
The exploratory analysis we just conducted helps us to move forward with building classification models based on pixels.