First, let's think back to the original example: we had a single-layer network, which we wanted to get from a 4 x 3 matrix to a 4 x 1 vector. Now, we have to get from an MNIST image that is 28 x 28 pixels to one single number. This number is our network's guess about which number the image actually represents.
The following screenshot represents a rough example of what we can expect to find in the MNIST data: some grayscale images of handwritten digits next to their labels (which are stored separately):