An Example: More Stats

Remember the script we did yesterday for simple statistics, where you entered numbers in one at a time, and the script calculated the count, sum, and average? Let's modify that script today to store the numbers that get entered into an array. Having the numbers around after the initial input means we can do more things with them, such as sorting them or finding the median (a different number from the mean).

Here's how the new version of the statistics script looks when it's run:

% morestats.pl
Enter a number: 4
Enter a number: 5
Enter a number: 3
						(many more numbers in here that I've deleted for space)
Enter a number: 47
Enter a number: 548
Enter a number: 54
Enter a number: 5485
Enter a number:

Total count of numbers: 49
Total sum of numbers: 10430
Maximum number: 5485
Minimum number: 2
Average (mean): 212.86
Median: 45

There are two differences in obvious behavior between yesterday's version of the statistics script and this one:

  • It calculates the maximum and minimum numbers that were entered.

  • It finds the median number (the middle number in a sorted list of all the numbers).

In the code, however, there's one other significant difference between this version and the last: here we're using an array to store the input data, rather than just discarding it (when you run the script, you still end the input with a blank line). Listing 4.1 shows the code for the new listing.

Listing 4.1. The morestats.pl Script
1:  #!/usr/local/bin/perl -w
2:
3:  $input = '';  # temporary input
4:  @nums = ();   # array of numbers;
5:  $count = 0;   # count of numbers
6:  $sum = 0;     # sum of numbers
7:  $avg = 0;     # average
8:  $med = 0;     # median
9:
10: while () {
11:   print 'Enter a number: ';
12:   chomp ($input = <STDIN>);
13:   if ($input eq '') { last; }
14:
15:   if ($input =~ /D/) {
16:       print "Digits only, please.
";
17:       next;
18:   }
19:
20:   push @nums, $input;
21:   $count++;
22:   $sum += $input;
23: }
24:
25: @nums = sort { $a <=> $b }  @nums;
26: $avg = $sum / $count;
27: $med = $nums[$count / 2];
28:
29: print "
Total count of numbers: $count
";
30: print "Total sum of numbers: $sum
";
31: print "Minimum number: $nums[0]
";
32: print "Maximum number: $nums[$#nums]
";
33: printf("Average (mean): %.2f
", $avg);
34: print "Median: $med
";
					

The morestats.pl version of the statistics script has four main sections: initialization, data entry, sorting the data and calculating the statistics, and, finally, printing the results.

The initialization section, lines 3 through 8, is the same as it was in the previous script, except that we've added two variables: an array variable (@nums) in line 4, to store the numeric inputs, and a $med variable in line 8 for the median. As with the other variables, we don't have to initialize the @nums variable, but it looks nice and groups all our variables up at the top of the script.

Lines 10 through 23 are the new while loop for entering in the input. If you compare this version to the version in yesterday's lesson, you'll see that there's actually not much that's new here. We're still accepting numbers one line at a time, still checking for digits only, and still incrementing the $count and updating the $sum for each number. The difference is in line 20, where for each turn of the loop we append the input to the @nums array using the push function.

With all the input in place, we move onto line 25, where we sort the @nums array using the special numeric sort routine I described earlier. Note that I don't have to define the $a or $b variables—these variables are local to sort and are discarded as soon as the sort is complete. Line 26 calculates the average (the mean), as it did in our previous version of the script, and line 27 calculates the median value.

Some points about the median—given a sorted set of data, the median is roughly defined as the value in the middle (it determines the true middle value, whereas the average can be skewed if there are especially high or low values). In statistics, if the number of values in the set is odd, the median is simply the value in the middle. If the number of values is even, then the middle value is actually the mean of the two values surrounding the actual middle. Here, I've simplified things; our median will be the middle value if the data set has an odd number of values, and the largest value of the lower half if the data set is even. That'll be close enough for our purposes.

To find the value in the middle, we need to find the array index number for the middle. The $count variable gives us the highest index number, so all we need to do is divide that by two and use that to index the array. Note that if there is an odd number of elements, the result of this division will be a floating-point number. This is okay; array indexes must be integers, and Perl will truncate the number before using it.

This brings us to the final summary in lines 29 to 34. Count, sum and average as the same as before, but now we've also added maximum, minimum, and median. Maximum and minimum are easy; because our array is sorted, we don't even need to calculate anything; we can just pull the first and last elements off of the array. And because we calculated the median earlier, all we need to do is print it as we have the other values.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset