The summary statistics of a dataset can be obtained using the describe function within scipy.stats.
The process to obtain the summary statistics of a dataset is as follows:
- Import the relevant packages:
from scipy import stats
- Initialize an array:
a = np.arange(10)
In the preceding code, we have initialized a one-dimensional array.
- Fetch the summary statistics of the dataset (array):
stats.describe(a)
DescribeResult(nobs=10, minmax=(0, 9), mean=4.5, variance=9.1666666666666661,
skewness=0.0, kurtosis=-1.2242424242424244)
Note that the result of the describe function is all the summary statistics of the dataset.
The preceding output is for a one-dimensional dataset. In the following example, let us look at calculating the summary statistics of a two-dimensional array (dataset):
- Initialize a two-dimensional array:
b = [[1, 2], [3, 4]]
- Calculate the summary statistics of the two-dimensional array:
stats.describe(b)
The output of the preceding line of code is:
DescribeResult(nobs=2, minmax=(array([1, 2]), array([3, 4])),
mean=array([ 2., 3.]), variance=array([ 2., 2.]),
skewness=array([ 0., 0.]), kurtosis=array([-2., -2.]))
We can see that the output of describe function is all the statistical measures that we discussed earlier in the section.