Getting familiar with the dataset

The dataset was originally utilized in the PhD thesis of Andrea Dal Pozzolo, Adaptive Machine learning for credit card fraud detection ULB MLG, and has since been released by its authors for public use (www.ulb.ac.be/di/map/adalpozz/data/creditcard.Rdata). The dataset contains more than 284,000 instances, but only 492 instances of fraud (almost 0.17%).

Its target class value is if the transaction was not a fraud, and if it was. The dataset's features are a number of principal components, as the dataset has been transformed using Principle Components Analysis (PCA), in order to retain the confidentiality of the data. The dataset's features are comprised of 28 PCA components, as well as the transaction’s amount and the time elapsed from the first transaction in the dataset. Descriptive statistics about the dataset are provided as follows:

Feature

Time

V1

V2

V3

V4

count

284,807

284,807

284,807

284,807

284,807

mean

94,813.86

1.17E-15

3.42E-16

-1.37E-15

2.09E-15

std

47,488.15

1.96

1.65

1.52

1.42

min

0.00

-56.41

-72.72

-48.33

-5.68

max

172,792.00

2.45

22.06

9.38

16.88

Feature

V5

V6

V7

V8

V9

count

284,807

284,807

284,807

284,807

284,807

mean

9.60E-16

1.49E-15

-5.56E-16

1.18E-16

-2.41E-15

std

1.38

1.33

1.24

1.19

1.10

min

-113.74

-26.16

-43.56

-73.22

-13.43

max

34.80

73.30

120.59

20.01

15.59

Feature V10 V11 V12 V13 V14

count

284,807

284,807

284,807

284,807

284,807

mean

2.24E-15

1.67E-15

-1.25E-15

8.18E-16

1.21E-15

std

1.09

1.02

1.00

1.00

0.96

min

-24.59

-4.80

-18.68

-5.79

-19.21

max

23.75

12.02

7.85

7.13

10.53

Feature

V15

V16

V17

V18

V19

count

284,807

284,807

284,807

284,807

284,807

mean

4.91E-15

1.44E-15

-3.80E-16

9.57E-16

1.04E-15

std

0.92

0.88

0.85

0.84

0.81

min

-4.50

-14.13

-25.16

-9.50

-7.21

max

8.88

17.32

9.25

5.04

5.59

Feature

V20

V21

V22

V23

V24

count

284,807

284,807

284,807

284,807

284,807

mean

6.41E-16

1.66E-16

-3.44E-16

2.58E-16

4.47E-15

std

0.77

0.73

0.73

0.62

0.61

min

-54.50

-34.83

-10.93

-44.81

-2.84

max

39.42

27.20

10.50

22.53

4.58

Feature

V25

V26

V27

V28

Amount

count

284,807

284,807

284,807

284,807

284,807

mean

5.34E-16

1.69E-15

-3.67E-16

-1.22E-16

88.34962

std

0.52

0.48

0.40

0.33

250.12

min

-10.30

-2.60

-22.57

-15.43

0.00

max

7.52

3.52

31.61

33.85

25,691.16

Descriptive statistics of the credit card transaction dataset
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset