Aggregation in R

In R, aggregation is effected using the following command:

    > goal_stats=read.csv('champ_league_stats_semifinalists.csv')
    >goal_stats
                  Club                 Player Goals GamesPlayed
    1  Atletico Madrid            Diego Costa     8           9
    2  Atletico Madrid             ArdaTuran     4           9
    3  Atletico Madrid            RaúlGarcía     4          12
    4  Atletico Madrid           AdriánLópez     2           9
    5  Atletico Madrid            Diego Godín     2          10
    6      Real Madrid      Cristiano Ronaldo    17          11
    7      Real Madrid            Gareth Bale     6          12
    8      Real Madrid          Karim Benzema     5          11
    9      Real Madrid                   Isco     3          12
    10     Real Madrid         Ángel Di María     3          11
    11   Bayern Munich          Thomas Müller     5          12
    12   Bayern Munich           ArjenRobben     4          10
    13   Bayern Munich            Mario Götze     3          11
    14   Bayern Munich Bastian Schweinsteiger     3           8
    15   Bayern Munich        Mario Mandzukić     3          10
    16         Chelsea        Fernando Torres     4           9
    17         Chelsea               Demba Ba     3           6
    18         Chelsea           Samuel Eto'o     3           9
    19         Chelsea            Eden Hazard     2           9
    20         Chelsea                Ramires     2          10

We now compute the goals-per-game ratio for each striker, so as to measure their deadliness in front of goal:

    >goal_stats$GoalsPerGame<- goal_stats$Goals/goal_stats$GamesPlayed
    >goal_stats
                  Club   Player         Goals GamesPlayedGoalsPerGame
    1  Atletico Madrid  Diego Costa     8           9    0.8888889
    2  Atletico Madrid  ArdaTuran      4           9    0.4444444
    3  Atletico Madrid  RaúlGarcía     4          12    0.3333333
    4  Atletico Madrid  AdriánLópez    2           9    0.2222222
    5  Atletico Madrid  Diego Godín     2          10    0.2000000
    6  Real Madrid  Cristiano Ronaldo  17          11    1.5454545
    7  Real Madrid  Gareth Bale         6          12    0.5000000
    8  Real Madrid    Karim Benzema     5          11    0.4545455
    9  Real Madrid       Isco           3          12    0.2500000
    10 Real Madrid  Ángel Di María     3          11    0.2727273
    11 Bayern Munich Thomas Müller     5          12    0.4166667
    12 Bayern Munich  ArjenRobben     4          10    0.4000000
    13 Bayern Munich  MarioGötze      3          11    0.2727273
    14 Bayern Munich Bastian Schweinsteiger 3      8    0.3750000
    15 Bayern Munich  MarioMandzukić  3          10    0.3000000
    16 Chelsea       Fernando Torres   4           9    0.4444444
    17 Chelsea           Demba Ba      3           6    0.5000000
    18 Chelsea           Samuel Eto'o  3           9    0.3333333
    19 Chelsea            Eden Hazard  2           9    0.2222222
    20 Chelsea                Ramires  2          10    0.2000000
  

Suppose that we wanted to know the highest goals-per-game ratio for each team. We would calculate this as follows:

    >aggregate(x=goal_stats[,c('GoalsPerGame')], by=list(goal_stats$Club),FUN=max)
              Group.1         x
    1 Atletico Madrid 0.8888889
    2   Bayern Munich 0.4166667
    3         Chelsea 0.5000000
    4     Real Madrid 1.5454545
  

The tapply function is used to apply a function to a subset of an array or vector that is defined by one or more columns. The tapply function can also be used as follows:

    >tapply(goal_stats$GoalsPerGame,goal_stats$Club,max)
    Atletico Madrid   Bayern Munich         Chelsea     Real Madrid 
          0.8888889       0.4166667       0.5000000       1.5454545
  
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset