In R, aggregation is effected using the following command:
> goal_stats=read.csv('champ_league_stats_semifinalists.csv') >goal_stats Club Player Goals GamesPlayed 1 Atletico Madrid Diego Costa 8 9 2 Atletico Madrid ArdaTuran 4 9 3 Atletico Madrid RaúlGarcía 4 12 4 Atletico Madrid AdriánLópez 2 9 5 Atletico Madrid Diego Godín 2 10 6 Real Madrid Cristiano Ronaldo 17 11 7 Real Madrid Gareth Bale 6 12 8 Real Madrid Karim Benzema 5 11 9 Real Madrid Isco 3 12 10 Real Madrid Ángel Di María 3 11 11 Bayern Munich Thomas Müller 5 12 12 Bayern Munich ArjenRobben 4 10 13 Bayern Munich Mario Götze 3 11 14 Bayern Munich Bastian Schweinsteiger 3 8 15 Bayern Munich Mario Mandzukić 3 10 16 Chelsea Fernando Torres 4 9 17 Chelsea Demba Ba 3 6 18 Chelsea Samuel Eto'o 3 9 19 Chelsea Eden Hazard 2 9 20 Chelsea Ramires 2 10
We now compute the goals-per-game ratio for each striker, so as to measure their deadliness in front of goal:
>goal_stats$GoalsPerGame<- goal_stats$Goals/goal_stats$GamesPlayed >goal_stats Club Player Goals GamesPlayedGoalsPerGame 1 Atletico Madrid Diego Costa 8 9 0.8888889 2 Atletico Madrid ArdaTuran 4 9 0.4444444 3 Atletico Madrid RaúlGarcía 4 12 0.3333333 4 Atletico Madrid AdriánLópez 2 9 0.2222222 5 Atletico Madrid Diego Godín 2 10 0.2000000 6 Real Madrid Cristiano Ronaldo 17 11 1.5454545 7 Real Madrid Gareth Bale 6 12 0.5000000 8 Real Madrid Karim Benzema 5 11 0.4545455 9 Real Madrid Isco 3 12 0.2500000 10 Real Madrid Ángel Di María 3 11 0.2727273 11 Bayern Munich Thomas Müller 5 12 0.4166667 12 Bayern Munich ArjenRobben 4 10 0.4000000 13 Bayern Munich MarioGötze 3 11 0.2727273 14 Bayern Munich Bastian Schweinsteiger 3 8 0.3750000 15 Bayern Munich MarioMandzukić 3 10 0.3000000 16 Chelsea Fernando Torres 4 9 0.4444444 17 Chelsea Demba Ba 3 6 0.5000000 18 Chelsea Samuel Eto'o 3 9 0.3333333 19 Chelsea Eden Hazard 2 9 0.2222222 20 Chelsea Ramires 2 10 0.2000000
Suppose that we wanted to know the highest goals-per-game ratio for each team. We would calculate this as follows:
>aggregate(x=goal_stats[,c('GoalsPerGame')], by=list(goal_stats$Club),FUN=max) Group.1 x 1 Atletico Madrid 0.8888889 2 Bayern Munich 0.4166667 3 Chelsea 0.5000000 4 Real Madrid 1.5454545
The tapply function is used to apply a function to a subset of an array or vector that is defined by one or more columns. The tapply function can also be used as follows:
>tapply(goal_stats$GoalsPerGame,goal_stats$Club,max) Atletico Madrid Bayern Munich Chelsea Real Madrid 0.8888889 0.4166667 0.5000000 1.5454545