Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HW3 Paweł Fijałkowski XAI2022L #17

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Homeworks/Homework-III/.DS_Store
Homeworks/.DS_Store
.DS_Store
Homeworks/.DS_Store
Homeworks/.DS_Store
Binary file added Homeworks/.DS_Store
Binary file not shown.
Binary file added Homeworks/Homework-I/.DS_Store
Binary file not shown.
Binary file not shown.
7,390 changes: 7,390 additions & 0 deletions Homeworks/Homework-I/Fijalkowski_Pawel/EPL_2021.csv

Large diffs are not rendered by default.

106 changes: 106 additions & 0 deletions Homeworks/Homework-I/Fijalkowski_Pawel/Homework1-XAI-2022.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
---
title: "Homework1-XAI2022-Pawel-Fijalkowski"
author: "Paweł Fijałkowski"
date: '2022-03-30'
output:
html_document:
df_print: paged
---


## Initial:


As the package was removed from CRAN, we needed to download it directly from GitHub (using Rstudio Cloud, as it only worked *somehow*).

```{r, load_data, message=FALSE}
EPL_2021 <- read.csv("EPL_2021.csv")
head(EPL_2021)
```

Each row in the data set is a shot attempt. Provided features ("league", "shotType", "home_team", "away_team", ...)
are supposed to help us in predicting "xG" value which essentially is a probability that this attempt will end up with a
positive result (a goal).

Here we select features that are important in the context of our problem and do proper transformations ((X,Y) -> (distance, angle))
```{r, message=FALSE}
library(dplyr)
data <- EPL_2021 %>% select(result, X, Y, xG, h_a, situation, shotType, home_goals, away_goals, lastAction) %>%
mutate(status = ifelse(result == "Goal", "1", "0")) %>%
mutate(distanceToGoal = sqrt((105 - (X * 105)) ^ 2 + (32.5 - (Y * 68)) ^ 2)) %>%
mutate(angleToGoal = abs(atan((7.32 * (105 - (X * 105))) / ((105 - (X * 105))^2 +
(32.5 - (Y * 68)) ^ 2 - (7.32 / 2) ^ 2)) * 180 / pi)) %>%
select(-X, -Y, -result)
```


## Models:

We will be predicting probability of scoring a goal (status) for each shot attempt.

```{r, message=FALSE}
library(DALEX)
library(DALEXtra)
library(ranger)
```


```{r, train_model, message = FALSE}
data_processed <- data %>% mutate(h_a = as.factor(h_a),
situation = as.factor(situation),
shotType = as.factor(shotType),
lastAction = as.factor(lastAction),
status = as.numeric(status))

model <- ranger::ranger(status ~., data = data_processed, classification = TRUE, probability = TRUE)
```
## Explanations

#### Making prediction:

```{r obs40}
obs40 <- data_processed[40, !(colnames(data_processed) %in% c("status"))]
predict_40 <- predict(model, obs40)
ifelse(predict_40$predictions[2] <= 0.5, 0, 1)
data_processed[12, "status"]
```

Model correctly predicted that attempt #40 will not end up with the goal.

Let's create explainer!
```{r, explainer}
explainer <- explain(model, data = data_processed, y = as.numeric(data_processed$status))
```

Break Down for observation #40

```{r predict_parts40}
plot(predict_parts(explainer = explainer, new_observation = obs40))
```

The angle to the goal contributes heavily negatively to the result (scoring goal).

Shap on observation 40

```{r shap40}
plot(predict_parts(explainer = explainer, new_observation = obs40, type = 'shap', B = 4))
```

We clearly see that low quality of the attempt (xG) and angleToGoal contributed negatively.

We will find 2 observations in which the same feature have different effect.

```{r 37}
obs37 <- data_processed[37, !(colnames(data_processed) %in% c("status"))]
plot(predict_parts(explainer = explainer, new_observation = obs37))
```

```{r 120}
obs120 <- data_processed[120, !(colnames(data_processed) %in% c("status"))]
plot(predict_parts(explainer = explainer, new_observation = obs120))
```

As we can see "xG" value can contribute negatively(120) or positively(37) to the status variable.



1,780 changes: 1,780 additions & 0 deletions Homeworks/Homework-I/Fijalkowski_Pawel/Homework1-XAI-2022.html

Large diffs are not rendered by default.

Binary file added Homeworks/Homework-II/.DS_Store
Binary file not shown.
Binary file added Homeworks/Homework-II/Fijalkowski_Pawel/.RData
Binary file not shown.
118 changes: 118 additions & 0 deletions Homeworks/Homework-II/Fijalkowski_Pawel/.Rhistory
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
install.packages("devtools")
devtools::install_github("JaseZiv/worldfootballR")
library(devtools)
library(devtools)
install.packages("devtools")
EPL_2021 <- read.csv("EPL_2021.csv")
head(EPL_2021)
View(EPL_2021)
distinct(EPL_2021$result)
set(EPL_2021$result)
unique(EPL_2021$result)
colnames(EPL_2021)
library(dplyr)
install.packages("dplyr")
library(dplyr)
df_1 <- EPL_2021 %>% select(result, X, Y, xG, h_a, situation, shotType, home_goals, away_goals, lastAction) %>%
mutate(status = ifelse(result == "Goal", "1", "0")) %>%
mutate(distanceToGoal = sqrt((105 - (X * 105)) ^ 2 + (32.5 - (Y * 68)) ^ 2)) %>%
mutate(angleToGoal = abs(atan((7.32 * (105 - (X * 105))) / ((105 - (X * 105))^2 +
(32.5 - (Y * 68)) ^ 2 - (7.32 / 2) ^ 2)) * 180 / pi)) %>%
select(-X, -Y, -result)
View(df_1)
library(dplyr)
data <- EPL_2021 %>% select(result, X, Y, xG, h_a, situation, shotType, home_goals, away_goals, lastAction) %>%
mutate(status = ifelse(result == "Goal", "1", "0")) %>%
mutate(distanceToGoal = sqrt((105 - (X * 105)) ^ 2 + (32.5 - (Y * 68)) ^ 2)) %>%
mutate(angleToGoal = abs(atan((7.32 * (105 - (X * 105))) / ((105 - (X * 105))^2 +
(32.5 - (Y * 68)) ^ 2 - (7.32 / 2) ^ 2)) * 180 / pi)) %>%
select(-X, -Y, -result)
library(autoEDA)
install.packages("autoEDA")
library(autoEDA)
install.packages("autoEDA")
View(data)
library(DALEX)
install.packages("DALEX")
library(DALEX)
library(DALEXtra)
install.packages("DALEXtra")
install.packages("ranger")
library(DALEX)
library(DALEXtra)
library(ranger)
model <- ranger::ranger(status ~., data = data, classification = TRUE, probability = TRUE)
data_processed <- data %>% mutate(h_a = as.factor(h_a),
situation = as.factor(situation),
shotType = as.factor(shotType),
lastAction = as.factor(lastAction),
status = as.numeric(status))
model <- ranger::ranger(status ~., data = data_processed, classification = TRUE, probability = TRUE)
obs37 <- data_processed[37, !(colnames(newdata) %in% c("status"))]
obs37 <- data_processed[37, !(colnames(data_processed) %in% c("status"))]
predict_37 <- predict(model, obs37)
# print(ifelse(predict_12$predictions[2]<=0.5,0,1))
# value_12 <- newdata[12, "status"]
# print(value_12)
View(predict_37)
obs37 <- data_processed[37, !(colnames(data_processed) %in% c("status"))]
predict_37 <- predict(model, obs37)
ifelse(predict_12$predictions[2] <= 0.5, 0, 1)
obs37 <- data_processed[37, !(colnames(data_processed) %in% c("status"))]
predict_37 <- predict(model, obs37)
ifelse(predict_37$predictions[2] <= 0.5, 0, 1)
data_processed[12, "status"]
predict_37$predictions[2]
obs40 <- data_processed[40, !(colnames(data_processed) %in% c("status"))]
predict_40 <- predict(model, obs40)
ifelse(predict_40$predictions[2] <= 0.5, 0, 1)
data_processed[12, "status"]
predict_40$predictions[2]
explainer <- explain(model, data = data_processed, y = as.numeric(data_processed$status))
plot(predict_parts(explainer = explainer, new_observation = obs40))
shap <- predict_parts(explainer = explainer, new_observation = obs40, type = 'shap', B = 4)
plot(shap)
heatmap(as.matrix(data_processed[, !(colnames(data_processed) %in% c("status"))]))
obs37 <- data_processed[37, !(colnames(data_processed) %in% c("status"))]
plot(predict_parts(explainer = explainer, new_observation = obs37))
obs120 <- data_processed[120, !(colnames(data_processed) %in% c("status"))]
plot(predict_parts(explainer = explainer, new_observation = obs120))
EPL_2021 <- read.csv("EPL_2021.csv")
head(EPL_2021)
library(dplyr)
data <- EPL_2021 %>% select(result, X, Y, xG, h_a, situation, shotType, home_goals, away_goals, lastAction) %>%
mutate(status = ifelse(result == "Goal", "1", "0")) %>%
mutate(distanceToGoal = sqrt((105 - (X * 105)) ^ 2 + (32.5 - (Y * 68)) ^ 2)) %>%
mutate(angleToGoal = abs(atan((7.32 * (105 - (X * 105))) / ((105 - (X * 105))^2 +
(32.5 - (Y * 68)) ^ 2 - (7.32 / 2) ^ 2)) * 180 / pi)) %>%
select(-X, -Y, -result)
library(DALEX)
library(DALEXtra)
library(ranger)
data_processed <- data %>% mutate(h_a = as.factor(h_a),
situation = as.factor(situation),
shotType = as.factor(shotType),
lastAction = as.factor(lastAction),
status = as.numeric(status))
model <- ranger::ranger(status ~., data = data_processed, classification = TRUE, probability = TRUE)
obs40 <- data_processed[40, !(colnames(data_processed) %in% c("status"))]
predict_40 <- predict(model, obs40)
ifelse(predict_40$predictions[2] <= 0.5, 0, 1)
data_processed[12, "status"]
explainer <- explain(model, data = data_processed, y = as.numeric(data_processed$status))
plot(predict_parts(explainer = explainer, new_observation = obs40))
plot(predict_parts(explainer = explainer, new_observation = obs40, type = 'shap', B = 4))
obs37 <- data_processed[37, !(colnames(data_processed) %in% c("status"))]
plot(predict_parts(explainer = explainer, new_observation = obs37))
obs120 <- data_processed[120, !(colnames(data_processed) %in% c("status"))]
plot(predict_parts(explainer = explainer, new_observation = obs120))
EPL_2021 <- read.csv("EPL_2021.csv")
head(EPL_2021)
library(dplyr)
data <- EPL_2021 %>% select(result, X, Y, xG, h_a, situation, shotType, home_goals, away_goals, lastAction) %>%
mutate(status = ifelse(result == "Goal", "1", "0")) %>%
mutate(distanceToGoal = sqrt((105 - (X * 105)) ^ 2 + (32.5 - (Y * 68)) ^ 2)) %>%
mutate(angleToGoal = abs(atan((7.32 * (105 - (X * 105))) / ((105 - (X * 105))^2 +
(32.5 - (Y * 68)) ^ 2 - (7.32 / 2) ^ 2)) * 180 / pi)) %>%
select(-X, -Y, -result)
model <- ranger::ranger(xG~., data = data, classification = TRUE, probability = TRUE)
Loading