Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The testGGPlot() function fails because ggplot2 stores identical plots differently depending on layers #72

Open
kplevoet opened this issue Oct 21, 2021 · 0 comments

Comments

@kplevoet
Copy link
Contributor

kplevoet commented Oct 21, 2021

It has already been noted (see this issue) that the testGGPlot() function is "too rigorous" in that it detects differences between an expected and a (student) generated plot which look actually look identical in R. The problem is not with the testGGPlot() function, however, which does what it is supposed to do. The issue is that ggplot2 stores a graph differently depending on which layers you use. This can be illustrated by the following two graphs which both produce a scatter plot with a LOESS smoother through the points:

p1 <- ggplot(cars, aes(x = speed, y = dist)) + geom_smooth(method = "loess", formula = y ~ x)
p2 <- ggplot(cars, aes(x = speed, y = dist)) + stat_smooth(method = "loess", formula = y ~ x)

The first graph uses a geom layer and the second graph uses a stat layer but both graphs look identical when visualized in R (which can easily be checked with the commands print(p1) and print(p2)). However, the objects p1 and p2, which are made by the ggplot() function of the ggplot2 package, are not the same. This can be checked with the function all.equal():

all.equal(p1, p2)
 [1] "Component “layers”: Component 1: Component 2: Names: 3 string mismatches"
 [2] "Component “layers”: Component 1: Component 2: Component 1: 1 element mismatch"
 [3] "Component “layers”: Component 1: Component 2: Component 2: 'is.NA' value mismatch: 0 in current 1 in target"
 [4] "Component “layers”: Component 1: Component 2: Component 3: 'is.NA' value mismatch: 1 in current 0 in target"
 [5] "Component “layers”: Component 1: Component 4: Names: 5 string mismatches"

[R output truncated] 

All differences between the two objects p1 and p2 appear to be in the first component of the layers component of each object, so we can have a look at what this component contains:

names(p1[["layers"]][[1]])
 [1]  "mapping"     "geom_params"  "show.legend"  "stat_params"  "stat"  "inherit.aes"  "geom"  "position"  "super"
 [10] "data"        "aes_params" 
names(p2[["layers"]][[1]])
 [1]  "mapping"     "geom_params"  "show.legend"  "stat_params"  "stat"  "inherit.aes"  "geom"  "position"  "super"
 [10] "data"        "aes_params" 

Further exploration shows that the differences actually reside in the components geom_params and stat_params. For geom_params, the difference is just one of the order of the elements:

p1[["layers"]][[1]]$geom_params
$na.rm
[1] FALSE

$orientation
[1] NA

$se
[1] TRUE

p2[["layers"]][[1]]$geom_params
$se
[1] TRUE

$na.rm
[1] FALSE

$orientation
[1] NA

For stat_params, however, the elements of p1 are a subset of the elements of p2 (in a different order):

p1[["layers"]][[1]]$stat_params
$na.rm
[1] FALSE

$orientation
[1] NA

$se
[1] TRUE

$method
[1] "loess"

$formula
y ~ x

p2[["layers"]][[1]]$stat_params
$method
[1] "loess"

$formula
y ~ x

$se
[1] TRUE

$n
[1] 80

$fullrange
[1] FALSE

$level
[1] 0.95

$na.rm
[1] FALSE

$orientation
[1] NA

$method.args
list()

$span
[1] 0.75

It is not immediately obvious how to deal with this. Creating a customized function for testing the equality of geom_params and stat_params (assuming there are no differences for other plot types) is probably too laborious. The solution seems to be to explicitly tell students which layers to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant