Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make java home configurable per-module #3716

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

albertpchen
Copy link

This PR aims to address #3480.

Changes:

  • adds coursier-jvm as a dependency to the util module, this library is for fetching JVMs using coursier and the coursier jvm-index
  • add a def JavaHome: Task[Option[PathRef]] task to ZincWorkerModule. This defaults to None, which is the existing behavior of using mill's java home.
  • add ZincWorkerModule.ForJvm dynamic cross module. This fetches the coursier jvm-index and populates the list of zinc worker modules based on that. This allows you to fetch a different Jvm and create a ZincWorkerModule e.g. ZincWorkerModule.ForJvm("temurin:1.11.0.24"). This uses coursier to fetch and cache the jvm and the jvm-index
  • updates the mockito and commons-io examples to use the new configuration options. Now these examples should run even when mill itself is not running on java 11.
    • This also required adding -encoding utf-8 to the projects' javac options.
  • update the JmhModule.generateBenchmarkSources task to not set jvmArgs = javacOpts. Since we now need the -encoding argument and that is not a valid jvm argument. The old behavior seems like a bug to me?
  • update the run and test tasks to use the zincWorker's java home if a different one specified

A few things I'm not sure about:

  • the ZincWorkerModule.ForJvm dynamic cross module will try to fetch the jvm-index on initialization. is this OK?
  • the tests will now fetch Jvms using coursier. maybe this is a bit too much to run tests? I'm not sure now to mock this out though/what the best way to test these changes are.

@albertpchen
Copy link
Author

I think I'm running into the same issue as this sbt/sbt-assembly#496. coursier-jvm pulls in shapless which has the problematic class. not sure what the correct fix is but I can think of a few options

  1. add export LC_ALL=C.UTF-8 to the workflow as mentioned in the issue
  2. since we might not want shapeless as a dependency anyway, reimplement the parts of coursier-jvm that we need. I don't think this should be too hard, since coursier already has all the fetching and caching logic

@lihaoyi
Copy link
Member

lihaoyi commented Oct 11, 2024

At a first skim this looks great. I didnt know coursier could fetch JVMs or that it was so easy to integrate

Is it possible to configure a module to point it at a JVM not listed in coursier's index? I imagine there would be some use cases for that

The cross module stuff is lazily initialized then cached, so it should be OK to perform heavy downloads and initialization, but is it possible to move the download into a task? That would help benefit from parallelization and log handling and error reporting and all that

@alexarchambault should probably review this as well, since he knows more about coursier than i do. One question is whether coursier JVM downloads are safe to run in parallel, or if they need to be locked/retries in mill and/or made safe upstream

@albertpchen
Copy link
Author

Is it possible to configure a module to point it at a JVM not listed in coursier's index? I imagine there would be some use cases for that

yes this is possible. I added def javaHome: Task[Option[PathRef]] task to the base ZincWorkerModule trait that can be overridden to set the JVM to an arbitrary path. we could also expose a configuration option to change the url we fetch the index from or make the in-memory JvmIndex mapping a configuration option. I'm not sure how many knobs should be exposed though, so I just added the one javaHome option.

The cross module stuff is lazily initialized then cached, so it should be OK to perform heavy downloads and initialization, but is it possible to move the download into a task? That would help benefit from parallelization and log handling and error reporting and all that

How do you make the cross value for a cross module the output of a task? I'm not sure how, since it needs to be invoked outside of a Task { ... } block

@lihaoyi
Copy link
Member

lihaoyi commented Oct 11, 2024

Ah I see, you mean the index itself, and not the JVMs that the index references.

I think what we should do here is avoid using a Cross module to list out the various ZincWorkerModule.ForJvms, and instead just make it something that the user has to instantiate in their build. Something like:

object myZincWorker = new ZincWorkerModule.ForJvm("graalvm-community:23.0.0") 
object core extends JavaModule {
  override def zincWorker = ModuleRef(myZincWorker)
  object test extends JavaTests with TestModule.Junit4
}

That is a tiny bit more boilerplate in the build file, but avoids the need for Mill to know about the contents of the coursier index entirely, and avoids having a huge cross module inside ZincWorkerModule that is mostly unused.

We still want to provide a way to download the index and list out its entries for debugging purposes, but that can be a normal task or command since it won't change the shape of the moduletree/taskgraph

@lihaoyi
Copy link
Member

lihaoyi commented Oct 11, 2024

@alexarchambault one question about https://github.com/coursier/jvm-index: do we offer any guarantees on that? e.g. does Coursier need to fetch it from Github every time? I'm concerned about having an operational dependency here, e.g. I don't want people's Mill builds to start failing if Github has an outage (that's also why we moved the Mill assembly jar from Github to Maven Central in 0.11.x)

To avoid a dependency on Github, maybe we can publish the index to maven central somewhere, since that's already a piece of infrastructure we are bound to? Or we can bundle snapshots of it with Mill when we make releases

@lihaoyi
Copy link
Member

lihaoyi commented Oct 11, 2024

Come to think of it, if the index is bundled with Mill, then having it used to initialize the Cross module becomes not an issue at all

@alexarchambault
Copy link
Contributor

one question about https://github.com/coursier/jvm-index: do we offer any guarantees on that? e.g. does Coursier need to fetch it from Github every time? I'm concerned about having an operational dependency here, e.g. I don't want people's Mill builds to start failing if Github has an outage (that's also why we moved the Mill assembly jar from Github to Maven Central in 0.11.x)

It's fetched by coursier just like a snapshot artifact. So it lives in the coursier cache (cs get https://github.com/coursier/jvm-index/raw/master/index.json fetches it and prints its path), and coursier checks for updates if the last check is longer than the coursier TTL (24h by default). If it's not in cache, it needs to be downloaded from GitHub, yes.

@alexarchambault
Copy link
Contributor

It used to be pushed to Maven Central too, some time ago. This can always be re-instated (coursier/jvm-index#277).

@alexarchambault
Copy link
Contributor

Come to think of it, if the index is bundled with Mill, then having it used to initialize the Cross module becomes not an issue at all

It's up to you, but beware that it's getting more and more heavyweight (~1.8 MB uncompressed, 110 kB gzipped)

@alexarchambault
Copy link
Contributor

@albertpchen It looks like you inlined quite some code from coursier-jvm. Is it because of the shapeless issue you mentioned? It seems it's pulled by coursier-jvm by mistake (or not to break bin compat, I don't recall). You can safely exclude it manually when you pull it in Mill.

@lihaoyi
Copy link
Member

lihaoyi commented Oct 11, 2024

@alexarchambault I think 110kb is fine to bundle, the other issue is that it would fall stale relatively wuickly as new versions are added

I glanced through the indices, and it seems like a lot of the download paths are computed based on the artifact version. Could the logic to compute these be bundled into Mill, rather than bundling the generated JSON? That would have the effect that new versions of known distirbutions would automatically be supported, and we would only need to make a new release if new distributions are added or the URL pattern for a distribution changes (both of which should be much rarer than existing distirbutions adding new versions).

We could still bundle a list of known good versions, for debugging and error reporting purposes, but it would mean we could support unknown good versions as well

@alexarchambault
Copy link
Contributor

alexarchambault commented Oct 11, 2024

I glanced through the indices, and it seems like a lot of the download paths are computed based on the artifact version

Somehow, except there are often subtleties. In https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.7+7/OpenJDK17U-jdk_x64_mac_hotspot_17.0.7_7.tar.gz for example, 17.0.7+7 is the "build" version, and it ends up as both 17.0.7+7 and 17.0.7_7 in the URL.

@lihaoyi
Copy link
Member

lihaoyi commented Oct 11, 2024

I glanced through the indices, and it seems like a lot of the download paths are computed based on the artifact version

Somehow, except there are often subtleties. In https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.7+7/OpenJDK17U-jdk_x64_mac_hotspot_17.0.7_7.tar.gz for example, 17.0.7+7 is the "build" version, and it ends up as both 17.0.7+7 and 17.0.7_7 in the URL.

Yes, but that is accounted for by the Scala code in coursier/jvm-index right? So if we bundle the few kb of classfiles we get all these special cases included, as well as support for any future versions (unless they add additional special cases)

@alexarchambault
Copy link
Contributor

alexarchambault commented Oct 11, 2024

Yes, but that is accounted for by the Scala code in coursier/jvm-index right?

What do you mean? Right now, the index basically look like this:

$ cat "$(cs get https://github.com/coursier/jvm-index/raw/master/index.json)" | jq '.darwin.amd64["jdk@temurin"]'
…
  "1.17": "tgz+https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17%2B35/OpenJDK17-jdk_x64_mac_hotspot_17_35.tar.gz",
  "1.17.0.1": "tgz+https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.1%2B12/OpenJDK17U-jdk_x64_mac_hotspot_17.0.1_12.tar.gz",
  "1.17.0.10": "tgz+https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.10%2B7/OpenJDK17U-jdk_x64_mac_hotspot_17.0.10_7.tar.gz",
  "1.17.0.11": "tgz+https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.11%2B9/OpenJDK17U-jdk_x64_mac_hotspot_17.0.11_9.tar.gz",
  "1.17.0.12": "tgz+https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.12%2B7/OpenJDK17U-jdk_x64_mac_hotspot_17.0.12_7.tar.gz",
…

There's some logic in the coursier-jvm module, so that the superfluous 1. is ignored, and versions such as just 17 actually match the highest 17.x version… if there's no exact match (there's one here, but 17.0 will actually match the highest 17.0.x version)

@alexarchambault
Copy link
Contributor

But we can probably "compress" that a bit, with a pattern and "build" version listings (in the snippet above, 17+35, 17.0.1+12, etc., rather than 17, 17.0.1, etc.) and allow users to pass newer build versions if they'd like. But these build versions don't look handy…

@lihaoyi
Copy link
Member

lihaoyi commented Oct 11, 2024

@alexarchambault i mean the scala code in https://github.com/coursier/jvm-index/tree/master/src; isnt that what generates the json?

@alexarchambault
Copy link
Contributor

i mean the scala code in https://github.com/coursier/jvm-index/tree/master/src; isnt that what generates the json?

It does, yeah

@lihaoyi
Copy link
Member

lihaoyi commented Oct 11, 2024

Could that coxe be bundled with Mill to generate the download URL from a jdk version that gets dynamically passed in? I see some parts of it are passing github tokens around, but its unclear to me how much of the logic needs to talk to github vs how much is just mangling strings locally

@albertpchen
Copy link
Author

@albertpchen It looks like you inlined quite some code from coursier-jvm. Is it because of the shapeless issue you mentioned? It seems it's pulled by coursier-jvm by mistake (or not to break bin compat, I don't recall). You can safely exclude it manually when you pull it in Mill.

yeah thanks for the tip. I've updated the PR

def ivyDeps = Agg(
build.Deps.coursier,
build.Deps.coursierJvm.exclude(
"com.github.alexarchambault" -> "argonaut-shapeless_6.3_2.13",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment here explaining why it is excluded, for future reference


// The first eight bytes are magic numbers followed by two bytes for major version and two bytes for minor version
// We are overriding to java 23 which corresponds to class file version 67
os.read.bytes(coreClassFile.get, 4, 4).toSeq == Seq[Byte](0, 0, 0, 67)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add a test to exercise .test and .run and verify that System.getProperty("java.version") returns the expected value. We should also exercise this for two different java versions, to verify that it's not just picking up the environmental default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should overriding the java version also affect .launcher and .assembly?

  • launcher is explicitly not meant to be portable: it's meant to be something you run on the same machine where it was built, so any JVM you installed would still be present

  • assembly is meant to be portable, e.g. run on a different machine from which it was built, so we probably can't rely on the configured java home for that

@lihaoyi
Copy link
Member

lihaoyi commented Oct 13, 2024

So I looked up how Gradle does this, and it appears they depend on the Adoptium service API https://api.adoptium.net/. I'm reluctant to have an additional service dependency from Mill, because it means that we need to start worrying about rate limiting, uptime, SLAs, outages, etc.. "Pull from latest master on Github" also definitely counts as "service dependency", because a Github outage or someone fat-fingering a bad push would then be able to cause an outage in Mill users.

Here's how I think we should proceed:

  1. @alexarchambault can we re-enable publishing stable versions of coursier/jvm-index to maven central? That would give us an immutable artifact to depend on and provide stronger guarantees than pulling stuff directly off of github, improving hermeticity/reproducibility reducing the blast radius if anything goes wrong (github down, someone fat-fingers a push, etc.). We pay a price of staleness, but I think that's a tradeoff that's worth it.

  2. Once that's done, we move this PR's runtime dependency to maven central instead of github/jvm-index master, and we can merge this and close out the bounty.

  3. We can have a longer-term follow up discussion of how to reduce the staleness issue. I have some ideas, e.g. breaking up the JVM index into separate sonatype artifacts per-version, so that newer artifact version-numbers/download-urls can be added piecemeal and be picked up by Mill/Coursier without having to re-publish the whole index and bump the index version in Mill. But that can be done later and shouldn't block the merging of this PR

Copy link
Member

@lefou lefou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should come with documentation and examples:

  • How to configure a specific Java version?
  • How to find available Java versions?
  • What's the default?
  • How can I use a locally installed JVM which path is hold in a system environment variable MY_PROJECT_JAVA_HOME?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants