Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Repository): Use KnownProvenance instead of VcsInfo #8764

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

pepper-jk
Copy link
Contributor

@pepper-jk pepper-jk commented Jun 17, 2024

In order to allow source artifacts as well as local source code to be scanned, we require a more generallized Repository, which allows any type of KnownProvenance. While we still set the signature of Repository to allow KnownProvenances, this commit focuses on accomidateing RepositoryProvenance as then main input for now.

Therefore we wrap VcsInfo with RepositoryProvenance, whenever it is used as input, and unwrap it, when it is produced as output. This gives us a quick update of the now depreacted code, without the necessity to support all KnownProvenance types from the get go.

We also update any realted tests, mostly the expected OrtResults, but also some Repository definitions and calls.

To avoid having vcs and vcsProcessed appear in the OrtResult output, we change them to be a method (fun) instead of a variable (val). This is still a soft refactor, to keep the widely used vcsProcessed available for now. We might still remove it later.

Signed-off-by: Jens Keim [email protected]

Part of: #8803.

@pepper-jk pepper-jk force-pushed the repo-known-provenance branch 8 times, most recently from c5852c3 to dd222f1 Compare June 20, 2024 14:32
@pepper-jk pepper-jk force-pushed the repo-known-provenance branch 4 times, most recently from 3f1306d to fdf3ebb Compare June 20, 2024 16:20
Copy link

codecov bot commented Jun 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.79%. Comparing base (37f4aa5) to head (ba2a2a2).

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #8764   +/-   ##
=========================================
  Coverage     67.79%   67.79%           
  Complexity     1164     1164           
=========================================
  Files           243      243           
  Lines          7711     7711           
  Branches        861      861           
=========================================
  Hits           5228     5228           
  Misses         2127     2127           
  Partials        356      356           
Flag Coverage Δ
funTest-non-docker 33.96% <100.00%> (ø)
test 38.12% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pepper-jk pepper-jk force-pushed the repo-known-provenance branch 13 times, most recently from 26e0a8f to a210a69 Compare June 26, 2024 16:25
@pepper-jk pepper-jk force-pushed the repo-known-provenance branch 2 times, most recently from 729a29d to 2f69343 Compare June 26, 2024 16:55
@pepper-jk pepper-jk marked this pull request as ready for review June 26, 2024 17:11
@pepper-jk pepper-jk requested review from a team as code owners June 26, 2024 17:11
@pepper-jk pepper-jk changed the title Draft: feat(Repository): Use KnownProvenance instead of VcsInfo feat(Repository): Use KnownProvenance instead of VcsInfo Jun 26, 2024
@@ -83,6 +83,11 @@ data class VcsInfo(
* Return a [VcsInfoCurationData] with the properties from this [VcsInfo].
*/
fun toCuration() = VcsInfoCurationData(type, url, revision, path)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message nits:

  • Duplicate "method" in first sentence.
  • Please use passive voice, i.e. no "we", "I", "us" etc.

Copy link
Contributor Author

@pepper-jk pepper-jk Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I overhauled the commit message after the changes to the commit.

EDIT: had to force push the old version, in order to see your reviews on the correct commit.
I will go through everything first before pushing any changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed new commits with code changes only.
I will squash them once they are reviewed and accepted.
Afterwards I will update the commit messages and force push.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed the squashed commits with the overhauled messages.

@@ -83,6 +83,11 @@ data class VcsInfo(
* Return a [VcsInfoCurationData] with the properties from this [VcsInfo].
*/
fun toCuration() = VcsInfoCurationData(type, url, revision, path)

/**
* Return true if this vcs information matches the other vcs information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/vcs/VCS/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs for this function require some special attention: The matching disregards the path property (otherwise just vcsInfoA == vcsInfoB could have been used for matching). So in a way this is a bit similar to a PackageCuration's isApplicableDisregardingVersion() function, and I'd name it as equalsDisregardingPath().

Copy link
Contributor Author

@pepper-jk pepper-jk Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I missed that. I'll change the name and refactored any related changes accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I change the name and the doc, should the same doc changes be applied to the matches method in RespositoryProvenance in the next commit:
f43b5d9#diff-db17c034876de657175ab28d452f2812913baba034a764c609990260ce101751

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. The current docs already make clear the only the normalized VCS info is compared, so I think we're good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* provenance's normalized [VcsInfo][vcsInfo].
*/
override fun matches(other: Provenance): Boolean =
other is RepositoryProvenance && other.vcsInfo.normalize().matches(vcsInfo.normalize())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this disregarding the resolvedRevision? And if that's a mistake, why not simply use provenanveA == provenanceB for matching Provenances?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not done on purpose.

why not simply use provenanveA == provenanceB for matching Provenances?

Because Repository used the equalsDisregardingPath style matches before, and I wanted to ensure the same behavior as before. IIRC some tests would also fail with just == between VcsInfo, which provenanveA == provenanceB would imply, correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which provenanveA == provenanceB would imply, correct?

Yes, == on data classes is a "recursive equals()".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comparison of resolvedRevision.

@@ -80,7 +80,9 @@ data class RepositoryProvenance(
val resolvedRevision: String
) : KnownProvenance {
init {
require(resolvedRevision.isNotBlank()) { "The resolved revision must not be blank." }
require(vcsInfo == VcsInfo.EMPTY || resolvedRevision.isNotBlank()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message nits:

  • s/vcs/VCS/
  • s/git/Git/
  • Odd indentation before "Both"
  • Again use passive voice

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic-wise, I'd prefer to have the clause order swapped. Usually we require resolvedRevision.isNotBlank() but if it is blank, the whole VcsInfo must be empty: resolvedRevision.isNotBlank() || vcsInfo == VcsInfo.EMPTY.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • s/vcs/VCS/

vcs is part of a test case name string. so changing the capitalization would erode the reference.
Maybe I should change the format to "fail if no vcs matches" to make this more clear and also explicitly call them test cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swapped the logic around.

@@ -42,6 +42,7 @@ import org.ossreviewtoolkit.model.AnalyzerResult
import org.ossreviewtoolkit.model.AnalyzerRun
import org.ossreviewtoolkit.model.OrtResult
import org.ossreviewtoolkit.model.Repository
import org.ossreviewtoolkit.model.RepositoryProvenance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message:

  • Again use passive voice
  • Typo in "generallized" -> "generalized"
  • "Repository" -> "Repository class" (use backticks so it's clear you refer to the class name)
  • "accomidateing" -> "accommodating"
  • "then main" -> "the main"
  • "depreacted" -> "deprecated"
  • "realted" -> "related"


/**
* The configuration of the repository, parsed from [ORT_REPO_CONFIG_FILENAME].
*/
val config: RepositoryConfiguration = RepositoryConfiguration()
) {
companion object {
private const val DEFAULT_REVISION = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more "empty" than "default". And I'd simply inline it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

nestedRepositories = emptyMap(),
config = RepositoryConfiguration()
)
}

fun vcs(): VcsInfo {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, this could be a getter-property annotated with JsonIgnore. Same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Let's see if the tests like me.

@@ -80,7 +80,9 @@ data class RepositoryProvenance(
val resolvedRevision: String
) : KnownProvenance {
init {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why haven't you decided for keeping the require as-is, and make the tests construct instances with non-empty resolved revision?

I'm asking to me it seems that if repository provenance is sued, the VCS info should not be empty, because otherwise it would be an Unkown provenance, wouldn't it?

Copy link
Contributor Author

@pepper-jk pepper-jk Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are probably correct. Changing the tests¹ to use "main" as revision instead of blank, like I did for model/src/test/kotlin/OrtResultTest.kt, might be the better way to solve the issue with the tests, than allowing blank resolvedRevisions in RepositoryProvenance for VcsInfo.Empty.

However, the Analyzer also allows VcsInfo.Empty to be passed from workingTree to the Repository. And with VcsInfo.Empty workingTree will very likely also return an empty revision. So in order to not change that behavior of the Analyzer and Repository in this case, I added this exception to RepositoryProvenance.

I have not yet noticed any issues regarding non-Repository uses of the RepositoryProvenance.

Still you are bringing up some good points, maybe the Analyzer behavior could change in this case? Then we could revert the exception to RepositoryProvenance. We could dial this in once we handle other Provenances in the Analyzer.

¹ evaluator/src/test/kotlin/TestData.kt
model/src/test/kotlin/licenses/TestData.kt
reporter/src/testFixtures/kotlin/TestData.kt


/**
* A map of nested repositories, for example Git submodules or Git-Repo modules. The key is the path to the
* nested repository relative to the root of the main repository.
*/
@JsonInclude(JsonInclude.Include.NON_EMPTY)
val nestedRepositories: Map<String, VcsInfo> = emptyMap(),
val nestedRepositories: Map<String, KnownProvenance> = emptyMap(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nested repositories can remain VcsInfo or alternatively repository provenance, because by definition this can only be a repository.

Copy link
Contributor Author

@pepper-jk pepper-jk Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Provenances have no hierarchy? The information required to create nestedRepositories is only available when using Git as a VCS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, Git and GitRepo are the only implementors of WorkingTree.getNested().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now reverted the nestedRepositories changes (ac06be1) and will squash them into the prior commit (665ab20) once all checks pass.

Rebasing on current main was also no issue, so I guess I will be done soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed the rebased and squashed version.

@fviernau
Copy link
Member

Can we have a GitHub issue which describes exactly the goal of this effort? In particular, mention explicitly in which cases we want to support local file provenance. (e.g. analyzer, scanner, scan storages, ...). I'm still worried about the added complexity of this refactoring, compared to the value it will provide, but having the goal clearly statet would at least ensure we are reviewing against the same outlined goal.

@sschuberth
Copy link
Member

Can we have a GitHub issue which describes exactly the goal of this effort?

I have started such an issue at #8803 for people to comment on.

@pepper-jk pepper-jk force-pushed the repo-known-provenance branch 3 times, most recently from ca44715 to 665ab20 Compare June 28, 2024 13:03
pepper-jk pushed a commit to pepper-jk/ort that referenced this pull request Jul 1, 2024
As was pointed out during the code review [1], there is now need
to change `nestedRepositories` to support `Provenance`s, as only
`git` is supported for `nestedRepositories`, which always contains
`vcsInfo`. In order to reduce the amount of changes for this PR,
any changes regarding `nestedRepositories` are now reverted.

[1] oss-review-toolkit#8764 (comment)

Signed-off-by: Jens Keim <[email protected]>
pepper-jk pushed a commit to pepper-jk/ort that referenced this pull request Jul 1, 2024
As was pointed out during the code review [1], there is now need
to change `nestedRepositories` to support `Provenance`s, as only
`git` is supported for `nestedRepositories`, which always contains
`vcsInfo`. In order to reduce the amount of changes for this PR,
any changes regarding `nestedRepositories` are now reverted.

[1] oss-review-toolkit#8764 (comment)

Signed-off-by: Jens Keim <[email protected]>
…ath`

Up until now the method `VcsInfo.matches(VcsInfo)` was only defined
and used inside the `Repository` class. With shifting from `VcsInfo`
to `RepositoryProvenance`, the matcher needs to be available inside
of `RepositoryProvenance`.
For that purpose, it gets moved into `VcsInfo` proper, in order to
be available anywhere `VcsInfo` is used.
It further gets a new name `equalsDisregardingPath`, since it disregards
the path, when matching `VcsInfo`s. [1]

[1] oss-review-toolkit@9d271b5#r1655333611

Signed-off-by: Jens Keim <[email protected]>
In preparation of replacing `VcsInfo` in `Repository` with
`KnownProvenance`, a method to match `Provenance` against
each other was added.
This inherited method, allows any `Provenace` to be matched
against any other. Since the type is verified to be equal
before matching any attributes, it can even match against
`UnknownProvenance`, no need to limit it to `KnownProvenance`.

Signed-off-by: Jens Keim <[email protected]>
This will allow tests using `Repository` objects with `VcsInfo.EMPTY` to
continue to function, even though `RepositoryProvenance` does not usually
allow blank revisions. By restricting the exception to `VcsInfo.EMPTY` any
other behavior of `RepositoryProvenance` should be retained.

Some unrelated `OrtResultTest`s had to be given non-blank revisions:
Both test cases `"fail if no vcs matches"` and `"use the correct vcs"`
don't hinge on the given blank revision, so to assure they do not fail
for unrelated reasons, the default Git revision "main" was added to the
example repos.

Signed-off-by: Jens Keim <[email protected]>
In order to allow source artifacts as well as local source code to
be scanned, a more generalized `Repository` class is required, which
allows any type of `KnownProvenance`. While the signature of the
`Repository` class is set to allow any `KnownProvenance` object,
this commit focuses on accommodating `RepositoryProvenance` as the
main input for now.

Therefore `VcsInfo` is wrapped within `RepositoryProvenance`, whenever
it is used as input, and unwraped, when it is produced as output.
This results in a faster update of the now deprecated code, without
the necessity to support all `KnownProvenance` types from the get go.

Any related tests are also updated, mostly the expected `OrtResults`,
but also some `Repository` definitions and calls.

To avoid having `vcs` and `vcsProcessed` appear in the `OrtResult`
output, getter properties were created and marked as `@JsonIgnore`.

Signed-off-by: Jens Keim <[email protected]>
@pepper-jk
Copy link
Contributor Author

@sschuberth @fviernau I know, you are still considering if this refactoring is the right way, but if you could find the time to review the current changes that would be great.

Since most changes regard code quality and commit messages, it would be great to know if the changes know meet your standards. Not only for this PR, but also for future work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants