Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work through data format example for GSD-2020-7471 #140

Closed
westonsteimel opened this issue Feb 5, 2022 · 10 comments
Closed

Work through data format example for GSD-2020-7471 #140

westonsteimel opened this issue Feb 5, 2022 · 10 comments
Labels
data format help wanted Extra attention is needed idea A new idea to explore as a group

Comments

@westonsteimel
Copy link
Contributor

westonsteimel commented Feb 5, 2022

I know @kurtseifried has documented some great stuff about potential data formats here, but I think it would be quite helpful to work through an actual example record, and I suggest having a try with GSD-2020-7471.

I like this one because it presents several common challenges that I would like us as a community to work on addressing. For instance, how do we want to handle package naming and versioning differences across various package managers? Here, the vulnerability is for Django, and for PyPI specifically we have PYSEC-2020-35 in the OSV format; however, what about for the Debian package where the name is python-django and there are fixes backported to earlier versions than PyPI?

If we end up using something like the OSV format for the primary GSD namespace, is this one OSV record with multiple affected entries with various ecosystem, package name, and version entries, or is it something like an entire OSV record for each ecosystem as a separate namespace (so each OS or packaging ecosystem could potentially have its own custom description, etc)?

Or do we want a separate GSD id for each one and some parent record that unifies them?

In this case I'd expect to see something for at least the following:

  • PyPI
  • Anaconda (typically same name and affected version ranges as PyPI)
  • Debian
  • Ubuntu
  • Alpine
  • Gentoo
  • Probably a bunch more Linux distros?
  • source control repository with commit-level info

I'll try to throw in some example json of a few possible approaches when I have some more time, but please start sharing any ideas you all have!

@joshbressers
Copy link
Collaborator

I'm of the opinion not to overload fields to accomplish this. For example I'm not a fan of the affected field in OSV, I think it's overly complex.

The last time I discussed this with anyone I was leaning towards one ID per package/ecosystem, but that was before we had a concept of a namespace.

I think I currently would lean in the direction of using a lot of namespaces. For example if we had a PURL based namespace we would get a lot of functionality for free.

@kurtseifried
Copy link
Contributor

thinking out loud namespaces give us the best of both worlds, e.g. you can have vendor/project/organization specific namespaces, e.g. "debian.org" with whatever their data/formats are, and standards-based namespaces, e.g. "packageurl" or "purl" or "CPE" or "OVAL" or whatever format. And now people have a hint at least on what that data is and how to go about processing it. Also ideally we have tools to parse our files.

@westonsteimel
Copy link
Contributor Author

Yeah, I think I've come to a similar conclusion. I was thinking something like similar to what we have with cve.org and NVD namespaces we also have a namespace per other organization with more of a raw format of what they provide (I was thinking of starting with the GitHub security advisories and GitLab community advisories). Having that data is helpful (especially when they don't agree with each other) as it allows people to assign a degree of confidence based on sources they trust. And when there are disagreements between sources we can flag that for manual review and reach out to get them corrected in the sources when possible. GitLab at least is very happy to get that feedback. GitHub can be a bit more difficult since the advisories are often at a per-project level.

@westonsteimel
Copy link
Contributor Author

But I also think there is value in having some sort of standard easily consumable format for most of those. One thing I was thinking of is letting the OSV project take care of some of that. I was already thinking of getting the various Linux distro alerts setup to feed into their existing pipeline. One really nice thing that OSV tries to do is have an array of explicit affected versions so that stuff consuming it doesn't have to understand the weirdness of various ecosystems version range logic.

@westonsteimel
Copy link
Contributor Author

Anyways, I'll try to work on some example JSON of what I was thinking, hopefully sometime today

@westonsteimel
Copy link
Contributor Author

Sorry, guess I didn't get to it yet. I'm thinking about too many things at once. Are there any objections to feeding in the data from GitHub Advisories into a namespace github.com and GitLab Community Advisories into a namespace gitlab.com similar to what we're doing with the cve and nvd data?

@westonsteimel
Copy link
Contributor Author

And if we think that is okay, would gsd-tools be a good place to integrate those? I can create issues to track that work there if so (and work on it as well (I kinda already started anyways))

@joshbressers
Copy link
Collaborator

I think that's a perfectly reasonable place
The CVE/NVD import tooling is here
https://github.com/cloudsecurityalliance/gsd-tools/tree/main/securitylist

Maybe start on a feature branch or fork that we can merge once it's all working

@westonsteimel
Copy link
Contributor Author

Cool, I'll start working on that then and then come back to figuring out more of the details on this one. I do like the idea of having a bunch of the affected package urls somewhere and preferably not having to parse ecosystem-specific version ranges

@joshbuker joshbuker transferred this issue from CloudSecurityAlliance/gsd-database Mar 3, 2023
@joshbuker
Copy link
Member

Related discussion in the OSV repo: ossf/osv-schema#123

@joshbuker joshbuker added help wanted Extra attention is needed idea A new idea to explore as a group labels Mar 29, 2023
@CloudSecurityAlliance CloudSecurityAlliance locked and limited conversation to collaborators Mar 29, 2023
@joshbuker joshbuker converted this issue into discussion #190 Mar 29, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
data format help wanted Extra attention is needed idea A new idea to explore as a group
Projects
None yet
Development

No branches or pull requests

4 participants