Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update VCLLVM (now Pallas) to LLVM 17, update to newest VerCors version, and convert more instructions to COL #1159

Open
wants to merge 50 commits into
base: dev
Choose a base branch
from

Conversation

superaxander
Copy link
Member

@superaxander superaxander commented Feb 28, 2024

Summary

This PR updates VCLLVM to LLVM 17 and updates it to work with the current VerCors. Additionally more instructions converted into COL, there is support for loops, pointers, structs, and more. This also includes changes to the C and PVL frontends for pointers and ByValueClass encoding. (i.e. structs)

Detailed list of changes

General

  • Ignore a quantifier in SimplifyNestedQuantifiers if it contains a non-inline trigger (it was erroneously rewriting a generated trigger)
  • Make the ParBlockEncoder treat a unary minus as a constant operator (this was needed for one of the HaliVer examples to verify)
  • Update to silicon commit viperproject/silicon@2030e3e to improve verification performance for quantified heap chunks (PR: Consolidating quantified field and predicate chunks viperproject/silicon#860)
  • Add clang-format to the commit hooks to format the C++ code before every commit

Pallas

  • Rename VCLLVM to Pallas
  • Update Pallas to LLVM 17 from LLVM 15
  • Switch from Pallas as a standalone application to a plugin that can be loaded by the LLVM framework
  • Add support for more LLVM-IR instructions
    • AllocA (allocating variables on the stack eliminating the need for the mem2reg pass before verification)
    • GetElementPtr (indexing structs arrays and vectors, currently only with statically known offsets)
    • Load/Store (loading and storing values behind pointers)
    • SExt, ZExt, Trunc (since we only deal with mathematical integers these are translated as no-ops for now)
  • Add support for more LLVM-IR types
    • Structs
    • Pointers (including some very basic function pointer support)
    • Arrays/Vectors (these are currently treated as the same)
  • Add support for more LLVM-IR elements:
    • Global variables
    • Loops (Branch instructions are transformed into ifs + goto and then the goto's are replaced with actual (while) loops. This uses LLVM's loop analyses to find the header and body of the loop)
    • More constant types
    • Implicit pointer casts (based on C support described below, but with automatic type inference since LLVM omits pointer types)
  • Pass in debug locations from the LLVM-IR to VerCors so that error locations in the original source files can be pointed out

C (changes superceding #1172 and #1227)

  • Change permission syntax such that it always takes a pointer except when unfolding a struct permission
    • Permission to write a field f of struct value a becomes Perm(&a.f, write) instead of Perm(a.f, write)
    • Permission to write all fields of a struct value a becomes Perm(a, write) (or if a is a pointer to a struct you'd have Perm(a, write) ** Perm(*a, write))
  • Replace struct encoding with a new one based on ADTs (called ByValueClass in the code) containing pointers (related: Encoding structs (and pointers to structs) #1194, Split Class into ByReferenceClass and ByValueClass #1227)
    • Allow casting pointers to structs to pointers to their first element (possibly multiple layers deep), also allow casting these pointers back up
    • Allow passing around void pointers to have pointers that might not always point to the same type
    • Allow getting a pointer to a struct field
    • Ensure struct values are always treated as values instead of references (this was already true in most cases but could not be guaranteed because unrelated passes were allowed to treat the struct as a Java/PVL class)
  • Enable getting the address of local and global variables (related: Allow the use of pointers to things that are not a reference type #1172)
    • Variables that have their address taken are automatically upgraded to pointer types
    • Currently no automatic permission annotations are generated for local variables so this may be confusing if you have a local variable that you need to declare access for in a loop invariant. (however in every case where this would occur you would not have been able to verify the program previously because of the address-of operator) Generating permission would likely be feasible
  • Add a new pointer type that can never be null (also replaces some of the changes in Encode that malloc may return NULL in C (weaken postcondition) #1239)
  • Improve some of the names (which used to be unknownN in the resulting viper file to help with debugging
  • Remove Viper field access from triggers where DerefPointer or PointerSubscript are the top-level expression (this is generally better for verification performance)

PVL

  • Add the * and & operators to PVL to work with pointer values (which could also already be used through the pointer<T> ADT)
  • Add VerCors option --contract-import-file which allows importing method contracts and types from a PVL file which will replace the function signature/contract of the LLVM function with the same name (this will likely be removed later once Robert's specification stuff gets merged in but is necessary now to test the LLVM frontend)

To do:

  • Update the wiki

@pieter-bos
Copy link
Member

Hey nice work, just a quick general comment while you work on this: I've realized that we're starting to do too much logic in the Lang{_}ToCol passes (i.e. the "unskippable" part), things should be in the transformation stage. The solution is to just adopt more nodes into the "core" of col: I think all reasonable language features should live there, so col-core is a ~superset of all our input languages. I don't mind duplicates (so by all means make a col equivalent of LLVM nodes so the initial translation is simple), but I'd encourage you to think about language features we should add into col.

@superaxander
Copy link
Member Author

Hey nice work, just a quick general comment while you work on this: I've realized that we're starting to do too much logic in the Lang{_}ToCol passes (i.e. the "unskippable" part), things should be in the transformation stage. The solution is to just adopt more nodes into the "core" of col: I think all reasonable language features should live there, so col-core is a ~superset of all our input languages. I don't mind duplicates (so by all means make a col equivalent of LLVM nodes so the initial translation is simple), but I'd encourage you to think about language features we should add into col.

Yeah that makes sense I've so far not touched any of the transformation stuff so I don't quite know what goes in rewrite/, resolve/, etc. But indeed I already felt like some of the things I'm adding will be relevant more broadly.

@superaxander
Copy link
Member Author

My current plan is to remove the changes from #1172 such that this can be merged without affecting any other parts of the code. I had some issues with the new Ubuntu 24.02 image (which I'm using to get LLVM-17) but the last few runs seem to have worked instead of being suddenly cancelled.

@superaxander superaxander marked this pull request as ready for review October 1, 2024 12:17
@superaxander
Copy link
Member Author

Marking this as ready for review since I basically just want to add some tests now before merging this

def transSupportArrowsHelper(
seen: Set[TClass[G]]
): Seq[(TClass[G], TClass[G])] = {
val t: TClass[G] = TClass(this.ref, typeArgs.map(v => TVar(v.ref)))
// TODO: Does this break things if we have a ByValueClass with supers?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it? Do you want to disallow plain assignment for byref classes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I can come up with that's relevant is that in C, casting a MyStruct* to FirstMemberOfMyStruct* is well-defined. So in a way, a CStruct ast node does not have any supertypes, but the LangCToCol pass could compile it into a byvalue class, with the only supertype being equal to the first member of the struct. You'd have to account for that in assignment, though.

In addition, since you're targeting C, assigning different struct types makes no sense. So maybe struct types with supers should be disallowed? Then maybe byval classes should be separate from byref classes (not saying you have to do this, just thinking out loud)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's probably best to not support ByValueClasses with super types for now

@@ -12,7 +12,7 @@ trait EndpointImpl[G]
override def layout(implicit ctx: Ctx): Doc =
Group(Text("endpoint") <+> ctx.name(this) <+> "=" <+> init)

def t: TClass[G] = TClass(cls, typeArgs)
def t: TClass[G] = TByReferenceClass(cls, typeArgs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this might be a bit of a code smell: while this is the right choice for now as it reflects what was already the case, I did not have in mind to restrict endpoints of choreographies to be either byref or byval. I only expect fields and methods.

Maybe we can discuss in-office if it's possible or not to leave it open for both here? I thought the whole point of this PR was to leave it open in rewriting code whether a class is byref or byval, but I'm sure I'm missing a lot of (c++) context.

src/col/vct/col/ast/lang/llvm/LLVMGlobalVariableImpl.scala Outdated Show resolved Hide resolved
@@ -109,6 +109,9 @@ case class SourceName(name: String) extends NameStrategy {
Some(SourceName.stringToName(name))
}

// Used to disambiguate whether to show a ByValueClass as a class or a struct
case class TypeName(name: String) extends OriginContent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you focus on the C frontend, why distinguish between ByValueClass that require a "class" prefix? In addition, there's no PVL syntax, right? So might aswell settle on that ByValueClasses are structs, and that ByRef is class...?

Spec.findClass(name, ctx)
.getOrElse(throw NoSuchNameError("class", name, t))
)
case t @ TByValueClass(ref, _) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case: if you make class/struct distinction, maybe you can change the "class" error message below to "struct" as well.

)) { dispatch(main) }
if (mMap.isEmpty) { Let(b, v, m) }
else {
mMap.foreach(postM =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably just slow, maybe we can have a look at this code later

src/rewrite/vct/rewrite/EncodeForkJoin.scala Outdated Show resolved Hide resolved
@@ -143,7 +143,7 @@ case class EncodeIntrinsicLock[Pre <: Generation]() extends Rewriter[Pre] {

override def dispatch(decl: Declaration[Pre]): Unit =
decl match {
case cls: Class[Pre] =>
case cls: ByReferenceClass[Pre] =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated: should intrinsicLockInvariant for ByValueClass throw an exception or something? Or maybe should we remove it from the Class trait?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I assumed it would be fine to just make the lock invariant a fixed true

src/rewrite/vct/rewrite/EncodeResourceValues.scala Outdated Show resolved Hide resolved
src/rewrite/vct/rewrite/EncodeResourceValues.scala Outdated Show resolved Hide resolved
Copy link
Contributor

@bobismijnnaam bobismijnnaam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There you go! Good work!

val (patternsHere, body) = patterns.collect {
// We only want to inline lets that are defined inside the quantifier
letBindings.having(ScopedStack()) { dispatch(f.body) }
localHeapVariables.scope {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: maybe the concept of variables and heap variables can be unified behind one scope/trait? They both have reasonable get/set implementations, and heap variables further extend that with an address. Just a thought, might not be worth the effort either?

@@ -82,7 +82,7 @@ case class GenerateSingleOwnerPermissions[Pre <: Generation](
),
)

case cls: Class[Pre] if enabled =>
case cls: ByReferenceClass[Pre] if enabled =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an exercise you might consider integrating heap variables into this pass, though maybe you already do so somewhere else?

node match {
// Same logic as CollectLocalDeclarations
case Scope(vars, impl) =>
val (newVars, newImpl) = variables.collect {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, scope.rewrite()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the difference, maybe it's even rewriteDefault in this case?

(a, b) match {
case (None, _) | (_, None) => None
case (Some(a), Some(b))
if a == b || rw.dispatch(a) == rw.dispatch(b) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing things in Post is highly dubious! Might trigger comparison of some uninitialized refs. I know sycl used to do this every once in a while and it's a pain.

Is it maybe possible to make a more explicit ordering for a fragment of types? As this is for llvm, you only need to do this for say ints, structs, and the spec types (seq/map etc.). And for the spec types you can maybe just say to not let them interleave with llvm types maybe, to simplify things? Relying on the behaviour of another rewriter is fragile. It's also fine if your ordering is very limited and needs to be extended every once in a while. What about only using moreSpecific, and maybe making moreSpecific more complete for the identity cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I knew this was dubious when I wrote it. I do it because I'm mixing LLVM and PVL in places and their types will only match after the LangSpecificToCol pass has finished. This was mostly a cop-out for not having to write out every LLVM type that might match a PVL type and vice-versa.

}
if (subMap.isEmpty) { value }
else {
// TODO: Support multiple guesses?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if there's an actual use case, I don't imagine any sane compiler does this (I'm probably wrong! :D)

src/rewrite/vct/rewrite/lang/LangSpecificToCol.scala Outdated Show resolved Hide resolved
src/rewrite/vct/rewrite/lang/LangTypesToCol.scala Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants