Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datatypes validation & CLI error messages #71

Open
rapw3k opened this issue Feb 22, 2021 · 6 comments
Open

datatypes validation & CLI error messages #71

rapw3k opened this issue Feb 22, 2021 · 6 comments

Comments

@rapw3k
Copy link

rapw3k commented Feb 22, 2021

Hi @ashleysommer
Thanks a lot for the hints.
Following issue #70, using the following inputs:

The shape is here: https://raw.githubusercontent.com/rapw3k/DEMETER/master/models/SHACL/demeterAgriProfile-SHACL.ttl
The example data graph is here: https://box.psnc.pl/f/c95eb51962/?raw=1

Issue 1
I checked those erros in the shapefile of property shapes without sh:path. These were generated automatically for owl:disjointwith statements, which translate it into the statements below (https://astrea.linkeddata.es/documentation.html).

?shapeUrl a sh:NodeShape ;
   sh:not ?propertyShape . 
?propertyShape a sh:PropertyShape ;
   sh:class ?disjointType .

I understand this is not correct, so I can just remove the statement of type and it works.

?shapeUrl a sh:NodeShape ;
   sh:not ?propertyShape . 
?propertyShape 
   sh:class ?disjointType .

An issue, however, with the command line is that I didnt get information of the error, so I didnt know what was happening. The error i got is runtime error (MacOS) as below:

RAPMAC-3:SHACL rap$ pyshacl -s /Users/rap/GitRepositories/GitHub/DEMETER/models/SHACL/demeterAgriProfile-SHACL.ttl -e /Users/rap/GitRepositories/GitHub/DEMETER/models/cross-domain.ttl -i rdfs -a -j -f human /Users/rap/Downloads/pilot5.2-afc-observation-point-simplified.ttl
  File "/usr/local/lib/python3.9/site-packages/pyshacl/cli.py", line 169, in main
    is_conform, v_graph, v_text = validate(args.data, **validator_kwargs)
  File "/usr/local/lib/python3.9/site-packages/pyshacl/validate.py", line 390, in validate
    conforms, report_graph, report_text = validator.run()
  File "/usr/local/lib/python3.9/site-packages/pyshacl/validate.py", line 223, in run
    shapes = self.shacl_graph.shapes  # This property getter triggers shapes harvest.
  File "/usr/local/lib/python3.9/site-packages/pyshacl/shapes_graph.py", line 164, in shapes
    self._build_node_shape_cache()
  File "/usr/local/lib/python3.9/site-packages/pyshacl/shapes_graph.py", line 208, in _build_node_shape_cache
    raise ShapeLoadError(


Validator encountered a Runtime Error. Please report this to the PySHACL issue tracker.

issue 2
I fixed the shape, accessible in the same location above, and tried adding the target ontology into the mix with the -e option.
The target ontology is available here: https://raw.githubusercontent.com/rapw3k/DEMETER/master/models/cross-domain.ttl
Now, I get a validation error:

RAPMAC-3:SHACL rap$ pyshacl -s /Users/rap/GitRepositories/GitHub/DEMETER/models/SHACL/demeterAgriProfile-SHACL.ttl -e /Users/rap/GitRepositories/GitHub/DEMETER/models/cross-domain.ttl -i rdfs -a -j -f human /Users/rap/Downloads/pilot5.2-afc-observation-point-simplified.ttl
Validation Report
Conforms: False
Results (1):
Constraint Violation in DatatypeConstraintComponent (http://www.w3.org/ns/shacl#DatatypeConstraintComponent):
	Severity: sh:Violation
	Source Shape: <https://astrea.linkeddata.es/shapes#0f6502053608fc059a721676b6e67115>
	Focus Node: <http://www.w3id.org/afarcloud/pCoord?lat=45.75&amp;long=4.85>
	Value Node: Literal("POINT(45.75 4.85)" = None, datatype=ns0:wktLiteral)
	Result Path: geo:hasSerialization
	Message: Value is not Literal with datatype rdfs:Literal

This is because the data graph says that:

@prefix ns0: <http://www.opengis.net/ont/geosparql#> .
ns0:asWKT "POINT(45.75 4.85)"^^ns0:wktLiteral

However, the ontology says that asWKT has range wktLiteral and it is a subproperty of hasSerialization, which has range rdfs:Literal (see extracts below). Additionally, the ontology defines wktLiteral as a Datatype (and according to spec Each instance of rdfs:Datatype is a subclass of rdfs:Literal). So, I dont really understand why the validation error?

@prefix geo: <http://www.opengis.net/ont/geosparql#> .
geo:asWKT rdf:type owl:DatatypeProperty ;
          rdfs:subPropertyOf geo:hasSerialization ;
          rdfs:domain geo:Geometry ;
          rdfs:range geo:wktLiteral ;
...
geo:hasSerialization rdf:type owl:DatatypeProperty ;
                     rdfs:domain geo:Geometry ;
                     rdfs:range rdfs:Literal ;
...
geo:wktLiteral rdf:type rdfs:Datatype ;
               rdfs:comment "A Well-known Text serialization of a geometry object."@en ;
@ashleysommer
Copy link
Collaborator

ashleysommer commented Feb 23, 2021

Hi @rapw3k
Yes, I know error reporting and error diagnostics and debugging is very difficult in PySHACL, especially when using the CLI tool. That is something we are aiming to fix in future versions.

For issue2:
This is a complex problem, but I think I know what is causing it.
The GeoSPARQL ontology (nor your ontology) does not define geo:wktLiteral to be a subclass of rdfs:Literal. I.e. there is no geo:wktLiteral rdfs:subClassOf rdfs:Literal triple in your datagraph when validating, even after RDFS expansion.

I know it looks like defining geo:hasSerialization rdfs:range rdfs:Literal should imply that any value of asWKT will receive rdfs:Literal as its datatype. But actually that will try to give the value the class of rdfs:Literal, in this case a Datatype and a Class are two different things, and you cannot add a class to a Literal. Eg, the RDFS inferencer cannot define:
"POINT(45.75 4.85)"^^geo:wktLiteral rdf:type rdfs:Literal, because a literal cannot be in the subject position of a triple.
And even if it could, this would not cause the sh:datatype constraint to pass because adding a class here does not change the datatype of the literal.

So after expanding the graph, the literal "POINT(45.75 4.85)"^^geo:wktLiteral will have the class rdfs:Literal, but it will have the datatype geo:wktLiteral, and geo:wktLiteral is not defined as a subclass of rdfs:Literal (and for reference, rdfs:Literal is itself not a datatype either).

I haven't tested it, but I think you can fix this problem by removing sh:datatype rdfs:Literal from shape <https://astrea.linkeddata.es/shapes#0f6502053608fc059a721676b6e67115>, because it already has sh:nodeKind sh:Literal which I think is what you want in this case.

@ashleysommer
Copy link
Collaborator

Looking again at your explanation of issue1:

?shapeUrl a sh:NodeShape ;
   sh:not ?propertyShape . 
?propertyShape a sh:PropertyShape ;
   sh:class ?disjointType .

Looks like this is a bug in Astrea's shape generation, I believe the correct output should use a NodeShape, like this:

?shapeUrl a sh:NodeShape ;
   sh:not ?shape2 . 
?shape2 a sh:NodeShape ;
   sh:class ?disjointType .

Is there a way we can submit a bug report to their software?

@rapw3k
Copy link
Author

rapw3k commented Feb 23, 2021

Thanks a lot @ashleysommer for all the information.
Regarding Astrea, I am in touch with the developers, I will point this issue to them.
Regarding the GeoSPARQL terms (we are just re-using them in our ontology - statement import :)) indeed removing sh:datatype rdfs:Literal from https://astrea.linkeddata.es/shapes#0f6502053608fc059a721676b6e67115 fixed the issue, thanks! I really like this validator.

But getting a bit into the reason behind. As you said the GeoSPARQL ontology does not define geo:wktLiteral to be a subclass of rdfs:Literal. I.e. there is no geo:wktLiteral rdfs:subClassOf rdfs:Literal triple ....
What I was pointing out is that, as according to spec https://www.w3.org/TR/rdf-schema/ this would be implicit (see below), right?
Nevertheless, i guess I can also point this situation (datatypes) to the SHACL generator (Astrea).

2.4 rdfs:Datatype
rdfs:Datatype is the class of datatypes. All instances of rdfs:Datatype correspond to the RDF model of a datatype described in the RDF Concepts specification [RDF11-CONCEPTS]. rdfs:Datatype is both an instance of and a subclass of rdfs:Class. **Each instance of rdfs:Datatype is a subclass of rdfs:Literal**.

@ashleysommer
Copy link
Collaborator

ashleysommer commented Mar 13, 2021

Hi @rapw3k
Sorry for taking so long to respond to your previous message.

Each instance of rdfs:Datatype is a subclass of rdfs:Literal

This is interesting, and I actually didn't know that. I wonder what that means for datatype validation in pySHACL.

For example,
If I have a Literal: "29e1"^^ex:myDataType and ex:myDataType rdf:type rdfs:Datatype (all datatypes are an instance of rdfs:DataType) and we know that rdfs:Datatype rdfs:subClassOf rdfs:Literal (Each instance of rdfs:Datatype is a subclass of rdfs:Literal) does that mean rdfs:Literal can be used to match ex:MyDataType in a datatype constraint?

@ashleysommer
Copy link
Collaborator

ashleysommer commented Mar 13, 2021

I think I can do:

  • sh:datatype rdfs:Literal -> Matches any RDF Literal, ie. acts the same as sh:NodeKind sh:Literal
  • sh:datatype rdfs:Datatype -> Matches any RDF Literal if it has a defined explicit datatype, ie "29e1"^^ex:MyDataType matches, but "29" doesn't. This one I'm not quite sure on, maybe all Literals are rdfs:Datatype implicitly even without a datatype specified.

I'll do some testing.

@ashleysommer
Copy link
Collaborator

Hi @rapw3k
Sorry for the delay on this. I've made the changed mentioned above, and I believe this issue reported ("issue 2" above) is resolved. Can you please test on PySHACL v0.17.1, and let me know if its fixes your specific issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants