diff --git a/Metadata_XSD/POD_v1_1.xsd b/Metadata_XSD/POD_v1_1.xsd new file mode 100644 index 00000000..ef48eaa7 --- /dev/null +++ b/Metadata_XSD/POD_v1_1.xsd @@ -0,0 +1,591 @@ + + + + + + Top element of a EDI catalog. This wrapper element is for XML [and + JSON] implementation only and is not defined in the Project Open Data standard. + + + + + + + Label: Metadata Type This attribute contains an IRI for the + JSON-LD data type. This should be dcat:Catalog for the Catalog. + + + + + + + + Label: Schema Version The field contains the URI that + identifies the version of the Project Open Data schema being used. + + + + + + Label: Data Dictionary The field contains URL for the + JSON Schema file that defines the schema used. + + + + + + + + + + + + + Label: Public Access Level / This field refers to the degree to which + this dataset could be made available to the public, regardless of whether it is + currently available to the public. For example, if a member of the public can walk + into your agency and obtain a dataset, that entry is public even if there are no + files online. A restricted public dataset is one only available under certain + conditions or to certain audiences (such as researchers who sign a waiver). A + non-public dataset is one that could never be made available to the public for + privacy, security, or other reasons as determined by your agency. + Must be one of the following: “public”, “restricted public”, + “non-public” + + + + + + + + + + + + Label: Access URL/ URL providing indirect access to a dataset, for + example via API or a graphical interface. + This should be the URL for an indirect means of accessing the data, + such as API documentation, a ‘wizard’ or other graphical interface which is used to + generate a download, feed, or a request form for the data. When accessLevel is + “restricted public” but the dataset is available online indirectly, this field + should be the URL that provides indirect access. This should not be a direct + download URL. It is usually assumed that accessURL is an HTML + webpage. + Required if the file is accessible indirectly, through means other + than direct download. + + + + + Label: Frequency / The frequency with which dataset is + published. + Accepted values: ISO 8601 Repeating Duration (or + irregular) + Must be an ISO 8601 repeating duration unless this is not possible + because the accrual periodicity is completely irregular, in which case the value + should simply be irregular. The value should not include a start or end date but + rather simply express the duration of time between data publishing. For example, a + dataset which is updated on an annual basis would be R/P1Y; every three months would + be R/P3M; weekly would be R/P1W; and daily would be R/P1D. Further examples and + documenation can be found here + [https://project-open-data.cio.gov/iso8601_guidance#accrualperiodicity]. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Label: Bureau Code / Represent each bureau responsible for the dataset + according to the codes found in OMB Circular A-11, Appendix C (PDF, CSV). Start with + the agency code, then a colon, then the bureau code. + Ex: The Office of the Solicitor (86) at the Department of the Interior + (010) would be: "010:86". If a second bureau was also responsible, the format like + this: "010:86","010:04". + + + + + + + + + + Label: Data Standard / URI used to identify a standardized + specification the dataset and/or distribution conforms to. + This is used to identify a standardized specification the + dataset/distribution conforms to. If this is a technical specification associated + with a particular serialization of a distribution, this should be specified with + conformsTo at the distribution level. It’s recommended that this be a URI that + serves as a unique identifier for the standard. The URI may or may not also be a URL + that provides documentation of the specification. + + + + + Label: Contact Point This is a container for two fields that together + make up the contact information for the dataset. contactPoint should always contain + both the person’s appropriately formatted full name (fn) and email + (hasEmail). + + + + + + + + + + + + Label:Dataset A container element for the array of Dataset objects. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Label: Metadata Type / IRI for the JSON-LD data type. This + should be dcat:Dataset for each Dataset. + + + + + + + + + + Label: Data Dictionary / URL to the data dictionary for the dataset or + distribution of dataset found at downloadURL. Note that documentation other than a + data dictionary can be referenced using Related Documents + (references). + This is used to specify a data dictionary or schema that defines + fields or column headings in the dataset or distribution of dataset found at + downloadURL. If this is a machine readable file, it’s recommended to be specified + with describedBy at the distribution level along with the associated + describedByType. At the dataset level it’s assumed to be a human readable HTML + webpage or PDF document. Documentation that is not specifically a data dictionary + belongs in “references” + + + + + Label: Data Dictionary Type / The machine-readable file format (IANA + Media Type [http://www.iana.org/assignments/media-types/media-types.xhtml] or MIME + Type [http://en.wikipedia.org/wiki/Internet_media_type]) of the distribution’s + describedBy URL. This is especially important if describedBy is a machine readable + file. + + + + + + Label: Data Quality / Indicates whether the dataset meets the agency’s + Information Quality Guidelines (true/false). + Must be a boolean value of true or false (not contained within quote + marks) + + + + + Label: Description / Human-readable description (e.g., an abstract) + with sufficient detail to enable a user to quickly understand whether the asset or + distribution is of interest. + This should be human-readable and understandable to an average + person. + + + + + + + + + + + + + + + + + Label: Metadata Type / IRI for the JSON-LD data type. This + should be dcat:Distribution for each Distribution. + + + + + + + + + Label: Download URL / URL providing direct access to a downloadable + file of a dataset. + This must be the direct download URL. Other means of accessing the + dataset should be expressed using accessURL. This should always be accompanied by + mediaType. + Required if the fie is available for public + download. + + + + + This should include included with hasEmail as part of a record’s + contactPoint (see above example). + Enter name Firstname LastName + + + + + + Label: Format / A human-readable description of the file format of a + distribution. + This should be a human-readable description of the file format of the + dataset, that provides useful information that might not be apparent from mediaType. + Note that API should always be used to distinguish web APIs. + + + + + This should be formatted per vCard specifications (see example below) + and included with fn as part of a record’s contactPoint . + + + + + + + + + + + + Label: Unique Identifier A unique identifier for the dataset or API as + maintained within an Agency catalog or database. + This field allows third parties to maintain a consistent record for + datasets even if title or URLs are updated. Agencies may integrate an existing + system for maintaining unique identifiers. Each identifier must be unique across the + agency’s catalog and remain fixed. It is highly recommended that a URI (preferably + an HTTP URL) be used to provide a globally unique identifier. Identifier URLs should + be designed and maintained to persist indefinitely regardless of whether the URL of + the resource itself changes. + + + + + + Label: Release Date / Date of formal issuance. + Accepted Values: ISO 8601 Date + Dates should be ISO 8601 [http://en.wikipedia.org/wiki/ISO_8601] of + least resolution. In other words, as much of YYYY-MM-DDThh:mm:ss.sTZD as is relevant + to this dataset. + + + + + + Label: Tags Tags (or keywords) help users discover your dataset; + please include terms that would be used by technical and non-technical + users. + + Avoid duplicate keywords in the same record. + + + + + + + + The item elment holds content that would be individual strings in "an arrary of strings" in a json + document. This is a convenience for the XML format. + + + + + Label: Homepage URL / This field is not intended for an agency’s + homepage (e.g. www.agency.gov), but rather if a dataset has a human-friendly hub or + landing page that users can be directed to for all resources tied to the + dataset. + + + + + + Label: Language / The language of the dataset. + This should adhere to the RFC 5646 + [http://tools.ietf.org/html/rfc5646standard]. This language subtag lookup provides a + good tool for checking and verifying language codes. A language tag is comprised of + either one or two parts, the language subtag (such as en for English, sp for + Spanish, wo for Wolof) and the regional subtag (such as US for United States, GB for + Great Britain, MX for Mexico), separated by a hyphen. Regional subtags should only + be provided when needed to distinguish a language tag from another one (such as + American vs. British English). + + + + + + + + + Label: License / The license or non-license (i.e. Public Domain) + status with which the dataset or API has been published. See Open Licenses + [https://project-open-data.cio.gov/open-licenses/] for more information. Required if + applicable. + See list of license-free declarations and licenses. + [https://project-open-data.cio.gov/license-examples/] + + + + + Label: Media Type / The machine-readable file format (IANA Media Type + [http://www.iana.org/assignments/media-types/media-types.xhtml] or MIME Type + [http://en.wikipedia.org/wiki/Internet_media_type]) of the distribution’s + downloadURL. + This must describe the exact files available at downloadURL using a + media type (IANA Media Type also known as MIME Type). For common Microsoft Office + files, see Office Open XML MIME types + [http://blogs.msdn.com/b/vsofficedeveloper/archive/2008/05/08/office-2007-open-xml-mime-types.aspx]. + + + + + Label: Last Update Most recent date on which the dataset was changed, + updated or modified. + Dates should be ISO 8601 of highest resolution. In other words, as + much of YYYY-MM-DD as is relevant to this dataset. If this file is brand-new, enter + the issued date here as well. If there is a need to reflect that the dataset is + continually updated, ISO 8601 formatting can account for this with repeating + intervals. For instance, R/P1D for daily, R/P2W for every two weeks, and R/PT5M for + every five minutes. + + + + + + + + + + + + + + The plaintext name of the entity publishing this + dataset. + + + + + Label: Primary IT Investment UII / For linking a dataset with an IT + Unique Investment Identifier (UII). + + + + + Provide an array of programs related to this data asset, from the + Federal Program Inventory. + + + + + + + + + + Label: Related Documents / Related documents such as technical + information about a dataset, developer documentation, etc. + Enclose each URL within each own tag. + + + + + + + The item url holds content that would be individual url strings in "an arrary of strings" in a json + document. This is a convenience for the XML format. + + + + + Label: Publisher This is a container for a publisher object which + groups together the fields: name and subOrganization. The subOrganization field can + also contain a publisher object which allows one to describe an organization’s + hierarchy. Where greater specificity is desired, include as many levels of publisher + as is useful, in ascending order, using the below format. + + + + + + + + + + The metadata type as defined by JSON-LD data types. This + should be org:Organization for each publisher + + + + + + + + label: Rights / This may include information regarding access or + restrictions based on privacy, security, or other policies. This should also serve + as an explanation for the selected “accessLevel” including instructions for how to + access a restricted file, if applicable, or explanation for why a “non-public” or + “restricted public” data asset is not “public,” if applicable. Text, 255 + characters. + Required if accessLevel is "restricted public" or + "non-public" + + + + + Label: Spatial / The range of spatial applicability of a dataset. + Could include a spatial region like a bounding box or a named + place. + This field should contain one of the following types of content: (1) a + bounding coordinate box for the dataset represented in latitude / longitude pairs + where the coordinates are specified in decimal degrees and in the order of: minimum + longitude, minimum latitude, maximum longitude, maximum latitude; (2) a latitude / + longitude pair (in decimal degrees) representing a point where the dataset is + relevant; (3) a geographic feature expressed in Geography Markup Language using the + Simple Features Profile [http://www.ogcnetwork.net/gml-sf]; or (4) a geographic + feature from the GeoNames database [http://www.geonames.org/]. + Required if dataset is spacial. + + + + + A parent organizational entity described using the same publisher + object fields. + + + + + Label: System of Records / If the system is designated as a system of + records under the Privacy Act of 1974, provide the URL to the System of Records + Notice (SORN) related to this dataset. + + + + + Label: Temporal / The range of temporal applicability of a dataset + (i.e., a start and end date of applicability for the data). + Accepted values: ISO 8601 Date + This field should contain an interval of time defined by start and end + dates. Dates should be formatted as pairs of {start datetime/end datetime} in the + ISO 8601 format [http://en.wikipedia.org/wiki/ISO_8601]. ISO 8601 specifies that + datetimes can be formatted in a number of ways, including a simple four-digit year + (eg. 2013) to a much more specific YYYY-MM-DDTHH:MM:SSZ, where the T specifies a + seperator between the date and time and time is expressed in 24 hour notation in the + UTC (Zulu) time zone. (e.g., 2011-02-14T12:00:00Z/2013-07-04T19:34:00Z). Use a + solidus (“/”) to separate start and end times. If there is a need to reflect that + the dataset is continually updated, ISO 8601 formatting can account for this with + repeating intervals [http://en.wikipedia.org/wiki/ISO_8601#Time_intervals]. For + instance, updated monthly starting in January 2010 and continuing through the + present would be represented as: R/2010-01/P1M. Updated every 5 minutes beginning on + February 15, 2010 would be represented as: R/2010-02-15/PT5M. + Required if applicable. + + + + + Label: Category / Main thematic category of the + dataset. + Separate multiple categories with a comma. + + + + + + + + Label: Title Human-readable name of the asset or distribution. Should + be in plain English and include sufficient detail to facilitate search and + discovery. + + + + + Label: Metadata Context This attribute contains the URL or JSON + object for the JSON-LD Context that defines the schema used. + + + + + Label: Metadata Catalog ID The attributed contains the IRI for the + JSON-LD Node Identifier of the Catalog. This should be the URL of the data.json file + itself. + + + diff --git a/Metadata_XSD/POD_v1_1.xsl b/Metadata_XSD/POD_v1_1.xsl new file mode 100644 index 00000000..967f6c21 --- /dev/null +++ b/Metadata_XSD/POD_v1_1.xsl @@ -0,0 +1,131 @@ + + + + + + + + + + + + { + + + "dataset":[ + + ] + } + + + + "@":"", + + + + "@":"" , + + + + { + + }, + + + + "":{ + + }, + + + + "":"mailto:" + + + + "":[ + + ], + + + + "" , + + + + + + "":{ + + }, + + + + "":{ + + } + + + + "":[ + "" + ], + + + + "":, + + + + "":[ + { + + }, + + ] , + + + + { + + }, + + + + + + + "":[ + + ], + + + + "":[ + + ], + + + + "", + + + + "":[ + + ], + + + + "@":"", + + + + "":"", + + + + \ No newline at end of file diff --git a/Metadata_XSD/README.md b/Metadata_XSD/README.md new file mode 100644 index 00000000..88e66414 --- /dev/null +++ b/Metadata_XSD/README.md @@ -0,0 +1,47 @@ +Notes: + +The following files can be found in this folder: +1. POD_v1_1.xsd + * This is an xml representation of the (federal) Project Open Data v1.1 json schema + + 2. dataGovSample.xml + * This is the content from the sample file (for the 'extended' POD fields) + https://project-open-data.cio.gov/v1.1/examples/catalog-sample-extended.json + expressed in xml valid to POD_v1_1.xsd + +3. POD_v1_1.xsl + * This is an xsl stylesheet that converts xml valid to POD_v1_1.xsd to json. + * This stylesheet was developed using xml that reproduced the information found in + the file https://project-open-data.cio.gov/v1.1/examples/catalog-sample-extended.json. + * It is possible that there are xml files that could be valid to the POD_v1_1.xsd for + which this stylesheet makes an incomplete or incorrect conversion. + + 4. dataGovSample.json + * This is the json file created from dataGovSample.xml using POD_v1_1.xsl. + +5. nistSample.xml + * This is a file containing two records that are present in NIST's data.gov json + + 6. nistSample.json + * This is the json file created from nistSample.xml using POD_v1_1.xsl. + + + +As of 26 June 2015: + +1. The files dataGovSample.xml and nistSample.xml are valid to POD_v1_1.xsd. + +2. The files dataGovSample.json and nistSample.json were created from their + similarly named xml files using POD_v1_1.xsl. + +3. The files dataGovSample.json and nistSample.json were validated against the + Federal v1.1 schema using the Project Open Data Dashboard. + +Note about testing the xml to json conversion: + * The xsl stysheet is declared as version="2.0" and has been mostly run + using Saxon 9x in the Oxygen environment. + * The stylesheet has been successfully tested (although not extensively) using + a declaration of version="1.0" and run under Saxon 6.5.5 in Oxygen. No testing + has been done with other parsers and/or transformation engines. + + \ No newline at end of file diff --git a/Metadata_XSD/README.txt b/Metadata_XSD/README.txt new file mode 100644 index 00000000..5ca3172b --- /dev/null +++ b/Metadata_XSD/README.txt @@ -0,0 +1,47 @@ +Notes: + +The following files can be found in this folder: +1. POD_v1_1.xsd + * This is an xml representation of the Federal v1.1 json schema + + 2. dataGovSample.xml + * This is the content from the sample file (for the 'extended' POD fields) + https://project-open-data.cio.gov/v1.1/examples/catalog-sample-extended.json + expressed in xml valid to POD_v1_1.xsd + +3. POD_v1_1.xsl + * This is an xsl stylesheet that converts xml valid to POD_v1_1.xsd to json. + * This stylesheet was developed using xml that reproduced the information found in + the file https://project-open-data.cio.gov/v1.1/examples/catalog-sample-extended.json. + * It is possible that there are xml files that could be valid to the POD_v1_1.xsd for + which this stylesheet makes an incomplete or incorrect conversion. + + 4. dataGovSample.json + * This is the json file created from dataGovSample.xml using POD_v1_1.xsl. + +5. nistSample.xml + * This is a file containing two records that are present in NIST's data.gov json + + 6. nistSample.json + * This is the json file created from nistSample.xml using POD_v1_1.xsl. + + + +As of 26 June 2015: + +1. The files dataGovSample.xml and nistSample.xml are valid to POD_v1_1.xsd. + +2. The files dataGovSample.json and nistSample.json were created from their + similarly named xml files using POD_v1_1.xsl. + +3. The files dataGovSample.json and nistSample.json were validated against the + Federal v1.1 schema using the Project Open Data Dashboard. + +Note about testing the xml to json conversion: + * The xsl stysheet is declared as version="2.0" and has been mostly run + using Saxon 9x in the Oxygen environment. + * The stylesheet has been successfully tested (although not extensively) using + a declaration of version="1.0" and run under Saxon 6.5.5 in Oxygen. No testing + has been done with other parsers and/or transformation engines. + + \ No newline at end of file diff --git a/Metadata_XSD/dataGovSample.json b/Metadata_XSD/dataGovSample.json new file mode 100644 index 00000000..445069fa --- /dev/null +++ b/Metadata_XSD/dataGovSample.json @@ -0,0 +1,97 @@ +{ + "@type": "dcat:Catalog", + "@context": "https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld", + "@id": "http://www.agency.gov/data.json", + "conformsTo": "https://project-open-data.cio.gov/v1.1/schema", + "describedBy": "https://project-open-data.cio.gov/v1.1/schema/catalog.json", + "dataset": [{ + "title": "U.S. Widget Manufacturing Statistics", + "description": "This dataset provides national statistics on the production of widgets", + "contactPoint": { + "@type": "vcard:Contact", + "fn": "Jane Doe", + "hasEmail": "mailto:jane.doe@agency.gov" + }, + "keyword": [ + "widget", + "manufacturing", + "factory" + ], + "modified": "2011-11-19", + "publisher": { + "@type": "org:Organization", + "name": "Widget Services", + "subOrganizationOf": { + "@type": "org:Organization", + "name": "Office of Citizen Services and Innovative Technologies", + "subOrganizationOf": { + "@type": "org:Organization", + "name": "General Services Administration", + "subOrganizationOf": { + "@type": "org:Organization", + "name": "U.S. Government" + } + } + } + }, + "accessLevel": "public", + "identifier": "http://dx.doi.org/10.7927/H4PZ56R2", + "bureauCode": ["018:10"], + "programCode": ["018:001"], + "accrualPeriodicty": "R/P1Y", + "conformsTo": "https://project-open-data.cio.gov/v1.1/schema", + "describedBy": "https://project-open-data.cio.gov/v1.1/schema/catalog.json", + "dataQuality": true, + "distribution": [ + { + "description": "Widgets data as a CSV file", + "downloadURL": "https://data.agency.gov/datasets/widgets-statistics/widgets.csv", + "format": "CSV", + "mediaType": "text/csv", + "title": "widgets.csv", + "@type": "dcat:Distribution" + }, + { + "description": "Widgets data as a zipped CSV file with attached data dictionary", + "downloadURL": "https://data.agency.gov/datasets/widgets-statistics/widgets-all.zip", + "format": "Zipped CSV", + "mediaType": "application/zip", + "title": "widgets-all.zip", + "@type": "dcat:Distribution" + }, + { + "conformsTo": "http://www.agency.gov/widget-data-standard/", + "describedBy": "http://www.agency.gov/widgets/schema.json", + "describedByType": "application/schema+json", + "description": "Widget data as a JSON feed", + "downloadURL": "http://www.agency.gov/feeds/widgets-all.json", + "format": "JSON", + "mediaType": "application/json", + "title": "widgets-all.json", + "@type": "dcat:Distribution" + }, + { + "accessURL": "https://data.agency.gov/api/widgets-statistics/", + "description": "A fully queryable REST API with JSON and XML output", + "format": "API", + "title": "Widgets REST API", + "@type": "dcat:Distribution" + } + ], + "issued": "2011-11-22", + "landingPage": "http://agency.gov/widgets/data", + "language": ["en-US"], + "license": "http://creativecommons.org/publicdomain/zero/1.0/", + "primaryITInvestmentUII": "021-006227212", + "references": [ + "http://agency.gov/docs/widgets-1.html", + "http://agency.gov/docs/widgets-2.html" + ], + "rights": "This dataset has been given an international public domain dedication for worldwide reuse", + "spatial": "United States", + "systemOfRecords": "http://www.agency.gov/widgets/sorn/", + "temporal": "2009-09-01T12:00:00Z/2010-05-31T12:00:00Z", + "theme": ["manufacturing"], + "@type": "dcat:Dataset" + }] +} \ No newline at end of file diff --git a/Metadata_XSD/dataGovSample.xml b/Metadata_XSD/dataGovSample.xml new file mode 100644 index 00000000..67ae4177 --- /dev/null +++ b/Metadata_XSD/dataGovSample.xml @@ -0,0 +1,103 @@ + + + dcat:Catalog + https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld + http://www.agency.gov/data.json + https://project-open-data.cio.gov/v1.1/schema + https://project-open-data.cio.gov/v1.1/schema/catalog.json + + U.S. Widget Manufacturing Statistics + This dataset provides national statistics on the production of widgets + + vcard:Contact + Jane Doe + jane.doe@agency.gov + + + + widget + manufacturing + factory + + 2011-11-19 + + + org:Organization + Widget Services + + org:Organization + Office of Citizen Services and Innovative Technologies + + org:Organization + General Services Administration + + org:Organization + U.S. Government + + + + + public + http://dx.doi.org/10.7927/H4PZ56R2 + 018:10 + 018:001 + R/P1Y + https://project-open-data.cio.gov/v1.1/schema + https://project-open-data.cio.gov/v1.1/schema/catalog.json + true + + Widgets data as a CSV file + https://data.agency.gov/datasets/widgets-statistics/widgets.csv + CSV + text/csv + widgets.csv + dcat:Distribution + + + Widgets data as a zipped CSV file with attached data dictionary + https://data.agency.gov/datasets/widgets-statistics/widgets-all.zip + Zipped CSV + application/zip + widgets-all.zip + dcat:Distribution + + + http://www.agency.gov/widget-data-standard/ + http://www.agency.gov/widgets/schema.json + application/schema+json + Widget data as a JSON feed + http://www.agency.gov/feeds/widgets-all.json + JSON + application/json + widgets-all.json + dcat:Distribution + + + https://data.agency.gov/api/widgets-statistics/ + A fully queryable REST API with JSON and XML output + API + Widgets REST API + dcat:Distribution + + 2011-11-22 + http://agency.gov/widgets/data + + en-US + + http://creativecommons.org/publicdomain/zero/1.0/ + 021-006227212 + + http://agency.gov/docs/widgets-1.html + http://agency.gov/docs/widgets-2.html + + This dataset has been given an international public domain dedication for worldwide reuse + United States + http://www.agency.gov/widgets/sorn/ + 2009-09-01T12:00:00Z/2010-05-31T12:00:00Z + + manufacturing + + dcat:Dataset + + diff --git a/Metadata_XSD/nistSample.json b/Metadata_XSD/nistSample.json new file mode 100644 index 00000000..d57dd137 --- /dev/null +++ b/Metadata_XSD/nistSample.json @@ -0,0 +1,116 @@ + + { + + "@type":"dcat:Catalog" , + "@context":"https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld", + + "@id":"http://www.agency.gov/data.json", + "conformsTo":"https://project-open-data.cio.gov/v1.1/schema", "describedBy":"https://project-open-data.cio.gov/v1.1/schema/catalog.json", + "dataset":[ + + { + "title":"NIST Chem-BLAST Gateway for PDB - SRD 155", "description":"Chemical Block Layered Alignment of Substructure or Chem-BLAST uses a method for finding chemical compounds within a large collection. In this method, all chemical compounds are annotated in terms of standard chemical structural fragments. These fragments are then organized into a data tree based on their chemical substructures. Search engines have been developed to use this data tree. These search engines use the Chem-BLAST technique to search on the fragments and look for their chemical structural neighbors. The technique was originally developed in the context of the HIV Structural database to enable a query on inhibitors of HIV protease. (See http://xpdb.nist.gov/hivsdb/hivsdb.html.) Recently the method has been significantly improved to extend to the ligands found in the Protein Data Bank (PDB). (See http://xpdb.nist.gov/chemblast/pdb.html.) The method establishes a tree-like relationship between the rings found in three-letter codes that denote ligands of the structures found in the PDB. Semantic Web relations are established between the structural scaffolds of the ligands and organizes them in an XML database utilizing the Webâ??s Resource Description Framework (RDF). An Adobe Flex-based interface is used to present this information on the Web. Plans are under way to extend this work to non-ring type scaffolds as well. Chem-BLAST has also been extended to structures in PubChem. http://xpdb.nist.gov/chemblast/pdb.pl which includes several non-ring standard reused groups such as sulfates. Efforts are under way to use the underlying principles of Chem-BLAST to enable query on non-structural data, such as cell image data http://xpdb.nist.gov/cell/image.pl.", + "contactPoint":{ + + "@type":"vcard:Contact" , "fn":"NIST Standard Reference Data", + "hasEmail":"mailto:data@nist.gov" + + }, + + "keyword":[ + + "Cell-image-data" , + "Chem-BLAST" , + "Enabling-scientific-linked-data-by-automation" , + "Federated-terms-building" , + "Global-data-integration-challenge-solutions" , + "Infrastructure-for-semantic-terminology" , + "Latin-like-root-terminology-for-science" , + "Machine-friendly-vocabulary" , + "On-demand-ontology-nuggets" , + "PDB-ligands" , + "PubChem-structures" , + "Re-used-nuggets-of-ontology" , + "Re-used-scalable-terminology" , + "Re-used-use-case-friendly-terminology" , + "Rule-based-linking-of-data" , + "Rule-based-structural-data-graphs" , + "Rule-based-vocabulary-building" , + "Sanskrit-like-root-terminology-for-science" , + "Structural-resource-for-drug-design" , + "Thermodynamic-data" + ], + "modified":"2003", + "publisher":{ + + "@type":"org:Organization" , "name":"National Institute of Standards and Technology" + }, + "accessLevel":"public", "identifier":"EBC9DB05EDEF5B0EE043065706812DF86", + "bureauCode":[ + "006:55" + ], + + "programCode":[ + "006:045" + ], + "describedBy":"http://xpdb.nist.gov/pdb_chem_blast/help.html", "landingPage":"http://www.ceramics.nist.gov/srd/scd/scdquery.htm", "language":[ + + "en-US" + ], "license":"http://www.nist.gov/data/license.cf","theme":[ + + "materials science" + ], + "@type":"dcat:Dataset" + }, + { + "title":"NIST Atlas of the Spectrum of a Platinum/Neon Hollow-Cathode Lamp in the Region 1130-4330 Å - SRD 112", "description":"The Atlas provides lists of spectral lines with wavelengths, intensities, and energy level classifications, as well as graphical tracings of the observed spectrum of a platinum hollow-cathode lamp containing neon carrier gas. The spectrum was recorded photographically and photoelectrically with a 10.7 m normal-incidence vacuum spectrograph. Wavelengths and intensities were determined for about 5600 lines in the region 1130 Å to 4330 Å.", + "contactPoint":{ + + "@type":"vcard:Contact" , "fn":"NIST Standard Reference Data", + "hasEmail":"mailto:data@nist.gov" + + }, + + "keyword":[ + + "Atomic" , + "Pt" , + "UV" , + "atlas" , + "calibration" , + "hollow cathode lamp" , + "intensities" , + "platinum" , + "spectrograph" , + "spectroscopy" , + "spectrum" , + "ultraviolet" , + "wavelength" + ], + "modified":"2003-07", + "publisher":{ + + "@type":"org:Organization" , "name":"National Institute of Standards and Technology" + }, + "accessLevel":"public", "identifier":"FDB5909746675200E043065706813E54111", + "bureauCode":[ + "006:55" + ], + + "programCode":[ + "006:045" + ], + "landingPage":"http://www.nist.gov/pml/data/platinum/index.cfm", "language":[ + + "en-US" + ], "license":"http://www.nist.gov/data/license.cfm", + "references":[ + + "http://nvlpubs.nist.gov/nistpubs/jres/097/1/V97-1.pdf", + "http://dx.doi.org/10.6028/jres.097.002" + ], + "@type":"dcat:Dataset" + } + ] + } + \ No newline at end of file diff --git a/Metadata_XSD/nistSample.xml b/Metadata_XSD/nistSample.xml new file mode 100644 index 00000000..6a3a213d --- /dev/null +++ b/Metadata_XSD/nistSample.xml @@ -0,0 +1,104 @@ + + + dcat:Catalog + https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld + http://www.agency.gov/data.json + https://project-open-data.cio.gov/v1.1/schema + https://project-open-data.cio.gov/v1.1/schema/catalog.json + + NIST Chem-BLAST Gateway for PDB - SRD 155 + Chemical Block Layered Alignment of Substructure or Chem-BLAST uses a method for finding chemical compounds within a large collection. In this method, all chemical compounds are annotated in terms of standard chemical structural fragments. These fragments are then organized into a data tree based on their chemical substructures. Search engines have been developed to use this data tree. These search engines use the Chem-BLAST technique to search on the fragments and look for their chemical structural neighbors. The technique was originally developed in the context of the HIV Structural database to enable a query on inhibitors of HIV protease. (See http://xpdb.nist.gov/hivsdb/hivsdb.html.) Recently the method has been significantly improved to extend to the ligands found in the Protein Data Bank (PDB). (See http://xpdb.nist.gov/chemblast/pdb.html.) The method establishes a tree-like relationship between the rings found in three-letter codes that denote ligands of the structures found in the PDB. Semantic Web relations are established between the structural scaffolds of the ligands and organizes them in an XML database utilizing the Webâ??s Resource Description Framework (RDF). An Adobe Flex-based interface is used to present this information on the Web. Plans are under way to extend this work to non-ring type scaffolds as well. Chem-BLAST has also been extended to structures in PubChem. http://xpdb.nist.gov/chemblast/pdb.pl which includes several non-ring standard reused groups such as sulfates. Efforts are under way to use the underlying principles of Chem-BLAST to enable query on non-structural data, such as cell image data http://xpdb.nist.gov/cell/image.pl. + + vcard:Contact + NIST Standard Reference Data + data@nist.gov + + + Cell-image-data + Chem-BLAST + Enabling-scientific-linked-data-by-automation + Federated-terms-building + Global-data-integration-challenge-solutions + Infrastructure-for-semantic-terminology + Latin-like-root-terminology-for-science + Machine-friendly-vocabulary + On-demand-ontology-nuggets + PDB-ligands + PubChem-structures + Re-used-nuggets-of-ontology + Re-used-scalable-terminology + Re-used-use-case-friendly-terminology + Rule-based-linking-of-data + Rule-based-structural-data-graphs + Rule-based-vocabulary-building + Sanskrit-like-root-terminology-for-science + Structural-resource-for-drug-design + Thermodynamic-data + + 2003 + + org:Organization + National Institute of Standards and Technology + + public + + EBC9DB05EDEF5B0EE043065706812DF86 + 006:55 + 006:045 + http://xpdb.nist.gov/pdb_chem_blast/help.html + http://www.ceramics.nist.gov/srd/scd/scdquery.htm + + en-US + + http://www.nist.gov/data/license.cf + + materials science + + dcat:Dataset + + + NIST Atlas of the Spectrum of a Platinum/Neon Hollow-Cathode Lamp in the Region 1130-4330 Å - SRD 112 + The Atlas provides lists of spectral lines with wavelengths, intensities, and energy level classifications, as well as graphical tracings of the observed spectrum of a platinum hollow-cathode lamp containing neon carrier gas. The spectrum was recorded photographically and photoelectrically with a 10.7 m normal-incidence vacuum spectrograph. Wavelengths and intensities were determined for about 5600 lines in the region 1130 Å to 4330 Å. + + vcard:Contact + NIST Standard Reference Data + data@nist.gov + + + Atomic + Pt + UV + atlas + calibration + hollow cathode lamp + intensities + platinum + spectrograph + spectroscopy + spectrum + ultraviolet + wavelength + + 2003-07 + + org:Organization + National Institute of Standards and Technology + + public + + + FDB5909746675200E043065706813E54111 + 006:55 + 006:045 + http://www.nist.gov/pml/data/platinum/index.cfm + en-US + http://www.nist.gov/data/license.cfm + + http://nvlpubs.nist.gov/nistpubs/jres/097/1/V97-1.pdf + http://dx.doi.org/10.6028/jres.097.002 + + dcat:Dataset + + +