Monday, September 12, 2005

Federal XML Work Group August 17 meeting, ET.gov, XML NDRG & GJXDM NDR

Jim Burch, Deputy Director of DOJ’s Bureau of Justice Assistance, said the prescription drug monitoring program was part of the effort to develop standards that would enable information sharing. He said the Office of Justice Programs(OJP) had invested heavily in GJXDM and faced increasing questions from the performance measuring community. Burch has asked OJP partners to keep in mind the importance of performance measurement.

He went on to describe the difficulty of establishing a basis for performance measurement. Is it reasonable to measure the number of information sharing projects? Should the OJP ask for outcome measurement? Burch said IJIS and the University of New Orleans were working to establish a basis for performance measurement.

Burch said the National Sex Offender Registry was an important milestone for XML and Service Oriented Architecture(SOA). The OJP had been given a sixty-day deadline and were able to come in two days early and under budget. The site received twenty-seven million hits the first two days with a peak rate of almost 1,000 hits per second. Burch said it showed the power of XML and SOA and demonstrated that it had been a worthwhile investment. He emphasized the Bureau of Justice Assistance is not the lone ranger and is looking for partners.

Here Owen Ambur said, “We will know that XML is a success, when we don't need to talk about it anymore." With reference to the proposed specification of Strategy Markup Language (StratML), he said strategic plans must be flexible and reactive. Otherwise, they will become shelfware.

He then said he wanted to say a few quick words about the CIO Council's emerging technology life cycle management process, hosted at ET.gov. In addition to StratML, among other emerging technology components identified on the ET.gov site is PDF/A, which is nearing approval as an ISO standard for archival records. PDF/A strips out dynamic features that cause records to lack integrity and be unreliable.

Ambur listed some of the other components recently identified at ET.gov. NCCS at the Urban Software Institute has proposed Uniform Data Elements and Definitions (UDED). Ambur emphasized that citizen-centric government cannot design federal stovepipes; local governments and private organizations must be engaged. Ambur himself submitted Collaboration Markup Language (CollabML) and, at the suggestion of his boss, has also proposed specification of a common enterprise architecture (EA) metamodel for government wide application.

He noted that IPv6 is not yet included in the FEA TRM, but since direction has been given for agencies to begin implementing it, it should be fast-tracked for inclusion in the TRM through the process the CIO Council plans to announce soon for updating and maintaining the FEA models. The ET.gov site is a potential channel for input into the FEA model maintenance process with respect to the TRM and SRM.

Ambur concluded his remarks by saying that ET.gov would only work if it is a place where government employees can go for useful information, and if they choose to use it to identify and discover emerging technologies around which they'd like to build communities of practice. He then introduced Mark Crawford, saying Crawford would discuss the Naming Design Rules & Guidelines (NDRG). Ambur explained they wanted to develop guidelines that would link to existing standards such as GJXDM. They are scheduled for an October delivery. He noted that the role of the xmlCoP is to identify good practices and make recommendations. It is not the role of the xmlCoP to try to issue mandates. Only OMB can issue rules that are binding on agencies.

Crawford began his presentation by saying they wanted to create a NDRG that was rigid enough to be normative but flexible enough for both data and document centric worlds. The purpose of the NDRG is to enable the development of a clearly defined namespace schema that will ensure consistency across federal agencies, a versioning schema that will support consistency in government schema, a federal canonical schema for base data types, Naming and Design Rules(NDRs) by government agencies or communities of practice that will build on the Federal NDRG, and a reference to use for mapping different agency NDRs to each other. The proposed NDRG must also enable the development of consistent, reusable XML components including: schema, schema modules such as reusable code lists and identifier lists, simple and complex types, elements, and attributes. It must provide tools to facilitate ease of development, validation and interoperability. The NDRG document is intended for use by all federal agencies and their contractors. It will be linked at core.gov. Audiences who have yet to review the NDRG include developers of federal enterprise schema, government organizations looking for guidance, agency level developers, and private sector organizations.

The NDRG sources are voluntary consensus standards bodies (Oasis Universal Business Language Technical Committee, UN/CEFACT) and government NDRs (Dept. of the Navy, Environmental Protection Agency, and Global Justice XML Data Model).

Here, Crawford showed a slide that could only have been prepared for a Washington, DC audience:

The key words MUST, MUST NOT, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL are to be interpreted as described in Internet Engineering Task Force (IETF) Request for Comments (RFC) 2119. Non-capitalized forms of these words are used in the regular English sense.


Then Crawford made a few remarks on the use of must versus should. He indicated the NDRG group thought it was important to lean towards should.

He went on to say that modularity is the key to reuse. The modularity model must be structured, flexible and consistent. There are three approaches to modularity under consideration: monolithic (single namespace, one schema per process, idea or information requirement, no imports), reuse (root schema with all content imported, unique namespace for each schema, common modules for reusables), unique (root schema with all content imported, unique namespace for each schema, unique schema modules for each data element).

Crawford said the consensus was going towards reuse. The Navy, IRS and others are all using this approach. GJXDM is generally considered to be an example of reuse, although some have argued it is monolithic in character. It seems that the unique approach was considered by GJXDM but ultimately rejected.

The modularity model imports data types from existing standards. UBL and other standards are converging with UN/CEFACT data type schema modules. Standards bodies are converging on a single approach to code lists. The NDRG group hopes to persuade code list owners to publish and make lists publicly available for further adoption.

Here, Crawford offered a series of data types: amount (xsd:decimal), binary object (xsd:base64binary), code (xsd:normalizedString), date time (xsd:date Time), identifier (xsd:normalizedString), indicator (xsd:boolean), measure (xsd:decimal), numeric (xsd:decimal), quantity (xsd:decimal), text (xsd:string).

He emphasized the importance of flexibility and using existing standards. Initially, they wanted to mandate hierarchical URN schemas, but the consensus was that the NRDG would propose that, while developers should use URN hierarchy schemas, they may use URL hierarchy schemas. (Note to developers; I would interpret this to mean that, if you use URL hierarchy schemas, you need a good reason. Otherwise, go with URN.)

Here, Owen Ambur interjected that this had been somewhat of a religious discussion, eventually leading to an agreement that, “We can get to heaven either way.”

Crawford resumed his remarks, reviewing the seven levels of namespace domains: NID of US, organization hierarchy (gov), specific government hierarchy (EPA, OMB, DOD, Treasury, etc.), agency level hierarchy (USN, USAF, IRS, FMS, etc.), resource type (schema or other, as identified), resource status, and resource name. Here, a participant asked if IPv6 might come into eight levels of namespaces. Crawford replied, “no”.

Crawford spoke about versioning, saying there was a consensus on:
name - major . non-zero [. revision ].

Minor versioning of namespaces, use of namespaces for schema location, and URN or URL for schema location remain to be determined. (Note - if you have any opinion about this, now is the time to speak up.)

Crawford began to talk about schema content, noting that schema was all about data. He showed a slide showing the transformation of data into XML and illustrating the role of ISO 11179 in this process. He explained that associated classes would be treated as a simple data element with a global element declaration and complex type definition.

Crawford concluded by saying that the NDRG group would continue to work through comments on the first draft and finalize modularity, versioning and namespaces. He then offered to take questions.

Brand Neiman suggested that ISO 11179 lacked sufficient semantics machine processability and that related efforts are evolving toward RDF and OWL. Neiman observed that XML is easier to read and harder to process, whereas RDF is harder to read and easier to process. Neiman suggested scheduling a joint xmlCoP/SICoP meeting to consider the respective use cases for XML and RDF/XML.

Crawford responded that he considered the two standards to be complementary, rather than competitive. Owen Ambur interjected that he agreed that it was not a question of either/or, but rather a need to look at the business case.

Bob Green asked if those “in the URL camp” were aware of the problems that can result, if the URL does not properly resolve. Green said, “whole system can choke”. He encouraged the use of URN.

Following the midmorning break, Paul Embley explained the work of the Global XML Task Force and the Global Justice XML Data Model and Design Rules. He began by saying that GJXDM is for use by anyone who needs it.

The current draft is still being vetted by Global XSTF. Their primary focus is rule accuracy against version 3.1; the final draft is due to release on October 31. Embley said they were two days behind schedule.

The GJXDM Design Rules incorporates GJXDM, Oasis LegalXML IJ TC GJXDM draft MNDR, Federal XML NDR Working Group Draft NDRG, and OASIS UBL NDR. The committee’s work was influenced by the NIEM Steering Committee, Federal Enterprise Architecture, the IJIS Institute, OASIS Integrated Justice TC, National Center for State Courts, and the Federal XML NDR Working Group. Here, Embley said his committee believes in plagiarism.

The specification will be a product of Global XSTF and will specify how GJXDM is actually defined. Its format will be as close as possible to the UBL NDR and use/copy appropriate wording from other NDR documents. It will include definitions, principles, rules, rationales and explanations, and examples for rules.

Embley emphasized the specification will not be a projection of UBL on GJXDM, nor a comparison of UBL and GJXDM, nor a methodology for building Information Exchange Package Documentation.

The scope of the specification of GJXDM 3.1 will focus on the definition of conformant schemas, conformant reference schemas, subsets, documentation, and GJXDM-conformant instances.

Principles are the basis for the rules; all rules spring from principles:
Format: [Principle number ]
Currently there are 22 principles.

General Rule Format.

Embley said versioning was not complete and threw the floor open for questions. He was asked how his work differed from the previous presentation. Embley allowed there were not great differences and that his committee would highlight those differences.

Minutes of the Federal NDR COI, August 11, 2005
XML Working Group, Meeting Notes, August 17, 2005
Presentations

No comments: