Schemas

Paul Prescod <paul@prescod.net> wrote:
> Oren Ben-Kiki wrote:
> > ... If people want to have a single
> > namespace for XHTML, they should first go and re-write the above
paragraph
> > in the XSchema draft. Right?
>
> What would you rewrite it to? Let's say you have three schemas for XHTML
> on your hard drive or on the web. Let's say that you have a document
> like this:
>
> <MYDOC>
> <P xmlns="http://www.w3.org/TR/XHTML">This is supposed to be loose.</P>
> <P xmlns="http://www.w3.org/TR/XHTML">This is supposed to be strict.</P>
> </MYDOC>
>
> How would you select the schema to use for each?

I'll try to formulate the outline of a general solution (you did ask :-).

<Outline>

1. Allow a document to declare "I was created with the intention of
satisfying the following schema". A schema identification is a MIME type. I
know what the chances of all that are, so we'd probably see schemas named by
URIs. It doesn't really matter for the rest of the points.

For the people who claim that a document should not declare its own
stylesheet/schema/whatever - One must specify _something_ that allows the
application to know what document type this is. This is known, today, as a
MIME type. Why mix things up by inventing another typing system?

How the document is associated with a MIME type isn't really within XML's
realm. But just like HTML had add a META tag, even though HTTP is expected
to provide the relevant data, I'd expect there to be a directive which would
specify the meta information of an XML document (<?META?>) - be it a MIME
type, an XSchema, or whatever.

For Davig Megginson, who wants the same document to live under more then one
XSchema - (i) this problem applies to MIME types in general, bring it up
there; (ii) application are free to ignore a the document's specified meta
data and use a different one (they often do, anyway).

2. All of this has nothing whatsoever to do with namespaces. The XSchema
itself refers to names - and hence, makes use of namespaces to uniquely
define these names. Period. There's no other restriction of special
relationship between schemas and names.

3. Actually fetching the schema is a tricky matter. I have some notions but
that's not a critical issue - there are several obvious ways to tackle it;
I'm not going to fuss over it unless something twisted is suggested :-)

4. Given that Schemas can export and import stuff, the problem of
"modularization" is already solved. If you want to mix and match MathML with
XHTML and YourML, simple create your own super-schema which refers to these
three, and you are done. The power you'll have mixing pieces from different
schemas would depend on only one thing - the strength of the schema language
itself. Making intelligent use of namespaces reduces the need for powerful
constructs in this language - but does not remove the need altogether. I
fully trust the XSchema WG to come up with an adequate set of constructs, if
they put their mind to it.

</Outline>

With regard to your sample document. The application won't to be able to
distinguish between two 'p' elements unless there was _something_ different
about them. This something might be anything at all expressible in the
XSchema language. For example, suppose I could say in an XSchema "if attr
'A' contains 'VALUE', the element obeys a rule from this schema, otherwise a
rule from that schema" then your document would look like:

<MYDOC>
<P xmlns="http://www.w3.org/TR/XHTML" type="loose">This is supposed to be
loose.</P>
<P xmlns="http://www.w3.org/TR/XHTML" type="strict">This is supposed to be
strict.</P>
</MYDOC>

A more typical example might be:

<MYDOC>
<xhtml:p xmlns="http://www.w3.org/TR/XHTML">
This is a paragraph with some
<math:equation>equation</math:equation>
in it.
</xhtml:p>
</MYDOC>

This would require that in an XSchema one would be able to say "the element
define in this schema may also contain the following attributes/elements".
Much simpler then your case, and demonstrates why namespaces are a great
help for mixing and matching.

<SideNote>
I fully agree with you on what should be done in XHTML for now, which is to
remove any reference to namespaces from the XHTML specs until this is
finalized in some recommendation. Only then, when we know how it is to be
done, should namespaces be added to the relevant drafts/recommendations.
Doing anything else is, IMVHO, irress^H^H^H^H^H^Hpremature.
</SideNote>

Share & Enjoy,

Oren Ben-Kiki

Schemas - XML-Data, DCD, DTD, SOX,

See a maintained catalog at http://www.schema.net/catalog/ that people are more than welcome to use.
Guide to schema at http://msdn.microsoft.com/xml/XMLGuide/schema-overview.asp
The catalog itself is at http://www.schema.net/public-text/catalog.soc - James Tauber

Conceptually, the various schemas they are not very different - xmlschema, XML-Data, SOX, DCD, and DDML are trying to replace DTDs with an instance syntax. That is, they define an XML language that does the same thing as DTDs: defines elements, attributes, and so on.

For the most common operations -- defining elements and attributes and assigning data types (integer, date, etc.) to attributes and PCDATA-only elements -- both are technically very similar. For more exotic operations -- inheritance, reusing definitions from other schemas, etc. -- they differ in both scope and technique. XML Schema offers more features than XML-Data Reduced (the part of XML-Data implemented by IE 5.0); it is not known how these will evolve in future drafts.

The most significant difference between the two is simply that XML-Data Reduced is implemented and XML Schema is not. However, many people seem to be waiting for XML Schema to become a recommendation before starting any schema work. When that happens, I suspect you will see a lot of software that uses schemas. I also suspect that XML-Data, as well as the other schema proposals, will largely fade away in favor of the standard.

Various groups submitted schema proposals to the W3C, and the a W3C working Group studied them and came up with one of their own (http://www.w3.org/TR/xmlschema-1/ and http://www.w3.org/TR/xmlschema-2/). One of the four proposals came out of the xml-dev mailing list, and the other three came from vendors; some of these vendors had enough invested in the schemas they worked on that they continue to use them because the W3C version won't be a Recommendation for a while. Microsoft was involved in the XML-Data one (and another one called DCD), and continues to use a simplified version of it called XML Data Reduced. While most of these vendors will probably switch over to the W3C schema once it's official, I won't make the call on Microsoft's plans.

To summarize: of the four schema proposals submitted to the W3C, Microsoft sort of supports one of the two that they worked on, but most people are waiting for the W3C-authored one before getting serious with schemas.

Bob DuCharme www.snee.com/bob see www.snee.com/bob/xmlann for "XML: The Annotated Specification" from Prentice Hall.

A fairly lengthy guide to what IE5 has in the way of schema support at http://biztalk.org/btSchemasGuide.asp
http://biztalk.org/btSchemasGuide.asp>

There is a page, also at the BizTalk site, containing links to the current W3C schema activity documents: http://biztalk.org/btSchemasGuidelines.asp - Andrew Layman

Simon St.Laurent wrote:
I'd say DTDs have at least another few years of active use in them. I suspect (though it isn't certain) that while they're less powerful than schemas, they involve less processing as well. For some situations, even 'non-legacy' ones, that may insure a very long life.

1. Use is driven in part by tools and in part by the functionality of tools (what do you need to do, what can you do that with).

2. Use is driven by the availability of tools and ease of obtaining them (can i get them when I need them, can i afford to get them).

3. Use is driven by what I know how to do or what I can learn in the time available (do i know i need to know, do i know where to go to find out).

It's a good idea to write down what you need in semi-formal, human readable prose first. For example, you might use a standard table with several fields to fill in, or even UML. Most of your target customers won't know DTD syntax or XML Schema, so you'll need to present the information in a form that they can understand.

Once you've done that (and it's easily 90 per cent of the work), actually writing a DTD, XML-Schema, or both is a matter of representing your ideal model within the limits of DTD and/or XML Schema capabilities.

Eve Maler and Jeanne El Andaloussi describe one possible business approach for designing abstract models in their book DEVELOPING SGML DTDS: FROM TEXT TO MODEL TO MARKUP (ISBN 0-13-309881-8); don't feel bound by the exact details of their process, but reading the book should give you a good idea of what you're up against. The really hard part is finding all of the requirements and getting buy-in from the target customers (this is what SGML/XML consultants generally spend most of their time doing).

When you're ready to actually write the DTD or other schema, you can take a look at my own book, STRUCTURING XML DOCUMENTS (ISBN 0-13-642299-3) to learn about some of the decisions and trade-offs involved in document type design. - David Megginson david@megginson.com http://www.megginson.com/

Maybe our XMLOutline. Converts any tab-indented structure to DTD and XML.Send me an example and let's see. John Hicks Cerium Component Software XMLOutline | XMLdb | XMLServlet http://ceriumworks.com "Software as a conversation with a community."

Data in the DOM tree can be changed and nodes added as required. The DOM tree can then be scanned and turned into XML data for transmission References - http://www.python.org/ have a XML addition which is very instructive though different for your requirements http://www.alphaworks.ibm.com/ offer source code and a C++ Builder example
XML4C

http://www.jclark.com/xml/xp/index.html has an SP parser and a Jade parser

Something that may be useful if you are going the C & DOM route is the dom_interface.h header file at http://www.sinica.edu.tw/~ricko/src/dom_interface.h or http://www.ascc.net/xml/en/utf-8/schemas.html

This file gives a C version of the DOM level 1 API. It is basically just a big struct of pointers to individual DOM
functions, which you have to provide. It gives the design of the interface, but not the implementation: you don't have to search through CORBA and W3C documentation as much to get to first base.

If you use this, it should make porting your program (e.g. if other DOM implementations in C also use the API, to C++ DOMs, or even across to Java) easier. Even if you do make a DTD to C struct converter, you can still have a DOM interface to it, to allow very generic operations such as navigation. Also, increasingly more programmers will be familiar with the DOM interface and concepts. You can figure out the efficiency tradeoffs in your own context. - Rick Jelliffe

Rick Jelliffe wrote:
> So does the recipient system look through the document and then
> request a schema server for the appropriate minimal schema to be
> generated and sent, or does the server already have a separate schema
> generated for each instance?

Mmm ... neither really. You only (mostly, but we'll keep it simple for
now) need the schema for the root node, and you can generate it
automatically (recursively). All I have done is written a routine that
'digs out' the schema definition for a node and all its possible
children and attributes. This means that I can request a schema at any
level of detail. The same principle is applied to my data. A few
scenarios might illustrate this:

1. A server or user requests an article from my XML server:

http://view.IED-IED.ied-support.net/documents ...
/article[@ArticleType="interview"]

and the following is returned (note the schema URL):

<Article
Title="This is an article"
ArticleType="interview"
xmlns="x-schema:http://view.ied-ied.ied-support.net ...
/schema/Article"
>
<ArticleText>
<Para>
Try visiting
<ExternalSite>this lovely site</ExternalSite>
. You'll like it.
</Para>
<Para>More text</Para>
</ArticleText>
</Article>

Now, when the parser follows the URL for the schema my server dishes up:

<Schema
xmlns="urn:schemas-microsoft-com:xml-data"
name="article"
xmlns:dt="urn:schemas-microsoft-com:datatypes"
>
<AttributeType name="Title"/>
<AttributeType
name="ArticleType"
dt:type="enumeration"
dt:values="Interview Article BookReview"
/>
<ElementType name="ExternalSite" content="textOnly"/>
<ElementType name="Para" content="mixed" order="many">
<element type="ExternalSite"/>
</ElementType>
<ElementType name="ArticleText" content="eltOnly">
<element type="Para" minOccurs="1" maxOccurs="*"/>
</ElementType>
<ElementType
name="Article"
content="eltOnly"
model="closed"
>
<attribute type="Title" required="no"/>
<attribute type="ArticleType" required="no"/>
<element type="ArticleText" minOccurs="1" maxOccurs="1"/>
</ElementType>
</Schema>

2. If I was to now request just the first paragraph of the article:

http://view.IED-IED.ied-support.net/documents ...
/article[@ArticleType="interview"]/*/para[1]

I would get back:

<Para
xmlns="x-schema:http://view.ied-ied.ied-support.net ...
/schema/Para"
>
Try visiting
<ExternalSite>this lovely site</ExternalSite>
. You'll like it.
</Para>

All my routine has to do to construct the URL for the schema, is to take
the general schema area for the same server the data is from and append
the name of the element type. This could obviously be modified so that
there is a separate 'schema server', for example if there was a
centralised repository like BizTalk, or whatever. Note that the returned
XML is not a 'fragment', as mentioned in previous emails, but correctly
formed documents. (I wrote a long piece ages ago about why I preferred
to think of XML documents as units of transfer, and distinct from
'documents' as we normally conceive them.) However, there are situations
where we do wrap this XML document in a fragment container, for example
if dealing with an editor when we would need to know where to put the
data back to if it had been changed.

Anyway, when the parser follows the URL for the schema all the server
need dish up now is:

<Schema
xmlns="urn:schemas-microsoft-com:xml-data"
name="para"
xmlns:dt="urn:schemas-microsoft-com:datatypes"
>
<ElementType name="ExternalSite" content="textOnly"/>
<ElementType name="Para" content="mixed" order="many">
<element type="ExternalSite"/>
</ElementType>
</Schema>

In other words, as I said in my previous contribution, why would you
bother delivering 100k of schema for a 1k document? The only reason
people are thinking that they would do this is because they are still
thinking that 1 document = 1 file. File systems are uncool, man ...
databases are where it's at daddy-o. (There's a Timothy Leary revival
this side of the water so I'm just practising.) For example, with my
database-driven approach I could easily extend the functionality to
allow:

http://view.ied-ied.ied-support.net/schema/1.2/Para

In previous discussions on DTDs versus XML approaches to schemas I have
argued that this ability to dynamically generate only enough of the
schema as you need, (and the ability to cope with namespaces, which I
haven't covered here) is my major reason for preferring XML schemas over
DTDs. I'd even go further and predict that it is one of the major reason
that XML schemas will win out over DTDs (the other is the ability to mix
schema).

Does this confuse or clarify the point, Rick? :-)

Best regards,

Mark Birbeck
http://www.iedigital.net/

Ttools for developing XML-data schema?

Extensibility's _XML Authority_ can export XML-Data, as well as DTDs, DDML, DCD, and SOX. It's reasonably cheap (miraculously cheap for a non-free product in this environment), supports data typing, and is pretty friendly.
See http://www.extensibility.com for more information.

Simon St.Laurent XML: A Primer / Building XML Applications

Inside XML DTDs: Scientific and Technical Sharing Bandwidth / Cookies
http://www.simonstl.com

SOX 2 documentation at http://www.w3.org/TR has a lot of very good ideas. Try to avoid the XML Schema draft
itself, until you have a good grip on what is going on: it is liable to be revised substantially, as many drafts are.
"The SGML & XML Cookbook", ISBN 0-13-614223-0, which I think is very useful. I also have a series of
articles on experimental schema ideas at http://www.ascc.net/xml/en/utf-8/schema.html Rick Jelliffe

Verbosity is a disproportion of the schema w.r.t. the data or its use; if you are passing the same kind of data between servers all night, the verbosity question does not arise.

> If one server sends:
>
><statusReport>
> <time>1201</time>
> <station>123</station>
> <status>56</status>
></statusReport>
>
>why bother sending more schema info than the name of the root document >and the two children that it has?

But so could a DTD. I could have <!DOCTYPE statusReport SYSTEM

http://www.ricko.com.zx/dtd/statusreport?gi=statusReport+time+station+status ">
and have the DTD generated at the server. The DTD could be generated from data marked up in instance syntax, for ease of implementation, if you like! The server that generates the data can also generate the correct URI.

>For this same reason, I must say I am surprised that fans of XML can be looking to use non-XML syntaxes to define any type of data, unless totally unavoidable/impractical.

DTDs are XML syntax.

Norm Walsh's xml.com article "Schemas for XML" at http://www.xml.com/pub/1999/07/schemas/index.html.

There is an XML editor called "XML Spy" (http://www.xmlspy.com) which can find the validation error. It uses exactly MSXML parser.

See my presentations on schemas on: http://www.informatik.tu-darmstadt.de/DVS1/staff/bourret/bourret.htm -- Ron Bourret

try the xlxp-dev list - archives at > http://www.fsc.fujitsu.com/hybrick/xlxp-dev/maillist.html.

You can send comments on the XLink spec to: www-xml-linking-comments@w3.org

and comments on the XML Schemas spec to: www-xml-schema-comments@w3.org

DTD The same information can in XML be specified using mostly elements, or using mostly parameters. Example with parameters:
>
> <!ELEMENT service EMPTY>
> <!ATTLIST service
> name CDATA #REQUIRED
> language CDATA (en|de|fr)'en'
> description CDATA #IMPLIED
> uri CDATA #REQUIRED
> >
>
> Similar data with elements:
>
> <!ELEMENT service (name,language,description?,uri)>
> <!ELEMENT name (#PCDATA)>
> <!ELEMENT language (#PCDATA)>
> <!ELEMENT description (#PCDATA)>
> <!ELEMENT URI (#PCDATA)>
>
> Are there any guidelines on where each format is most suitable?
This issue is covered in: http://www.oasis-open.org/cover/elementsAndAttrs.html -- Fredrik Lindgren -- Upright Engineering AB

I understand the logic behind the !ATTLIST tag but I'm unfamiliar with the e-dtype and the e-dSize references. Can any one shed some light on this for me?

Actually, it turns out that Extensibility posted this publicly, and if anyone else is curious about how they're doing data typing within XML 1.0, see: http://www.extensibility.com/best/bestdataproc.htm --
Simon St.Laurent
XML: A Primer (2nd Ed - September)
Building XML Applications
Inside XML DTDs: Scientific and Technical
Sharing Bandwidth / Cookies
http://www.simonstl.com

XML Schemas are still numerous and ill-defined, but Microsoft is charging ahead with an
EDI flavor of XML (see "http://www.biztalk.org"). For bibliographic metadata, you might wish to look at OCLC's Dublin Core effort (see "http://www.oclc.org/oclc/research/projects/core/oldindex.htm" or go to "http://www.oclc.org" and search on "Dublin Code" for more links).