User's Guide to ISO/IEC 15445:2000 HyperText Markup Language (HTML)

Abrahamson   -   Price

Japanese translation

Second edition 2003-04-08 incorporating TC1,
for ISO/IEC 15445:2000 first edition 2000-05-15,
corrected version 2003-06-01.


Contents

Foreword

ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.

International Standards are drafted in accordance with the rules given in ISO/IEC Directives, Part 3.

In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1. Draft International Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as an International Standard requires approval by at least 75% of the national bodies casting a vote.

Attention is drawn to the possibility that some of the elements of this International Standard may be the subject of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.

International Standard ISO/IEC 15445 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee SC34, Document description languages. JTC1/SC34 has worked on this project in close cooperation with the World Wide Web Consortium. This International Standard makes normative reference to the W3C Recommendation for HTML 4.01.

Annexes A and B form a normative part of this International Standard.

This corrected version of the International Standard includes normative technical changes, altering the requirements and recommendations for the use of the W3C Recommendation for HTML 4.01, and extends the support for accessibility to the World Wide Web. The changes result in part from practical experience with the language defined by this International Standard and in part from the World Wide Web Consortium's adoption of HTML 4.01 as the reference specification for the HTML 4 language. Details of the changes are provided in the Supplement to the corrected version of ISO/IEC 15445:2000.

Foreword to the User's Guide

In November 1996 we were authorized to act as the project editors of the ISO/IEC International Standard 15445:2000 for HTML, informally known as "ISO-HTML". The formal specification, published on May 15th 2000, that we developed is intended for SGML experts who are familiar with the SGML family of International Standards, and as such is challenging to read. However we wanted the standard to be accessible to readers who do not spend their time working on SGML standards. This User's Guide is intended to encourage and assist people wishing to develop high quality IT applications on the World Wide Web and set high standards of document design and management. We assume a familiarity with the W3C Recommendation for HTML 4.01, but the reader is not expected to be an expert in SGML.

We have received help and encouragement from many people during the development of the International Standard and this User's Guide. Many members of the IETF HTML Working Group commented on the early strawman which led to the formal introduction of the ISO-HTML project. We also received assistance from the staff of the World Wide Web Consortium (W3C) and from people in W3C member organizations. We have worked in close cooperation with the W3C Working Group which developed the HTML Recommendation, and at the invitation of the W3C, we have taken the W3C Recommendation for HTML 4.01 as a referenced text. The ISO/IEC Working Group responsible for the SGML family of standards have provided us with direction, and encouraged and supported our close liaison with the W3C. We have also received help directly from members of National Bodies and from members of the public commenting in general mailing lists. A special word of thanks and appreciation is due to Dave Raggett who accepted an invitation to act as Invited Expert at the Dublin meeting held in July 1997 which established the principle of technically harmonized text, and made ISO-HTML a true subset of the W3C HTML specification.

This Guide is not a formal document, neither is it intended as a reference specification, and it is not appropriate to cite it as such. However, if you the reader find it useful, then we will have met our objectives.

David M. Abrahamson
Trinity College Dublin.
d a v i d at c s dot t c d dot i e

Roger Price
University of Massachusetts Lowell.
r p r i c e at c s dot u m l dot e d u

Foreword to second edition of the User's Guide

This second edition of the User's Guide, and the International Standard, are now generated from the same source file using ISO 8879 based technology, thus simplifying maintenance and ensuring technical alignment. The common source file is marked up using the Pre-HTML DTD specified in this User's Guide and then transformed to conforming instances of ISO/IEC 15445 using the technique described in the chapter "Document preparation".

The additional material added by the Guide is marked up with the attribute class="UG". A W3C CSS2 style sheet associates class "UG" with the style of this paragraph. The text introduced by Technical Corrigendum 1 is highlighted in this style. If there is to be a Technical Correndum 2, it will be highlighted in this style, and so on...

Acknowledgements

We would like to thank Russell O'Connor, Michael Huang, Nicolas Lesbats and Edward Welbourne for their helpful comments and suggestions.

Copyright notice

Copyright © 2000-2005 Roger Price, David Abrahamson. All Rights Reserved.

This guide is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

The names of the copyright holders may NOT be used in advertising or publicity pertaining to this document or its contents without specific, written prior permission.

This document describes ISO/IEC 15445:2000 which is subject to IETF, W3C (MIT, Inria, Keio) and ISO/IEC copyright. Because the U.S. Department of Energy, has supported the development of International Standards by JTC1/SC34 (under contract DE-AC05-84OR21400), it makes the following assertion about the International Standard:

The U.S. Government retains a paid-up, nonexclusive, irrevocable, world-wide license to publish or reproduce the published form of these documents, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, or to allow others to do so, for U.S. Government purposes.

Introduction

The HyperText Markup Language (HTML) is an application of the International Standard ISO 8879 -- Standard Generalized Markup Language (SGML). It provides a simple way of structuring hypertext documents and of placing references in one document which point to another. These references, called "links", may be presented to readers of a document in such a way that a simple "click" summons the other document, which is then presented to the reader. The reader has the impression of moving from one document to another. This simple user interface has been wildly successful and as a result the World Wide Web, the "web", has become extremely popular.

In the frenzy of the growth, much of the discipline and good practice of the mature SGML world has been lost, and browser developers have added additional features to the markup language such as new tags and new semantics for tags. As a result, many documents have been created which can only be rendered faithfully on a limited number of browsers. Common web practice is to hide any syntactic problems detected by the browsers and thus the reader is not aware that a page being browsed is not always faithful to the original authored document.

The International Standard was developed in an effort to ensure that it will remain possible for an author to produce simple hypertext for the web and be confident that a conforming browser will be able to render the document faithfully. ISO/IEC 15445 represents a core of the language to be supported by all conforming browsers, authoring and validating systems. This International Standard is a refinement of the World Wide Web Consortium's (W3C's) Recommendation for HTML 4.0: it provides further rules to condition and refine the use of the W3C Recommendation in a way which emphasizes the use of stable and mature features, and represents accepted SGML practice. Documents which conform to this International Standard also conform to the strict DTD provided by the W3C Recommendation for HTML 4.01.

ISO-HTML omits all deprecated features of the language, features whose role is purely cosmetic, and features which are still unstable or immature. This has been done in preparation for the expected wide adoption of style sheets by authors and browser manufacturers. Certain optional facilities such as markup omission of the document and other major elements have been removed to produce more robust texts in keeping with recognized good SGML practice. This does not reduce in any way the expressive power of the language.

This International Standard makes a clear and important distinction between conforming systems and validating systems. A conforming system operates correctly when handling documents which conform to this International Standard, but is not required to operate correctly when the documents do not conform. A validating system is more powerful: it detects all SGML and HTML errors in a document, and must be able to certify that a document is valid ISO-HTML. Frequently browsers are conforming systems whereas authoring tools should check for validity. Authoring tools which issue broken, non conforming pages are a major cause of the low quality of many sites.

NOTE: A conforming system is not sufficient to validate an ISO-HTML document. A validating system is required.

This International Standard does not define error handling procedures for user agents: It emphasises validation at source rather than error handling at the destination.

A minimal ISO-HTML document has the form:

<!DOCTYPE HTML PUBLIC "ISO/IEC 15445:2000//DTD HTML//EN">
<HTML>
<HEAD>
<TITLE>Les unit&eacute;s de base</TITLE>

... other head elements ...

</HEAD>
<BODY>
<P>La seconde...

... remainder of document body ...

</BODY>
</HTML>

This User's Guide follows the convention of presenting element and attribute names in upper case, although there is no formal requirement for the practice.

NOTE: ISO-HTML is an application of SGML and the SGML declaration used calls for upper case folding of all names except entity names. (XHTMLTM is an application of XMLTM and has an SGML declaration which does not call for upper case folding, ie. XHTML names are case sensitive whereas names in ISO-HTML and the W3C Recommendation for HTML 4.01 are not.)

In order to support world wide use of the markup language, the internationalization facilities specified by the IETF in RFC2070 have been included in the International Standard. It is recognised that full compliance to RFC2070 will be progressive and the conformance clause allows for progressive compliance to the use of ISO 10646.

References to the W3C Recommendation

To facilitate the use of this User's Guide, frequent references are provided to the W3C Recommendation for HTML 4.01. These take two forms: hyperlinks to the W3C's electronic sources, and clause number references to the W3C's printed version of their specification in the style [W3C 12.3].

Related work

The World Wide Web Consortium have prepared a Recommendation for XHTMLTM which recasts the W3C Recommendation for HTML 4.01 as an application of XML.

1   Scope

The scope of this International Standard is a conforming application of ISO 8879, SGML. This International Standard describes the way in which the HTML language specified by the following clauses in the W3C Recommendation for HTML 4.01 shall be used, and does so by identifying all the differences between the HTML language specified by the W3C Recommendation for HTML 4.01 and the HTML language defined by this International Standard:

The scope excludes any material in the W3C Recommendation for HTML 4.01 not listed in this clause. It also excludes any standardization of models, services, systems, protocols or applications which are likely to make use of the ISO-HTML language. ISO-HTML does not define the "look and feel" of any conforming product, and provides only sufficient semantics to allow a reader who is familiar with the W3C Recommendation for HTML 4.01 to have an intuitive idea of the requirement.

2   Conformance

This International Standard distinguishes between conforming documents, validating systems, conforming systems and character set conformance.

The distinction between validating systems and conforming systems is very important.

A validating system is one that is able to verify that the document it is processing contains correct HTML. If the document is correct, the validator certifies it as such; if not, the validator identifies the errors. The notion of validation is currently poorly defined on the World Wide Web and many authors assume wrongly that their browser may be used to check out the pages they write.

I tried it with my browser and it worked!

is an all too common mistake, and is the source of many errors and broken pages.

ISO-HTML insists that a validator requires an SGML parser since ISO-HTML makes full use of the underlying SGML language. Conforming systems do not require an SGML parser since they merely promise to operate correctly provided that the documents they process are already validated as conforming to ISO-HTML.

NOTE: It is possible for a system that is simply "conforming" to identify many errors in an invalid document, and notification of such errors could be of value to a user, but it is not "validating" unless it can detect all errors.

2.1   Conforming documents

A document which conforms to this International Standard shall

  1. Be a conforming HTML document consisting of a required document type declaration, followed by a single document instance, contained within an <HTML> [W3C 7.3] document element. The document type declaration may be surrounded by white space consisting of RS, RE, SPACE, TAB and HTML comments. The document instance may also be followed by such white space.
  2. Meet the requirements of this International Standard.

In other words, to be a conforming document, documents are required to have the following structure:

  1. Optional white space.

    "White space" is the term used by programmers for the characters between tokens, even if the style sheet makes them appear in some other colour. We will use the common term.

  2. A required document type declaration.
  3. Optional white space.
  4. A single document instance, contained in an <HTML> [W3C 7.3] document element.
  5. Optional white space.

White space consists of the SGML-defined characters RS (record start), RE (record end), SEPCHAR (tab) and SPACE [8879 9.2.1 figure 2], and ISO-HTML comments.

2.2   Validating systems

An HTML system is a validating HTML system if

  1. It is a validating SGML parser as defined by ISO 8879 subclause 15.4; and
  2. It is able to process any conforming HTML document; and
  3. It finds and reports an HTML error if one exits; and
  4. It does not report an HTML error where none exists.

The International Standard does not say how the validation system is to report the errors: whether this is "one at a time" or "all at once" is left to the implementor. The SP parser provides the -E option with which the user may specify a maximum number of error messages to be displayed. This is useful for checking pages of possibly very low quality.

NOTE: This requires more than a validating SGML parser is able to offer, nevertheless a validation by an SGML parser is an essential first step. Some of the ISO-HTML errors a validating system is required to detect cannot be detected by an SGML parser, and require further processing.

2.2.1   Documentation of validating systems

Validating systems are required by ISO-HTML to display a text identifying them clearly as validating systems.

Validating systems conforming to this International Standard shall display the following identification text prominently and in the national language of the documentation:

  1. In a prominent location in the front matter of publications (normally the title and cover pages),
  2. On identifying displays of programs, (presumably the introductory page, not all pages)
  3. In promotional and training material.

The HTML validating system identification text is:

An HTML validating system conforming to International Standard ISO/IEC 15445—HyperText Markup Language, and International Standard ISO 8879—Standard Generalized Markup Language (SGML).

NOTE: The validating system identification text is copyrighted by the ISO/IEC, but may be used without further permission or further reference to the ISO/IEC.

NOTE: Neither the ISO nor the IEC provide a certification service, nor do they provide an icon to indentify validating or conforming systems. The ISO and IEC icons are copyrighted and cannot be used without the permission of those organisations. The International Standard gives permission to use the identification text but not the icon.

2.3   Conforming systems

A conforming HTML system is an HTML system which is able to process all documents conforming to this standard.

The International Standard says nothing about error handling or the processing of non-conforming documents. The basic creed is that in a high quality web application, all documents are validated as conforming before publication, and that conforming documents are sent to conforming user agents to obtain correct results.

Nevertheless, a prudent implementor of a program which is just a conforming system would be wise to guard against broken HTML, perhaps maliciously fed to the program in an attempt to provoke a buffer overrun and defeat security mechanisms.

2.3.1   Documentation of conforming HTML systems

The documentation of conforming systems in much the same way as validating systems. The only difference is the identifying text itself. It is important that the documentation not claim or suggest that a conforming system may be used to validate ISO-HTML documents.

Conforming systems shall display the following identification text prominently and in the national language of the documentation:

  1. In a prominent location in the front matter of publications (normally the title and cover pages),
  2. On identifying displays of programs,
  3. In promotional and training material.

The HTML conforming system identification text is:

An HTML system conforming to International Standard ISO/IEC 15445—HyperText Markup Language.

The documentation shall not claim or imply that the system may be used to validate HTML documents.

2.4   Character set conformance

The SGML declaration provided with this International Standard calls for the use of ISO/IEC 10646 Universal Multiple-Octet Coded Character Set (UCS). ISO/IEC 10646 specifies a large number of facilities from which different selections may be made to suit individual applications. ISO/IEC 10646 is potentially very large and although the described character set portion identified by the DESCSET keyword [8879 13.1.1.2] calls for the whole character set, the International Standard does not require that it is fully implemented in any user agent. As a result it is only practicable to envisage limited conformance to ISO/IEC 10646 as defined in this subclause.

ISO-HTML takes the same approach as was taken by ISO 2022, and this subclause is based on ISO 2022 clause 3.

Under limited conformance, the following is required:

  1. When the characters described by ISO/IEC 10646 are used, they shall be implemented with the meanings and coded representation specified in ISO/IEC 10646.
  2. If a server is unable to express a document using the character set supported by the user agent, it should instead deliver a document in a limited character set such as ISO/IEC 646 (often called ASCII) and explain the problem to the user agent.

    NOTE: The International Standard does not say how the problem is to be explained. This is left entirely to the implementor to decide. Neither does the International Standard discuss any negotiation that might be done, or the operation of the HTTP protocol.

  3. Code positions that are either reserved for registration or reserved for future standardization shall not be used.
  4. No registered escape sequence shall be used with a meaning other than that defined by ISO/IEC 10646.

The UTF-1 transformation format of ISO/IEC 10646, registered by IANA as ISO-10646-UTF-1, has been removed from ISO/IEC 10646 and should not be used.

3   Normative references

The following normative documents contain provisions which, through reference in this text, constitute provisions of this International Standard. For dated references, subsequent amendments to, or revisions of, any of these publications do not apply. However, parties to agreements based in this International Standard are encouraged to investigate the possibility of applying the most recent editions of the normative documents indicated below. For undated references, the latest edition of the normative document referred to applies. Members of IEC and ISO maintain registers of currently valid International Standards.

NOTE: In an ISO/IEC specification, a normative reference has the effect of including all the provisions of the referenced text into the referencing text. The W3C Recommendation itself contains normative references, but it is implicit that the effect is not one of "total normative inclusion". The W3C normative references appear to be closer in spirit to ISO/IEC informative references defining good practice, and we recommend that they should be treated as such.

This International Standard refers normatively to:

4   Definitions

For the purposes of this International Standard, the definitions given in ISO 8879:1986 and the following definitions apply:

4.1   Browser
A user agent whose main function is to present documents to a user.
4.2   Character
(Source: RFC1866) An atom of information, for example a letter or a digit. Graphic characters have associated glyphs, whereas control characters have associated processing semantics.
4.3   Character encoding scheme
(Source: RFC1866) A function whose domain is the set of sequences of octets, and whose range is the set of sequences of characters from a character repertoire; that is, a sequence of octets and a character encoding scheme determining a sequence of characters.
4.4   Character repertoire
(Source: RFC1866) A finite set of characters; eg. the range of a coded character set.
4.5   Code position
(Source: RFC1866) An integer in the domain of a coded character set. The coded character set maps the code position to a character.
4.6   Coded character set
(Source: RFC1866) A function whose domain is a subset of the integers and whose range is a character repertoire; that is, for some set of integers (usually of the form {0, 1, 2, ..., N-1}), a coded character set and an integer in that set determine a character. Conversely, a character and a coded character set determine the character's code position (or, in rare cases, a few code positions).
4.7   CRLF
(Source: RFC1521) The sequence of the two ISO/IEC 646:1991 characters CR (13) and LF (10) which, taken together, in this order, denote a line break.
4.8   Form data set
(Source: RFC1866) A sequence of name/value pairs; the names given by an HTML document and the values given by the user.
4.9   Fragment identifier
(Source: RFC1866) The portion of an HREF attribute value following the `#' character.
4.10   HTML browser
A browser which presents HTML documents.
4.11   HTML document
A document structured in accordance with this International Standard.
4.12   Hyperlink
A relationship between two anchors, called the source and the target. The link goes from the source to the target. The source is also known as the tail, and the target is also known as the destination or head.
4.13   User Agent (in the World Wide Web)
A software or hardware device which accepts user input and presents to the user the World Wide Web's interpretation of that input.

All the definitions of SGML are incorporated into ISO-HTML.

The multiple definitions and techniques for the representation of characters may be the source of confusion. The following figure shows some of the ideas involved. It is based on the character set defined by ISO 8859-1:1987 "8-bit single-byte coded graphic character sets", Part 1: Latin alphabet No. 1.

Relationship of character name to glyph, numerical value and ISOlat1 entity   [D]

Figure 1: Illustration of some character representation definitions.

5   Symbols and abbreviated terms

The following symbols and abbreviated terms are used in this International Standard:

5.1   HTML
HyperText Markup Language. Pertaining to this standard.
5.2   HTML 4.0
A Recommendation for the HyperText Markup Language developed by the World Wide Web Consortium.
5.3   HTTP
IETF RFC2068 Hypertext Transfer Protocol.
5.4   IANA
Internet Assigned Numbers Authority.
5.5   IETF
Internet Engineering Task Force.
5.6   RFC
Request for Comments. An Internet Engineering Task Force specification.
5.7   SGML
Standard Generalized Markup Language. Notation provided through use of ISO 8879.
5.8   URI
Uniform Resource Identifier as defined by RFC2396.
5.9   URL
Uniform Resource Locator as defined by RFC2396.
5.10   WWW
World Wide Web.
5.11   W3C
World Wide Web Consortium, founded in 1994 to develop common standards for the evolution of the World Wide Web. It is an industry consortium, hosted by the Massachusetts Institute of Technology Laboratory for Computer Science (MIT/LCS) in the United States, the Institut National de Recherche en Informatique et en Automatique (Inria) in Europe and the Keio University Shonan Fujisawa Campus in Asia.

6   Requirements

This International Standard has been designed to satisfy the following requirements:

The International Standard states the requirements it had to meet in terms of the relationship between ISO-HTML and SGML and the need for ISO-HTML to be viewable with browsers which conform to the W3C Recommendation for HTML 4.01.

The underlying requirements were to:

  1. Provide a stable core for the W3C Recommendation.
  2. Distinguish between "conforming" and "validating" systems, and provide a basis for contracts which make reference to HTML. For example an organisation which receives HTML documentation from a subcontractor might wish to establish formal acceptance criteria as part of an ISO 9000 based quality plan. The acceptance criteria could be based on an ISO-HTML validator as defined by the International Standard.
  3. Reinforce the W3C text. The W3C Recommendation for HTML 4.01 sometimes "recommends" or "deprecates" practices which we believe should be required or forbidden, but the requirement cannot be made normative by the W3C because of the need for backward compatibility. There is no backward compatibility requirement for ISO-HTML and such requirements can be made normative. An example of such a practice is the required structuring of sections and subsections.
  4. Encourage good SGML practice, such as the use of the id attribute rather than the name attribute. This allows an SGML parser to check that the value is unique.
  5. Facilitate the use of HTML in situations, such as government procurement, where the use of ISO/IEC International Standards is required.
  6. Reinforce the practice of separating content and style by excluding all elements such as <FONT> [W3C 15.2.2] and attributes such as BGCOLOR which provide style rather than structure.

7   Use of the referenced text

Throughout this User's Guide, references to the printed version of the referenced text are given in the abbreviated style [W3C 12.3].

The set of element types provided by this International Standard is a subset of the set of element [type]s defined by the W3C Recommendation for HTML 4.01. The set of attributes provided for each element type included in this International Standard is a subset of the corresponding set defined by the W3C Recommendation for HTML 4.01. The set of element types and the sets of attributes are defined by the DTD provided with this International Standard.

Where refinements are defined for element types and attributes, the semantics are a subset of the semantics defined by the W3C Recommendation for HTML 4.01 in the sense that the set of documents conforming to this International Standard is a subset of those conforming to the W3C Recommendation for HTML 4.01.

NOTE: For clarity, and as required by ISO 8879, this International Standard makes a distinction between an individual element with a given generic identifier and the class of all such elements. The class is called an element type, the instance is called an element and the generic identifier is called an element type name.

ISO 8879 distinguishes between element type [8879 11.2.1] and element [8879 7.3], which is an instance of the type, whereas the W3C Recommendation for HTML 4.01 uses the term "element" for both element and element type. This guide follows the ISO practice, and when quoting from the W3C Recommendation for HTML 4.01 inserts the missing word in square brackets when it is needed, eg. [type].

7.1   Element [type]s defined by the W3C Recommendation

While the syntax of ISO-HTML is defined by the DTD provided by the International Standard, the semantics of the following element types are defined normatively in the W3C Recommendation for HTML 4.01:

  1. <ABBR> [W3C 9.2.1]—Abbreviation
  2. <ACRONYM> [W3C 9.2.1]—Acronym
  3. <B> [W3C 15.2.1]—Bold character style
  4. <BDO> [W3C 8.2.4]—Bidirectional override
  5. <BR> [W3C 9.3.2]—Line break
  6. <CAPTION> [W3C 11.2.2]—Table caption
  7. <CITE> [W3C 9.2.1]—Citation
  8. <CODE> [W3C 9.2.1]—Program code
  9. <DD> [W3C 10.3]—Definition data
  10. <DEL> [W3C 9.4]—Deleted material
  11. <DFN> [W3C 9.2.1]—Defining instance
  12. <DIV> [W3C 7.5.4]—Document division
  13. <DL> [W3C 10.3]—Definition list
  14. <DT> [W3C 10.3]—Definition term
  15. <EM> [W3C 9.2.1]—Emphasized text
  16. <FIELDSET> [W3C 17.10]—Group of form items
  17. <FORM> [W3C 17.3]—Forms
  18. <HR> [W3C 15.3]—Horizontal rule
  19. <I> [W3C 15.2.1]—Italic character style
  20. <INS> [W3C 9.4]—Inserted material
  21. <KBD> [W3C 9.2.1]—Keyboard input
  22. <LEGEND> [W3C 17.10]—Fieldset label
  23. <LI> [W3C 10.2]—List item
  24. <META> [W3C 7.4.4]—Document meta-information
  25. <OL> [W3C 10.2]—Ordered list
  26. <OPTGROUP> [W3C 17.6]—Group of user choices
  27. <OPTION> [W3C 17.6]—User choice
  28. <P> [W3C 9.3.1]—Paragraph
  29. <PARAM> [W3C 13.3.2]—Agent interface parameter
  30. <PRE> [W3C 9.3.4]—Preformatted text
  31. <SAMP> [W3C 9.2.1]—Sample output
  32. <SELECT> [W3C 17.6]—Form selection
  33. <SPAN> [W3C 7.5.4]—Generic container
  34. <STRONG> [W3C 9.2.1]—Strong emphasis
  35. <SUB> [W3C 9.2.3]—Subscript character style
  36. <SUP> [W3C 9.2.3]—Superscript character style
  37. <TEXTAREA> [W3C 17.7]—Multi-line text field
  38. <TFOOT> [W3C 11.2.3]—Table footer
  39. <THEAD> [W3C 11.2.3]—Table header cell
  40. <TITLE> [W3C 7.4.2]—Document title
  41. <TT> [W3C 15.2.1]—Monospaced character style
  42. <UL> [W3C 10.2]—Unordered list
  43. <VAR> [W3C 9.2.1]—Generic variable

NOTE: In case you are curious, the lettered list is the official ISO style for lists.

7.2   Element [type]s refined by ISO-HTML

The definitions of the following element types are refined by the International Standard:

  1. <A> [W3C 12.2]—Source and target anchors
  2. <ADDRESS> [W3C 7.5.6]—Author's address
  3. <AREA> [W3C 13.6.1]—Image map region
  4. <BLOCKQUOTE> [W3C 9.2.2]—Block quotation
  5. <BODY> [W3C 7.5.1]—Document body
  6. <BUTTON> [W3C 17.5]—Selectable input mechanism
  7. <COL> [W3C 11.2.4]—Table column properties
  8. <COLGROUP> [W3C 11.2.4]—Table column group properties
  9. <HEAD> [W3C 7.4.1]—Document header
  10. <HTML> [W3C 7.3]—Document instance
  11. <H1> [W3C 7.5.5]—Major section header
  12. <H2> [W3C 7.5.5]—Section header
  13. <H3> [W3C 7.5.5]—Subsection header
  14. <H4> [W3C 7.5.5]—Subsubsection header
  15. <H5> [W3C 7.5.5]—Subsubsubsection header
  16. <H6> [W3C 7.5.5]—Minor subsubsubsection header
  17. <IMG> [W3C 13.2]—Inline images
  18. <INPUT> [W3C 17.4]—User input field
  19. <LABEL> [W3C 17.9.1]—Form field label
  20. <LINK> [W3C 12.3]—Interdocument relations
  21. <MAP> [W3C 13.6.1]—Client-side image map
  22. <OBJECT> [W3C 13.3]—Simple agent
  23. <Q> [W3C 9.2.2]—Quote
  24. <STYLE> [W3C 14.2.3]—Style specification
  25. <TABLE> [W3C 11.2.1]—Tables
  26. <TBODY> [W3C 11.2.3]—Table body
  27. <TD> [W3C 11.2.6]—Table data cell
  28. <TH> [W3C 11.2.6]—Table header cell
  29. <TR> [W3C 11.2.5]—Table row

Any element type not listed in this or the preceding subclause is excluded from the International Standard.

7.3   Attributes omitted by ISO-HTML

The W3C Recommendation for HTML 4.01 provides a number of attributes that are not supported by the International Standard. They have been omitted because they are used to describe appearance rather than structure, or because the feature is considered to be still too unstable or immature for an International Standard.

  1. ALIGN—Omitted from all elements on which it occurs.
  2. ALINK—Omitted from all elements on which it occurs.
  3. ALT—Omitted from <INPUT> [W3C 17.4].
  4. ARCHIVE—Omitted from <OBJECT> [W3C 13.3].
  5. BACKGROUND—Omitted from <BODY> [W3C 7.5.1].
  6. BGCOLOR—Omitted from all elements on which it occurs.
  7. BORDER—Omitted from all elements on which it occurs.
  8. CELLPADDING—Omitted from <TABLE> [W3C 11.2.1].
  9. CELLSPACING—Omitted from <TABLE> [W3C 11.2.1].
  10. CHAR—Omitted from all elements on which it occurs.
  11. CHAROFF—Omitted from all elements on which it occurs.
  12. CLEAR—Omitted from <BR> [W3C 9.3.2].
  13. COMPACT—Omitted from all elements on which it occurs.
  14. COORDS—Omitted from <A> [W3C 12.2].
  15. FRAME—Omitted from <TABLE> [W3C 11.2.1].
  16. HEIGHT—Omitted from all elements on which it occurs.
  17. HSPACE—Omitted from all elements on which it occurs.
  18. LINK—Omitted from <BODY> [W3C 7.5.1].
  19. NAME—Omitted from <FORM> [W3C 17.3].
  20. NAME—Omitted from <IMG> [W3C 13.2].
  21. NOSHADE—Omitted from <HR> [W3C 15.3].
  22. NOWRAP—Omitted from <TD> [W3C 11.2.6] and <TH> [W3C 11.2.6].
  23. ONBLUR—Omitted from all elements on which it occurs.
  24. ONCHANGE—Omitted from all elements on which it occurs.
  25. ONCLICK—Omitted from all elements on which it occurs.
  26. ONDBLCLICK—Omitted from all elements on which it occurs.
  27. ONFOCUS—Omitted from all elements on which it occurs.
  28. ONKEYDOWN—Omitted from all elements on which it occurs.
  29. ONKEYPRESS—Omitted from all elements on which it occurs.
  30. ONKEYUP—Omitted from all elements on which it occurs.
  31. ONLOAD—Omitted from all elements on which it occurs.
  32. ONMOUSEDOWN—Omitted from all elements on which it occurs.
  33. ONMOUSEMOVE—Omitted from all elements on which it occurs.
  34. ONMOUSEOUT—Omitted from all elements on which it occurs.
  35. ONMOUSEOVER—Omitted from all elements on which it occurs.
  36. ONMOUSEUP—Omitted from all elements on which it occurs.
  37. ONRESET—Omitted from all elements on which it occurs.
  38. ONSELECT—Omitted from all elements on which it occurs.
  39. ONSUBMIT—Omitted from all elements on which it occurs.
  40. ONUNLOAD—Omitted from all elements on which it occurs.
  41. RULES—Omitted from <TABLE> [W3C 11.2.1].
  42. SHAPE—Omitted from <A> [W3C 12.2].
  43. SIZE—Omitted from <HR> [W3C 15.3].
  44. SRC—Omitted from <INPUT> [W3C 17.4].
  45. START—Omitted from <OL> [W3C 10.2].
  46. STYLE—Omitted from all elements on which it occurs.
  47. TARGET—Omitted from all elements on which it occurs.
  48. TEXT—Omitted from <BODY> [W3C 7.5.1].
  49. TYPE—Omitted from <LI> [W3C 10.2], <OL> [W3C 10.2] and <UL> [W3C 10.2].
  50. USEMAP—Omitted from <INPUT> [W3C 17.4].
  51. VALIGN—Omitted from all elements on which it occurs.
  52. VALUE—Omitted from <LI> [W3C 10.2].
  53. VERSION—Omitted from <HTML> [W3C 7.3].
  54. VLINK—Omitted from all elements on which it occurs.
  55. VSPACE—Omitted from all elements on which it occurs.
  56. WIDTH—Omitted from all elements on which it occurs.

8   General provisions

This clause in the International Standard covers matters that are not associated with a particular element.

8.1   Byte order

When an HTML text is transmitted as a multibyte character set UCS-2 or UCS-4, this International Standard follows RFC2070 and recommends:

  1. That it be transmitted in big-endian byte order—high order byte first.
  2. That the transmitted document always begin with a ZERO-WIDTH NON-BREAKING SPACE character (hexadecimal FEFF or 0000FEFF) which, when byte-reversed becomes FFFE or FFFE0000, a character guaranteed never to be assigned. Thus a user agent receiving an FFFE as the first two octets of a text would know that bytes have to be reversed for the remainder of the text.

8.2   Block and text element types

The International Standard defines two classes of structure: block element types and text element types. The two classes are defined in the ISO-HTML DTD by the entities:

%block;
The block class contains the element types <BLOCKQUOTE> [W3C 9.2.2], <DIV> [W3C 7.5.4], <DL> [W3C 10.3], <FIELDSET> [W3C 17.10], <FORM> [W3C 17.3], <HR> [W3C 15.3], <OL> [W3C 10.2], <P> [W3C 9.3.1], <PRE> [W3C 9.3.4], <TABLE> [W3C 11.2.1] and <UL> [W3C 10.2].

NOTE: The %block; class corresponds to the %block; parameter entity in the W3C Recommendation for HTML 4.01, but excludes the %heading; element [type]s and the <ADDRESS> [W3C 7.5.6] element [type].

%text;
The text class contains parsed character data (PCDATA) [8879 4.228 and 11.2.4] and the subclasses physical styles, logical styles and special.

NOTE: The %text; class corresponds to the %inline; parameter entity in the W3C Recommendation for HTML 4.01, but without the %formctrl; element [type].

The subclasses are defined by entities:

%physical.styles;
The physical styles subclass contains the element types <B> [W3C 15.2.1], <I> [W3C 15.2.1], <SUB> [W3C 9.2.3], <SUP> [W3C 9.2.3] and <TT> [W3C 15.2.1].

NOTE: The physical styles are called %fontstyle; in the W3C Recommendation for HTML 4.01. ISO-HTML adds <SUB> [W3C 9.2.3] and <SUP> [W3C 9.2.3] taken from %special;, and sorts the set into alphabetical order.

%logical.styles;
The logical styles subclass contains the element types <ABBR> [W3C 9.2.1], <ACRONYM> [W3C 9.2.1], <CITE> [W3C 9.2.1], <CODE> [W3C 9.2.1], <DFN> [W3C 9.2.1], <EM> [W3C 9.2.1], <KBD> [W3C 9.2.1], <SAMP> [W3C 9.2.1], <STRONG> [W3C 9.2.1] and <VAR> [W3C 9.2.1].

NOTE: %logical.styles; are called %phrase; in the W3C Recommendation for HTML 4.01. The ISO-HTML DTD presents the elements in alphabetical order.

%special;
The special subclass contains the element types <A> [W3C 12.2], <BDO> [W3C 8.2.4], <BR> [W3C 9.3.2], <IMG> [W3C 13.2], <OBJECT> [W3C 13.3], <MAP> [W3C 13.6.1], <Q> [W3C 9.2.2] and <SPAN> [W3C 7.5.4].

NOTE: The ISO-HTML special subclass corresponds to %special; in the W3C Recommendation for HTML 4.01, but excludes <SUB> [W3C 9.2.3] and <SUP> [W3C 9.2.3] which ISO-HTML considers to be physical styles. Those that are included are in alphabetical order.

The distinction between block elements and text elements appears in

For details, see the W3C Recommendation for HTML 4.01.

9   Invocation

The DTD provided by this International Standard has the following formal public identifiers:

"ISO/IEC 15445:2000//DTD HyperText Markup Language//EN"
"ISO/IEC 15445:2000//DTD HTML//EN"

NOTE: The second formal public identifier is shorter, but has exactly the same meaning as the first.

9.1   Document type declaration

The DTD is typically invoked by one of the following declarations:

<!DOCTYPE HTML PUBLIC "ISO/IEC 15445:2000//DTD HyperText Markup Language//EN">
<!DOCTYPE HTML PUBLIC "ISO/IEC 15445:2000//DTD HTML//EN">

The document type declaration shall not include a document type declaration subset [8879 11.1].

NOTE: The DTD provides an optional mechanism to facilitate the production of conforming documents. The optional mechanism, which is not a part of this International Standard, allows an SGML parser to verify the correct nesting of sections and requires the use of an alternative document type declaration which is described in the User's Guide to ISO/IEC 15445. The Guide also provides descriptions of the SGML techniques used in the documentation preparation process.

9.1.1   Document type declaration for preparation of ISO-HTML

The exclusion of the document type declaration subset [8879 11.1] by the International Standard prevents the use of parameter entities [8879 B.6] in conforming documents. Parameter entities declared in the subset can be useful in documents in the same way that macros are useful in programming languages. We will explain later how to take advantage of the power of parameter entities when preparing ISO-HTML documents. This will require a modified document type which is invoked by the document type declaration:

<!DOCTYPE Pre-HTML PUBLIC 
   "-//ISO-HTML User's Guide//DTD ISO-HTML Preparation//EN" 
[<!ENTITY % Preparation "INCLUDE" >

general entity declarations...

]>

This modified document type declaration is not a part of the International Standard, but is useful in preparing documents which conform to ISO-HTML.

NOTE: The International Standard and this User's Guide were prepared from a common source marked up using the modified document type declaration.

9.2   Architectural support declaration

In order to use the HTML document type definition as a base architecture for other SGML applications, one of the the following architectural support declarations should be used:

<!ENTITY % HtmlDtd PUBLIC "ISO/IEC 15445:2000//DTD HTML//EN">

<?IS10744 ArcBase HTML>

<!NOTATION HTML PUBLIC
  "-//ISO-HTML User's Guide//NOTATION HTML Architecture//EN">
<!ATTLIST #NOTATION HTML
  ArcDTD    CDATA #FIXED "%HtmlDtd" -- Meta-DTD entity --
  ArcDocF   NAME  #FIXED "HTML"     -- Document element name --
  ArcNamrA  NAME  #IMPLIED          -- Default: no renaming --
                                    -- See [HyTime A.3.4.2] --
>

or

<?IS10744
  arch name="html"
  public-id="ISO/IEC 15445:2000//DTD HyperText Markup Language//EN"
  dtd-system-id="ftp://ftp.cs.tcd.ie/isohtml/15445.dtd"
  renamer-att="HTMLnames"
  doc-elem-form="HTML"
>

NOTE: These two architectural support declarations are equivalent.

NOTE: In the first form, the attribute ArcNamrA may be defined as ArcNamrA NAME #FIXED "HTMLnames" if renaming is required [HyTime, A.3.5.2].

NOTE: The Processing instruction [8879, clause 8] based mechanisms used in the second form have not yet been published (February 2003). The International Standard forgot to give the first form.

9.3   Comments in the DTD

The International Standard makes the comments in the DTD a part of the normative text.

The comments in the DTD which use the expressions "shall" or "shall not" are normative requirements of this International Standard. Comments which use the expression "should" or "should not" are recommendations of this International Standard. Comments which use the verbs "recommend" or "deprecate" are recommendations and deprecations of this International Standard.

NOTE: DTD comments in the W3C Recommendation for HTML 4.01 are informative only.

10   Availability of the DTD

The document type definition (DTD) [8879 11.1] provided by ISO-HTML is divided into three parts which are grouped within a single file. Part 1 is a set of entity definitions required by the DTD and forms the ISO-HTML entity set. Part 2 defines the ISO-HTML element types and their content models, and Part 3 defines the attribute sets for each element type and provides additional normative refinements.

The International Standard also provides an SGML declaration [8879 13] which gives instructions to the SGML parser.

NOTE: The ISO-HTML SGML declaration is essentially the same as the SGML declaration in the W3C Recommendation for HTML 4.01.

The formal SGML definitions, i.e the ISO-HTML DTD and the ISO-HTML SGML declaration are part of the text of this International Standard and are protected by copyrights held by the IETF, the W3C (MIT, Inria, Keio) and the ISO/IEC. Permission to copy is granted provided the following copyright notice is included with all copies:

Permission to copy in any form is granted for use with validating and conforming systems and applications as defined in ISO/IEC 15445:2000, provided this copyright notice is included with all copies.

This provision allows you to make electronic copies of the file that contains the ISO-HTML DTD and the file that contains the SGML declaration. Make sure that the copies that you use are pristine. They should have the following 128 bit MD5 Message-Digest Algorithm checksums specified by RFC1321 and calculated by the GNU md5sum utility for text (not binary) files:



52a4de8d16bc469f42801924384d84fa  15445.dcl
cb098831761d5d7458084d6076c2d6eb  15445.dtd

 

NOTE: The OASIS catalogue fragment described in this User's Guide is not a part of the International Standard. It may be copied without payment under the terms of the GNU General Public License.

NOTE: The checksums which appear in this clause are an example of automatically computed text in an ISO-HTML page. The technique is described in chapter Document preparation.

11   Style

This International Standard requires a complete separation of style and content.

The International Standard is based on the well established principle that it is good document design to separate the content of a document from the intended style in which it is to be presented to a reader. This facilitates the reprocessing of documents in ways that were not envisaged when they were created, and thus protects the content owners' long term investment in documents.

A <STYLE> [W3C 14.2.3] element may be used in the head of a document as a container for a style sheet. The style sheet language is not defined by this International Standard.

Although the International Standard does not specify a style sheet language, this User's Guide recommends that authors of ISO-HTML documents use Cascading Style Sheets as specified by the World Wide Web Consortium.

Wherever this International Standard describes a possible presentation, eg. as a button, the styling information is intended to provide assistance to the reader in understanding the semantics of the element or attribute. It is not intended as a normative style requirement.

12   Comments in HTML

All comments in HTML document instances shall appear in comment declarations. There shall be exactly one comment per comment declaration.

SGML differentiates between a comment [8879 10.3] which appears between pairs of double hyphens:

-- This is a comment --

and a comment declaration [8879 10.3] which has the form

<!--comment--   --comment--  --comment--  >

Notice that a comment may be followed by whitespace. The degenerate case

<!>

is allowed by SGML. A common beginner's mistake is to place multiple hyphens in a comment for decorative purposes:

<!----------------------------------------------------
    Joe: have the Whizz-Bang lawyers check this out:
  ---------------------------------------------------->

This example is not valid SGML and it is not valid ISO-HTML, since the additional hyphens are not present in multiples of four.

Validating systems should find an SGML error in such invalid examples (the characters Joe: have the Whizz-Bang lawyers check this out: should not appear in whitespace). We leave you to count the hyphens and appreciate that you should not write -- within a comment.

The International Standard requires that all comments in ISO-HTML documents appear in comment declarations [8879 10.3]. There shall be one and only one comment per comment declaration. For example:

<!-- This is a single comment 
     in a comment declaration. -->

The intention of this provision is to facilitate the use of popular user agents which are unable to parse SGML and which cannot handle comments outside comment declarations.

The International Standard allows white space following the comment, so an author could write:

<!-- This is a single comment 
     followed by white space. --
>

13   Refinement of element types

The following subchapters describe the refinements that the International Standard makes to those element [type]s defined by the W3C Recommendation for HTML 4.01 which are included in ISO-HTML.

13.1   The A element type—Source and target anchors

The attributes of the <A> [W3C 12.2] element are restricted to:

The International Standard recommends that authors of ISO-HTML documents use both the ID attribute and the NAME attribute. If both are used, then they shall be given identical values since this allows an SGML parser to verify that the values for different anchors are distinct.

13.2   The ADDRESS element type—Author's address

The <ADDRESS> [W3C 7.5.6] element indicates the author or originator of a document or major part of a document. The International Standard discourages its use for general markup by requiring that it appear only in the content of the elements: <BLOCKQUOTE> [W3C 9.2.2], <BODY> [W3C 7.5.1], <DIV> [W3C 7.5.4], <FIELDSET> [W3C 17.10], <FORM> [W3C 17.3] and <OBJECT> [W3C 13.3].

The <ADDRESS> [W3C 7.5.6] element should not to be used to markup, for example, a list of addresses of the members of a club.

13.3   The AREA element type—Image map region

ISO-HTML resticts the attributes of the <AREA> [W3C 13.6.1] element to:

The International Standard requires that a value be provided for the ALT attribute, and that one of HREF or NOHREF be specified.

13.4   The BLOCKQUOTE element type—Block quotation

ISO-HTML strengthens a recommendation in the W3C Recommendation for HTML 4.01 by insisting that the contents of the <BLOCKQUOTE> [W3C 9.2.2] element be specified without surrounding quotation marks. These may be added by a user agent through the use of a style sheet.

NOTE: Authors have recognized that popular browsers often present the <BLOCKQUOTE> [W3C 9.2.2] contents indented left and right, and they have misused the element to obtain this formatting effect for text which was not a block quotation. True block quotations were marked up with quotation marks such as ". The W3C try to provide backward compatibility in the W3C Recommendation for HTML 4.01 and this prevents them requiring the omission of quotation marks. ISO-HTML does not have a backward compatibility requirement, and can insist on quotation mark omission.

13.4.1   Example

This example quotes from article 129C of the European Union Treaty. Here is the markup:

<BLOCKQUOTE
   LANG=fr
   TITLE="Trait&eacute; sur l'Union Europ&eacute;enne, Article 129 C.">
<p>
Afin de r&eacute;aliser les objectifs vis&eacute;s &agrave; l'article
129B, la Communaut&eacute; :
<p>
met en oeuvre toute action qui peut s'av&eacute;rer n&eacute;cessaire
pour assurer l'interoperabilit&eacute; des r&eacute;seaux, en
particulier dans le domaine de l'harmonisation des
normes techniques ;
</BLOCKQUOTE>

The quotation contains two paragraphs which begin with <p> start tags. Note that the end tags </p> have been omitted. This is allowed in ISO-HTML's SGML-based markup by the omitted tag minimization [8879 11.2.2] specified in the DTD:

<!ELEMENT P  - O  (%text;)+ >

The "O" says that end-tags may be omitted. In the World Wide Web Consortium's Recommendation for XHTMLTM which is an application of XML, such end tag omission is not allowed and the two end tags </p> would have to be provided. XML has dis-allowed all tag omission.

Here is a possible rendering of the quotation:

<< Afin de r&eacute;aliser les objectifs vis&eacute;s &agrave; l'article 129B, 
   la Communaut&eacute; :

   met en oeuvre toute action qui peut s'av&eacute;rer n&eacute;cessaire
   pour assurer l'interoperabilit&eacute; des r&eacute;seaux, en
   particulier dans le domaine de l'harmonisation des
   normes techniques ; >>

Although there is no requirement in SGML or ISO-HTML to place the value of the TITLE attribute on a single line, we encourage authors to do this, to facilitate the use of popular browsers while they move towards fuller conformance.

13.5   The BODY element type—Document body

The start tag is required but the end tag is optional. We recommend that authors include the end tag if the document is to be the subject of further processing.

13.5.1   Preparation

In order to facilitate the preparation of conforming ISO-HTML documents, the User's Guide provides a stricter definition for the content model of the <BODY> [W3C 7.5.1] element.

<!ELEMENT BODY  - O  ((%block;)*,(H1,DIV1)*)  +(DEL|INS) >

This content model makes use of the element <DIV1>, which is not a part of ISO-HTML, to enforce strictly progressive nesting of sections. The <DIV1> tags generated during the preparation process will be removed after the document has been validated as conforming to the strict nesting requirement.

NOTE: Authors are not required to place <DIV1> tags in documents; they are deduced automatically by the SGML parser.

13.6   The BUTTON element type—Selectable input mechanism

The International Standard requires that the <BUTTON> [W3C 17.5] element not contain the <A> [W3C 12.2], <BUTTON> [W3C 17.5], <FIELDSET> [W3C 17.10], <FORM> [W3C 17.3], <INPUT> [W3C 17.4], <LABEL> [W3C 17.9.1], <SELECT> [W3C 17.6] or <TEXTAREA> [W3C 17.7] elements. If the <BUTTON> [W3C 17.5] element contains an <IMG> [W3C 13.2] element, the International Standard requires that the <IMG> [W3C 13.2] not have an ISMAP or USEMAP attribute.

The attributes of the <BUTTON> [W3C 17.5] element are restricted to:

ISO-HTML requires that the TYPE attribute be provided, and when the TYPE is specified as submit, the NAME and VALUE attributes shall be provided.

13.7   The COL element type—Table column properties

The International Standard restricts the attributes of the <COL> [W3C 11.2.4] element to:

13.8   The COLGROUP element type—Table column group properties

The International Standard restricts the attributes of the <COLGROUP> [W3C 11.2.4] element to:

The SPAN attribute should only be used if the <COLGROUP> [W3C 11.2.4] element has no content.

13.9   The HEAD element type—Document header

The header of a document provides information about the document rather than the content of the document. Such meta-information is potentially very important for libraries and applications based on large document collections. We recommended that authors give careful attention to their document headers as part of the overall architecture and design of their applications.

The start tag of the <HEAD> [W3C 7.4.1] element is required by ISO-HTML and shall not be omitted.

Scripting is not yet considered to be sufficiently stable and mature to be included in an International Standard, so the <HEAD> [W3C 7.4.1] element content model does not include the <SCRIPT> [W3C 18.2.1] element.

13.10   The HTML element type—Document instance

In SGML vocabulary, the element which contains the document instance is known as the document element [8879 4.99 and 7.2]. Many historic HTML documents omitted the document element tags, and the W3C Recommendation for HTML 4.01, in an effort at backward compatibility, continues to allow omission of the document element start and end tags. ISO-HTML has no backward compatibility requirement, and requires that both the start and end tags of the <HTML> [W3C 7.3] element be present. They shall not be omitted.

13.10.1   Preparation

This User's Guide provides a specification for an "HTML in preparation" document which facilitates validation. Since the preparation documents are technically not ISO-HTML, their document element is changed to <Pre-HTML> to avoid any possible confusion.

13.11   The H1 element type—Major section header

13.11.1   Introduction

The structural elements BODY, H1, P, ... were invented in the late 60s and have re-appeared in many SGML-based markup languages since. An historic example is the general document DTD (GDOC) [SGML Annex E.1]. The notion of sectioning that the elements provide is most clear in the industrial strength DocBook DTD where a chapter corresponds to the BODY of an HTML page. In DocBook, a typical chapter is

<chapter><title>My Chapter</title>
<para> ... </para>
<sect1><title>First section</title>
<para> ... </para>
<example> ... </example>
</sect1>
</chapter>

There are three ideas here:

A document designer, when creating a DTD, needs to have at least two elements which represent these three ideas in order to fully structure the document. DocBook has choosen elements to represent the nested section and the text of the title. HTML has only one element which represents the text of the title.

The following table shows the correspondance with GDOC, HTML and Pre-HTML:

Table 1: Comparison of sectioning in DocBook, GDOC, HTML and Pre-HTML
DocBook GDOC HTML Pre-HTML
chapter h0 missing missing
sect1 h1 missing DIV1
sect2 h2 missing DIV2
para p P P
title h0t, ..., h3t H1, ..., H6 H1, ..., H6

HTML appears to put H1, ..., H6 in the "wrong" place, confusing the text of a title with the beginning of a new nested section.

ISO-HTML considers that the H1, H2,... of HTML still identify sections even though they contain only the section title. The "H1 section" exists up to the next H1 or the end of the body.

The <DIV> [W3C 7.5.4] element in HTML does not have the same nested section semantics as DocBook's sectn. This is why ISO-HTML, which is very strict about document structuring, does not allow <DIV> [W3C 7.5.4] to be intermixed with nested sections.

13.11.2   Nesting of sections

ISO-HTML takes a very strict view of the nesting of sections. Sections are considered to be important building blocks in documents, and maintaining the integrity of their relationships is considered vital. ISO-HTML considers that the <H1> [W3C 7.5.5] element specifies the beginning of a major section of a document and contains the title of that major section. In the past, many authors have used section header elements only for their appearance, typically giving the author a set of larger fonts with a visual browser. The W3C offer the following light deprecation of this usage:

Some people consider skipping heading levels to be bad practice

but accept headings in any order, in an effort to promote backward compatibility.

ISO-HTML considers that the <H1> [W3C 7.5.5] through <H6> [W3C 7.5.5] elements identify sections of increasing depth and requires that the trees formed by the containment of sections be rooted at the <H1> [W3C 7.5.5] element, and that no intermediate level be skipped.

The International Standard requires that the <H1> [W3C 7.5.5] element not be followed by an <H3> [W3C 7.5.5], <H4> [W3C 7.5.5], <H5> [W3C 7.5.5], or <H6> [W3C 7.5.5] element without an intervening <H2> [W3C 7.5.5] element. This requirement is expressed as normative text in the DTD, but cannot be specified in the DTD content models without introducing additional elements which are not a part of the language. It is possible to make the introduction of new elements entirely automatic, without them appearing in the source document, but the use of general purpose SGML tools such as sgmlnorm which parse documents and re-issue then with all start and end tags included poses a problem since these "normalized" documents are not valid ISO-HTML.

The attributes of the <H1> [W3C 7.5.5] element are restricted to:

13.11.3   Preparation

To make it possible for an SGML parser to validate the correct nesting of sections, this User's Guide provides an "almost ISO-HTML" document type definition which may be used to facilitate preparation of valid ISO-HTML. The document element of this "preparation ISO-HTML" has been changed from <HTML> [W3C 7.3] to <Pre-HTML> to avoid any confusion. The <Pre-HTML> DTD automatically introduces new elements required for the validation process. A simple program or a procedure based on architectural forms may be used later to remove the unwanted elements to produce valid ISO-HTML.

The ISO-HTML DTD may be switched to the <Pre-HTML> DTD through use of the Preparation parameter entity. If Preparation has the value INCLUDE, the alternate definition of the <H1> [W3C 7.5.5] element requires the correct nesting of headings and sections:

<!ELEMENT H1    - -     (%text;)+ >
<!ELEMENT DIV1  O O     ((%block;)*,(H2,DIV2)*) >

For further details, see SGML engineering.

The recommended way of specifying that the <Pre-HTML> DTD is to be used is by preceeding the document instance with the ISO-HTML preparation document type declaration. This has the effect of setting the Preparation parameter entity to the value INCLUDE.

The <DIV1> through <DIV6> elements are for internal use only within the DTD and are not a part of the language. They shall not appear in any ISO-HTML document or associated style sheet.

13.11.4   Example of structured headings

Nested boxes illustrate nested sections   [D]

Figure 2: Progressive nesting of sections.

13.12   The H2 element type—Section header

ISO-HTML considers that the <H2> [W3C 7.5.5] element specifies the beginning of a section of a document and contains the title of that section.

The International Standard requires that the <H2> [W3C 7.5.5] element not be followed by an <H4> [W3C 7.5.5], <H5> [W3C 7.5.5], or <H6> [W3C 7.5.5] element without an intervening <H3> [W3C 7.5.5] element. An <H2> [W3C 7.5.5] element shall be preceded by an <H1> [W3C 7.5.5] element.

The attributes of the <H2> [W3C 7.5.5] element are restricted to:

13.13   The H3 element type—Subsection header

ISO-HTML considers that the <H3> [W3C 7.5.5] element specifies the beginning of a subsection of a document and contains the title of the subsection.

The <H3> [W3C 7.5.5] element shall not be followed by an <H5> [W3C 7.5.5] or <H6> [W3C 7.5.5] element without an intervening <H4> [W3C 7.5.5] element. An <H3> [W3C 7.5.5] element shall be preceded by an <H2> [W3C 7.5.5] element.

The attributes of the <H3> [W3C 7.5.5] element are restricted to:

13.14   The H4 element type—Subsubsection header

ISO-HTML considers that the <H4> [W3C 7.5.5] element specifies the beginning of a subsubsection of a document and contains the title of the subsubsection.

The <H4> [W3C 7.5.5] element shall not be followed by an <H6> [W3C 7.5.5] element without an intervening <H5> [W3C 7.5.5] element. An <H4> [W3C 7.5.5] element shall be preceded by an <H3> [W3C 7.5.5] element.

The attributes of the <H4> [W3C 7.5.5] element are restricted to:

13.15   The H5 element type—Subsubsubsection header

ISO-HTML considers that the <H5> [W3C 7.5.5] element specifies the beginning of a subsubsubsection of a document and contains the title of the subsubsubsection.

An <H5> [W3C 7.5.5] element shall be preceded by an <H4> [W3C 7.5.5] element.

The attributes of the <H5> [W3C 7.5.5] element are restricted to:

13.16   The H6 element type—Minor subsubsubsection header

ISO-HTML considers that the <H6> [W3C 7.5.5] element specifies the beginning of a minor subsubsubsection of a document and contains the title of the minor subsubsubsection.

An <H6> [W3C 7.5.5] element shall be preceded by an <H5> [W3C 7.5.5] element.

The attributes of the <H6> [W3C 7.5.5] element are restricted to:

13.17   The IMG element type—Inline images

The attributes of the <IMG> [W3C 13.2] element are restricted to:

The International Standard requires that the SRC and ALT attributes be provided. At most one of the attributes ISMAP and USEMAP may be provided.

13.18   The INPUT element type—User input field

The TYPE attribute of the <INPUT> [W3C 17.4] element discriminates between several different types of input field. The set of applicable attributes depends on the value of the TYPE attribute as specified in the following subchapters. By default the value of the TYPE attribute is "text".

The value "button" for the attribute TYPE is not available in ISO-HTML. Authors wishing to place button-like devices in documents should use the <BUTTON> [W3C 17.5] element.

For all values of the TYPE attribute, the <INPUT> [W3C 17.4] element carries the following attributes:

ISO-HTML restricts the other attributes of the <INPUT> [W3C 17.4] element to ACCEPT, ACCESSKEY, CHECKED, DISABLED, MAXLENGTH, NAME, READONLY, SIZE, TABINDEX, TYPE and VALUE. Their use depends on the value of the TYPE attribute as specified in the following subchapters.

Pairs of NAME, VALUE attributes are known as controls and are described in clause 17.2 Controls. When they are submitted for processing they are known as successful controls and described in clause 17.13.2 Successful controls in the W3C Recommendation for HTML 4.01.

For some values of attribute TYPE, the attribute TABINDEX is available: its value is a non-negative integer. An SGML number [8879 9.3] is merely a token in which the characters are restricted to digits. 14 and 00014 are not the same number/token since the character strings are not the same. ISO-HTML recommends that the number be given an integer interpretation, with leading zeroes ignored, in the manner of a programming language.

13.18.1   TYPE=checkbox

An <INPUT> [W3C 17.4] element with TYPE=checkbox specifies a boolean choice. A set of <INPUT> [W3C 17.4] elements in the same <FORM> [W3C 17.3] element with the same NAME attribute value represents an n-of-many choice.

The other attribute values are as follows:

13.18.2   TYPE=file

An <INPUT> [W3C 17.4] element with TYPE=file provides a means for users to attach a file to a form's content. The <INPUT> [W3C 17.4] is typically structured within a <FIELDSET> [W3C 17.10] containing text and an associated <BUTTON> [W3C 17.5] which when selected invokes a file browser to select a file name. The file name can also be entered directly in the text field. See RFC1867 for further details.

It is important that a user agent not send any file that the user has not explicitly authorized to be sent. Thus ISO-HTML interpreting agents are expected to confirm any default file names that might be suggested. ISO-HTML requires that fields specifying files not be hidden.

The other attribute values are as follows:

13.18.3   TYPE=hidden

An <INPUT> [W3C 17.4] element with TYPE=hidden declares that a field should not be rendered—it is hidden from the user. The user does not interact with the field; instead, the VALUE attribute specifies the value of the field. The NAME and VALUE attributes are required, and are returned to the server when the form is submitted.

This input element may be used to provide state information in a form.

The other attribute values are as follows:

13.18.4   TYPE=password

An <INPUT> [W3C 17.4] element with TYPE=password specifies a single line text field into which users may type a password. As the user types, the characters are usually echoed as `*' to hide the password from prying eyes.

Application designers should note that this is only a light security protection. Although the password is masked by the browser from casual observers, it may be transmitted back to the server in clear text, and can be read by anyone with low-level access to the network. It is possible to specify encryption using the ACTION attribute of <FORM> [W3C 17.3] however details are beyond the scope of the Guide.

The other attribute values are as follows:

13.18.5   TYPE=radio

An <INPUT> [W3C 17.4] element with TYPE=radio specifies a boolean choice: "on" or "off". A set of <INPUT> [W3C 17.4] elements in a <FORM> [W3C 17.3] element with the same NAME attribute value collectively represents a 1-of-many choice. Only one is "on", and all the others are "off".

The other attribute values are as follows:

ISO-HTML requires that at all times one and only one of the radio buttons in a set be checked. Initially, if none of the <INPUT> [W3C 17.4] elements in a set of radio buttons specifies CHECKED, then the user agent shall mark the first radio button of the set as checked.

13.18.6   TYPE=reset

An <INPUT> [W3C 17.4] element with TYPE=reset specifies an input option, usually represented by a button, that instructs a user agent to reset the form's fields to their initial states.

This behaviour is also offered by the <BUTTON> [W3C 17.5] element which should be preferred.

There is an inconsistency between the behaviour of the <BUTTON> [W3C 17.5] element type with attribute TYPE=reset when contained in a <FIELDSET> [W3C 17.10], and the behaviour of the <INPUT> [W3C 17.4] element type with attribute TYPE=reset when contained in a <FIELDSET> [W3C 17.10].

In the case of <BUTTON> [W3C 17.5], the reset action is limited to the contents of the <FIELDSET> [W3C 17.10], but in the case of <INPUT> [W3C 17.4], the International Standard omits to state the limitation. See reported defect 8. We recommend that authors and application designers assume that the same limitation exists for <BUTTON> [W3C 17.5] and <INPUT> [W3C 17.4].

The other attribute values are as follows:

13.18.7   TYPE=submit

An <INPUT> [W3C 17.4] element with TYPE=submit represents an input option, typically a button, that instructs a user agent to submit the form.

This behaviour is also offered by the <BUTTON> [W3C 17.5] element which should be preferred.

The other attribute values are as follows:

13.18.8   TYPE=text

An <INPUT> [W3C 17.4] element with TYPE=text specifies a single line text field into which users may type a string.

The other attribute values are as follows:

13.19   The LABEL element type—Form field label

The International Standard requires that the <LABEL> [W3C 17.9.1] element refer to a form field in the content of the <FORM> [W3C 17.3] element which contains the <LABEL> [W3C 17.9.1].

ISO-HTML restricts the attributes of the <LINK> [W3C 12.3] element to:

13.20.1   Example

In this example the current document is "Chapter2.html", and the links describe the relationships with the preceding and following chapters:

<HEAD>
 <LINK REL="Index"     HREF="../index.html">
 <LINK REL="Next"      HREF="Chapter3.html">
 <LINK REV="Previous"  HREF="Chapter3.html">
 <LINK REV="Next"      HREF="Chapter1.html">
</HEAD>

If the HREF is unchanged, changing REL to REV or vice versa requires reversing the semantics of the REL/REV attribute.

13.21   The MAP element type—Client-side image map

The International Standard requires that the NAME attribute be provided.

In order to resolve the ID/NAME case folding contradiction, we recommend that authors satisfy the competing requirements of SGML and the W3C Recommendation for HTML 4.01 by restricting themselves to the 40 characters "ABCDEFGHIJKLMNOPQRSTUVWXYZ.-_:0123456789" for ID and NAME values, and for the corresponding HREF values.

In SGML terms, the attribute value specification shall be processed as if the declared value were NAME.

Entity references and character references are replaced, entity ends and record starts are removed, record end and separator characters are replaced by a space. Any sequence of space characters is replaced by a single space and leading and trailing spaces are deleted, [8879 7.9.3 and 10.1.7].

The International Standard recommends that authors of ISO-HTML documents use both the ID attribute and the NAME attribute. If both are used, then they shall be given identical values since this allows an SGML parser to verify that the values for different anchors are distinct.

13.21.1   Accessibility

The first edition of the International Standard provided only <AREA> [W3C 13.6.1] elements to specify the shape of the map. These are essentially graphic and are not suitable for sight impaired or blind users. See defect 4.

The W3C Recommendation for HTML 4.01 extends the content model of the <MAP> [W3C 13.6.1] element type to include block elements as well as <AREA> [W3C 13.6.1] elements. The block elements provide a richer means of describing the map areas, allowing alternative descriptions of the areas suitable for speech browsers. They are intended to improve accessibility, and the International Standard recommends that they be used by authors and rendered by browsers. Although the International Standard expresses the requirement as a recommendation by the use of the word "should", the use of block level content should be understood as a strict requirement.

Authors should use the block-level content of the <MAP> [W3C 13.6.1] element when creating accessible documents. Each region should be specified using an <A> [W3C 12.2] element to define its associated link and shape. User agents should render the block-level content of a <MAP> [W3C 13.6.1] element.

Here is an example of a national accessibility requirement.

13.21.2   Example

The following example shows the use of block-level content to describe five polygons placed in a figure. Each polygon is inscribed in a circle radius R. Selecting one of the polygons leads to a formula for the surface area S.

Triangle or square or hexagon or decagon or duodecagon.

Triangle inscribed in a circle Square inscribed in a circle Hexagon inscribed in a circle Decagon inscribed in a circle Duodecagon inscribed in a circle

Five regular polygons each inscribed in a circle [D]

If the circle has radius R, then the surface area S of the inscribed polygon is:

Triangle: S = (3 * R**2 * sqrt(3)) / 4

Square: S = 2 * R**2

Hexagon: S = (3 * R**2 * sqrt(3)) / 2

Decagon: S = (5 * R**2 * sqrt(10 - 2 * sqrt(5))) / 4

Duodecagon: S = 3 * R**2

The markup used in this example is as follows. It includes an <AREA> [W3C 13.6.1] specification of the selectable areas for browsers which cannot handle block content in a <MAP> [W3C 13.6.1].

<!-- This map describes a 632x128 pixel
     drawing of five polygons. -->
<map id="POLYGONMAP" name="POLYGONMAP">
  <p>
  <a href="#TRIANGLE"
     shape="rect"
     coords="  0,0, 125,127">Triangle</a> or
  <a href="#SQUARE"
     shape="rect"
     coords="126,0, 251,127">square</a> or
  <a href="#HEXAGON"
     shape="rect"
     coords="252,0, 377,127">hexagon</a> or
  <a href="#DECAGON"
     shape="rect"
     coords="378,0, 503,127">decagon</a> or
  <a href="#DUODECAGON"
     shape="rect"
     coords="252,0, 631,127">duodecagon</a>.

  <!-- Markup for browsers which cannot
       handle block content in MAP -->
  <area href="#TRIANGLE"
        shape="rect" coords="  0,0, 125,127"
        alt="Triangle inscribed in a circle">
  <area href="#SQUARE"
        shape="rect" coords="126,0, 251,127"
        alt="Square inscribed in a circle">
  <area href="#HEXAGON"
        shape="rect" coords="252,0, 377,127"
        alt="Hexagon inscribed in a circle">
  <area href="#DECAGON"
        shape="rect" coords="378,0, 503,127"
        alt="Decagon inscribed in a circle">
  <area href="#DUODECAGON"
        shape="rect" coords="252,0, 631,127"
        alt="Duodecagon inscribed in a circle">
</map>

<!-- Offer the visitor a choice of polygon. -->
<p>
<img src="polygon.png"
     class="fullwidth"
     alt="Five regular polygons each inscribed in a circle"
     title="Choose a polygon"
     usemap="#POLYGONMAP">

13.22   The OBJECT element type—Simple agent

The attributes of the <OBJECT> [W3C 13.3] element are restricted to:

13.23   The Q element type—Quote

The contents of the <Q> [W3C 9.2.2] element shall not be surrounded with quotation marks. These may be added by the user agent through the use of a style sheet.

13.23.1   Example

A <Q LANG=de>quotation in German</Q> and
a <Q LANG=fr>quotation in French</Q>.

might be rendered as:

A ,,quotation in German'' and a << quotation in French >>.

13.24   The STYLE element type—Style specification

The <STYLE> [W3C 14.2.3] element contains style sheet information which shall be passed to the user agent's style manager. Any style sheet language may be used, and none is defined by the International Standard.

It is a user agent error to render the style sheet information as if it were part of a document's text.

We recommend that authors:

  1. Offer a range of styles for their documents to take into account the different types of user agent on which the document may be rendered, and the special needs of the readers, eg. larger fonts for the visually impaired.
  2. Do not use style as an intrinsic part of the content. For example: The correct answer is shown in green, the others are in red, would be useless with a user agent which does not render in colour.
  3. Specify the default style sheet language using the <META> [W3C 7.4.4] element.

13.25   The TABLE element type—Tables

The attributes of the <TABLE> [W3C 11.2.1] element are restricted to:

13.26   The TBODY element type—Table body

In ISO-HTML the start tag is required for the <TBODY> [W3C 11.2.3] element.

13.27   The TD element type—Table data cell

The attributes of the <TD> [W3C 11.2.6] element are restricted to:

13.28   The TH element type—Table header cell

The attributes of the <TH> [W3C 11.2.6] element are restricted to:

13.29   The TR element type—Table row

It is recommended that authors pay attention to the following points in order to avoid inconsistent rendering of their tables.

The <TR> [W3C 11.2.5] element should require exactly the same number of columns as the number of columns specified by the <COL> [W3C 11.2.4] or <COLGROUP> [W3C 11.2.4] elements in the containing <TABLE> [W3C 11.2.1] element, if present, taking into account the effect of the ROWSPAN and COLSPAN attributes of the <TD> [W3C 11.2.6] and <TH> [W3C 11.2.6] elements, the SPAN attributes of the <COL> [W3C 11.2.4] and <COLGROUP> [W3C 11.2.4] elements and the padding of incomplete rows by a user agent.

The attributes of the <TR> [W3C 11.2.5] element are restricted to:

14   Document preparation

This chapter describes an SGML-based process for preparing ISO-HTML conforming documents. The process is not a part of the International Standard, but is intended to make it easier to conform to the International Standard. The principal advantages are:

  1. Documents are validated by an SGML parser which verifies conformance to the DTD.
  2. The parser is also able to validate the correct progressive nesting of sections required by ISO-HTML.
  3. The author may split a page into convenient pieces and recombine them during the preparation process. This is particularly convenient for authors working with a set of pages each containing the same piece of "boiler-plate" information.
  4. Often a page will contain pieces of information which may change from time to time. This form of ephemeral data may be placed in a set of entities grouped together to facilitate maintenance and updating.

    NOTE: This is the technique used to specify the many links between the User's Guide and the W3C Recommendation for HTML 4.01.

  5. An SGML parser is able to incorporate computed text automatically.
  6. Different versions of a document may be produced from the same source file under the control of an external process such as a Makefile.

More complex SGML-based processes are possible. For example, the source document may be structured using a richer DTD or a richly structured document database. This has advantages when a document represents a major investment and is used to generate a range of output. The processing of such documents is beyond the scope of this User's Guide.

14.1   Pre-HTML

The process uses the document type declaration internal subset [8879 11.1] which is a feature of the DOCTYPE declaration not supported by the International Standard or the W3C Recommendation for HTML 4.01. In order to clearly identify the documents-in-preparation as being different from ISO-HTML or HTML 4, we give them a different document element <Pre-HTML>. This document element is only valid for documents-in-preparation.

The internal subset appears between square brackets in the DOCTYPE declaration as shown in the following figure.

Use of the DOCTYPE internal subset [D]

Figure 3: Use of the DOCTYPE internal subset.

Before describing the contents of the figure, a short discussion of entities in SGML may be useful. An SGML entity [8879 B.6] may be thought of as a chunk of document — a programmer might prefer to use the term macro. There are two types of entity in SGML:

  1. Parameter entities. These entities are defined and called (the SGML world uses the term referenced) in the document type declaration, including the subset. They are used mainly to provide convenient references to chunks of DTD and to other useful constructs that may be placed in a DTD.

    Parameter entities are also referenced in a document instance in the status keyword specification of marked sections where they provide the keywords INCLUDE or IGNORE for optional sections of text..

  2. General entities. These entities are defined in a document type declaration, including the subset, but are called/referenced in document instances. A well known example is &agrave; used to provide a lower case a with a grave accent which appears at the end of the word voilà.

NOTE: The two types of entity serve the same basic purpose. The reason for having two types is to have two name spaces. The document author need not be concerned about overloading an entity name already chosen by the support people who define the document type declaration.

The "legal" ENTITY declaration [8879 10.5] in the subset has a % character before the entity name. This indicates that "legal" is a parameter entity for use in the subset. The notation "%legal;" [8879 9.4.4] is a reference to the parameter entity and in the example shown, an SGML parser will resolve the parameter entity to a declaration of the general entity &fineprint; which may be used in the document. The resolution process is indirect: an OASIS catalogue fragment, usually in a file "catalog", points to the file which contains the general entity definition. The lookup is done using the Formal Public Identifier [8879 10.2], in the example given: "-//Whiz-Bang//TEXT Legal//EN". The result is to make the general entity &fineprint; available for use in the document.

At first sight this process may seem complex, but in a large production environment it has many advantages. The document author can work without having to be concerned about which file contains the latest fine print. The system administrator manages the OASIS catalogue and the legal department can work independently on their fine print. We have shown an external file "fineprint.txt" which contains only one general entity declaration. In practice the external file may contain hundreds of entity declarations, for example, the offical list of all the publicly avalaible URLs and URI's offered by a corporation.

The ISO-HTML page produced by the process does not contain an internal subset or any indication of the existence of the parameter entity %legal; or the general entity &fineprint;

NOTE: The catalogue fragment may be in the same "catalog" file as the OASIS catalogue fragment described in "SGML engineering" and the sample SGML catalog fragment provided by the W3C Recommendation for HTML 4.01.

14.2   Preparation process

There are two preparation processes, both using the sgmlnorm feature of the SP parser to produce a version of the document-in-preparation in which

14.2.1   Using a "scrubber"

In this process, the intermediate document produced by sgmlnorm contains the <DIV1> ... <DIV6> element tags which are not permitted in ISO-HTML. They are removed by a "scrubber" which also replaces <Pre-HTML> start and end tags by <HTML> [W3C 7.3] start and end tags. In addition, the scrubber places the ISO-HTML document type declaration at the head of the file.

This was the process initially used by the editors. The incantation for the International Standard was of the form:

sgmlnorm -e -g -w all -E 5 15445.Pre-HTML | scrubber > 15445.html

14.2.2   Using an architectural form

In this process, there is no intermediate document. We use the DTD for ISO-HTML as an architectural form to which the output of sgmlnorm is to conform. Since the <DIV1> ... <DIV6> element tags are not a part of ISO-HTML, they are ignored and do not appear in the output. In order to set up the process, we place the following declaration in the internal subset:

<!-- Use ISO-HTML as architectural form -->
 <!ENTITY % HtmlDtd PUBLIC "ISO/IEC 15445:2000//DTD HTML//EN">
 <?IS10744 ArcBase HTML>
 <!NOTATION HTML PUBLIC
   "-//ISO-HTML User's Guide//NOTATION HTML Architecture//EN">
 <!ATTLIST #NOTATION HTML
   ArcDTD    CDATA #FIXED "%HtmlDtd" -- Meta-DTD entity --
   ArcDocF   NAME  #FIXED "HTML"     -- Document element name --
   ArcNamrA  NAME  #IMPLIED          -- Default: no renaming --
                                     -- See [HyTime A.3.4.2] --
 >

The incantation which produces the International Standard now takes the form:

sgmlnorm -A html -d -e -g -w all -E 5 15445.Pre-HTML > 15445.html

where the "-A html" specifies use of notation HTML as a meta-DTD, and the option "-d" asks sgmlnorm to place the document type declaration for the metaDTD, ie. ISO-HTML, at the top of the output document instance.

14.3   An example — Document time stamps

It is common to see a time stamp at the foot of an HTML page such as that of the Free Software Foundation: Updated: 1 Jan 1998 rms. It is possible to use the <Pre-HTML> techniques to set this time stamp automatically. We will assume that you are using a Makefile to build your pages.

  1. In the Makefile, just before you parse a page in which you wish to place a time stamp, insert the following shell commands:
    	echo "<!ENTITY lastchange '" > lastchange
    	date >> lastchange
    	echo "' >" >> lastchange

    NOTE: The three lines are indented with a tab, not spaces.

  2. In the document type declaration subset, between the square brackets, place the following declaration and reference:
    <!ENTITY % lastchange PUBLIC
               "-//ISO-HTML User's Guide//TEXT Last change time stamp//EN" >
    %lastchange;
  3. In your catalogue, add the entry:
            -- Last change time stamp --
    PUBLIC  "-//ISO-HTML User's Guide//TEXT Last change time stamp//EN" lastchange
  4. At the foot of your document, or wherever you want the time stamp to appear, add the following markup:
    <hr>
    <p>Last change was on &lastchange;
    <hr>
  5. See the foot of this document for an example of the result.

You can adapt the formal public identifiers, entity names and time stamp text to your own needs.

NOTE: The parameter entity, the general entity and the temporary file which contains the time stamp all have the same name but since they are in different name spaces there is no ambiguity.

14.4   Document versions and optional content

Authors are often interesting in having a single document which describes something which has options, levels, releases or variations. That is, some part of the content is to be included only if a description of the "version 2.11" is needed, or if the reader has the required reading authority. The author would like to be able to specify to the SGML production process which parts of the document are to be included.

This is easy to do if the source file is marked up using the Pre-HTML DTD for documents-in-preparation. To include or exclude text, we use SGML marked sections [8879 10.4] managed from the document type declaration internal subset [8879 11.1] which is available in Pre-HTML.

The first version of some product was "easy to use", but following urgent safety improvements, the new version is "easy and safe to use". We handle this as follows:

An alternative, more direct process includes or excludes text using the -i option of sgmlnorm

15   SGML engineering

This chapter describes the SGML techniques that are used in the formal specification of ISO-HTML and Pre-HTML. Validating systems are required to support these techniques, but conforming systems are not.

The engineering is based on a three step process:

  1. The ISO-HTML or Pre-HTML document instance always contains a DOCTYPE declaration [8879 11.1], which identifies the set of features to be used. The formal public identifiers in the DOCTYPE declarations are used as keys in an OASIS catalogue which identifies the file containing the DTD shared by ISO-HTML and Pre-HTML. In the case of ISO-HTML, the DTD is complete; there is no internal subset, and conforming systems are not required to support such a construction. In the case of Pre-HTML, a further parameter entity declaration in the internal subset completes the DTD.
  2. The DTD contains a default value IGNORE for the %Preparation; parameter entity which manages the customization of the DTD. This default value is overridden by Pre-HTML documents which specify the value INCLUDE for the %Preparation; parameter entity.
  3. The SGML parser parses the formal definition of ISO-HTML or Pre-HTML, taking into account the value of the %Preparation; parameter entity.

15.1   Step 1—DTD identification

The DOCTYPE declarations [8879 11.1] for ISO-HTML are:

<!DOCTYPE HTML PUBLIC "ISO/IEC 15445:2000//DTD HyperText Markup Language//EN">
<!DOCTYPE HTML PUBLIC "ISO/IEC 15445:2000//DTD HTML//EN">

and the declaration for Pre-HTML is

<!DOCTYPE Pre-HTML PUBLIC 
   "-//ISO-HTML User's Guide//DTD ISO-HTML Preparation//EN" 
[<!ENTITY % Preparation "INCLUDE"> 

general entity declarations...

]> 

The formal public identifiers (FPI) [8879 10.2] in the DOCTYPE declarations for ISO-HTML and Pre-HTML are used as keys to identify the corresponding entries in a catalogue which is usually placed in a file named catalog. The catalogue associates the same file name with the three FPIs:

PUBLIC  "ISO/IEC 15445:2000//DTD HyperText Markup Language//EN"   15445.dtd
PUBLIC  "ISO/IEC 15445:2000//DTD HTML//EN"                        15445.dtd
PUBLIC  "-//ISO-HTML User's Guide//DTD ISO-HTML Preparation//EN"  15445.dtd

Parsers such as SP which support use of a catalogue use the FPIs to find the name of the file containing the DTD.

NOTE: The file name is system dependent. A different name may be needed on restricted operating systems.

15.2   Step 2—Declaration of parameter entities

The Pre-HTML internal subset contains a declaration for the parameter entity %Preparation; for which the default value is defined in the ISO-HTML DTD. The value in the Pre-HTML internal subset takes precedence over the default value provided in the ISO-HTML DTD (see [8879 9.4.4.1]).

If %Preparation; has the value INCLUDE, the mechanisms which require correct nesting of the elements <H1> [W3C 7.5.5] through <H6> [W3C 7.5.5] are included. If the value is IGNORE, the mechanisms which require correct nesting of headings are omitted.

The ISO-HTML DTD defines the inverse parameter entity %NoPreparation;. The result is that the DTD specifies the following parameter entities:

15.3   Step 3—Parsing the formal definition

The SGML parser parses the files making up the formal definition of ISO-HTML, taking into account the values of the %Preparation; and %NoPreparation; parameter entities specified in step 2. The parameter entities control the inclusion or exclusion of marked sections, see [8879 10.4], thus changing the formal definitions.

A typical effect of the parameter entity is in the modification of the element <BODY> [W3C 7.5.1].

<![ %Preparation;   
  [ 
      <!ELEMENT BODY  - O  ((%block;)*, (H1,DIV1)* ) 
                              +(DEL|INS) >
  ]]>
<![ %NoPreparation; 
  [
      <!ELEMENT BODY  - O  (%block;|H1|H2|H3|H4|H5|H6)+ 
                              +(DEL|INS) >
  ]]>

When the author's DOCTYPE declaration calls for ISO-HTML, this is the same as:

<!ELEMENT BODY  - O  (%block;|H1|H2|H3|H4|H5|H6)+ 
                        +(DEL|INS) >

but when the author's DOCTYPE declaration calls for Pre-HTML, this is the same as:

<!ELEMENT BODY  - O  ((%block;)*, (H1,DIV1)* ) 
                        +(DEL|INS) >

16   Folding to upper case

16.1   Some SGML vocabulary

Before we begin the discussion of folding to upper case, we need to review some SGML vocabulary. Consider the following attribute definition list declaration [SGML 11.3]:

 <!ATTLIST ...
    LANG  NAME   #IMPLIED -- RFC1766 language value --
    ID    ID     #IMPLIED -- Document-wide unique id --
    HREF  CDATA  #IMPLIED -- Universal Resource Identifier, RFC1630 --
    NAME  CDATA  #IMPLIED -- Target anchor --
 >

Each of the four attribute definitions in the list consists of three parts. For example in the third attribute definition:

Attributes with certain "declared values"/"types" have their values automatically folded to upper case in certain conditions. What are these conditions? They are given in the SGML declaration, in the section:

   NAMING   ...
            NAMECASE GENERAL YES
                     ENTITY   NO

The declaration NAMING ... NAMECASE GENERAL YES means that those syntactic items which are names, are to be folded to upper case. For example, if an attribute has the type ID, IDREF, IDREFS, NAME, NAMES, NMTOKEN, NMTOKENS, NUTOKEN or NUTOKENS then its value is to be automatically folded [SGML 13.4.5]. Of these, only ID, IDREF, IDREFS and NAME appear in ISO-HTML.

NOTE: We are talking about the types here, not the attribute names. It is easy to confuse the attribute name NAME and the type NAME.

The declaration NAMING ... NAMECASE ENTITY NO means that those syntactic items which are the names of entities, are not to be folded to upper case. For example, if an attribute has the type ENTITY or ENTITIES then its value is not automatically folded [SGML 13.4.5]. This situation does not occur in ISO-HTML, but could occur in a Pre-HTML document.

You will see those ISO-HTML attributes whose values are folded to upper case by inspecting the ISO-HTML DTD and noting those which have a declared value/type of ID, IDREF, IDREFS or NAME.

In XHTML which is an application of XML, the SGML declaration becomes

   NAMING   ...
            NAMECASE GENERAL  NO
                     ENTITY   NO

which removes all case folding. XHTML is case sensitive.

16.2   Folding of anchors

The following table summarizes the situation for anchors.

Table 2: Which anchor attribute values are folded to upper case in ISO-HTML?
Attribute Attribute type Automatic
name (declared value) folding?
ID ID Yes
HREF CDATA No
NAME CDATA No

We suggest that you now read the clause in the W3C Recommendation for HTML 4.01 which discusses 12.2.3 Anchors with the id attribute. In summary:

Clearly there is a contradiction between the automatic folding of the ID, but not the NAME and HREF. The example suggests that names are to be equal before folding, but the equality test is applied after any folding.

The case folding behaviour of browsers and other tools is in general undefined. The very useful tool HTML tidy, which cleans up the broken HTML generated by many authoring tools, checks that when attributes ID and NAME are used together on an element, they have the same value. However this test is made without any folding. As a result, if the values contain lower case characters, and the document is later passed through the SP tool sgmlnorm, the document no longer satisfies HTML tidy, even though from a strict SGML point of view, nothing has changed.

As far as case folding is concerned, the International Standard requires that conforming documents satisfy the requirements of the W3C Recommendation for HTML 4.01 and those of SGML, but without saying how. We recommend that authors satisfy these requirements by restricting themselves to the 40 characters "ABCDEFGHIJKLMNOPQRSTUVWXYZ.-_:0123456789" for ID and NAME values, and for the corresponding HREF values.

NOTE: In the markup for the International Standard all the values of the ID, and NAME attributes, and the corresponding HREF values are written in upper case. This allows the markup to pass through the SGML parser sgmlnorm and remain acceptable to HTML tidy.


Annex A

(normative in the International Standard)

SGML declaration

The SGML declaration for ISO-HTML is provided by this file:


<!SGML  "ISO 8879:1986 (WWW)"
--   ISO/IEC 15445 Hypertext Markup Language (ISO-HTML)
     SGML Declaration

     Copyright (C) 2000 IETF, W3C (MIT, Inria, Keio), ISO/IEC
               All Rights Reserved

     Permission to copy in any form is granted for use with
     validating and conforming systems and applications as defined 
     in ISO/IEC 15445, provided this copyright notice is included
     with all copies.
--
CHARSET
         -- First 17 planes of ISO 10646. --
         BASESET  "ISO Registration Number 177//CHARSET
                   ISO/IEC 10646-1:1993 UCS-4 with
                   implementation level 3//ESC 2/5 2/15 4/6"
         DESCSET  0       9       UNUSED
                  9       2       9
                  11      2       UNUSED
                  13      1       13
                  14      18      UNUSED
                  32      95      32
                  127     1       UNUSED
                  128     32      UNUSED
                  160     55136   160
                  55296   2048    UNUSED
                  57344   1056768 57344

-- 
        ISO/IEC 10646 does not define all positions. For example, it reserves
        positions with hexadecimal values 0000D800 - 0000DFFF, used in the
        UTF-16 encoding of UCS-4, as well as the last two code values in each
        plane of UCS-4, ie. all values of the hexadecimal form xxxxFFFE and
        xxxxFFFF. Undefined code values and the corresponding numeric
        character references should not be included in an HTML document, and
        they shall be ignored if encountered when processing an HTML document.
--
CAPACITY          SGMLREF
                  TOTALCAP        150000
                  GRPCAP          150000
                  ENTCAP          150000

SCOPE    DOCUMENT
SYNTAX
         SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
                  17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127
         BASESET "ISO 646IRV:1991//CHARSET
                  International Reference Version
                  (IRV)//ESC 2/8 4/2"
         DESCSET  0 128 0

         FUNCTION
                  RE          13
                  RS          10
                  SPACE       32
                  TAB SEPCHAR  9 -- Deprecated --

         NAMING   LCNMSTRT ""
                  UCNMSTRT ""
                  LCNMCHAR ".-_:"
                  UCNMCHAR ".-_:"
                  NAMECASE GENERAL YES
                           ENTITY   NO
         DELIM    GENERAL  SGMLREF
                  HCRO     "&#38;#x" -- 38 is Ampersand --
                  SHORTREF SGMLREF

         NAMES    SGMLREF
         QUANTITY SGMLREF
                  ATTCNT      60
                  ATTSPLEN 65536 -- These are the largest values --
                  LITLEN   65536 -- permitted in the declaration. --
                  NAMELEN  65536 -- Avoid fixed limits in actual --
                  PILEN    65536 -- implementations of user agents. --
                  TAGLVL     100
                  TAGLEN   65536
                  GRPGTCNT   150
                  GRPCNT      64
FEATURES
         MINIMIZE
                  DATATAG     NO
                  OMITTAG    YES
                  RANK        NO
                  SHORTTAG   YES
         LINK
                  SIMPLE      NO
                  IMPLICIT    NO
                  EXPLICIT    NO
         OTHER
                  CONCUR      NO
                  SUBDOC      NO
                  FORMAL     YES
APPINFO  NONE
>

Annex B

(normative in the International Standard)

Entities, element types and attributes

Part 1 of the DTD for ISO-HTML contains parameter entity definitions used in Parts 2 and 3, and the short reference mapping [8879 11.5] which converts the deprecated horizontal tab into a space. Part 2 contains the elements and their content models. Part 3 provides the attribute definitions and additional normative refinements that ISO-HTML places on the elements.

The document type definition (DTD) for ISO-HTML is provided by this file.

After the International Standard was published, it was discovered that there was a discrepancy between the W3C Recommendations and the ISO/IEC specification in the formal public identifier [8879 10.2] used to identify the set of entities defined by the W3C for the characters of ISO 8859-1 8-bit single-byte coded graphic character sets — Latin alphabet No. 1 commonly known as ``ISO latin 1''.

The formal public identifier used in the ISO/IEC DTD:

-//W3C//ENTITIES Full Latin 1//EN//HTML

contains a public text description [8879 10.2.2.2] ``Full Latin 1''. However the W3C recommendations had used ``Latin 1'' and ``Latin1''. Had the public text description identified an ISO publication, then it would have been created in accordance with the rule given by [8879 10.2.2.2]:

It consists of the last element of the publication title, without the part number designation (if any).

If this rule had been applicable, then the public text descriptor would have been ``Latin alphabet No. 1'', giving the formal public identifier

-//W3C//ENTITIES Latin alphabet No. 1//EN//HTML

However the ISO rule is not applicable to W3C publications.

The solution chosen is to consider all four public text descriptions to be valid and equivalent, which means that the four formal public identifiers:

-//W3C//ENTITIES Latin alphabet No. 1//EN//HTML
-//W3C//ENTITIES Full Latin 1//EN//HTML
-//W3C//ENTITIES Latin 1//EN//HTML
-//W3C//ENTITIES Latin1//EN//HTML

specify the same entity set.

The DTD defined in this clause references the entity set specified by the W3C to define the characters of ISO 8859-1 8-bit single-byte coded graphic character sets — Latin alphabet No. 1. The reference uses a formal public identifier ``-//W3C//ENTITIES Full Latin 1//EN//HTML'' which contains the public text description ``Full Latin 1''. The public text descriptions ``Latin alphabet No. 1'', ``Latin 1'' and ``Latin1'' are permitted alternatives which describe the same entity set.

A similar situation arises for the reference by the DTD defined in this clause to the entity set specified by the W3C for mathematical, Greek and symbolic characters. The reference uses a formal public identifier ``-//W3C//ENTITIES Symbolic//EN//HTML'' which contains the public text description ``Symbolic''. However the W3C in HTML 4.01 subclause A.2.1 Errors that were corrected changed the public text description to ``Symbols''.

We recommend that system administrators use the same technique as used for DTD identification to identify the entity sets. The formal public identifiers (FPI) [8879 10.2] of the entity sets are used as keys to identify the corresponding entries in a catalogue which is usually placed in a file named catalog. The catalogue associates the same file name with the equivalent FPIs:

PUBLIC "-//W3C//ENTITIES Latin alphabet No. 1//EN//HTML"
       ISOlatin1.entities
PUBLIC "-//W3C//ENTITIES Full Latin 1//EN//HTML"
       ISOlatin1.entities
PUBLIC "-//W3C//ENTITIES Latin 1//EN//HTML"
       ISOlatin1.entities
PUBLIC "-//W3C//ENTITIES Latin1//EN//HTML"
       ISOlatin1.entities

PUBLIC "-//W3C//ENTITIES Symbolic//EN//HTML"
       Symbols.entities
PUBLIC "-//W3C//ENTITIES Symbols//EN//HTML"
       Symbols.entities

PUBLIC "-//W3C//ENTITIES Special//EN//HTML"
       Special.entities

NOTE: The file name is system dependent. A different name may be needed on restricted operating systems.

The DTD defined in this clause references the entity set specified by the W3C to define mathematical, Greek and symbolic characters. The reference uses a formal public identifier ``-//W3C//ENTITIES Symbolic//EN//HTML'' which contains the public text description ``Symbolic''. The public text description ``Symbols'' is a permitted alternative which describes the same entity set.

NOTE: The User's Guide to this International Standard describes a way in which system administrators may allow simultaneous use of these alternatives.


<!-- 15445.dtd
     ISO/IEC 15445:2000  Hypertext Markup Language (HTML) 
     Document Type Definition.

     Copyright (C) 2000-2003, IETF, W3C (MIT, Inria, Keio), ISO/IEC.
               All Rights Reserved.

     Permission to copy in any form is granted for use with
     validating and conforming systems and applications as defined
     in ISO/IEC 15445:2000, provided this copyright notice is included
     with all copies.  

     The DTD is typically invoked by one of the following declarations:

     <!DOCTYPE HTML PUBLIC "ISO/IEC 15445:2000//DTD HyperText Markup Language//EN">
     <!DOCTYPE HTML PUBLIC "ISO/IEC 15445:2000//DTD HTML//EN">

     In order to use the HTML document type definition as a base architecture for
     other SGML applications, one of the following architectural support
     declarations should be used:

     <?IS10744
       arch name="html"
       public-id="ISO/IEC 15445:2000//DTD HyperText Markup Language//EN"
       dtd-system-id="ftp://ftp.cs.tcd.ie/isohtml/15445.dtd"
       renamer-att="HTMLnames"
       doc-elem-form="HTML"
     >

     <!ENTITY % HtmlDtd PUBLIC "ISO/IEC 15445:2000//DTD HTML//EN">
     <?IS10744 ArcBase HTML>
     <!NOTATION HTML PUBLIC
                     "-//ISO-HTML User's Guide//NOTATION HTML Architecture//EN">
     <!ATTLIST #NOTATION HTML
               ArcDTD    CDATA #FIXED "%HtmlDtd"
               ArcDocF   NAME  #FIXED "HTML"
               ArcNamrA  NAME  #IMPLIED
     >
-->
                <!-- Part 1 - Entity set -->

<!-- The Preparation parameter entity shall be set to IGNORE for HTML, 
     and to INCLUDE for a document to be submitted to the preparation
     process -->
<!ENTITY % Preparation "IGNORE" >

<!-- This definition generates the inverse entity 
     NoPreparation which is internal to the DTD -->
<![ %Preparation; [
<!ENTITY % NoPreparation "IGNORE"    -- Inverse of Preparation = INCLUDE -->
                   ]]>
<!ENTITY % NoPreparation "INCLUDE"   -- Inverse of Preparation = IGNORE -->
<!-- End of definition -->


        <!-- Tokens defined by other standards -->

<!ENTITY % Content-Type "CDATA" -- MIME content type, RFC1521 -->
<!ENTITY % HTTP-Method "(get | post)" -- as per HTTP/1.1 RFC2068  -->
<!ENTITY % URI "CDATA" -- Universal Resource Identifier, RFC1630 -->

        <!-- Element tokens -->

<!ENTITY % special "A | BDO | BR | IMG | OBJECT | 
                    MAP | Q | SPAN" >

<!-- Logical character styles -->
<!ENTITY % logical.styles "ABBR | ACRONYM | CITE | CODE | DFN | EM |
                           KBD | SAMP | STRONG | VAR" >

<!-- Physical character styles -->
<!ENTITY % physical.styles "B | I | SUB | SUP | TT" >

        <!-- Model groups -->

<!-- Block-like elements eg. paragraphs and lists -->
<!ENTITY % block "BLOCKQUOTE | DIV | DL | FIELDSET | FORM |
                  HR | OL | P | PRE | TABLE | UL" >

<!-- Form fields - input elements that should appear only within forms -->
<!ENTITY % form.fields "BUTTON | INPUT | LABEL | SELECT | TEXTAREA" >

<!-- Character level elements and text strings -->
<!ENTITY % text "#PCDATA | %physical.styles; | %logical.styles; | %special;
                         | %form.fields;" >

<!-- Elements that may appear in a section or table -->
<!ENTITY % section.content "(%block; | %text; | ADDRESS)+" >
<!ENTITY % table.content   "(%block; | %text;)*" >

        <!-- Generic attributes -->

<!ENTITY % core
   "CLASS      CDATA      #IMPLIED -- Comma separated list of class values --
    --The name space of the ID attribute is shared with the name space of
      the NAME attribute.  Both ID and NAME attributes may be provided for
      the <A> and <MAP> elements. When both ID and NAME values are provided
      for an element, the values shall be identical.  It is an error for an
      ID or NAME value to be associated with more than one element in a
      document.

      It is recommended that authors of documents specify both the ID
      attribute and the NAME attribute for the <A> and <MAP> elements.
    --
    ID         ID         #IMPLIED -- Document-wide unique id --
    TITLE      CDATA      #IMPLIED -- Advisory title or amplification --" >

        <!-- Internationalization attributes -->

<!ENTITY % i18n
   "DIR        (ltr|rtl)  #IMPLIED -- Direction for weak/neutral text --
    LANG       NAME       #IMPLIED -- RFC1766 language value --" >

        <!-- Presentation styles -->

<!ENTITY % shape     "(circle | default | poly | rect)" >
<!ENTITY % InputType "(checkbox | file | hidden | password | 
                       radio | reset | submit | text)" >

<!-- SHORTREF mapping for the tab character -->
<!-- Use of the tab character is deprecated.  However, to facilitate
     the preparation of conforming documents by authors who use it,
     the tab character is tolerated and is mapped into a single space. -->
<!ENTITY   nontab  " " >
<!SHORTREF tabmap  "	" nontab >
<!USEMAP   tabmap  HTML >

        <!-- Specify character entity sets defined by W3C -->

<!ENTITY % HTMLlat1    PUBLIC "-//W3C//ENTITIES Full Latin 1//EN//HTML" >
<!ENTITY % HTMLsymbol  PUBLIC "-//W3C//ENTITIES Symbolic//EN//HTML" >
<!ENTITY % HTMLspecial PUBLIC "-//W3C//ENTITIES Special//EN//HTML" >
<!-- Reference character entities -->
%HTMLlat1;%HTMLsymbol;%HTMLspecial;

                <!-- Part 2 - Document structure -->

<!-- Further normative requirements on the elements defined in this part
     of the DTD are provided in Part 3.-->
<!--      ELEMENTS    MIN  CONTENT  (EXCEPTIONS) -->
<!ELEMENT HTML        - -  (HEAD, BODY) >
<!ELEMENT HEAD        - O  (TITLE) +(LINK | META | STYLE) >
<!ELEMENT TITLE       - -  (#PCDATA) -(LINK | META | STYLE) >
<!ELEMENT LINK        - O  EMPTY >
<!ELEMENT META        - O  EMPTY >
<!ELEMENT STYLE       - -  CDATA >

<!-- The following marked section is informative only -->
<![ %Preparation; [
<!ELEMENT Pre-HTML    - -  (HEAD, BODY) >
<!ATTLIST Pre-HTML %i18n;  -- Internationalization DIR and LANG -->
<!ELEMENT BODY        - O  ((%block;)*,(H1,DIV1)* ) +(DEL|INS) >
<!ELEMENT H1          - -  (%text;)+ >
<!ELEMENT DIV1        O O  ((%block;)*, (H2,DIV2)* ) >
<!ELEMENT H2          - -  (%text;)+ >
<!ELEMENT DIV2        O O  ((%block;)*, (H3,DIV3)* ) >
<!ELEMENT H3          - -  (%text;)+ >
<!ELEMENT DIV3        O O  ((%block;)*, (H4,DIV4)* ) >
<!ELEMENT H4          - -  (%text;)+ >
<!ELEMENT DIV4        O O  ((%block;)*, (H5,DIV5)* ) >
<!ELEMENT H5          - -  (%text;)+ >
<!ELEMENT DIV5        O O  ((%block;)*, (H6,DIV6)* ) >
<!ELEMENT H6          - -  (%text;)+ >
<!ELEMENT DIV6        O O  ((%block;)*) >
                 ]]>
<!-- The following marked section is normative -->
<![ %NoPreparation; [
<!ELEMENT BODY        - O  (%block;|H1|H2|H3|H4|H5|H6)+ +(DEL|INS) >
<!ELEMENT (H1|H2|H3|H4|H5|H6) - - (%text;)+ >
                   ]]>
<!ELEMENT DIV         - -  %section.content; >
<!ELEMENT ADDRESS     - -  (%text;)+ -(IMG|OBJECT|MAP) >
<!ELEMENT P           - O  (%text;)+ >
<!ELEMENT (OL|UL)     - -  (LI)+ >
<!ELEMENT LI          - O  (%text; | %block;)+ >
<!ELEMENT DL          - -  (DT|DD)+ >
<!ELEMENT DT          - O  (%text;)+ >
<!ELEMENT DD          - O  %section.content; -(ADDRESS) >
<!ELEMENT PRE         - -  (%text;)+ -(IMG|MAP|OBJECT|SUB|SUP) >
<!ELEMENT BLOCKQUOTE  - -  (%block;)+ >
<!ELEMENT Q           - -  (%text;)+ >
<!ELEMENT FORM        - -  (%block;)+ -(FORM) >

<!-- #PCDATA required to absorb leading white space -->
<!ELEMENT FIELDSET    - -  (#PCDATA,LEGEND,(%block; | %text; | ADDRESS)+)
                            -(FIELDSET) >
<!ELEMENT INPUT       - O  EMPTY >
<!ELEMENT BUTTON      - -  (%text;)+ -(A|FIELDSET|FORM|%form.fields;) >
<!ELEMENT LABEL       - -  (%text;)+ -(LABEL) >
<!ELEMENT LEGEND      - -  (#PCDATA) >
<!ELEMENT SELECT      - -  (OPTGROUP|OPTION)+ >
<!ELEMENT OPTGROUP    - -  (OPTION)+ >
<!ELEMENT OPTION      - O  (#PCDATA) >
<!ELEMENT TEXTAREA    - -  (#PCDATA) >
<!ELEMENT HR          - O  EMPTY >
<!ELEMENT TABLE       - -  (CAPTION?, (COL*|COLGROUP*), 
                              THEAD?, TFOOT?, TBODY+) >
<!ELEMENT CAPTION     - -  (%text;)+ >
<!ELEMENT (THEAD,TFOOT,TBODY) - O  (TR)+ >
<!ELEMENT COL         - O  EMPTY >
<!ELEMENT COLGROUP    - O  (COL)* >
<!ELEMENT TR          - O  (TH|TD)+ >  
<!ELEMENT (TH|TD)     - O  %table.content; >
<!ELEMENT (%logical.styles;|%physical.styles;)
                      - -  (%text;)+ >
<!ELEMENT A           - -  (%text;)* -(A) >
<!ELEMENT IMG         - O  EMPTY >
<!ELEMENT OBJECT      - -  (PARAM | %section.content;)* >
<!ELEMENT PARAM       - O  EMPTY >
<!ELEMENT BR          - O  EMPTY >
<!-- Authors should use the block-level content of the <MAP> element when
     creating accessible documents.  Each region should be specified using
     an <A> element to define its associated link and shape.  User agents
     should render the block-level content of a <MAP> element. -->
<!ELEMENT MAP         - -  ((%block;)|AREA)+ >
<!ELEMENT AREA        - O  EMPTY >
<!ELEMENT SPAN        - -  (%text;)+ >
<!ELEMENT (DEL|INS)   - -  (%text;)+ >
<!ELEMENT BDO         - -  (%text;)+ >

                <!-- Part 3 - Attribute definition lists -->

<!--      ELEMENTS 
    NAME       VALUE       DEFAULT --> 
<!ATTLIST A
  --Case shall not be taken into account when determining a match
    between an ID value and a NAME value, between an ID value and 
    an HREF value or between a NAME value and an HREF value.  
    Comparisons should be made with the values folded to upper case.

    The NAME attribute value specification shall be processed as if the
    declared value were NAME.

    It is recommended that authors of HTML documents specify both ID
    and NAME attributes, and use values restricted to the 40 characters
    "ABCDEFGHIJKLMNOPQRSTUVWXYZ.-_:0123456789".  When both attributes
    are specified, they shall have identical values.

    COORDS shall not be specified if SHAPE has the value `default'.
  --
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    ACCESSKEY  CDATA      #IMPLIED -- Accessibility key character --
    CHARSET    CDATA      #IMPLIED -- Character encoding as per RFC2045 --
    COORDS     CDATA      #IMPLIED -- Comma separated list of values --
    HREF       %URI;      #IMPLIED -- Source anchor is URI of target --
    HREFLANG   NAME       #IMPLIED -- Language code of resource --
    NAME       CDATA      #IMPLIED -- Target anchor --
    REL        CDATA      #IMPLIED -- Forward link types --
    REV        CDATA      #IMPLIED -- Reverse link types --
    SHAPE      %shape;        rect -- Control interpretation of coords --
    TABINDEX   NUMBER     #IMPLIED -- Position in tabbing order --
    TYPE       CDATA      #IMPLIED -- Advisory content type -->

<!ATTLIST ADDRESS
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST AREA     
  --One of HREF or NOHREF shall be specified.  

    COORDS shall not be specified if SHAPE has the value `default'.

    Authors are very strongly recommended to provide meaningful ALT 
    attributes to support interoperability with speech-based or text-only 
    agents.  The language and direction of the text provided by the ALT 
    attribute are defined by the containing elements.
  --
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    ACCESSKEY  CDATA      #IMPLIED -- Accessibility key character --
    ALT        CDATA     #REQUIRED -- Description for text-only UAs --
    COORDS     CDATA      #IMPLIED -- Comma separated list of values --
    HREF       %URI;      #IMPLIED -- This region acts as hypertext link --
    NOHREF     (nohref)   #IMPLIED -- This region has no action --
    SHAPE      %shape;        rect -- Control interpretation of coords --
    TABINDEX   NUMBER     #IMPLIED -- Position in tabbing order -->

<!ATTLIST BDO
    %core;                         -- Element CLASS, ID and TITLE --
    DIR        (ltr|rtl) #REQUIRED -- Direction of writing --
    LANG       NAME       #IMPLIED -- RFC1766 language value -->

<!ATTLIST BLOCKQUOTE
  --The contents of the <BLOCKQUOTE> element shall not be surrounded with
    quotation marks.  These may be added by the user agent through the use
    of a style sheet.
  --
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    CITE       %URI;      #IMPLIED -- URI for source document or message -->

<!ATTLIST BODY
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST BR
    %core;                         -- Element CLASS, ID and TITLE -->

<!ATTLIST BUTTON
  --The <BUTTON> element shall not contain the <A>, <BUTTON>, <FIELDSET>,
    <FORM>, <INPUT>, <LABEL>, <SELECT> or <TEXTAREA> elements.

    If the <BUTTON> element contains an <IMG> element, the <IMG> shall not
    have an ISMAP or USEMAP attribute.
    
    The TYPE attribute shall be provided, and when the TYPE is
    specified as `submit', the NAME and VALUE attributes shall be provided.

    The NAME attribute is required if the TYPE attribute has the value 
    `submit'.

    If the TYPE attribute has value `reset', and the <BUTTON> is contained 
    in a <FIELDSET>, the reset action is limited to the contents of the 
    <FIELDSET>.

    The VALUE attribute is required if the TYPE attribute has the value
    `submit' and specifies the value to be returned if the button
    is selected.

    The <BUTTON> element should be used only in the content of a <FORM>
    element.
  --
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    ACCESSKEY  CDATA      #IMPLIED -- Accessibility key character --
    DISABLED   (disabled) #IMPLIED -- Control unavailable in this context --
    NAME       CDATA      #IMPLIED -- Required for all except submit, reset -- 
    TABINDEX   NUMBER     #IMPLIED -- Position in tabbing order --
    TYPE  (submit|reset)    submit -- For use as form submit/reset button --
    VALUE      CDATA      #IMPLIED -- Passed to server when submitted -->

<!ATTLIST CAPTION
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST COL
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    SPAN       NUMBER            1 -- Number of cols spanned -->

<!ATTLIST COLGROUP
  --The SPAN attribute should only be used if the <COLGROUP> element
    has no content.
  --
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    SPAN       NUMBER            1 -- Number of cols spanned by group -->

<!ATTLIST DD
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST DEL
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    CITE       %URI;      #IMPLIED -- Information on reason for change --
    DATETIME   CDATA      #IMPLIED -- When changed, subset of ISO/IEC 8601 -->

<!ATTLIST DIV
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST DL
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST DT
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST FIELDSET
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST FORM
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    ACCEPT     CDATA      #IMPLIED -- List of MIME types for file upload --
    ACCEPT-CHARSET CDATA  #IMPLIED -- List of supported char sets --
    ACTION     %URI;     #REQUIRED -- Server-side form handler --
    ENCTYPE    %Content-Type; "application/x-www-form-urlencoded"
    METHOD     %HTTP-Method;   get -- See HTTP specification -->

<!ATTLIST HEAD
    %i18n;                         -- Internationalization DIR and LANG --
    PROFILE    %URI;      #IMPLIED -- Named dictionary of meta info -->

<!ATTLIST HR
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST HTML 
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST (H1 | H2 | H3 | H4 | H5 | H6)
  --The <H1> element shall not be followed by an <H3>, <H4>, <H5> or
    <H6> element without an intervening <H2> element.  

    The <H2> element shall not be followed by an <H4>, <H5> or <H6>
    element without an intervening <H3> element.

    The <H3> element shall not be followed by an <H5> or <H6> element
    without an intervening <H4> element.

    The <H4> element shall not be followed by an <H6> element without an 
    intervening <H5> element.

    An <H2> element shall be preceded by an <H1> element.

    An <H3> element shall be preceded by an <H2> element.

    An <H4> element shall be preceded by an <H3> element.

    An <H5> element shall be preceded by an <H4> element.

    An <H6> element shall be preceded by an <H5> element.
  --
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST IMG
  --If the <IMG> element is contained in a <BUTTON> element, the <IMG>
    shall not have an ISMAP or USEMAP attribute.

    If the ISMAP attribute is present in an <IMG> element, that <IMG>
    element shall be contained in an <A> element with an HREF attribute
    present.

    At most one of the attributes ISMAP and USEMAP may be provided.
  --
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    ALT        CDATA     #REQUIRED -- Text for text-only user agent --
    ISMAP      (ismap)    #IMPLIED -- Use server image map --
    LONGDESC   %URI;      #IMPLIED -- Extended description for text UA --
    SRC        %URI;     #REQUIRED -- URI of image to embed --
    USEMAP     %URI;      #IMPLIED -- Use client-side image map -->

<!ATTLIST INPUT
  --If the attribute TYPE has the value `checkbox', values shall be 
    provided for the NAME and VALUE attributes.
  
    If the attribute TYPE has the value `file', a value shall be 
    provided for the NAME attribute; HTML interpreting agents should 
    request user confirmation of any default file names that might 
    be suggested, and fields specifying files shall not be hidden.

    If the attribute TYPE has the value `hidden', values shall be 
    provided for the NAME and VALUE attributes.
  
    If the attribute TYPE has the value `password', a value shall be 
    provided for the NAME attribute.

    If the attribute TYPE has the value `radio', values shall be 
    provided for the the NAME and VALUE attributes. At all times, 
    one and only one of the radio buttons shall be checked.  
    Initially, if none of the <INPUT> elements in a set of radio 
    buttons specifies CHECKED, then the user agent shall mark the 
    first radio button of the set as checked.

    If the attribute TYPE has the value `submit', and a value is 
    specified for the VALUE attribute, then a value shall be provided 
    for the NAME attribute.

    If the attribute TYPE has the value `text', values shall be 
    provided for the NAME and VALUE attributes.
  
    The MAXLENGTH and TABINDEX values shall be considered as integers 
    with any leading zeroes ignored.

    The <INPUT> element should be used only in the content of a <FORM>
    element.
  --
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    ACCEPT     CDATA      #IMPLIED -- List of MIME types for file upload --
    ACCESSKEY  CDATA      #IMPLIED -- Accessibility key character --
    CHECKED    (checked)  #IMPLIED -- For radio buttons, checkboxes --
    DISABLED   (disabled) #IMPLIED -- Control unavailable in this context --
    MAXLENGTH  NUMBER     #IMPLIED -- Max chars for text fields --
    NAME       CDATA      #IMPLIED -- Required for all except submit, reset --
    READONLY   (READONLY) #IMPLIED -- For text --
    SIZE       CDATA      #IMPLIED -- Specific to each type of field --
    TABINDEX   NUMBER     #IMPLIED -- Position in tabbing order --
    TYPE       %InputType;    text -- Widget --
    VALUE      CDATA      #IMPLIED -- Required for radio, checkboxes -->

<!ATTLIST INS
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    CITE       %URI;      #IMPLIED -- Information on reason for change --
    DATETIME   CDATA      #IMPLIED -- When changed, subset of ISO/IEC 8601 -->

<!ATTLIST LABEL
  --The <LABEL> element shall refer to a form field in the content of the 
    <FORM> element which contains the <LABEL>.

    The <LABEL> element should be used only in the content of a <FORM>
    element.
  --
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    ACCESSKEY  CDATA      #IMPLIED -- Accessibility key character --
    FOR        IDREF      #IMPLIED -- Points to associated field -->

<!ATTLIST LEGEND
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    ACCESSKEY  CDATA      #IMPLIED -- Accessibility key character -->

<!ATTLIST LI
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST LINK
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    CHARSET    CDATA      #IMPLIED -- Character encoding as per RFC2045 --
    HREF       %URI;      #IMPLIED -- URI for link resource --
    HREFLANG   NAME       #IMPLIED -- Language code of resource --
    MEDIA      CDATA      #IMPLIED -- Destination media of referenced doc --
    REL        CDATA      #IMPLIED -- Forward link types --
    REV        CDATA      #IMPLIED -- Reverse link types --
    TYPE       CDATA      #IMPLIED -- Advisory Internet content type -->

<!ATTLIST MAP
  --The value of the NAME attribute is case sensitive, and the attribute 
    value specification shall be processed as if the declared value were 
    NAME.

    It is recommended that authors of HTML documents specify both ID
    and NAME attributes, and use values restricted to the 40 characters
    "ABCDEFGHIJKLMNOPQRSTUVWXYZ.-_:0123456789".  When both attributes
    are specified, they shall have identical values.
  --
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    NAME       CDATA     #REQUIRED -- Referenced by USEMAP in <IMG> -->

<!ATTLIST META     
    %i18n;                         -- Internationalization DIR and LANG --
    CONTENT    CDATA     #REQUIRED -- Associated information --
    HTTP-EQUIV NAME       #IMPLIED -- HTTP response header name --
    NAME       NAME       #IMPLIED -- Meta-information name --
    SCHEME     CDATA      #IMPLIED -- Nature of content -->

<!ATTLIST OBJECT   
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    CLASSID    %URI;      #IMPLIED -- Identifies implementation --
    CODEBASE   %URI;      #IMPLIED -- Needed by some systems --
    CODETYPE   CDATA      #IMPLIED -- Internet content type for code --
    DATA       %URI;      #IMPLIED -- Reference to objects data --
    DECLARE    (declare)  #IMPLIED -- Flag: declare but dont instantiate --
    NAME       CDATA      #IMPLIED -- Submit as part of form --
    STANDBY    CDATA      #IMPLIED -- Show this msg while loading --
    TABINDEX   NUMBER     #IMPLIED -- Position in tabbing order --
    TYPE       CDATA      #IMPLIED -- Internet content type for data --
    USEMAP     %URI;      #IMPLIED -- Reference to image map -->

<!ATTLIST OL
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST OPTGROUP
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    DISABLED   (disabled) #IMPLIED -- Control unavailable in this context --
    LABEL      CDATA     #REQUIRED -- For use in hierarchical menus -->

<!ATTLIST OPTION
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    DISABLED   (disabled) #IMPLIED -- Control unavailable in this context --
    LABEL      CDATA      #IMPLIED -- For use in hierarchical menus --
    SELECTED   (selected) #IMPLIED -- Pre-selected option --
    VALUE      CDATA      #IMPLIED -- Defaults to content -->

<!ATTLIST P
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST PARAM
    ID         ID         #IMPLIED -- Document-wide unique id --
    NAME       CDATA     #REQUIRED -- Name of parameter --
    TYPE       CDATA      #IMPLIED -- Internet Media Type --
    VALUE      CDATA      #IMPLIED -- Value of parameter --
    VALUETYPE  (data|ref|object)
                              data -- Interpret value as -->

<!ATTLIST PRE
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST Q
  --The textual contents of the <Q> element shall not be surrounded with
    quotation marks.  These may be added by the user agent through the
    use of a style sheet.
  --
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    CITE       %URI;      #IMPLIED -- URI for source document or message -->

<!ATTLIST SELECT
  --The <SELECT> element should be used only in the content of a <FORM>
    element.
  --
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    DISABLED   (disabled) #IMPLIED -- Control unavailable in this context --
    MULTIPLE   (multiple) #IMPLIED -- Default is single selection --
    NAME       CDATA     #REQUIRED -- Field name --
    SIZE       NUMBER     #IMPLIED -- Rows visible --
    TABINDEX   NUMBER     #IMPLIED -- Position in tabbing order -->

<!ATTLIST SPAN
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST STYLE
  --The <STYLE> element contains style sheet information which shall be
    passed to the user agent's style manager.  Any style sheet language
    may be used.  It is a user agent error to render the style sheet 
    information as if it were part of a document's text.
  --
    %i18n;                         -- Internationalization DIR and LANG --
    MEDIA      CDATA      #IMPLIED -- Designed for use with these media --
    TITLE      CDATA      #IMPLIED -- Advisory title --
    TYPE       CDATA     #REQUIRED -- Internet content type for style lang. -->

<!ATTLIST TABLE
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    SUMMARY    CDATA     #REQUIRED -- Purpose/structure for speech output -->

<!ATTLIST TBODY
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST TD
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    ABBR       CDATA      #IMPLIED -- Abbreviation for header cell --
    AXIS       CDATA      #IMPLIED -- Names groups of related headers --
    COLSPAN    NUMBER            1 -- Number of columns spanned by cell --
    HEADERS    IDREFS     #IMPLIED -- List of ID's for header cells --
    ROWSPAN    NUMBER            1 -- Number of rows spanned by cell --
    SCOPE      (col|colgroup|row|rowgroup)
                          #IMPLIED -- Scope covered by header cells -->

<!ATTLIST TEXTAREA
  --The <TEXTAREA> element should be used only in the content of a <FORM>
    element.
  --
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    ACCESSKEY  CDATA      #IMPLIED -- Accessibility key character --
    COLS       NUMBER    #REQUIRED -- Number required in av char widths --
    DISABLED   (disabled) #IMPLIED -- Control unavailable in this context --
    NAME       CDATA     #REQUIRED -- Name of form field --
    READONLY   (readonly) #IMPLIED -- For text --
    ROWS       NUMBER    #REQUIRED -- Number of rows required --
    TABINDEX   NUMBER     #IMPLIED -- Position in tabbing order -->

<!ATTLIST TFOOT
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST TH
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG --
    ABBR       CDATA      #IMPLIED -- Abbreviation for header cell --
    AXIS       CDATA      #IMPLIED -- Names groups of related headers --
    COLSPAN    NUMBER            1 -- Number of columns spanned by cell --
    HEADERS    IDREFS     #IMPLIED -- List of ID's for header cells --
    ROWSPAN    NUMBER            1 -- Number of rows spanned by cell --
    SCOPE      (col|colgroup|row|rowgroup)
                          #IMPLIED -- Scope covered by header cells -->

<!ATTLIST THEAD
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST TITLE
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST TR
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST UL
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

        <!-- Attribute group definition lists -->

<!ATTLIST (%physical.styles;)
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!ATTLIST (%logical.styles;)
    %core;                         -- Element CLASS, ID and TITLE --
    %i18n;                         -- Internationalization DIR and LANG -->

<!-- End of file -->

Annex C

Maintenance of the International Standard — Defect report index

Every effort has been made to provide a language specification that is correct and rigorously specified. However since change is inevitable, facilities have been provided to manage the maintenance of this text.

Error notifications should be made via your national body or via a liaison organization such as the World Wide Web Consortium.

The defects in reports 1 through 6 have been corrected by Technical Corrigendum 1. The remaining defect reports are working documents for use by JTC1/SC34 and the editors of the International Standard. They should be considered as Work in Progress and should not be used for reference.

Defects are corrected following the procedure for "rapid promulgation" [JTC1 14.4.2.3] specified in clauses 14.4.3 through 14.4.10 of the JTC1 directives

NOTE: We present defects in a style based on form G17 in the JTC1 Directives.

C.1   Defect report 1   Latin alphabet No. 1 entity set public text description

Part 1 - To be completed by the WG secretariat

Defect report number: DR 15445/001

WG Secretariat: Project editors

Date circulated by WG Secretariat: 2000-12-10

Deadline for response from editor: 2000-12-10

Part 2 - To be completed by the submitter

Submitter: W3C

For review by: JTC1/SC34/WG3 members

Defect report concerning: ISO/IEC 15445:2000 HyperText Markup Language (HTML)

Qualifier: Omission

References:

  1. E-mail from Gerald Oskoboiny, 2000-10-18
  2. https://rogerprice.org/15445/15445.html#dtd
  3. http://www.w3.org/TR/html401/sgml/entities.html#h-24.2
  4. http://www.w3.org/TR/html401/appendix/changes.html#h-A.2.1

Nature of defect: The formal public identifier -//W3C//ENTITIES Full Latin 1//EN//HTML used by ISO/IEC 15445:2000 for the ISO Latin alphabet No. 1 entities contains the public text description `Full Latin 1' and not `Latin 1' or `Latin1' as used by the W3C Recommendations for HTML 4.0 and 4.01.

Solution proposed by submitter: Allow a range of formal public identifiers in the catalog file.

Part 3 - Editor's response

This is a technical defect in the International Standard. We recommend accepting the submitter's proposal. See the new text introduced into the International Standard (highlighted in yellow), a description of proposed solution and the required Technical Corrigendum.

C.2   Defect report 2   Symbols entity set public text description

Part 1 - To be completed by the WG secretariat

Defect report number: DR 15445/002

WG Secretariat: Project editors

Date circulated by WG Secretariat: 2000-12-10

Deadline for response from editor: 2000-12-10

Part 2 - To be completed by the submitter

Submitter: W3C

For review by: JTC1/SC34/WG3 members

Defect report concerning: ISO/IEC 15445:2000 HyperText Markup Language (HTML)

Qualifier: Omission

References:

  1. E-mail from Gerald Oskoboiny, 2000-10-18
  2. https://rogerprice.org/15445/15445.html#dtd
  3. http://www.w3.org/TR/html401/sgml/entities.html#h-24.2
  4. http://www.w3.org/TR/html401/appendix/changes.html#h-A.2.1

Nature of defect: The formal public identifier -//W3C//ENTITIES Symbolic//EN//HTML used by ISO/IEC 15445:2000 for the symbol entities contains the public text description `Symbolic' and not `Symbols' as amended in the W3C Recommendation for HTML 4.01.

Solution proposed by submitter: Allow a range of formal public identifiers in the catalog file.

Part 3 - Editor's response

This is a technical defect in the International Standard. We recommend accepting the submitter's proposal. See the new text introduced into the International Standard (highlighted in yellow), a description of proposed solution and the required Technical Corrigendum.

C.3   Defect report 3   Simultaneous ID and NAME attributes

Part 1 - To be completed by the WG secretariat

Defect report number: DR 15445/003

WG Secretariat: Project editors

Date circulated by WG Secretariat: 2000-12-10

Deadline for response from editor: 2000-12-10

Part 2 - To be completed by the submitter

Submitter: Project editors

For review by: JTC1/SC34/WG3 members

Defect report concerning: ISO/IEC 15445:2000 HyperText Markup Language (HTML)

Qualifier: Omission

References:

  1. http://www.w3.org/TR/html401/appendix/changes.html#h-A.1 `Changes between 24 April 1998 HTML 4.0 and 24 December 1999 HTML 4.01 versions'

Nature of defect: Subclause 12.2.3 Anchors with the id attribute of the W3C Recommendation for HTML 4.01 now specifies that it is legal for attributes ID and NAME to appear in the same start tag when they are both defined for an element, and that they must have identical values. ISO/IEC 15445:2000 first edition recommended use of the ID attribute but required that the ID and NAME values be distinct [Annex B, part 1, parameter entity core]. Note that the W3C Recommendation for HTML 4.01 permits use of both attributes to specify an element's unique identifier for the elements: <A> [W3C 12.2], <APPLET> [W3C 13.4], <FORM> [W3C 17.3], <FRAME> [W3C 16.2.2], <IFRAME> [W3C 16.5], <IMG> [W3C 13.2] and <MAP> [W3C 13.6.1], but of these, <APPLET> [W3C 13.4], <FRAME> [W3C 16.2.2] and <IFRAME> [W3C 16.5] are excluded from the International Standard, and <FORM> [W3C 17.3] and <IMG> [W3C 13.2] have no NAME attribute.

Solution proposed by submitter: Change the corresponding normative text in the ISO-HTML DTD to allow attributes NAME and ID to appear in the same start tag when they are both defined for an element, and require that they have identical values.

Part 3 - Editor's response

This is a technical defect in the International Standard. We recommend accepting the submitter's proposal. See the new text introduced into the International Standard, two short descriptions of the proposed solutions, here and here, and the required Technical Corrigendum.

C.4   Defect report 4   Accessibility of client side maps

Part 1 - To be completed by the WG secretariat

Defect report number: DR 15445/004

WG Secretariat: Project editors

Date circulated by WG Secretariat: 2000-12-10

Deadline for response from editor: 2001-01-31

Part 2 - To be completed by the submitter

Submitter: Project editors

For review by: JTC1/SC34/WG3 members

Defect report concerning: ISO/IEC 15445:2000 HyperText Markup Language (HTML)

Qualifier: Omission

References:

  1. http://www.w3.org/TR/html401/appendix/changes.html#h-A.1 `Changes between 24 April 1998 HTML 4.0 and 24 December 1999 HTML 4.01 versions'
  2. E-mail from Steven Pemberton, Chair, W3C HTML WG, 2000-12-20

Nature of defect: Subclause 13.6.1 Client-side image maps of the W3C Recommendation for HTML 4.01 introduces an extended mixed content model for the <MAP> [W3C 13.6.1] element type ((%block;) | AREA)+ which allows %block; elements in addition to <AREA> [W3C 13.6.1] elements, and recommends rendering the block-level content to improve accessibility. ISO/IEC 15445:2000 provides only <AREA> [W3C 13.6.1] elements.

Solution proposed by submitter:

Part 3 - Editor's response

The W3C HTML WG have advised us that

We recommend that ISO/IEC 15445:2000 provide the same support for accessibility as the W3C Recommendation for HTML 4.01, by extending the <MAP> [W3C 13.6.1] element type content model to ((%block;) | AREA)+ and adding SHAPE and COORDS attributes to <A> [W3C 12.2]. Note that the restricted definition of the %block; parameter entity in ISO/IEC 15445:2000 prevents %heading; and <ADDRESS> [W3C 7.5.6] elements appearing in a client side map.

C.5   Defect report 5   HTML 4.01

Part 1 - To be completed by the WG secretariat

Defect report number: DR 15445/005

WG Secretariat: Project editors

Date circulated by WG Secretariat: 2000-12-10

Deadline for response from editor: 2000-12-10

Part 2 - To be completed by the submitter

Submitter: Project editors

For review by: JTC1/SC34/WG3 members

Defect report concerning: ISO/IEC 15445:2000 HyperText Markup Language (HTML)

Qualifier: Omission

References:

  1. http://www.w3.org/TR/html401/appendix/changes.html#h-A.1 `Changes between 24 April 1998 HTML 4.0 and 24 December 1999 HTML 4.01 versions'

Nature of defect: The International Standard refers to HTML 4.0 `as ammended by the W3C errata', however the W3C have made HTML 4.01 the specification of the `HTML 4' language, and there are now no W3C errata. References in the International Standard to the W3C errata are now incorrect.

Solution proposed by submitter: Make the W3C Recommendation for HTML 4.01 the reference text.

Part 3 - Editor's response

This is a technical defect in the International Standard. We recommend accepting the submitter's proposal. See the required Technical Corrigendum.

C.6   Defect report 6   FORM content model

Part 1 - To be completed by the WG secretariat

Defect report number: DR 15445/006

WG Secretariat: Project editors

Date circulated by WG Secretariat: 2001-01-15

Deadline for response from editor: 2001-01-31

Part 2 - To be completed by the submitter

Submitter: Project editors

For review by: JTC1/SC34/WG3 members

Defect report concerning: ISO/IEC 15445:2000 HyperText Markup Language (HTML)

Qualifier: Omission

References:

  1. Specification of the HTML 4 <FORM> [W3C 17.3] element type.
  2. E-mail from Nicolas Lesbats, Technical University of Compiègne, 2001-01-12.

Nature of defect: The <FORM> [W3C 17.3] element type specified by W3C HTML 4 has the content model (%block;|SCRIPT)+. The content model for the same element type defined by ISO/IEC 15445:2000 is (%block; | %text; | %form.fields; | ADDRESS)+ which allows text content. This `generosity' allows authors to create documents which conform to ISO/IEC 15445 but do not conform to W3C HTML 4. This is a defect since all documents which conform to ISO/IEC 15445 should also conform to the W3C Recommendation for HTML 4.01.

Solution proposed by submitter: Make the following changes to the ISO/IEC 15445 DTD:

  1. Reduce the content model of the <FORM> [W3C 17.3] element type to (%block;)+ to obtain:
    <!ELEMENT FORM - - (%block;)+ -(FORM) >
  2. Extend the definition of inline, `text level' elements to
    <!ENTITY % text '#PCDATA | %physical.styles; | %logical.styles; | %special;
                             | %form.fields;' >
  3. Reduce the declaration of the <FIELDSET> [W3C 17.10] element type to
    <!ELEMENT FIELDSET - - (#PCDATA,LEGEND,(%block; | %text; | ADDRESS)+)
                                 -(FIELDSET) >
  4. Remove the parameter entity reference %form.fields; from the declaration of the element type <LABEL> [W3C 17.9.1] to obtain:
    <!ELEMENT LABEL - - (%text;)+ -(LABEL) >
  5. Remove the declaration of the parameter entity %form.content;.

Part 3 - Editor's response

This is a technical defect in the International Standard. We recommend accepting the submitter's proposal. See the required Technical Corrigendum.

C.7   Defect report 7   Case folding of ID, NAME and HREF attribute values

Part 1 - To be completed by the WG secretariat

Defect report number: DR 15445/007

WG Secretariat: Project editors

Date circulated by WG Secretariat: tba

Deadline for response from editor: tba

Part 2 - To be completed by the submitter

Submitter: Project editors

For review by: JTC1/SC34/WG3 members

Defect report concerning: ISO/IEC 15445:2000 HyperText Markup Language (HTML)

Qualifier: Omission

References:

  1. The DTD for IS 15445, element types A and MAP.

Nature of defect: The International Standard identifies the case folding contradiction and says that "case must not be taken into account", but does not say what is required of the authors.

Solution proposed by submitter: Add text to the DTD to recommend that authors satisfy the competing requirements of SGML and the W3C Recommendation for HTML 4.01 by restricting themselves to the 40 characters "ABCDEFGHIJKLMNOPQRSTUVWXYZ.-_:0123456789" for ID and NAME values, and for the corresponding HREF values.

Part 3 - Editor's response

This is a technical defect in the International Standard. We recommend accepting the submitter's proposal. See the proposed Draft Technical Corrigendum.

C.8   Defect report 8   Element type INPUT with attribute TYPE=reset

Part 1 - To be completed by the WG secretariat

Defect report number: DR 15445/008

WG Secretariat: Project editors

Date circulated by WG Secretariat: tba

Deadline for response from editor: tba

Part 2 - To be completed by the submitter

Submitter: Edward Welbourne

For review by: JTC1/SC34/WG3 members

Defect report concerning: ISO/IEC 15445:2000 HyperText Markup Language (HTML)

Qualifier: Omission

References:

  1. The DTD for IS 15445, element type INPUT with attribute TYPE=reset.
  2. E-mail from Edward Welbourne, Mon, 28 Oct 2002.

Nature of defect: When a <BUTTON> [W3C 17.5] has attribute TYPE=reset, its effects are limited by any enclosing <FIELDSET> [W3C 17.10]; but when an <INPUT> [W3C 17.4] has attribute TYPE=reset, which should have the same effect, there is no statement in the International Standard of the limitation due to an enclosing <FIELDSET> [W3C 17.10].

Solution proposed by submitter: Add text to the DTD to state the limitation.

Part 3 - Editor's response

This is an omission in the International Standard. We recommend accepting the submitter's proposal. See the proposed Draft Technical Corrigendum.

C.9   Defect report 9   Alternative syntax for architectural support declaration

Part 1 - To be completed by the WG secretariat

Defect report number: DR 15445/009

WG Secretariat: Project editors

Date circulated by WG Secretariat: tba

Deadline for response from editor: tba

Part 2 - To be completed by the submitter

Submitter: Russell O'Connor

For review by: JTC1/SC34/WG3 members

Defect report concerning: ISO/IEC 15445:2000 HyperText Markup Language (HTML)

Qualifier: Omission

References:

  1. Clause 9.2 Architectural support declaration
  2. E-mail from Russell O'Connor, Mon, 20 Jan 2003.
  3. E-mail from W. Eliot Kimber, Fri, 07 Feb 2003.

Nature of defect: Clause 9.2 provides an architectural support declaration using a PI-based syntax. However ISO/IEC 10744:1997 (HyTime) in Annex A.3 provides a different syntax based on an attribute definition list declaration. The International Standard offers no explanation for this discrepancy.

Solution proposed by submitter: Add the second syntax.

Part 3 - Editor's response

The two syntaxes are both valid, but the PI-based syntax has not yet been published. We recommend accepting the submitter's proposal. See the proposed Draft Technical Corrigendum.

The alternative syntax proposed by the submitter is used in the production of the International Standard and the User's Guide.

Bibliography

[1]   "Hypertext Markup Language - 2.0". T. Berners-Lee, D. Connolly. IETF RFC1866, November 1995. Category: Standards Track. http://www.ietf.org/rfc/rfc1866.txt

[2]   "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", N. Freed, N. Borenstein. IETF RFC2046, November 1996. Category: Standards Track. Obsoletes: 1521, 1522, 1590. http://www.ietf.org/rfc/rfc2046.txt

User's Guide Bibliography

There is an excellent online bibliography by Robin Cover for SGML and XML topics. Detailed references for international standards are available at the ISO's WWW site and details of the ISO/IEC JTC1 programme of work are available at the JTC1 WWW site. W3C documents will be found at the W3C site. The IETF RFC's will be found at the IETF WWW site. A bibliography of ``Dublin Core Relevant Publications'' is available.

[3]   H. Alverstrand. Tags for the Identification of languages Internet Engineering Task Force, March 1995. RFC1766

[4]   Tim Berners-Lee, R. Fielding and L. Masinter. Uniform Resource Identifiers (URI): Generic Syntax Internet Engineering Task Force, August 1998. RFC2396

[5]   Tim Berners-Lee, Daniel Connolly. Hypertext Markup Language—2.0 Internet Engineering Task Force RFC1866 1995.

[6]   Tim Bray, Jean Paoli and C.M. Sperberg-McQueen. Extensible Markup Language (XML) 1.0 World Wide Web Consortium REC-xml-19980210, 1998.

[7]   N. Freed, N. Borenstein. Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies Internet Engineering Task Force, December 2nd, 1996. RFC2045

[8]   Roy T. Fielding, Jim Gettys, Jeffrey C. Mogul, Henrik Frystyk Nielsen, Tim Berners-Lee. Hypertext Transfer Protocol—HTTP/1.1 Internet Engineering Task Force, 1997 RFC2068.

[9]   Charles F. Goldfarb. The SGML Handbook First edition. Oxford University Press, 1990. ISBN 0-19-853737-9.

[10]   Barbara F. Grimes Ethnologue, Languages of the World 12th edition. Summer Institute of Linguistics, Dallas 1992

[11]   Paul Grosso, Ed., Entity Management. OASIS Technical Resolution 9401:1997 (Amendment 2 to TR 9401), Organization for the Advancement of Structured Information Standards (OASIS), 1997 September 10.

[12]   ISO/IEC Joint Technical Committee 1. JTC1 Directives: Procedures for the technical work of ISO/IEC JTC1 1999-09-23

[13]   Rohit Khare and Scott Lawrence, HTTP Upgrade to TLS, Internet Engineering Task Force, May 2000, RFC 2817.

[14]   Håkon W. Lie, Bert Bos. Cascading Style Sheets, level 1 World Wide Web Consortium, 1996. REC-CSS1-961217

[15]   Ernesto Nebel, Larry Masinter. Form-based File Upload in HTML Internet Engineering Task Force, November 1994. RFC1867

[16]   Steven Pemberton and others. XHTMLTM 1.0: The Extensible HyperText Markup Language: A Reformulation of HTML 4 in XML 1.0. World Wide Web Consortium. REC-xhtml1-20000126, 2000.

[17]   Dave Raggett, Arnaud Le Hors, Ian Jacobs. HTML 4.0 Specification W3C Recommendation REC-html40-971218, 18-Dec-1997. World Wide Web Consortium.

[18]   Dave Raggett, Arnaud Le Hors, Ian Jacobs. HTML 4.01 Specification W3C Recommendation REC-html401-19991224, 24-Dec-1999. World Wide Web Consortium.

[19]   Dave Ragget, Charlie Kindel, Lou Montulli, Eric Sink, Wayne Gramlich, Jonathan Hirschman, Tim Berners-Lee, Dan Connolly. Inserting objects into HTML. (work in progress) World Wide Web Consortium, 1996. WD-object-960422

[20]   Dave Raggett. HTML Tables Internet Engineering Task Force, May 1996. RFC1942

[21]   Registered Internet MIME types

[22]   J. Reynolds, Jon Postel. Assigned Numbers Internet Engineering Task Force, October 1994. RFC1700

[23]   Norman Walsh and Leonard Muellner, DocBook: The Definitive Guide O'Reilly & Associates, Inc., published version 2.0.8", 2003-01-02. ISBN 1-56592-580-7.

[24]   François Yergeau, Gavin Nicol, Glenn Adams, Martin Dürst. Internationalization of the Hypertext Markup Language Internet Engineering Task Force, January 1997. RFC2070

Long descriptions

Long description of terms used for character representation

The figure illustrates the different ways of refering to a character. Each character is given a name such as "CAPITAL LETTER E WITH GRAVE ACCENT", and the characters are placed in an ordered set known as the character repertoire. The elements (the characters) of the set are assigned decimal numbers 0, 1, 2, 3, and so on. The decimal number for a character is called the code position, and the code position for "CAPITAL LETTER E WITH GRAVE ACCENT" is 200. SGML calls these decimal numbers the character numbers.

The function from "code position" to "character name" is called the coded character set by RFC 1866. The second column in the figure shows the code position as a hexadecimal value which represents a binary pattern. The ordered set of binary patterns is called the code set by SGML.

The 1 to 1 relation between the binary pattern and the character name is called the coded character set by ISO 8859-1. The function from name to pattern is called character set by SGML and the function from pattern to name is called character encoding scheme by RFC 1866.

To facilitate entry of characters not on a keyboard, entity sets such as "ISO latin 1" provide entities for accented characters. The "CAPITAL LETTER E WITH GRAVE ACCENT" may be entered as &Egrave;. A character may also be entered using its decimal code position in the form of a numeric character reference, such as &#200;. The figure also provides in the final column an approximation for the printed glyph.

NOTE: An interesting use of numeric character references is to obfuscate the markup of an e-mail address in a web page, so that it is not harvested by spam-bots.

Back to figure

Long description of progressive nesting of sections

The figure illustrates the progressive nesting of sections. The model is one of geographic entities containing one another. The sections have a rank: An <H1> is called a continent, an <H2> is called a country, an <H3> is called a province, an <H4> is called a city, and so on. The idea is that a province may contain a city but not the other way around. The nesting must also be progressive, ie. if a continent contains a province, there must be an intermediate country.

An <H1> continent may contain more than one <H2> country, and a <H2> country may contain more than one <H3> province.

Back to figure

Long description of figure which has a map

The figure shows an graphic containing a row of five equal circles. The circles are inscribed with regular polygons: a triangle, a square, a hexagon (6 sided), a decagon (10 sided) and a duodecagon (12 sided). Clicking on one of the polygons leads to a text giving a formula for the surface area.

Back to figure

Long description of figure for internal subset

The figure shows a "document in preparation" which contains:

The figure shows a piece of the catalogue which associates the entity "legal" with the file legal.txt. The figure shows that the file legal.txt contains a declaration of the general entity "fineprint". The reference to parameter entity "legal" has the effect of declaring the entity "fineprint", which is then available for reference in the body of the document.

NOTE: The legal text is in such a small font that it is impossible to read it.

Back to figure


ICS 35.240.30

Price of the International Standard is based on a printed size of 20 pages.


PURL: http://purl.org/NET/ISO+IEC.15445/Users-Guide.html   Last change was on 2024-06-19Z15:56:35 UTC