Misplaced Pages

Standard Generalized Markup Language

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

The Standard Generalized Markup Language ( SGML ; ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates ":

#133866

83-486: DocBook SGML and LinuxDoc are examples which used SGML tools. SGML is an ISO standard: "ISO 8879:1986 Information processing – Text and office systems – Standard Generalized Markup Language (SGML)", of which there are three versions: SGML is part of a trio of enabling ISO standards for electronic documents developed by ISO/IEC JTC 1/SC 34 (ISO/IEC Joint Technical Committee 1, Subcommittee 34 – Document description and processing languages) : SGML

166-539: A numeric character reference . Consider the Chinese character "中", whose numeric code in Unicode is hexadecimal 4E2D, or decimal 20,013. A user whose keyboard offers no method for entering this character could still insert it in an XML document encoded either as &#20013; or &#x4e2d; . Similarly, the string "I <3 Jörg" could be encoded for inclusion in an XML document as I &lt;3 J&#xF6;rg . &#0;

249-475: A document type declaration associated with any of the instances. Note: If there is a document type declaration , the instance can be parsed with or without reference to it. Tag-validity was introduced in SGML (ENR+WWW) to support XML which allows documents with no DOCTYPE declaration but which can be parsed without a grammar, or documents which have a DOCTYPE declaration that makes no XML Infoset contributions to

332-458: A document, such as whether a document instance is integrally-stored or free of entity references. A type-valid SGML document is defined by the standard as An SGML document in which, for each document instance, there is an associated document type declaration (DTD) to whose DTD that instance conforms. A tag-valid SGML document is defined by the standard as An SGML document, all of whose document instances are fully tagged. There need not be

415-605: A joint project of HAL Computer Systems and O'Reilly & Associates and eventually spawned its own maintenance organization (the Davenport Group) before moving in 1998 to the SGML Open consortium, which subsequently became OASIS . DocBook is currently maintained by the DocBook Technical Committee at OASIS. DocBook is available in both SGML and XML forms, as a DTD . RELAX NG and W3C XML Schema forms of

498-501: A key group of software companies used DocBook since their representatives were involved in its initial design. Eventually, however, DocBook was adopted by the open source community where it has become a standard for creating documentation for many projects, including FreeBSD , KDE , GNOME desktop documentation, the GTK+ API references, the Linux kernel documentation (which, as of July 2016,

581-448: A list of syntax rules provided in the specification. Some key points in the fairly lengthy list include: The definition of an XML document excludes texts that contain violations of well-formedness rules; they are simply not XML. An XML processor that encounters such a violation is required to report such errors and to cease normal processing. This policy, occasionally referred to as " draconian error handling", stands in notable contrast to

664-522: A mechanism whereby an XML processor can reliably, without any prior knowledge, determine which encoding is being used. Encodings other than UTF-8 and UTF-16 are not necessarily recognized by every XML parser (and in some cases not even UTF-16, even though the standard mandates it to also be recognized). XML provides escape facilities for including characters that are problematic to include directly. For example: There are five predefined entities : All permitted Unicode characters may be represented with

747-546: A more compact non-XML syntax; the two syntaxes are isomorphic and James Clark 's conversion tool— Trang —can convert between them without loss of information. RELAX NG has a simpler definition and validation framework than XML Schema, making it easier to use and implement. It also has the ability to use datatype framework plug-ins ; a RELAX NG schema author, for example, can require values in an XML document to conform to definitions in XML Schema Datatypes. Schematron

830-599: A notion of unambiguity which closely resembles the LL(1) conditions and specifies various differences. There appears to be no definitive classification of full SGML against a known class of formal grammar . Plausible classes may include tree-adjoining grammars and adaptive grammars . DocBook DocBook is a semantic markup language for technical documentation . It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of documentation. As

913-475: A pane that appears as a frameset , but is actually implemented with div tags and cookies (so that it is progressive). DocBook offers a large number of features that may be overwhelming to a new user. For those who want the convenience of DocBook without a steep learning curve, Simplified DocBook was designed. It is a small subset of DocBook designed for single documents such as articles or white papers (i.e., "books" are not supported). The Simplified DocBook DTD

SECTION 10

#1732837668134

996-506: A rich datatyping system and allow for more detailed constraints on an XML document's logical structure. XSDs also use an XML-based format, which makes it possible to use ordinary XML tools to help process them. xs:schema element that defines a schema: RELAX NG (Regular Language for XML Next Generation) was initially specified by OASIS and is now a standard (Part 2: Regular-grammar-based validation of ISO/IEC 19757 – DSDL ). RELAX NG schemas may be written in either an XML based syntax or

1079-634: A schema can do so for DocBook. Many graphical or WYSIWYG XML editors come with the ability to edit DocBook like a word processor . Tables, list items, and other stylized content can be copied and pasted into the DocBook editor and will be preserved in the DocBook XML output. Because DocBook conforms to a well-defined XML schema, documents can be validated and processed using any tool or programming language that includes XML support. DocBook began in 1991 in discussion groups on Usenet and eventually became

1162-513: A semantic language, DocBook enables its users to create document content in a presentation-neutral form that captures the logical structure of the content; that content can then be published in a variety of formats, including HTML , XHTML , EPUB , PDF , man pages , WebHelp and HTML Help , without requiring users to make any changes to the source. In other words, when a document is written in DocBook format it becomes easily portable into other formats, rather than needing to be rewritten. DocBook

1245-612: A terser markup, via the SHORTREF feature. This markup style is now associated with wiki markup , e.g. wherein two equals-signs (==), at the start of a line, are the "heading start-tag", and two equals signs (==) after that are the "heading end-tag". SGML markup languages whose concrete syntax enables the SHORTTAG VALUE feature, do not require attribute values containing only alphanumeric characters to be enclosed within quotation marks—either double " " (LIT) or single ' ' (LITA)—so that

1328-493: A text-based processor could use bold instead of italics. Semantically, this document is a "book", with a "title", that contains two "chapters" each with their own "titles". Those "chapters" contain "paragraphs" that have text in them. The markup is fairly readable in English. In more detail, the root element of the document is book . All DocBook elements are in an XML Namespace , so the root element has an xmlns attribute to set

1411-421: A validity error must be able to report it, but may continue normal processing. A DTD is an example of a schema or grammar . Since the initial publication of XML 1.0, there has been substantial work in the area of schema languages for XML. Such schema languages typically constrain the set of elements that may be used in a document, which attributes may be applied to them, the order in which they may appear, and

1494-510: A vast number of semantic element tags. They are divided into three broad categories, namely structural, block-level, and inline. Structural tags specify broad characteristics of their contents. The book element, for example, specifies that its child elements represent the parts of a book. This includes a title, chapters, glossaries, appendices, and so on. DocBook's structural tags include, but are not limited to: Structural elements can contain other structural elements. Structural elements are

1577-527: A vocabulary to refer to the constructs within an XML document, but does not provide any guidance on how to access this information. A variety of APIs for accessing XML have been developed and used, and some have been standardized. Existing APIs for XML processing tend to fall into these categories: Stream-oriented facilities require less memory and, for certain tasks based on a linear traversal of an XML document, are faster and simpler than other alternatives. Tree-traversal and data-binding APIs typically require

1660-458: Is a lexical , event-driven API in which a document is read serially and its contents are reported as callbacks to various methods on a handler object of the user's design. SAX is fast and efficient to implement, but difficult to use for extracting information at random from the XML, since it tends to burden the application author with keeping track of what part of the document is being processed. It

1743-456: Is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable . The World Wide Web Consortium 's XML 1.0 Specification of 1998 and several other related specifications —all of them free open standards —define XML. The design goals of XML emphasize simplicity, generality, and usability across

SECTION 20

#1732837668134

1826-726: Is a language for making assertions about the presence or absence of patterns in an XML document. It typically uses XPath expressions. Schematron is now a standard (Part 3: Rule-based validation of ISO/IEC 19757 – DSDL ). DSDL (Document Schema Definition Languages) is a multi-part ISO/IEC standard (ISO/IEC 19757) that brings together a comprehensive set of small schema languages, each targeted at specific problems. DSDL includes RELAX NG full and compact syntax, Schematron assertion language, and languages for defining datatypes, character repertoire constraints, renaming and entity expansion, and namespace-based routing of document fragments to different validators. DSDL schema languages do not have

1909-410: Is an XML language. In its current version (5.x), DocBook's language is formally defined by a RELAX NG schema with integrated Schematron rules. (There are also W3C XML Schema +Schematron and Document Type Definition (DTD) versions of the schema available, but these are considered non-standard.) As a semantic language, DocBook documents do not describe what their contents "look like", but rather

1992-570: Is an XML industry data standard. XML is used extensively to underpin various publishing formats. One of the applications of XML is in the transfer of Operational meteorology (OPMET) information based on IWXXM standards. The material in this section is based on the XML Specification . This is not an exhaustive list of all the constructs that appear in XML; it provides an introduction to the key constructs most often encountered in day-to-day use. XML documents consist entirely of characters from

2075-498: Is better suited to situations in which certain types of information are always handled the same way, no matter where they occur in the document. Pull parsing treats the document as a series of items read in sequence using the iterator design pattern . This allows for writing of recursive descent parsers in which the structure of the code performing the parsing mirrors the structure of the XML being parsed, and intermediate parsed results can be used and accessed as local variables within

2158-497: Is closer to the COCOA format. As a document markup language, SGML was originally designed to enable the sharing of machine-readable large-project documents in government, law, and industry. Many such documents must remain readable for several decades—a long time in the information technology field. SGML also was extensively applied by the military, and the aerospace, technical reference, and industrial publishing industries. The advent of

2241-425: Is currently at version 1.1. Ingo Schwarze, the author of OpenBSD 's mandoc , considers DocBook inferior to the semantic mdoc macro for man pages . In an attempt to write a DocBook-to-mdoc converter (previous converters like docbook-to-man do not cover semantic elements), he finds the semantic parts "bloated, redundant, and incomplete at the same time" compared to elements covered in mdoc. Moreover, Schwarze finds

2324-621: Is intolerant of syntax omissions, and does not require a DTD for checking well-formedness. Both start tags and end tags may be omitted from a document instance, provided: For example, if OMITTAG YES is specified in the SGML Declaration (enabling the OMITTAG feature), and the DTD includes the following declarations: then this excerpt: which omits two <title> tags and two </title> tags, would represent valid markup. Omitting tags

2407-442: Is not permitted because the null character is one of the control characters excluded from XML, even when using a numeric character reference. An alternative encoding mechanism such as Base64 is needed to represent such characters. Comments may appear anywhere in a document outside other markup. Comments cannot appear before the XML declaration. Comments begin with <!-- and end with --> . For compatibility with SGML ,

2490-535: Is optional – the same excerpt could be tagged like this: and would still represent valid markup. Note: The OMITTAG feature is unrelated to the tagging of elements whose declared content is EMPTY as defined in the DTD: Elements defined like this have no end tag, and specifying one in the document instance would result in invalid markup. This is syntactically different from XML empty elements in this regard. Tags can be replaced with delimiter strings, for

2573-504: Is provided as part of the distribution of the DocBook 5 schema and specification package. DocBook files are used to prepare output files in a wide variety of formats. Nearly always, this is accomplished using DocBook XSL stylesheets. These are XSLT stylesheets that transform DocBook documents into a number of formats ( HTML , XSL-FO for later conversion into PDF , etc.). These stylesheets can be sophisticated enough to generate tables of contents, glossaries, and indexes. They can oversee

Standard Generalized Markup Language - Misplaced Pages Continue

2656-409: Is supported by various technical reports, in particular SGML descended from IBM 's Generalized Markup Language (GML), which Charles Goldfarb , Edward Mosher, and Raymond Lorie developed in the 1960s. Goldfarb, editor of the international standard, coined the "GML" term using their surname initials. Goldfarb also wrote the definitive work on SGML syntax in "The SGML Handbook". The syntax of SGML

2739-488: Is that, because it is a direct child of the book; it does not need to be named specially for a human reader. However, because the format was defined by a DTD, it did have to be named as such. The root element does not have or need a version , as the version is built into the DTD declaration at the top of a pre-DocBook 5 document. DocBook 4.x documents are not compatible with DocBook 5, but can be converted into DocBook 5 documents via an XSLT stylesheet. One ( db4-upgrade.xsl )

2822-568: Is the NET (Null End Tag) construction: <ITALICS/this/ , which is structurally equivalent to <ITALICS>this</ITALICS> . Additionally, the SHORTTAG NETENABL IMMEDNET feature allows shortening tags surrounding an empty text value, but forbids shortening full tags: can be written as wherein the first slash ( / ) stands for the NET-enabling "start-tag close" (NESTC), and

2905-507: Is transitioning to Sphinx / reStructuredText ), and the work of the Linux Documentation Project . Until DocBook 5, DocBook was defined normatively by a Document Type Definition (DTD). Because DocBook was built originally as an application of SGML , the DTD was the only available schema language. DocBook 4.x formats can be SGML or XML, but the XML version does not have its own namespace. DocBook 4.x formats had to live within

2988-504: The Internet . It is a textual data format with strong support via Unicode for different human languages . Although the design of XML focuses on documents, the language is widely used for the representation of arbitrary data structures , such as those used in web services . Several schema systems exist to aid in the definition of XML-based languages, while programmers have developed many application programming interfaces (APIs) to aid

3071-453: The Unicode repertoire. Except for a small number of specifically excluded control characters , any character defined by Unicode may appear within the content of an XML document. XML includes facilities for identifying the encoding of the Unicode characters that make up the document, and for expressing characters that, for one reason or another, cannot be used directly. Unicode code points in

3154-450: The XML profile has made SGML suitable for widespread application for small-scale, general-purpose use. SGML (ENR+WWW) defines two kinds of validity. According to the revised Terms and Definitions of ISO 8879 (from the public draft): A conforming SGML document must be either a type-valid SGML document, a tag-valid SGML document, or both. Note: A user may wish to enforce additional constraints on

3237-422: The concrete syntax of the document. Although full SGML allows implicit markup and some other kinds of tags, the XML specification (s4.3.1) states: Each XML document has both a logical and a physical structure. Physically, the document is composed of units called entities. An entity may refer to other entities to cause their inclusion in the document. A document begins in a "root" or document entity. Logically,

3320-410: The infoset augmentation facility and attribute defaults. RELAX NG and Schematron intentionally do not provide these. A cluster of specifications closely related to XML have been developed, starting soon after the initial publication of XML 1.0. It is frequently the case that the term "XML" is used to refer to XML together with one or more of these other technologies that have come to be seen as part of

3403-534: The regular expression notation of automata theory, because automata theory provides a theoretical foundation for some aspects of the notion of conformance to a content model. No assumption should be made about the general applicability of automata to content models. A report on an early implementation of a parser for basic SGML, the Amsterdam SGML Parser, notes the DTD-grammar in SGML must conform to

Standard Generalized Markup Language - Misplaced Pages Continue

3486-536: The DocBook Project development team maintain the key application for producing output from DocBook source documents: A set of XSLT stylesheets (as well as a legacy set of DSSSL stylesheets) that can generate high-quality HTML and print ( FO / PDF ) output, as well as output in other formats, including RTF , man pages and HTML Help. Web help is a chunked HTML output format in the DocBook XSL stylesheets that

3569-459: The DocBook specification not specific enough about the use of tags, the language non-portable across versions, rough in details and overall inconsistent. Norman Walsh is the principal author of the book DocBook: The Definitive Guide , the official documentation of DocBook. This book is available online under the GFDL , and also as a print publication. XML Extensible Markup Language ( XML )

3652-429: The XML core. Some other specifications conceived as part of the "XML Core" have failed to find wide adoption, including XInclude , XLink , and XPointer . The design goals of XML include, "It shall be easy to write programs which process XML documents." Despite this, the XML specification contains almost no information about how programmers might go about doing such processing. The XML Infoset specification provides

3735-503: The XML processor inserts in the DTD itself and in the XML document wherever they are referenced, like character escapes. DTD technology is still used in many applications because of its ubiquity. A newer schema language, described by the W3C as the successor of DTDs, is XML Schema , often referred to by the initialism for XML Schema instances, XSD (XML Schema Definition). XSDs are far more powerful than DTDs in describing XML languages. They use

3818-610: The XML version are available. Starting with DocBook 5, the RELAX NG version is the "normative" form from which the other formats are generated. DocBook originally started out as an SGML application, but an equivalent XML application was developed and has now replaced the SGML one for most uses. (Starting with version 4 of the SGML DTD, the XML DTD continued with this version numbering scheme.) Initially,

3901-434: The allowable parent/child relationships. The oldest schema language for XML is the document type definition (DTD), inherited from SGML. DTDs have the following benefits: DTDs have the following limitations: Two peculiar features that distinguish DTDs from other schema types are the syntactic support for embedding a DTD within XML documents and for defining entities , which are arbitrary fragments of text or markup that

3984-598: The base language for communication protocols such as SOAP and XMPP . It is one of the message exchange formats used in the Asynchronous JavaScript and XML (AJAX) programming technique. Many industry data standards, such as Health Level 7 , OpenTravel Alliance , FpML , MISMO , and National Information Exchange Model are based on XML and the rich features of the XML schema specification. In publishing, Darwin Information Typing Architecture

4067-401: The behavior of programs that process HTML , which are designed to produce a reasonable result even in the presence of severe markup errors. XML's policy in this area has been criticized as a violation of Postel's law ("Be conservative in what you send; be liberal in what you accept"). The XML specification defines a valid XML document as a well-formed XML document which also conforms to

4150-423: The case of C1 characters, this restriction is a backwards incompatibility; it was introduced to allow common encoding errors to be detected. The code point U+0000 (Null) is the only character that is not permitted in any XML 1.1 document. The Unicode character set can be encoded into bytes for storage or transmission in a variety of different ways, called "encodings". Unicode itself defines encodings that cover

4233-468: The columns running from right to left, so "after" in that case would be to the left. DocBook semantics are entirely neutral to these kinds of language-based concepts. Inline-level tags are elements like emphasis, hyperlinks, etc. They wrap text within a block-level element. These elements do not cause the text to break when rendered in a paragraph format, but typically they cause the document processor to apply some kind of distinct typographical treatment to

SECTION 50

#1732837668134

4316-432: The current namespace. Also, the root element of a DocBook document must have a version that specifies the version of the format that the document is built on. (XML documents can include elements from multiple namespaces at once, like the id attributes in the example.) A book element must contain a title , or an info element containing a title . This must be before any child structural elements. Following

4399-429: The data structure and contain metadata . What is within the tags is data, encoded in the way the XML standard specifies. An additional XML schema (XSD) defines the necessary metadata for interpreting and validating XML. (This is also referred to as the canonical schema.) An XML document that adheres to basic XML rules is "well-formed"; one that adheres to its schema is "valid." IETF RFC 7303 (which supersedes

4482-442: The direct use of almost any Unicode character in element names, attributes, comments, character data, and processing instructions (other than the ones that have special symbolic meaning in XML itself, such as the less-than sign, "<"). The following is a well-formed XML document including Chinese , Armenian and Cyrillic characters: The XML specification defines an XML document as a well-formed text, meaning that it satisfies

4565-437: The document fails to conform to that schema. XML editing tools can also use schema information to avoid creating non-conforming documents in the first place. Because DocBook is XML, documents can be created and edited with any text editor. A dedicated XML editor is likewise a functional DocBook editor. DocBook provides schema files for popular XML schema languages, so any XML editor that can provide content completion based on

4648-437: The document is composed of declarations, elements, comments, character references, and processing instructions , all of which are indicated in the document by explicit markup. For introductory information on a basic, modern SGML syntax, see XML . The following material concentrates on features not in XML and is not a comprehensive summary of SGML syntax. SGML generalizes and supports a wide range of markup languages as found in

4731-436: The document's SGML Declaration it is always possible to know whether a document is supported by a particular processor. Many SGML features relate to markup minimization. Other features relate to concurrent (parallel) markup (CONCUR), to linking processing attributes (LINK), and to embedding SGML documents within SGML documents (SUBDOC). The notion of customizable features was not appropriate for Web use, so one goal of XML

4814-518: The document. The standard calls this fully tagged . Integrally stored reflects the XML requirement that elements end in the same entity in which they started. Reference-free reflects the HTML requirement that entity references are for special characters and do not contain markup. SGML validity commentary, especially commentary that was made before 1997 or that is unaware of SGML (ENR+WWW), covers type-validity only. The SGML emphasis on validity supports

4897-403: The enclosed text, by changing the font, size, or similar attributes. (The DocBook specification does say that it expects different typographical treatment, but it does not offer specific requirements as to what this treatment may be.) That is, a DocBook processor doesn't have to transform an emphasis tag into italics . A reader-based DocBook processor could increase the size of the words, or,

4980-516: The entire repertoire; well-known ones include UTF-8 (which the XML standard recommends using, without a BOM ) and UTF-16 . There are many other text encodings that predate Unicode, such as ASCII and various ISO/IEC 8859 ; their character repertoires are in every case subsets of the Unicode character set. XML allows the use of any of the Unicode-defined encodings and any other encodings whose characters also appear in Unicode. XML also provides

5063-471: The following declarations: (and "&#RE;&#RS;" is a short-reference delimiter in the concrete syntax), then: is equivalent to: SGML has many features that defied convenient description with the popular formal automata theory and the contemporary parser technology of the 1980s and the 1990s. The standard warns in Annex H: The SGML model group notation was deliberately designed to resemble

SECTION 60

#1732837668134

5146-498: The following ranges are valid in XML 1.0 documents: XML 1.1 extends the set of allowed characters to include all the above, plus the remaining characters in the range U+0001–U+001F. At the same time, however, it restricts the use of C0 and C1 control characters other than U+0009 (Horizontal Tab), U+000A (Line Feed), U+000D (Carriage Return), and U+0085 (Next Line) by requiring them to be written in escaped form (for example U+0001 must be written as &#x01; or its equivalent). In

5229-685: The functions performing the parsing, or passed down (as function parameters) into lower-level functions, or returned (as function return values) to higher-level functions. Examples of pull parsers include Data::Edit::Xml in Perl , StAX in the Java programming language, XMLPullParser in Smalltalk , XMLReader in PHP , ElementTree.iterparse in Python , SmartXML in Red , System.Xml.XmlReader in

5312-502: The markup norm is using angle brackets as start- and end-tag delimiters in an SGML document (per the standard-defined reference concrete syntax ), it is possible to use other characters—provided a suitable concrete syntax is defined in the document's SGML declaration . For example, an SGML interpreter might be programmed to parse GML, wherein the tags are delimited with a left colon and a right full stop , and an :e prefix denotes an end tag: :xmp.Hello, world:exmp. . According to

5395-413: The meaning of those contents. For example, rather than explaining how the abstract for an article might be visually formatted, DocBook simply says that a particular section is an abstract. It is up to an external processing tool or application to decide where on a page the abstract should go and what it should look like or whether or not it should be included in the final output at all. DocBook provides

5478-435: The mid 1980s. These ranged from terse Wiki -like syntaxes to RTF -like bracketed languages to HTML -like matching-tag languages. SGML did this by a relatively simple default reference concrete syntax augmented with a large number of optional features that could be enabled in the SGML Declaration. Not every SGML parser can necessarily process every SGML document. Because each processor's System Declaration can be compared to

5561-550: The older RFC 3023 ), provides rules for the construction of media types for use in XML message. It defines three media types: application/xml ( text/xml is an alias), application/xml-external-parsed-entity ( text/xml-external-parsed-entity is an alias) and application/xml-dtd . They are used for transmitting raw XML files without exposing their internal semantics . RFC 7303 further recommends that XML-based languages be given media types ending in +xml , for example, image/svg+xml for SVG . Further guidelines for

5644-584: The only permitted top-level elements in a DocBook document. Block-level tags are elements like paragraph, lists, etc. Not all these elements can directly contain text. Sequential block-level elements render one "after" another. After, in this case, can differ depending on the language. In most Western languages, "after" means below: text paragraphs are printed down the page. Other languages' writing systems can have different directionality ; for example, in Japanese, paragraphs are often printed in downward columns, with

5727-465: The previous markup example could be written: One feature of SGML markup languages is the "presumptuous empty tagging", such that the empty end tag </> in <ITALICS>this</> "inherits" its value from the nearest previous full start tag, which, in this example, is <ITALICS> (in other words, it closes the most recently opened item). The expression is thus equivalent to <ITALICS>this</ITALICS> . Another feature

5810-449: The processing of XML data. The main purpose of XML is serialization , i.e. storing, transmitting, and reconstructing arbitrary data. For two disparate systems to exchange information, they need to agree upon a file format. XML standardizes this process. It is therefore analogous to a lingua franca for representing information. As a markup language , XML labels, categorizes, and structurally organizes information. XML tags represent

5893-674: The reference syntax, letter case (upper- or lower-case) is not distinguished in tag names, so the three tags <quote> , <QUOTE> , and <quOtE> are equivalent. (A concrete syntax might change this rule via the NAMECASE NAMING declarations.) SGML has features for reducing the number of characters required to mark up a document, which must be enabled in the SGML Declaration. SGML processors need not support every available feature, thus allowing applications to tolerate many types of inadvertent markup omissions; however, SGML systems usually are intolerant of invalid structures. XML

5976-418: The requirement for generalized markup that markup should be rigorous. (ISO 8879 A.1) An SGML document may have three parts: An SGML document may be composed from many entities (discrete pieces of text). In SGML, the entities and element types used in the document may be specified with a DTD, the different character sets, features, delimiter sets, and keywords are specified in the SGML Declaration to create

6059-519: The restrictions of being defined by a DTD. The most significant restriction was that an element name uniquely defines its possible contents. That is, an element named info must contain the same information no matter where it is in the DocBook file. As such, there are many kinds of info elements in DocBook 4.x: bookinfo , chapterinfo , etc. Each has a slightly different content model, but they do share some of their content model. Additionally, they repeat context information. The book's info element

6142-487: The rules of a Document Type Definition (DTD). In addition to being well formed, an XML document may be valid . This means that it contains a reference to a Document Type Definition (DTD), and that its elements and attributes are declared in that DTD and follow the grammatical rules for them that the DTD specifies. XML processors are classified as validating or non-validating depending on whether or not they check XML documents for validity. A processor that discovers

6225-461: The second slash stands for the NET. NOTE: XML defines NESTC with a / , and NET with an > (angled bracket)—hence the corresponding construct in XML appears as <QUOTE/> . The third feature is 'text on the same line', allowing a markup item to be ended with a line-end; especially useful for headings and such, requiring using either SHORTREF or DATATAG minimization. For example, if the DTD includes

6308-409: The selection of particular designated portions of a master document to produce different versions of the same document (such as a "tutorial" or a "quick-reference guide", where each of these consist of a subset of the material). Users can write their own customized stylesheets or even a full-fledged program to process the DocBook into an appropriate output format as their needs dictate. Norman Walsh and

6391-469: The string "--" (double-hyphen) is not allowed inside comments; this means comments cannot be nested. The ampersand has no special significance within comments, so entity and character references are not recognized as such, and there is no way to represent characters outside the character set of the document encoding. An example of a valid comment: <!--no need to escape <code> & such in comments--> XML 1.0 (Fifth Edition) and XML 1.1 support

6474-532: The title are the structural children, in this case, two chapter elements. Each of these must have a title . They contain para block elements, which can contain free text and other inline elements like the emphasis in the second paragraph of the first chapter. Rules are formally defined in the DocBook XML schema . Appropriate programming tools can validate an XML document (DocBook or otherwise), against its corresponding schema, to determine if (and where)

6557-522: The use of XML in a networked context appear in RFC 3470 , also known as IETF BCP 70, a document covering many aspects of designing and deploying an XML-based language. XML has come into common use for the interchange of data over the Internet. Hundreds of document formats using XML syntax have been developed, including RSS , Atom , Office Open XML , OpenDocument , SVG , COLLADA , and XHTML . XML also provides

6640-472: The use of much more memory, but are often found more convenient for use by programmers; some include declarative retrieval of document components via the use of XPath expressions. XSLT is designed for declarative description of XML document transformations, and has been widely implemented both in server-side packages and Web browsers. XQuery overlaps XSLT in its functionality, but is designed more for searching of large XML databases . Simple API for XML (SAX)

6723-426: The vendor support of XML Schemas yet, and are to some extent a grassroots reaction of industrial publishers to the lack of utility of XML Schemas for publishing . Some schema languages not only describe the structure of a particular XML format but also offer limited facilities to influence processing of individual XML files that conform to this format. DTDs and XSDs both have this ability; they can for instance provide

6806-472: Was introduced in version 1.76.1. The documentation for web help also provides an example of web help and is part of the DocBook XSL distribution. The major features are its fully CSS-based page layout, search of the help content, and a table of contents in collapsible-tree form. Search has stemming , match highlighting, explicit page-scoring, and the standard multilingual tokenizer . The search and TOC are in

6889-429: Was to minimize optional features. However, XML's well-formedness rules cannot support Wiki-like languages, leaving them unstandardized and difficult to integrate with non-text information systems. The usual (default) SGML concrete syntax resembles this example, which is the default HTML concrete syntax: SGML provides an abstract syntax that can be implemented in many different types of concrete syntax . Although

#133866