KIBI Network Specifications :: Kixt

Kixt XML

Abstract

The following specification defines a means of creating XML documents which are useable with Kixt charsets.

1. Introduction

1.1 Purpose and Scope

The Kixt Transmission Format allows for the association of metadata with documents through the use of headers. However, using this format to specify character sets in a manner compatible with non–Kixt-aware XML processors is impossible, due to its reliance on characters prohibited in XML documents. This specification defines a subset of the XML syntax which may be used to encode texts written in a Kixt charset. The resulting document is called a Kixt XML document.

The file extension .xkixt or .xkx is suggested for saved Kixt XML documents.

1.2 Relationship to Other Specifications

This document is part of the Kixt family of specifications. It is also built upon XML and RDF technologies.

2. Character sets

A Kixt Charset Definition is XML compatible if it is UTF-8 compatible and the objects of the compatibility properties are equal to those defined in https://charset.KIBI.network/Kixt/XML for all characters so defined.

The Unicode character set, as well as the ASCII subset thereof, is assumed to be XML compatible. Whether other character sets are XML compatible is left undefined by this specification.

The character set of a Kixt XML document may be any XML compatible character set. Kixt XML documents must not be fully normalized, as they do not necessarily contain Unicode contents.

2.1 Character encoding

Kixt XML documents must be transmitted as either Generalized UTF-8, Fullwidth-BE, or Fullwidth-LE. Fullwidth-BE or Fullwidth-LE Kixt XML documents must begin with the codepoint FEFF. Generalized UTF-8 Kixt XML documents may also begin with FEFF, but this is not required.

3. The Format

3.1 Restrictions on the XML syntax

Kixt XML documents must follow the syntax defined by XML, with the additional constraints:

  1. Kixt XML documents must not contain the codepoints 000A, 000D, 0085, or 2028.

    As consequence of this rule, the only valid XML <S> whitespace in a Kixt XML document is 0020. This does not prevent the presence of other, non-syntactic whitespace, however.

  2. Kixt XML documents must not contain any codepoints not defined in https://charset.KIBI.network/Kixt/XML in any XML <Name>, <NCName>, <Nmtoken>, or <PubidLiteral>.

  3. Kixt XML documents must not contain an <EncodingDecl> encoding declaration.

  4. Kixt XML documents must not contain any codepoints not assigned in the current character set.

3.2 Defining the character set

The starting character set for a Kixt XML document is Unicode.

The character set for the contents of any XML element can be changed by setting the attribute with local name charset and namespace name https://spec.KIBI.network/Kixt/-/XML/ on that element. If the value of this attribute is the IRI of a supported, XML compatible character set, then this is the character set of the element's contents. Otherwise, the character set of the element's contents is the same as that for its parent, or, in the case of the root element, the document as a whole.

4. Security

Using non-Unicode character sets within a document may make scripted Kixt XML documents more difficult to sanitize. It is advised that processors of scripted documents which may contain unsafe information fail to recognize all character set IRIs, effectively locking the character set into Unicode, unless the character sets supported by a sanitization filter are known.

5. Changelog

Added Security section.

Initial specification.