Internet Draft S.Fujimoto Document: draft-fujimoto-sipping-header-lang-00. Fujitsu Labs LTD Expires: March 2003 September 2002 SIP Header Language Information Extension Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This draft explains the problem when we use the UTF-8 character set to represent the language text other than English. This draft also describes the requirements to solve the problems, and proposes the extended syntax to SIP protocol message syntax specification. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [2]. Table of Contents 1. Introduction..................................................2 2. Scope.........................................................2 3. Requirements..................................................3 3.1 Backward Compatibility....................................3 Fujimoto Expires - March 2003 [Page 1] draft-fujimoto-sipping-header-lang-00 September 2002 3.2 Data Size.................................................3 3.3 Multiple Languages on one field value.....................3 3.4 Extensibility.............................................3 3.5 Country Information.......................................4 4. SIP Header I18N...............................................4 4.1 Language Tag..............................................4 4.2 SIP Header i18n Extension.................................4 5. Formal Syntax.................................................5 6. Examples......................................................5 6.1 Display Name..............................................5 6.2 Generic Header Value......................................5 7. Security Considerations.......................................5 8. References....................................................6 9. Author's Addresses............................................6 1. Introduction SIP [3] allows users to use any UTF-8[4] characters in SIP message header fields, e.g. display-name at From: field. However, there are no standard ways to add the language information which give UAs hints to choose appropriate font set for representation on the User Interfaces. This problem comes from CJK(Chinese Japanese Korean) Kanji characters are used to express different letters in different languages. Additionally, people who use alphabet letters pronounce the same word differently in their languages. We believe adding language information on UTF-8 text will solve these kind of problems. UTF-8 allows to represent any UCS-2/4 characters, including recent proposed "language tag" which used to specifying the languages information for following character strings. However, the deployed SIP implementations may not have enough knowledge to handle them correctly. RFC2231[5] defines the standard way to add the language information on MIME part 3[6]. However, this specification does not provide any encording type to allow representing UTF-8 characters as is, since mail delivery infrastructures are still "8-bit unsafe". This internet draft describes the requirements for adding the language information on SIP message header, and proposes a method of adding the language information on UTF-8 header fields values. 2. Scope General requirements for header filed value with non UTF-8 characters are out of scope of this draft. Fujimoto Expires - March 2003 [Page 2] draft-fujimoto-sipping-header-lang-00 September 2002 3. Requirements 3.1 Backward Compatibility This requirement states: The extensions for SIP header internationalization (i18n-ext) MUST be fully conforming to the current SIP specifications. Even on the implementation is not compliant with i18n-ext, these headers MUST be recognized as valid SIP headers. This means all SIP implementation can relay the SIP message with SIP the i18n-ext headers. 3.2 Data Size This requirement states: The i18n-ext SHOULD NOT increase the size of header data drastically. If i18n-ext is archived by adding some information on original header value, the result MUST work fine with line folding. 3.3 Multiple Languages on one field value This requirement states: The i18n-ext SHOULD be applicable on the header value which consists of more than two languages, e.g. combination of Chinese and Japanese. string 3.4 Extensibility This requirement states: The i18n-ext MUST be easy to add new set of languages for header values. The i18n-ext SHOULD be easy to add new i18n-ext enabled headers which added in the future. The i18n-ext MUST provide the mean to determine if the implementation can handle the specific part of i18n-ext header value. Fujimoto Expires - March 2003 [Page 3] draft-fujimoto-sipping-header-lang-00 September 2002 3.5 Country Information This requirement states: The i18n-ext MUST provide enough information for cultural dependent process. This means even same characters and wording is used, it is possible to switch the way to pronounce with US and UK style on reading texts. 4. SIP Header I18N 4.1 Language Tag This specification uses 'Language-Tag' to identify the language for UTF-8 string for handling. The 'Language-Tag' syntax is imported from RFC3066 "Tags for the identification of language"[7]. 4.2 SIP Header i18n Extension This document defines new syntax component 'i18n-text' which can be used as 'qdtext', or 'UTF8-TRIM' in RFC3961[3] 'qdtext' is used for display-name in From and To header, and 'UTF8- TRIM' is used for other UTF-8 enabled headers. This specification reserves the characters "=", "?", and """ for special purpose, and those characters MUST be escaped within 'i18n- text'. All compliant implementations MUST replace "?" character with "=2F" where 'i18n-text' is allowed using. 'i18n-text' starts with preamble sequence "=?", followed by language tag, followed by delimiter "?", followed by escaped UTF-8 strings, and followed by epilogue sequence "=?". Escaped UTF-8 strings are represented as: 1) Any octet of UTF-8 printable characters MAY be represented as the format of starting with "=" character, and followed by 2-letter hex value. 2) "=", "?", and """ characters MUST be represented as hex value representation which defined in 1). Fujimoto Expires - March 2003 [Page 4] draft-fujimoto-sipping-header-lang-00 September 2002 5. Formal Syntax The following syntax specification uses the augmented Backus-Naur Form (BNF) as described in RFC-2234 [8]. 'i18n-text' MAY be used where 'qdtext' or 'UTF8-TRIM' is used in RFC-3261. However, using both 'qdtext' and 'i18n-text' within 'quoted-string' is not allowed. i18n-text = "=?" Language-Tag "?" escaped-utf8-string "?=" 'Language-Tag' is imported from RFC3066 "Tags for the identification of language"[7] Language-Tag = Primary-subtag *( "-" Subtag ) Primary-subtag = 1*8ALPHA Subtag = 1*8(ALPHA / DIGIT) escaped-utf8-char = "=" 2*2HEXDIGIT HEXDIGIT = %x30-x39 / %x41-%46 escaped-utf8-string = 1*(UTF8-NON-ASCII / %x21-33 / %x35-%x3C / %x3E / %x40-%7E / escaped-utf8-char ) 6. Examples 6.1 Display Name From: "=?ja-JP?(escaped-japanese-display-name in UTF8)?=" From: "=?en-GB?Alice in Wonderland?=" 6.2 Generic Header Value Subject: =?en-AU?How are you today=20=3F?= Organization: =?it?Ciao?= =?en?Travel Inc.?= 7. Security Considerations This draft does not discuss security issues and is not believed to raise any security issues on fully conforming implementations of SIP. Fujimoto Expires - March 2003 [Page 5] draft-fujimoto-sipping-header-lang-00 September 2002 8. References [1] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997 [3] J.Rosenberg, H.Schulzrinne, G.Camarillo, A.Johnston, J.Peterson, R.Sparks,M.Handley, E.Schooler, "SIP: Session Initiation Protocol", RFC3261, June 2002 [4] Yergeau, F., "UTF-8: a transformation format of ISO 10646", RFC2279, November 1998 [5] Freed, N., and Moore, K., "MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations", RFC2231, November 1997 [6] Moore, K., "Multipurpose Internet Mail Extensions Part Three: Representation of non-ASCII text in Internet Message Headers", RFC2047, December 1996 [7] "Tags for the Identification of Languages", RFC3066, January 2001. [8] Crocker, D. and Overell, P.(Editors), "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, Internet Mail Consortium and Demon Internet Ltd., November 1997 Author's Addresses Shingo Fujimoto Fujitsu Laboratories LTD Okubocho Nishiwaki 64 Akashi HYOGO JAPAN Phone: +81 78-934-8248 Email: shingo_fujimoto@jp.fujitsu.com Fujimoto Expires - March 2003 [Page 6]