文章分类 | 推荐文章 | 最新文章 | 热点文章 | 最新软件 | 精品软件 | 下载排行 | 推荐下载 | WPS | 杀毒软件
清风网络
首 页 软件下载 网络学院
QQ 电脑入门 游戏 操作系统 图形处理 办公软件 媒体动画 精文荟萃 工具软件 网络编程 程序开发 网络技术 认证考试 网站建设 文章专栏
当前位置:清风网络网络技术网络协议RFC1843 - HZ - A Data Format for Exchanging Files of Arbitrarily Mixed Chinese and ASCII characters
精品推荐
特别推荐
·ISIS路由协议
·Telnet入侵最完全手册
·网络协议基础知识 SMTP协议和UDP协议
·新的宽带认证方式——IEEE 802.1x协议
·ARP协议揭密
·网络沟通的桥梁-协议X档案
·TCP/IP协议简介
·NGN网络协议解析
·HTTP协议基础
·电子商务安全协议
热点TOP10
·TCP/IP协议原理
·RFC791 - Internet Protocol
·技术应用标题:WLAN与WPAN的QoS机制对比分析
·RS-485串行数据通信协议及其应用
·PPP协议链路操作的软件实现
·用协议分析工具学习TCP/IP(一)
·关于Sniffer Pro
·IP PBX方案篇
·通用多协议标记交换工作原理
·完全用Linux工作 摈弃Windows

RFC1843 - HZ - A Data Format for Exchanging Files of Arbitrarily Mixed Chinese and ASCII characters

日期:2007年5月5日 作者: 查看:[大字体 中字体 小字体]


  Network Working Group F. Lee
Request for Comments: 1843 Stanford University
Category: Informational August 1995

HZ - A Data Format for Exchanging Files of
Arbitrarily Mixed Chinese and ASCII characters

Status of this Memo

This memo provides information for the Internet community. This memo
does not specify an Internet standard of any kind. Distribution of
this memo is unlimited.

Abstract

The content of this memo is identical to an article of the same title
written by the author on September 4, 1989. In this memo, GB stands
for GB2312-80. Note that the title is kept only for historical
reasons. HZ has been widely used for purposes other than "file
exchange".

1. Introduction

Most existing computer systems which can handle a text file of
arbitrarily mixed Chinese and ASCII characters use 8-bit codes. To
exchange such text files through electronic mail on ASCII computer
systems, it is necessary to encode them in a 7-bit format. A generic
binary to ASCII encoder is not sufficient, because there is currently
no universal standard for such 8-bit codes. For example, CCDOS and
Macintosh's Chinese OS use different internal codes. Fortunately,
there is a PRC national standard, GuoBiao (GB), for the encoding of
Chinese characters, and Chinese characters encoded in the above
systems can be easily converted to GB by a simple formula. (* The ROC
standard BIG-5 is outside the scope of this article.)

HZ is a 7-bit data format proposed for arbitrarily mixed GB and ASCII
text file exchange. HZ is also intended for the design of terminal
emulators that display and edit mixed Chinese and ASCII text files in
real time.

2. Specification

The format of HZ is described in the following.

Without loss of generality, we assume that all Chinese characters
(HanZi) have already been encoded in GB. A GB (GB1 and GB2) code is
a two byte code, where the first byte is in the range $21-$77
(hexadecimal), and the second byte is in the range $21-$7E.

A graphical ASCII character is a byte in the range $21-$7E. A non-
graphical ASCII character is a byte in the range $0-$20 or of the
value $7F.

Since the range of a graphical ASCII character overlaps that of a GB
byte, a byte in the range $21-$7E is interpreted according to the
mode it is in. There are two modes, namely ASCII mode and GB mode.

By convention, a non-graphical ASCII character should only appear in
ASCII mode.

The default mode is ASCII mode.

In ASCII mode, a byte is interpreted as an ASCII character, unless a
'~' is encountered. The character '~' is an escape character. By
convention, it must be immediately followed ONLY by '~', '{' or '\n'
(<LF>), with the following special meaning.

o The escape sequence '~~' is interpreted as a '~'.
o The escape-to-GB sequence '~{' switches the mode from ASCII to
GB.
o The escape sequence '~\n' is a line-continuation marker to be
consumed with no output produced.

In GB mode, characters are interpreted two bytes at a time as (pure)
GB codes until the escape-from-GB code '~}' is read. This code
switches the mode from GB back to ASCII. (Note that the escape-
from-GB code '~}' ($7E7D) is outside the defined GB range.)

The decoding process is clear from the above description.

The encoding process is straightforward. Note that an (ASCII) '~' is
always encoded as '~~'. A sequence of GB codes is enclosed in '~{'
and '~}'.

3. Remarks & Recommendations

We choose to encode any ASCII character except '~' as it is, rather
than as a two byte code, and we choose ASCII as the default mode for
the following reasons. The computer systems we use is ASCII based. A
HZ file containing pure ASCII characters (i.e. no Chinese characters)
except '~' is precisely a pure ASCII file. In general, the English
(ASCII) portion of a HZ file is directly readable.

The escape character '~' is chosen not only because it is commonly
used in the ASCII world, but also because '~' ($7E) is outside the
defined range ($21-$77) of the first byte of a GB code.

In ASCII mode, other potential escape sequences, i.e., two byte
sequences beginning with '~' (other than '~~', '~{', '~\n') are
currently invalid HZ sequences. Hence, they can be used for future
extension of HZ with total upward compatibility.

The line-continuation marker '~\n' is useful if one wants to encode
long lines in the original text into short lines in this data format
without introducing extra newline characters in the decoding process.

There is no limit on the length of a line. In fact, the whole file
could be one long line or even contain no newline characters. Any
DECODER of this HZ data format should not and has no need to operate
on the concept of a line.

It is easy to write encoders and decoders for HZ. An encoder or
decoder needs to lookahead at most one character in the input data
stream.

Given the current mode, it is also possible and easy to decode a HZ
data stream by scanning backward. One of the implication is that
"backspaces" can be handled correctly by a terminal emulator.

To facilitate the effective use of programs supporting line/page
skips such as "more" on UNIX with a terminal emulator understanding
the HZ format, it is RECOMMENDED that the ENCODER (which outputs in
HZ) sets a maximum line size of less than 80 characters. Since '\n'
is an ASCII character, the syntax of HZ then automatically implies
that GB codes appearing at the end of a line must be terminated with
the escape-from-GB code '~}', and the line-continuation marker '~\n'
should be inserted appropriately. The price to paid is that the
encoded file size is slightly larger.

It is important to understand the following distinction. Note that
the above recommendation does NOT change the HZ format. It is simply
an encoding "style" which follows the syntax of HZ. Note that this

"style" is not built into HZ. It is an additional convention built
"on top of" HZ. Other applications may require different "styles",
but the same basic HZ DECODER will always work. The essence of HZ is
to provide such a flexible basic data format for files of arbitrarily
mixed Chinese and ASCII characters.

4. Examples

To illustrate the "stylistic" issue of HZ encoding, we give the
following four examples of encoded text, which should produce the
same decoded output. (The recommendation in the last section refers
to Example 2.)

Example 1: (Suppose there is no line size limit.)
This sentence is in ASCII.
The next sentence is in GB.~{<:Ky2;S{#,NpJ)l6HK!#~}Bye.

Example 2: (Suppose the maximum line size is 42.)
This sentence is in ASCII.
The next sentence is in GB.~{<:Ky2;S{#,~}~
~{NpJ)l6HK!#~}Bye.

Example 3: (Suppose a new line is started whenever there is a mode
switch.)
This sentence is in ASCII.
The next sentence is in GB.~
~{<:Ky2;S{#,NpJ)l6HK!#~}~
Bye.

Acknowledgement

Edmund Lai was the first one who brought my attention to this topic.
Discussions with Ed, Tin-Fook Ngai, Yagui Wei and Ricky Yeung were
very helpful in shaping the ideas in this article. Thanks to Tin-Fook
for his careful review of the draft and numerous interesting
suggestions.

References

[1] Fung Fung Lee, "HZ - A Data Format for Exchanging Files of
Arbitrarily Mixed Chinese and ASCII Characters," September 4,
1989.
As part of //ftp.ifcss.org/software/unix/convert/HZ-2.0.tar.gz

Security Considerations

Security issues are not addressed in this memo.

Author's Address

Fung Fung Lee
Computer Systems Laboratory
Stanford University
Stanford, CA 94309

Phone: +1 415 723 1450
[1] [2] 下一页 



上一篇:RFC1838 - Use of the X.500 Directory to support mapping between X.400 and RFC822 Addresses

下一篇:RFC1846 - SMTP 521 Reply Code
相关文章:
·collate chinese_prc_ci_as null 是什么意思
·Temporary Internet Files文件夹的用处
·mpservic - mpservic.exe - Process Information
·mpsetup - mpsetup.exe - Process Information
·ASP.NET:使用DataTable对象保存数据
·srng - srng.exe - Process Information
·DataList嵌套实例
·DataList分页、增加、删除、修改实例
·r_server - r_server.exe - Process Information
·ASCII码表
相关软件:

特别声明:本站除部分特别声明禁止转载的专稿外的其他文章可以自由转载,但请务必注明出处和原始作者。文章版权归文章原始作者所有。对于被本站转载文章的个人和网站,我们表示深深的谢意。如果本站转载的文章有版权问题请联系编辑人员,我们尽快予以更正。
[打印本页] [关闭窗口] 转载请注明来源:http://www.vipcn.net
| 帮助(?) | 版权声明 | 友情连接 | 关于我们 | 信息发布
Copyright 2007 www.vipcn.net All Rights Reserved. 鄂ICP备05000083号Powered by:viphot