Apache POI - HPSF Internals(Apache POI - HPSF 内部结构)

HPSF Internals(HPSF 内部结构)

Introduction(介绍)

A Microsoft Office document is internally organized like a filesystem with directory and files. Microsoft calls these files streams. A document can have properties attached to it, like author, title, number of words etc. These metadata are not stored in the main stream of, say, a Word document, but instead in a dedicated stream with a special format. Usually this stream's name is \005SummaryInformation, where \005 represents the character with a decimal value of 5.(Microsoft Office 文档的内部组织方式类似于具有目录和文件的文件系统。 Microsoft 将这些文件称为流。文档可以附加属性,如作者、标题、字数等。这些元数据不存储在 Word 文档的主流中,而是存储在具有特殊格式的专用流中。通常这个流的名称是 005SummaryInformation,其中 005 代表十进制值为 5 的字符。)

A single piece of information in the stream is called a property, for example the document title. Each property has an integral ID (e.g. 2 for title), a type (telling that the title is a string of bytes) and a value (what this is should be obvious). A stream containing properties is called a property set stream.(流中的一条信息称为属性,例如文档标题。每个属性都有一个完整的 ID(例如标题为 2)、一个类型(告诉标题是一个字节串)和一个值(这应该是显而易见的)。包含属性的流称为属性集流。)

This document describes the internal structure of a property set stream, i.e. the HPSF. It does not describe how a Microsoft Office document is organized internally and how to retrieve a stream from it. See the POIFS documentation for that kind of stuff.(本文档描述了属性集流的内部结构,即 HPSF。它没有描述 Microsoft Office 文档的内部组织方式以及如何从中检索流。有关此类内容,请参阅 POIFS 文档。)

The HPSF is not only used in the Summary Information stream in the top-level document of a Microsoft Office document. Often there is also a property set stream named \005DocumentSummaryInformation with additional properties. Embedded documents may have their own property set streams. You cannot tell by a stream's name whether it is a property set stream or not. Instead you have to open the stream and look at its bytes.(HPSF 不仅用于 Microsoft Office 文档的顶级文档的摘要信息流中。通常还有一个名为 005DocumentSummaryInformation 的属性集流,带有附加属性。嵌入式文档可能有自己的属性集流。您无法通过流的名称判断它是否是属性集流。相反,您必须打开流并查看其字节。)

Data Types(数据类型)

Before delving into the details of the property set stream format we have to have a short look at data types. Integral values are stored in the so-called little endian format. In this format the bytes that make out an integral value are stored in the "wrong" order. For example, the decimal value 4660 is 0x1234 in the hexadecimal notation. If you think this should be represented by a byte 0x12 followed by another byte 0x34, you are right. This is called the big endian format. In the little endian format, however, this order is reversed and the low-value byte comes first: 0x3412.(在深入研究属性集流格式的细节之前,我们必须先简要了解一下数据类型。整数值以所谓的小端格式存储。在这种格式中,构成整数值的字节以“错误”的顺序存储。例如,十进制值 4660 在十六进制表示法中是 0x1234。如果您认为这应该由一个字节 0x12 后跟另一个字节 0x34 来表示,那么您是对的。这称为大端格式。然而,在小端格式中,这个顺序是相反的,低值字节在前:0x3412。)

The following table gives an overview about some important data types:(下表概述了一些重要的数据类型:)

Name(名称) Length(长度) Example (Big Endian)(示例(大端)) Example (Little Endian)(示例(小尾数))
Bytes(字节) 1 byte(1 个字节) 0x12(0x12) 0x12(0x12)
Word(单词) 2 bytes(2 个字节) 0x1234(0x1234) 0x3412(0x3412)
DWord(双字) 4 bytes(4字节) 0x12345678(0x12345678) 0x78563412(0x78563412)
ClassID
A sequence of one DWord, two Words and eight Bytes
(ClassID 一个双字、两个字和八个字节的序列)
16 bytes(16 字节) 0xE0859FF2F94F6810AB9108002B27B3D9 resp. E0859FF2-F94F-6810-AB-91-08-00-2B-27-B3-D9(0xE0859FF2F94F6810AB9108002B27B3D9 分别E0859FF2-F94F-6810-AB-91-08-00-2B-27-B3-D9) 0xF29F85E04FF91068AB9108002B27B3D9 resp. F29F85E0-4FF9-1068-AB-91-08-00-2B-27-B3-D9(0xF29F85E04FF91068AB9108002B27B3D9 分别F29F85E0-4FF9-1068-AB-91-08-00-2B-27-B3-D9)
The ClassID examples are given here in two different notations. The second notation without the "0x" at the beginning and with dashes inside shows the internal grouping into one DWord, two Words and eight Bytes.(ClassID 示例在这里以两种不同的表示法给出。开头没有“0x”且内部带有破折号的第二种表示法显示了内部分组为一个双字、两个字和八个字节。) Watch out: Microsoft documentation and tools show class IDs a little bit differently like F29F85E0-4FF9-1068-AB91-08002B27B3D9. However, that representation is (intentionally?) misleading with respect to endianess.(注意:Microsoft 文档和工具显示的类 ID 有点不同,例如 F29F85E0-4FF9-1068-AB91-08002B27B3D9。但是,这种表示(故意?)在字节序方面具有误导性。)

HPSF Overview(HPSF 概述)

A property set stream consists of three main parts:(属性集流由三个主要部分组成:)

  1. The header and(标题和)
  2. the section(s) containing the properties.(包含属性的部分。)

The Header(标题)

The first bytes in a property set stream is the header. It has a fixed length and looks like this:(属性集流中的第一个字节是标头。它有一个固定的长度,看起来像这样:)

Offset(抵消) Type(类型) Contents(内容) Remarks(评论)
0(0) Word(单词) 0xFFFE(0xFFFE) If the first four bytes of a stream do not contain these values, the stream is not a property set stream.(如果流的前四个字节不包含这些值,则该流不是属性集流。)
2(2) Word(单词) 0x0000(0x0000)
4(4) DWord(双字) Denotes the operating system and the OS version under which this stream was created. The operating system ID is in the DWord's higher word (after little endian decoding): 0x0000 for Win16, 0x0001 for Macintosh and 0x0002 for Win32 - that's all. The reader is most likely aware of the fact that there are some more operating systems. However, Microsoft does not seem to know.(表示创建此流的操作系统和操作系统版本。操作系统 ID 在 DWord 的高位字中(经过小端解码后):Win16 为 0x0000,Macintosh 为 0x0001,Win32 为 0x0002 - 仅此而已。读者很可能知道还有更多的操作系统这一事实。但是,微软似乎并不知道。)
8(8) ClassID(班级号) 0x00000000000000000000000000000000(0x000000000000000000000000000000000) Most property set streams have this value but this is not required.(大多数属性集流都有这个值,但这不是必需的。)
24(24) DWord(双字) 0x01000000 or greater(0x01000000 或更大) Section count. This field's value should be equal to 1 or greater. Microsoft claims that this is a "reserved" field, but it seems to tell how many sections (see below) are following in the stream. This would really make sense because otherwise you could not know where and how far you should read section data.(节数。该字段的值应等于或大于 1。微软声称这是一个“保留”字段,但它似乎说明了流中有多少部分(见下文)。这真的很有意义,因为否则您将不知道应该在哪里以及应该读取多远的部分数据。)

Section List(部分列表)

Following the header is the section list. This is an array of pairs each consisting of a section format ID and an offset. This array has as many pairs of ClassID and and DWord fields as the section count field in the header says. The Summary Information stream contains a single section, the Document Summary Information stream contains two.(标题之后是部分列表。这是一个成对的数组,每对都由一个节格式 ID 和一个偏移量组成。该数组具有与标题中的节计数字段一样多的 ClassID 和 DWord 字段对。摘要信息流包含一个部分,文档摘要信息流包含两个部分。)

Type(类型) Contents(内容) Remarks(评论)
ClassID(班级号) Section format ID(部分格式 ID) 0xF29F85E04FF91068AB9108002B27B3D9 for the single section in the Summary Information stream.

0xD5CDD5022E9C101B939708002B2CF9AE for the first section in the Document Summary Information stream.
(0xF29F85E04FF91068AB9108002B27B3D9 用于摘要信息流中的单个部分。 0xD5CDD5022E9C101B939708002B2CF9AE 用于文档摘要信息流中的第一部分。)
DWord(双字) Offset(抵消) The number of bytes between the beginning of the stream and the beginning of the section within the stream.(流的开头和流中部分的开头之间的字节数。)
ClassID(班级号) Section format ID(部分格式 ID) ...(...)
DWord(双字) Offset(抵消) ...(...)
...(...) ...(...) ...(...)

Section(部分)

A section is divided into three parts: the section header (with the section length and the number of properties in the section), the properties list (with type and offset of each property), and the properties themselves. Here are the details:(一个部分分为三个部分:部分标题(包含部分长度和部分中的属性数量)、属性列表(包含每个属性的类型和偏移量)以及属性本身。以下是详细信息:)

  Type(类型) Contents(内容) Remarks(评论)
Section header(节标题) DWord(双字) Length(长度) The length of the section in bytes.(节的长度(以字节为单位)。)
DWord(双字) Property count(财产数) The number of properties in the section.(部分中的属性数量。)
Properties list(属性列表) DWord(双字) Property ID(物业编号) The property ID tells what the property means. For example, an ID of 0x0002 in the Summary Information stands for the document's title. See the Property IDs chapter below for more details.(属性 ID 说明属性的含义。例如,摘要信息中的 ID 0x0002 代表文档的标题。有关详细信息,请参阅下面的属性 ID 章节。)
DWord(双字) Offset(抵消) The number of bytes between the beginning of the section and the property.(节的开头和属性之间的字节数。)
...(...) ...(...) ...(...)
Properties(特性) DWord(双字) Property type ("variant")(属性类型(“变体”)) This is the property's data type, e.g. an integer value, a byte string or a Unicode string. See the Property Types chapter for details!(这是属性的数据类型,例如整数值、字节字符串或 Unicode 字符串。有关详细信息,请参阅属性类型一章!)
Field length depends on the property type ("variant")(字段长度取决于属性类型(“变体”)) Property value(适当的价值) This field's length depends on the property's type. These are the bytes that make out the DWord, the byte string or some other data of fixed or variable length.

The property value's length is always stored in an area which is a multiple of 4 in length. If the property is shorter, e.g. a byte string of 13 bytes, the remaining bytes are padded with 0x00 bytes.
(该字段的长度取决于属性的类型。这些是构成双字的字节、字节串或其他一些固定或可变长度的数据。属性值的长度始终存储在长度为 4 的倍数的区域中。如果属性更短,例如 13 字节的字节串,剩余的字节用 0x00 字节填充。)
...(...) ...(...) ...(...)

Property IDs(属性 ID)

As seen above, a section holds a property list: an array with property IDs and offsets. The property ID gives each property a meaning. For example, in the Summary Information stream the property ID 2 says that this property is the document's title.(如上所示,一个部分包含一个属性列表:一个包含属性 ID 和偏移量的数组。属性 ID 为每个属性赋予了含义。例如,在摘要信息流中,属性 ID 2 表示该属性是文档的标题。)

If you want to know a property ID's meaning, it is not sufficient to know the ID itself. You must also know the section format ID. For example, in the Document Summary Information stream the property ID 2 means not the document's title but its category. Due to Microsoft's infinite wisdom the section format ID is not part of the section. Thus if you have only a section without the stream it is in, you cannot make any sense of the properties because you do not know what they mean.(如果您想知道属性 ID 的含义,仅仅知道 ID 本身是不够的。您还必须知道部分格式 ID。例如,在 Document Summary Information 流中,属性 ID 2 表示的不是文档的标题,而是它的类别。由于 Microsoft 的无限智慧,部分格式 ID 不是部分的一部分。因此,如果您只有一个没有它所在的流的部分,您将无法理解这些属性,因为您不知道它们的含义。)

So each section format ID has its own name space of property IDs. Microsoft defined some "well-known" property IDs for the Summary Information and the Document Summary Information streams. You can extend them by your own additional IDs. This will be described below.(因此,每个部分格式 ID 都有自己的属性 ID 名称空间。 Microsoft 为摘要信息和文档摘要信息流定义了一些“众所周知的”属性 ID。您可以通过自己的附加 ID 扩展它们。这将在下面进行描述。)

Property IDs in The Summary Information Stream(摘要信息流中的属性 ID)

The Summary Information stream has a single section with a section format ID of 0xF29F85E04FF91068AB9108002B27B3D9. The following table defines the meaning of its property IDs. Each row associates a property ID with a name and an ID string. (The property type is just for informational purposes given here. As we have seen above, the type is always given along with the value.)(摘要信息流有一个节格式 ID 为 0xF29F85E04FF91068AB9108002B27B3D9 的节。下表定义了其属性 ID 的含义。每行将属性 ID 与名称和 ID 字符串相关联。 (这里给出的属性类型仅供参考。正如我们在上面看到的,类型总是与值一起给出。))

The property name is a readable string which could be displayed to the user. However, this string is useful only for users who understand English. The property name does not help with other languages.(属性名称是一个可读的字符串,可以显示给用户。但是,此字符串仅对懂英语的用户有用。属性名称对其他语言没有帮助。)

The property ID string is about the same but looks more technically and is nothing a user should bother with. You could the ID string and map it to an appropriate display string in a particular language. Of course you could do that with the property ID as well and with less overhead, but people (including software developers) tend to be better in remembering symbolic constants than remembering numbers.(属性 ID 字符串大致相同,但看起来更具技术性,用户无需费心。您可以将 ID 字符串映射到特定语言的适当显示字符串。当然,您也可以使用属性 ID 来做到这一点,而且开销更少,但是人们(包括软件开发人员)在记住符号常量方面往往比记住数字更好。)

Property ID(物业编号) Property Name(属性名称) Property ID String(属性 ID 字符串) Property Type(财产种类)
2(2) Title(标题) PID_TITLE(PID_TITLE) VT_LPSTR(VT_LPSTR)
3(3) Subject(主题) PID_SUBJECT(PID_SUBJECT) VT_LPSTR(VT_LPSTR)
4(4) Author(作者) PID_AUTHOR(PID_AUTHOR) VT_LPSTR(VT_LPSTR)
5(5) Keywords(关键词) PID_KEYWORDS(PID_KEYWORDS) VT_LPSTR(VT_LPSTR)
6(6) Comments(评论) PID_COMMENTS(PID_COMMENTS) VT_LPSTR(VT_LPSTR)
7(7) Template(模板) PID_TEMPLATE(PID_TEMPLATE) VT_LPSTR(VT_LPSTR)
8(8) Last Saved By(最后保存者) PID_LASTAUTHOR(PID_LASTAUTHOR) VT_LPSTR(VT_LPSTR)
9(9) Revision Number(修订号) PID_REVNUMBER(PID_REVNUMBER) VT_LPSTR(VT_LPSTR)
10(10) Total Editing Time(总编辑时间) PID_EDITTIME(PID_EDITTIME) VT_FILETIME(VT_FILETIME)
11(11) Last Printed(最后印刷) PID_LASTPRINTED(PID_LASTPRINTED) VT_FILETIME(VT_FILETIME)
12(12) Create Time/Date(创建时间/日期) PID_CREATE_DTM(PID_CREATE_DTM) VT_FILETIME(VT_FILETIME)
13(13) Last Saved Time/Date(上次保存时间/日期) PID_LASTSAVE_DTM(PID_LASTSAVE_DTM) VT_FILETIME(VT_FILETIME)
14(14) Number of Pages(页数) PID_PAGECOUNT(PID_PAGECOUNT) VT_I4(VT_I4)
15(15) Number of Words(字数) PID_WORDCOUNT(PID_WORDCOUNT) VT_I4(VT_I4)
16(16) Number of Characters(字符数) PID_CHARCOUNT(PID_CHARCOUNT) VT_I4(VT_I4)
17(17) Thumbnail(缩略图) PID_THUMBNAIL(PID_THUMBNAIL) VT_CF(VT_CF)
18(18) Name of Creating Application(创建应用程序名称) PID_APPNAME(PID_APPNAME) VT_LPSTR(VT_LPSTR)
19(19) Security(安全) PID_SECURITY(PID_SECURITY) VT_I4(VT_I4)

Property IDs in The Document Summary Information Stream(文档摘要信息流中的属性 ID)

The Document Summary Information stream has two sections with a section format ID of 0xD5CDD5022E9C101B939708002B2CF9AE for the first one. The following table defines the meaning of the property IDs in the first section. See the preceding section for interpreting the table.(文档摘要信息流有两个部分,第一个部分的部分格式 ID 为 0xD5CDD5022E9C101B939708002B2CF9AE。下表定义了第一部分中属性 ID 的含义。请参阅上一节以了解该表的解释。)

Property ID(物业编号) Property name(属性名称) Property ID string(属性 ID 字符串) VT type(VT型)
0(0) Dictionary(字典) PID_DICTIONARY(PID_DICTIONARY) [Special format]([特殊格式])
1(1) Code page(代码页) PID_CODEPAGE(PID_CODEPAGE) VT_I2(VT_I2)
2(2) Category(类别) PID_CATEGORY(PID_CATEGORY) VT_LPSTR(VT_LPSTR)
3(3) PresentationTarget(演示目标) PID_PRESFORMAT(PID_PRESFORMAT) VT_LPSTR(VT_LPSTR)
4(4) Bytes(字节) PID_BYTECOUNT(PID_BYTECOUNT) VT_I4(VT_I4)
5(5) Lines(线条) PID_LINECOUNT(PID_LINECOUNT) VT_I4(VT_I4)
6(6) Paragraphs(段落) PID_PARCOUNT(PID_PARCOUNT) VT_I4(VT_I4)
7(7) Slides(幻灯片) PID_SLIDECOUNT(PID_SLIDECOUNT) VT_I4(VT_I4)
8(8) Notes(笔记) PID_NOTECOUNT(PID_NOTECOUNT) VT_I4(VT_I4)
9(9) HiddenSlides(隐藏幻灯片) PID_HIDDENCOUNT(PID_HIDDENCOUNT) VT_I4(VT_I4)
10(10) MMClips(MMC剪辑) PID_MMCLIPCOUNT(PID_MMCLIPCOUNT) VT_I4(VT_I4)
11(11) ScaleCrop(缩放作物) PID_SCALE(PID_SCALE) VT_BOOL(VT_BOOL)
12(12) HeadingPairs(标题对) PID_HEADINGPAIR(PID_HEADINGPAIR) VT_VARIANT | VT_VECTOR(VT_VARIANT | VT_VECTOR)
13(13) TitlesofParts(零件标题) PID_DOCPARTS(PID_DOCPARTS) VT_LPSTR | VT_VECTOR(VT_LPSTR | VT_VECTOR)
14(14) Manager(经理) PID_MANAGER(PID_MANAGER) VT_LPSTR(VT_LPSTR)
15(15) Company(公司) PID_COMPANY(PID_公司) VT_LPSTR(VT_LPSTR)
16(16) LinksUpTo Date(LinksUpTo 日期) PID_LINKSDIRTY(PID_LINKSDIRTY) VT_BOOL(VT_BOOL)

Property Types(物业类型)

A property consists of a DWord type field followed by the property value. The property type is an integer value and tells how the data byte following it are to be interpreted. In the Microsoft world it is also known as the variant.(属性由后跟属性值的 DWord 类型字段组成。属性类型是一个整数值,并告诉它后面的数据字节将如何被解释。在 Microsoft 世界中,它也被称为变体。)

The Usage column says where a variant type may occur. Not all of them are allowed in a property set but just those marked with a [P]. [V] - may appear in a VARIANT, [T] - may appear in a TYPEDESC, [P] - may appear in an OLE property set, [S] - may appear in a Safe Array.(使用列说明可能出现变体类型的位置。并非所有这些都允许在属性集中,而只是那些标有 [P] 的。 [V] - 可能出现在 VARIANT 中,[T] - 可能出现在 TYPEDESC 中,[P] - 可能出现在 OLE 属性集中,[S] - 可能出现在 Safe Array 中。)

Variant ID(变体 ID) Variant Type(变体类型) Usage(用法) Description(描述)
0(0) VT_EMPTY(VT_EMPTY) [V] [P]([V][P]) nothing(没有什么)
1(1) VT_NULL(VT_NULL) [V] [P]([V][P]) SQL style Null(SQL 风格 Null)
2(2) VT_I2(VT_I2) [V] [T] [P] [S]([V] [T] [P] [S]) 2 byte signed int(2字节有符号整数)
3(3) VT_I4(VT_I4) [V] [T] [P] [S]([V] [T] [P] [S]) 4 byte signed int(4字节有符号整数)
4(4) VT_R4(VT_R4) [V] [T] [P] [S]([V] [T] [P] [S]) 4 byte real(4 字节实数)
5(5) VT_R8(VT_R8) [V] [T] [P] [S]([V] [T] [P] [S]) 8 byte real(8 字节实数)
6(6) VT_CY(VT_CY) [V] [T] [P] [S]([V] [T] [P] [S]) currency(货币)
7(7) VT_DATE(VT_DATE) [V] [T] [P] [S]([V] [T] [P] [S]) date(日期)
8(8) VT_BSTR(VT_BSTR) [V] [T] [P] [S]([V] [T] [P] [S]) OLE Automation string(OLE 自动化字符串)
9(9) VT_DISPATCH(VT_DISPATCH) [V] [T] [P] [S]([V] [T] [P] [S]) IDispatch *(IDispatch *)
10(10) VT_ERROR(VT_ERROR) [V] [T] [S]([V] [T] [S]) SCODE()
11(11) VT_BOOL(VT_BOOL) [V] [T] [P] [S]([V] [T] [P] [S]) True=-1, False=0(真=-1,假=0)
12(12) VT_VARIANT(VT_VARIANT) [V] [T] [P] [S]([V] [T] [P] [S]) VARIANT *(变体 *)
13(13) VT_UNKNOWN(VT_UNKNOWN) [V] [T] [S]([V] [T] [S]) IUnknown *(我不知道 *)
14(14) VT_DECIMAL(VT_DECIMAL) [V] [T] [S]([V] [T] [S]) 16 byte fixed point(16字节定点)
16(16) VT_I1(VT_I1) [T]([T]) signed char(签名字符)
17(17) VT_UI1(VT_UI1) [V] [T] [P] [S]([V] [T] [P] [S]) unsigned char(无符号的字符)
18(18) VT_UI2(VT_UI2) [T] [P]([T] [P]) unsigned short(无符号短)
19(19) VT_UI4(VT_UI4) [T] [P]([T] [P]) unsigned short(无符号短)
20(20) VT_I8(VT_I8) [T] [P]([T] [P]) signed 64-bit int(有符号的 64 位整数)
21(21) VT_UI8(VT_UI8) [T] [P]([T] [P]) unsigned 64-bit int(无符号 64 位整数)
22(22) VT_INT(VT_INT) [T]([T]) signed machine int(签名机器int)
23(23) VT_UINT(VT_UINT) [T]([T]) unsigned machine int(无符号机器整数)
24(24) VT_VOID(VT_VOID) [T]([T]) C style void(C 风格 void)
25(25) VT_HRESULT(VT_HRESULT) [T]([T]) Standard return type(标准返回类型)
26(26) VT_PTR(VT_PTR) [T]([T]) pointer type(指针类型)
27(27) VT_SAFEARRAY(VT_SAFEARRAY) [T]([T]) (use VT_ARRAY in VARIANT)((在 VARIANT 中使用 VT_ARRAY))
28(28) VT_CARRAY(VT_CARRAY) [T]([T]) C style array(C 样式数组)
29(29) VT_USERDEFINED(VT_USERDEFINED) [T]([T]) user defined type(用户定义类型)
30(30) VT_LPSTR(VT_LPSTR) [T] [P]([T] [P]) null terminated string(以空结尾的字符串)
31(31) VT_LPWSTR(VT_LPWSTR) [T] [P]([T] [P]) wide null terminated string(宽空终止字符串)
64(64) VT_FILETIME(VT_FILETIME) [P]([P]) FILETIME(文件时间)
65(65) VT_BLOB(VT_BLOB) [P]([P]) Length prefixed bytes(长度前缀字节)
66(66) VT_STREAM(VT_STREAM) [P]([P]) Name of the stream follows(流的名称如下)
67(67) VT_STORAGE(VT_STORAGE) [P]([P]) Name of the storage follows(存储名称如下)
68(68) VT_STREAMED_OBJECT(VT_STREAMED_OBJECT) [P]([P]) Stream contains an object(流包含一个对象)
69(69) VT_STORED_OBJECT(VT_STORED_OBJECT) [P]([P]) Storage contains an object(存储包含一个对象)
70(70) VT_BLOB_OBJECT(VT_BLOB_OBJECT) [P]([P]) Blob contains an object(Blob 包含一个对象)
71(71) VT_CF(VT_CF) [P]([P]) Clipboard format(剪贴板格式)
72(72) VT_CLSID(VT_CLSID) [P]([P]) A Class ID(类 ID)
0x1000(0x1000) VT_VECTOR(VT_VECTOR) [P]([P]) simple counted array(简单计数数组)
0x2000(0x2000) VT_ARRAY(VT_ARRAY) [V]([五]) SAFEARRAY*(SAFEARRAY*)
0x4000(0x4000) VT_BYREF(VT_BYREF) [V]([五]) void* for local use(void* 供本地使用)
0x8000(0x8000) VT_RESERVED(VT_RESERVED)

0xFFFF(0xFFFF) VT_ILLEGAL(VT_非法)

0xFFF(0xFFF) VT_ILLEGALMASKED(VT_ILLEGALMASKED)

0xFFF(0xFFF) VT_TYPEMASK(VT_TYPEMASK)

The Dictionary(词典)

What a dictionary is good for is explained in the HPSF HOW-TO. This chapter explains how it is organized internally.(HPSF HOW-TO 中解释了字典的用途。本章解释了它是如何在内部组织的。)

The dictionary has a simple header consisting of a single UInt value. It tells how many entries the dictionary comprises:(该字典有一个由单个 UInt 值组成的简单标题。它告诉字典包含多少条目:)

Name(名称) Data type(数据类型) Description(描述)
nrEntries(条目) UInt(UInt) Number of dictionary entries(字典条目数)

The dictionary entries follow the header. Each one looks like this:(字典条目跟在标题后面。每个看起来像这样:)

Name(名称) Data type(数据类型) Description(描述)
key(钥匙) UInt(UInt) The unique number of this property, i.e. the PID(该属性的唯一编号,即PID)
length(长度) UInt(UInt) The length of the property name associated with the key(与键关联的属性名称的长度)
value(价值) String(细绳) The property's name, terminated with a 0x00 character(属性名称,以 0x00 字符结尾)

The entries are not aligned, i.e. each one follows its predecessor without any gap or fill characters.(条目未对齐,即每个条目都跟随其前身,没有任何间隙或填充字符。)

References(参考)

In order to assemble the HPSF description I used information publically available on the Internet only. The references given below have been very helpful. If you have any amendments or corrections, please let us know! Thank you!(为了汇编 HPSF 描述,我使用了仅可在 Internet 上公开获得的信息。下面给出的参考资料非常有帮助。如果您有任何修改或更正,请告诉我们!谢谢!)

  1. In Understanding OLE documents, Ken Kyler gives an introduction to OLE2 documents and especially to property sets. He names the property names, types, and IDs of the Summary Information and Document Summary Information stream.(在了解 OLE 文档中,Ken Kyler 介绍了 OLE2 文档,尤其是属性集。他命名了摘要信息和文档摘要信息流的属性名称、类型和 ID。)
  2. The ActiveX Programmer's Reference at http://www.dwam.net/docs/oleref/ seems a little outdated, but that's what I have found.(http://www.dwam.net/docs/oleref/ 上的 ActiveX Programmer's Reference 似乎有点过时,但这就是我发现的。)
  3. An overview of the VT_ types is in Variant Type Definitions.(VT_ 类型的概述在 Variant Type Definitions 中。)
  4. What is a FILETIME? The answer can be found under , http://www.vbapi.com/ref/f/filetime.html or http://www.cs.rpi.edu/courses/fall01/os/FILETIME.html. In short: The FILETIME structure holds a date and time associated with a file. The structure identifies a 64-bit integer specifying the number of 100-nanosecond intervals which have passed since January 1, 1601. This 64-bit value is split into the two dwords stored in the structure.(什么是文件时间?答案可以在 http://www.vbapi.com/ref/f/filetime.html 或 http://www.cs.rpi.edu/courses/fall01/os/FILETIME.html 下找到。简而言之:FILETIME 结构包含与文件关联的日期和时间。该结构标识一个 64 位整数,指定自 1601 年 1 月 1 日以来经过的 100 纳秒间隔的数量。这个 64 位值被拆分为存储在结构中的两个 dword。)
  5. Microsoft provides some public information in the MSDN Library. Use the search function to try to find what you are looking for, e.g. "codepage" or "document summary information" etc.(Microsoft 在 MSDN Library 中提供了一些公共信息。使用搜索功能尝试查找您要查找的内容,例如“代码页”或“文档摘要信息”等。)

by Rainer Klute(作者:Rainer Klute)

 
中英文 | 中文 | 英文