HPSF HOW-TO(HPSF 操作方法)

How To Use the HPSF API(如何使用 HPSF API)

This HOW-TO is organized in four sections. You should read them sequentially because the later sections build upon the earlier ones.(本 HOW-TO 分为四个部分。您应该按顺序阅读它们，因为后面的部分建立在前面的部分之上。)

The first section explains how to read the most important standard properties of a Microsoft Office document. Standard properties are things like title, author, creation date etc. It is quite likely that you will find here what you need and don't have to read the other sections.(第一部分解释了如何阅读 Microsoft Office 文档最重要的标准属性。标准属性是标题、作者、创建日期等。您很可能会在这里找到您需要的内容，而不必阅读其他部分。)
The second section goes a small step further and focuses on reading additional standard properties. It also talks about exceptions that may be thrown when dealing with HPSF and shows how you can read properties of embedded objects.(第二部分更进一步，侧重于阅读其他标准属性。它还讨论了处理 HPSF 时可能引发的异常，并展示了如何读取嵌入对象的属性。)
The third section explains how to write standard properties. HPSF provides some high-level classes and methods which make writing of standard properties easy. They are based on the low-level writing functions explained in the fifth section.(第三部分解释了如何编写标准属性。 HPSF 提供了一些高级类和方法，使标准属性的编写变得容易。它们基于第五节中解释的低级写入函数。)
The fourth section tells how to read non-standard properties. Non-standard properties are application-specific triples consisting of an ID, a type, and a value.(第四部分讲述如何阅读非标准属性。非标准属性是由 ID、类型和值组成的特定于应用程序的三元组。)
The fifth section tells you how to write property set streams using HPSF's low-level methods. You have to understand the fourth section before you should think about low-level writing properties. Check the Javadoc API documentation to find out about the details!(第五部分告诉您如何使用 HPSF 的低级方法编写属性集流。在考虑低级写作属性之前，您必须了解第四部分。查看 Javadoc API 文档以了解详细信息！)

Note (注意)

Please note: HPSF's writing functionality is not present in POI releases up to and including 2.5. In order to write properties you have to download a 3.0.x POI release, or retrieve the POI development version from the Subversion repository. (请注意：HPSF 的写入功能在 POI 版本（包括 2.5）中不存在。为了编写属性，您必须下载 3.0.x POI 版本，或者从 Subversion 存储库中检索 POI 开发版本。)

Reading Standard Properties(读取标准属性)

Note (注意)

This section explains how to read the most important standard properties of a Microsoft Office document. Standard properties are things like title, author, creation date etc. This section introduces the summary information stream which is used to keep these properties. Chances are that you will find here what you need and don't have to read the other sections. (本节说明如何阅读 Microsoft Office 文档最重要的标准属性。标准属性是标题、作者、创建日期等。本节介绍用于保存这些属性的摘要信息流。您可能会在这里找到您需要的内容，而无需阅读其他部分。)

If all you are interested in is getting the textual content of all the document properties, such as for full text indexing, then take a look at org.apache.poi.hpsf.extractor.HPSFPropertiesExtractor. However, if you want full access to the properties, please read on!(如果您只对获取所有文档属性的文本内容（例如全文索引）感兴趣，请查看 org.apache.poi.hpsf.extractor.HPSFPropertiesExtractor。但是，如果您想完全访问这些属性，请继续阅读！)

The first thing you should understand is that a Microsoft Office file is not one large bunch of bytes but has an internal filesystem structure with files and directories. You can access these files and directories using the POI filesystem (POIFS) provides. A file or document in a POI filesystem is also called a stream - The properties of, say, an Excel document are stored apart of the actual spreadsheet data in separate streams. The good new is that this separation makes the properties independent of the concrete Microsoft Office file. In the following text we will always say "POI filesystem" instead of "Microsoft Office file" because a POI filesystem is not necessarily created by or for a Microsoft Office application, because it is shorter, and because we want to avoid the name of That Redmond Company.(您应该了解的第一件事是，Microsoft Office 文件不是一大堆字节，而是具有包含文件和目录的内部文件系统结构。您可以使用 POI 文件系统 (POIFS) 提供的访问这些文件和目录。 POI 文件系统中的文件或文档也称为流 - 例如，Excel 文档的属性与实际电子表格数据分开存储在单独的流中。好消息是这种分离使属性独立于具体的 Microsoft Office 文件。在以下文本中，我们将始终使用“POI 文件系统”而不是“Microsoft Office 文件”，因为 POI 文件系统不一定由 Microsoft Office 创建。因为它更简短，且我们希望避免使用 That雷德蒙公司的名称。)

The following example shows how to read the "title" property. Reading other properties is similar. Consider the API documentation of the class org.apache.poi.hpsf.SummaryInformation to learn which methods are available.(以下示例显示了如何读取“title”属性。读取其他属性与之类似。考虑类 org.apache.poi.hpsf.SummaryInformation 的 API 文档以了解哪些方法可用。)

The standard properties this section focuses on can be found in a document called \005SummaryInformation located in the root of the POI filesystem. The notation \005 in the document's name means the character with a decimal value of 5. In order to read the "title" property, an application has to perform the following steps:(本节重点介绍的标准属性可以在位于 POI 文件系统根目录的名为 005SummaryInformation 的文档中找到。文档名称中的符号 005 表示十进制值为 5 的字符。为了读取“title”属性，应用程序必须执行以下步骤：)

Open the document \005SummaryInformation located in the root of the POI filesystem.(打开位于 POI 文件系统根目录中的文档 005SummaryInformation。)
Create an instance of the class SummaryInformation from that document.(从该文档创建类 SummaryInformation 的实例。)
Call the SummaryInformation instance's getTitle() method.(调用 SummaryInformation 实例的 getTitle() 方法。)

Sounds easy, doesn't it? Here are the steps in detail.(听起来很容易，不是吗？以下是详细步骤。)

Open the document \005SummaryInformation in the root of the POI filesystem(打开 POI 文件系统根目录下的文档 005SummaryInformation)

An application that wants to open a document in a POI filesystem (POIFS) proceeds as shown by the following code fragment. The full source code of the sample application is available in the examples section of the POI source tree as ReadTitle.java.(想要在 POI 文件系统 (POIFS) 中打开文档的应用程序按照以下代码片段所示进行。示例应用程序的完整源代码可在 POI 源代码树的示例部分中获得，即 ReadTitle.java。)

 
       import java.io.*; 
      
       import org.apache.poi.hpsf.*; 
      
       import org.apache.poi.poifs.eventfilesystem.*; 
      
       // ... 
      
       // ... 
      
       public static void main(String[] args) 
      
        throws IOException 
      
       { 
      
        final String filename = args[0]; 
      
        POIFSReader r = new POIFSReader(); 
      
        r.registerListener(new MyPOIFSReaderListener(), 
      
        "\005SummaryInformation"); 
      
        r.read(new FileInputStream(filename)); 
      
       }

The first interesting statement is(第一个有趣的说法是)

POIFSReader r = new POIFSReader();

It creates a org.apache.poi.poifs.eventfilesystem.POIFSReader instance which we shall need to read the POI filesystem. Before the application actually opens the POI filesystem we have to tell the POIFSReader which documents we are interested in. In this case the application should do something with the document \005SummaryInformation.(它创建了一个 org.apache.poi.poifs.eventfilesystem.POIFSReader 实例，我们需要读取 POI 文件系统。在应用程序真正打开 POI 文件系统之前，我们必须告诉 POIFSReader 我们对哪些文档感兴趣。在这种情况下，应用程序应该对文档 005SummaryInformation 执行一些操作。)

 
       r.registerListener(new MyPOIFSReaderListener(), 
      
        "\005SummaryInformation");

This method call registers a org.apache.poi.poifs.eventfilesystem.POIFSReaderListener with the POIFSReader. The POIFSReaderListener interface specifies the method processPOIFSReaderEvent() which processes a document. The class MyPOIFSReaderListener implements the POIFSReaderListener and thus the processPOIFSReaderEvent() method. The eventing POI filesystem calls this method when it finds the \005SummaryInformation document. In the sample application MyPOIFSReaderListener is a static class in the ReadTitle.java source file.(此方法调用向 POIFSReader 注册 org.apache.poi.poifs.eventfilesystem.POIFSReaderListener。 POIFSReaderListener 接口指定处理文档的方法 processPOIFSReaderEvent()。 MyPOIFSReaderListener 类实现 POIFSReaderListener 并因此实现 processPOIFSReaderEvent() 方法。事件 POI 文件系统在找到 005SummaryInformation 文档时调用此方法。在示例应用程序中，MyPOIFSReaderListener 是 ReadTitle.java 源文件中的一个静态类。)

Now everything is prepared and reading the POI filesystem can start:(现在一切准备就绪，可以开始读取 POI 文件系统：)

r.read(new FileInputStream(filename));

The following source code fragment shows the MyPOIFSReaderListener class and how it retrieves the title.(以下源代码片段显示了 MyPOIFSReaderListener 类以及它如何检索标题。)

 
       static class MyPOIFSReaderListener implements POIFSReaderListener 
      
       { 
      
        public void processPOIFSReaderEvent(POIFSReaderEvent event) 
      
        { 
      
        SummaryInformation si = null; 
      
        try 
      
        { 
      
        si = (SummaryInformation) 
      
        PropertySetFactory.create(event.getStream()); 
      
        } 
      
        catch (Exception ex) 
      
        { 
      
        throw new RuntimeException 
      
        ("Property set stream \"" + 
      
        event.getPath() + event.getName() + "\": " + ex); 
      
        } 
      
        final String title = si.getTitle(); 
      
        if (title != null) 
      
        System.out.println("Title: \"" + title + "\""); 
      
        else 
      
        System.out.println("Document has no title."); 
      
        } 
      
       }

The line(线)

SummaryInformation si = null;

declares a SummaryInformation variable and initializes it with null. We need an instance of this class to access the title. The instance is created in a try block:(声明一个 SummaryInformation 变量并将其初始化为 null。我们需要这个类的一个实例来访问标题。该实例在 try 块中创建：)

si = (SummaryInformation)

PropertySetFactory.create(event.getStream());

The expression event.getStream() returns the input stream containing the bytes of the property set stream named \005SummaryInformation. This stream is passed into the create method of the factory class org.apache.poi.hpsf.PropertySetFactory which returns a org.apache.poi.hpsf.PropertySet instance. It is more or less safe to cast this result to SummaryInformation, a convenience class with methods like getTitle(), getAuthor() etc.(表达式 event.getStream() 返回包含名为 005SummaryInformation 的属性集流的字节的输入流。此流被传递到工厂类 org.apache.poi.hpsf.PropertySetFactory 的 create 方法中，该方法返回 org.apache.poi.hpsf.PropertySet 实例。将此结果转换为 SummaryInformation 或多或少是安全的，SummaryInformation 是一个具有 getTitle()、getAuthor() 等方法的便利类。)

The PropertySetFactory.create() method may throw all sorts of exceptions. We'll deal with them in the next sections. For now we just catch all exceptions and throw a RuntimeException containing the message text of the origin exception.(PropertySetFactory.create() 方法可能会抛出各种异常。我们将在下一节中处理它们。现在我们只捕获所有异常并抛出一个包含原始异常消息文本的 RuntimeException。)

If all goes well, the sample application retrieves the title and prints it to the standard output. As you can see you must be prepared for the case that the POI filesystem does not have a title.(如果一切顺利，示例应用程序将检索标题并将其打印到标准输出。如您所见，您必须为 POI 文件系统没有标题的情况做好准备。)

 
       final String title = si.getTitle(); 
      
       if (title != null) 
      
        System.out.println("Title: \"" + title + "\""); 
      
       else 
      
        System.out.println("Document has no title.");

Please note that a POI filesystem does not necessarily contain the \005SummaryInformation stream. The documents created by the Microsoft Office suite have one, as far as I know. However, an Excel spreadsheet exported from StarOffice 5.2 won't have a \005SummaryInformation stream. In this case the applications won't throw an exception but simply does not call the processPOIFSReaderEvent method. You have been warned!(请注意，POI 文件系统不一定包含 005SummaryInformation 流。据我所知，Microsoft Office 套件创建的文档有一个。但是，从 StarSuite 5.2 导出的 Excel 电子表格不会有 005SummaryInformation 流。在这种情况下，应用程序不会抛出异常，只是因为不会调用 processPOIFSReaderEvent 方法。你被警告了而已！)

Additional Standard Properties, Exceptions And Embedded Objects(其他标准属性、异常和嵌入对象)

Note (注意)

This section focusses on reading additional standard properties which are kept in the document summary information stream. It also talks about exceptions that may be thrown when dealing with HPSF and shows how you can read properties of embedded objects. (本节重点阅读保存在文档摘要信息流中的其他标准属性。它还讨论了处理 HPSF 时可能引发的异常，并展示了如何读取嵌入对象的属性。)

A couple of additional standard properties are not contained in the \005SummaryInformation stream explained above. Examples for such properties are a document's category or the number of multimedia clips in a PowerPoint presentation. Microsoft has invented an additional stream named \005DocumentSummaryInformation to hold these properties. With two minor exceptions you can proceed exactly as described above to read the properties stored in \005DocumentSummaryInformation:()

Instead of \005SummaryInformation use \005DocumentSummaryInformation as the stream's name.(代替 005SummaryInformation 使用 005DocumentSummaryInformation 作为流的名称。)
Replace all occurrences of the class SummaryInformation by DocumentSummaryInformation.(用 DocumentSummaryInformation 替换所有出现的类 SummaryInformation。)

And of course you cannot call getTitle() because DocumentSummaryInformation has different query methods, e.g. getCategory. See the Javadoc API documentation for the details.(当然你不能调用getTitle() 因为DocumentSummaryInformation 有不同的查询方法，例如getCategory。有关详细信息，请参阅 Javadoc API 文档。)

In the previous section the application simply caught all exceptions and was in no way interested in any details. However, a real application will likely want to know what went wrong and act appropriately. Besides any I/O exceptions there are three HPSF resp. POI specific exceptions you should know about:()

NoPropertySetStreamException:: This exception is thrown if the application tries to create a PropertySet instance from a stream that is not a property set stream. ( SummaryInformation and DocumentSummaryInformation are subclasses of PropertySet.) A faulty property set stream counts as not being a property set stream at all. An application should be prepared to deal with this case even if it opens streams named \005SummaryInformation or \005DocumentSummaryInformation. These are just names. A stream's name by itself does not ensure that the stream contains the expected contents and that this contents is correct.
UnexpectedPropertySetTypeException: This exception is thrown if a certain type of property set is expected somewhere (e.g. a SummaryInformation or DocumentSummaryInformation) but the provided property set is not of that type.
MarkUnsupportedException: This exception is thrown if an input stream that is to be parsed into a property set does not support the InputStream.mark(int) operation. The POI filesystem uses the DocumentInputStream class which does support this operation, so you are safe here. However, if you read a property set stream from another kind of input stream things may be different.

Many Microsoft Office documents contain embedded objects, for example an Excel sheet within a Word document. Embedded objects may have property sets of their own. An application can open these property set streams as described above. The only difference is that they are not located in the POI filesystem's root but in a nested directory instead. Just register a POIFSReaderListener for the property set streams you are interested in.()

Writing Standard Properties(编写标准属性)

Note (注意)

This section explains how to write standard properties. HPSF provides some high-level classes and methods which make writing of standard properties easy. They are based on the low-level writing functions explained in another section. (本节说明如何编写标准属性。 HPSF 提供了一些高级类和方法，使标准属性的编写变得容易。它们基于另一部分中解释的低级写入函数。)

As explained above, standard properties are located in the summary information and document summary information streams of typical POI filesystems. You have already learned about the classes SummaryInformation and DocumentSummaryInformation and their get...() methods for reading standard properties. These classes also provide set...() methods for writing properties.()

After setting properties in SummaryInformation or DocumentSummaryInformation you have to write them to a disk file. The following sample program shows how you can()

read a disk file into a POI filesystem,(将磁盘文件读入 POI 文件系统，)
read the document summary information from the POI filesystem,(从 POI 文件系统读取文档摘要信息，)
set a property to a new value,(将属性设置为新值，)
write the modified document summary information back to the POI filesystem, and(将修改后的文档摘要信息写回 POI 文件系统，以及)
write the POI filesystem to a disk file.(将 POI 文件系统写入磁盘文件。)

The complete source code of this program is available as ModifyDocumentSummaryInformation.java in the examples section of the POI source tree.(该程序的完整源代码可在 POI 源代码树的示例部分中以 ModifyDocumentSummaryInformation.java 的形式获得。)

Note (注意)

Dealing with the summary information stream is analogous to handling the document summary information and therefore does not need to be explained here in detailed. See the HPSF API documentation to learn about the set...() methods of the class SummaryInformation. (对摘要信息流的处理类似于对文档摘要信息的处理，在此不再赘述。请参阅 HPSF API 文档以了解 SummaryInformation 类的 set...() 方法。)

The first step is to read the POI filesystem into memory:(第一步是将 POI 文件系统读入内存：)

 
       InputStream is = new FileInputStream(poiFilesystem); 
      
       POIFSFileSystem poifs = new POIFSFileSystem(is); 
      
       is.close();

The code snippet above assumes that the variable poiFilesystem holds the name of a disk file. It reads the file from an input stream and creates a POIFSFileSystem object in memory. After having read the file, the input stream should be closed as shown.()

In order to read the document summary information stream the application must open the element \005DocumentSummaryInformation in the POI filesystem's root directory. However, the POI filesystem does not necessarily contain a document summary information stream, and the application should be able to deal with that situation. The following code does so by creating a new DocumentSummaryInformation if there is none in the POI filesystem:()

 
       DirectoryEntry dir = poifs.getRoot(); 
      
       DocumentSummaryInformation dsi; 
      
       try 
      
       { 
      
        DocumentEntry dsiEntry = (DocumentEntry) 
      
        dir.getEntry(DocumentSummaryInformation.DEFAULT_STREAM_NAME); 
      
        DocumentInputStream dis = new DocumentInputStream(dsiEntry); 
      
        PropertySet ps = new PropertySet(dis); 
      
        dis.close(); 
      
        dsi = new DocumentSummaryInformation(ps); 
      
       } 
      
       catch (FileNotFoundException ex) 
      
       { 
      
        /* There is no document summary information. We have to create a 
      
        * new one. */ 
      
        dsi = PropertySetFactory.newDocumentSummaryInformation(); 
      
       }

In the source code above the statement(在上面的源代码中声明)

DirectoryEntry dir = poifs.getRoot();

gets hold of the POI filesystem's root directory as a DirectoryEntry. The getEntry() method of this class is used to access a file or directory entry in a directory. However, if the file to be opened does not exist, a FileNotFoundException will be thrown. Therefore opening the document summary information entry should be done in a try block:(获取 POI 文件系统的根目录作为 DirectoryEntry。此类的 getEntry() 方法用于访问目录中的文件或目录条目。但是，如果要打开的文件不存在，则会抛出 FileNotFoundException。因此打开文档摘要信息条目应该在 try 块中完成：)

DocumentEntry dsiEntry = (DocumentEntry)

dir.getEntry(DocumentSummaryInformation.DEFAULT_STREAM_NAME);

DocumentSummaryInformation.DEFAULT_STREAM_NAME represents the string "\005DocumentSummaryInformation", i.e. the standard name of a document summary information stream. If this stream exists, the getEntry() method returns a DocumentEntry. To read the DocumentEntry's contents, create a DocumentInputStream:()

DocumentInputStream dis = new DocumentInputStream(dsiEntry);

Up to this point we have used POI's POIFS component. Now HPSF enters the stage. A property set is created from the input stream's data:()

 
        PropertySet ps = new PropertySet(dis); 
      
        dis.close(); 
      
        dsi = new DocumentSummaryInformation(ps);

If the data really constitutes a property set, a PropertySet object is created. Otherwise a NoPropertySetStreamException is thrown. After having read the data from the input stream the latter should be closed.(如果数据确实构成了一个属性集，则创建一个 PropertySet 对象。否则抛出 NoPropertySetStreamException。从输入流中读取数据后，后者应关闭。)

Since we know - or at least hope - that the stream named "\005DocumentSummaryInformation" is not just any property set but really contains the document summary information, we try to create a new DocumentSummaryInformation from the property set. If the stream is not document summary information stream the sample application fails with a UnexpectedPropertySetTypeException.(由于我们知道 - 或者至少希望 - 名为“005DocumentSummaryInformation”的流不仅仅是任何属性集，而是真正包含文档摘要信息，因此我们尝试从属性集创建一个新的 DocumentSummaryInformation。如果流不是文档摘要信息流，则示例应用程序将失败并出现 UnexpectedPropertySetTypeException。)

If the POI document does not contain a document summary information stream, we can create a new one in the catch clause. The PropertySetFactory's method newDocumentSummaryInformation() establishes a new and empty DocumentSummaryInformation instance:(如果 POI 文档不包含文档摘要信息流，我们可以在 catch 子句中创建一个新的。 PropertySetFactory 的方法 newDocumentSummaryInformation() 建立一个新的空 DocumentSummaryInformation 实例：)

dsi = PropertySetFactory.newDocumentSummaryInformation();

Whether we read the document summary information from the POI filesystem or created it from scratch, in either case we now have a DocumentSummaryInformation instance we can write to. Writing is quite simple, as the following line of code shows:(无论我们是从 POI 文件系统读取文档摘要信息还是从头开始创建它，在任何一种情况下，我们现在都有一个可以写入的 DocumentSummaryInformation 实例。写起来很简单，如以下代码行所示：)

dsi.setCategory("POI example");

This statement sets the "category" property to "POI example". Any former "category" value will be lost. If there hasn't been a "category" property yet, a new one will be created.(此语句将“类别”属性设置为“POI 示例”。任何以前的“类别”值都将丢失。如果还没有“类别”属性，则会创建一个新属性。)

DocumentSummaryInformation of course has methods to set the other standard properties, too - look into the API documentation to see all of them.(DocumentSummaryInformation 当然也有设置其他标准属性的方法 - 查看 API 文档以查看所有这些属性。)

Once all properties are set as needed, they should be stored into the file on disk. The first step is to write the DocumentSummaryInformation into the POI filesystem:(根据需要设置所有属性后，应将它们存储到磁盘上的文件中。第一步是将 DocumentSummaryInformation 写入 POI 文件系统：)

dsi.write(dir, DocumentSummaryInformation.DEFAULT_STREAM_NAME);

The DocumentSummaryInformation's write() method takes two parameters: The first is the DirectoryEntry in the POI filesystem, the second is the name of the stream to create in the directory. If this stream already exists, it will be overwritten.(DocumentSummaryInformation 的 write() 方法有两个参数：第一个是 POI 文件系统中的 DirectoryEntry，第二个是要在目录中创建的流的名称。如果这个流已经存在，它将被覆盖。)

Note (注意)

If you not only modified the document summary information but also the summary information you have to write both of them to the POI filesystem. (如果您不仅修改了文档摘要信息，还修改了摘要信息，您必须将它们都写入 POI 文件系统。)

Still the POI filesystem is a data structure in memory only and must be written to a disk file to make it permanent. The following lines write back the POI filesystem to the file it was read from before. Please note that in production-quality code you should never write directly to the origin file, because in case of an error everything would be lost. Here it is done this way to keep the example short.(POI 文件系统仍然只是内存中的数据结构，必须写入磁盘文件以使其永久化。以下行将 POI 文件系统写回到之前读取的文件中。请注意，在生产质量代码中，您永远不应该直接写入原始文件，因为如果出现错误，一切都会丢失。此处以这种方式完成以保持示例简短。)

 
       OutputStream out = new FileOutputStream(poiFilesystem); 
      
       poifs.writeFilesystem(out); 
      
       out.close();

User-Defined Properties(用户定义的属性)

If you compare the source code excerpts above with the file containing the full source code, you will notice that I left out some following lines of code. The are dealing with the special topic of custom properties.(如果您将上面的源代码摘录与包含完整源代码的文件进行比较，您会注意到我遗漏了以下几行代码。正在处理自定义属性的特殊主题。)

 
       DocumentSummaryInformation dsi = ... 
      
       ... 
      
       CustomProperties customProperties = dsi.getCustomProperties(); 
      
       if (customProperties == null) 
      
        customProperties = new CustomProperties(); 
      
       /* Insert some custom properties into the container. */ 
      
       customProperties.put("Key 1", "Value 1"); 
      
       customProperties.put("Schlüssel 2", "Wert 2"); 
      
       customProperties.put("Sample Number", new Integer(12345)); 
      
       customProperties.put("Sample Boolean", new Boolean(true)); 
      
       customProperties.put("Sample Date", new Date()); 
      
       /* Read a custom property. */ 
      
       Object value = customProperties.get("Sample Number"); 
      
       /* Write the custom properties back to the document summary 
      
        * information. */ 
      
       dsi.setCustomProperties(customProperties);

Custom properties are properties the user can define himself. Using for example Microsoft Word he can define these extra properties and give each of them a name, a type and a value. The custom properties are stored in the document information summary along with the standard properties.()

The source code example shows how to retrieve the custom properties as a whole from a DocumentSummaryInformation instance using the getCustomProperties() method. The result is a CustomProperties instance or null if no user-defined properties exist.(源代码示例展示了如何使用 getCustomProperties() 方法从 DocumentSummaryInformation 实例中检索整个自定义属性。如果不存在用户定义的属性，则结果是 CustomProperties 实例或 null。)

Since CustomProperties implements the Map interface you can read and write properties with the usual Map methods. However, CustomProperties poses some restrictions on the types of keys and values.()

The key is a string.(键是一个字符串。)
The value is one of String, Boolean, Long, Integer, Short, or java.util.Date.(该值是 String、Boolean、Long、Integer、Short 或 java.util.Date 之一。)

The CustomProperties class has been designed for easy access using just keys and values. The underlying Microsoft-specific custom properties data structure is more complicated. However, it does not provide noteworthy additional benefits. It is possible to have multiple properties with the same name or properties without a name at all. When reading custom properties from a document summary information stream, the CustomProperties class ignores properties without a name and keeps only the "last" (whatever that means) of those properties having the same name. You can find out whether a CustomProperties instance dropped any properties with the isPure() method.(CustomProperties 类的设计目的是仅使用键和值即可轻松访问。底层的 Microsoft 特定的自定义属性数据结构更加复杂。但是，它没有提供值得注意的额外好处。可以有多个具有相同名称的属性或根本没有名称的属性。从文档摘要信息流中读取自定义属性时，CustomProperties 类会忽略没有名称的属性，并且仅保留具有相同名称的那些属性的“最后一个”（无论这意味着什么）。您可以使用 isPure() 方法查看 CustomProperties 实例是否删除了任何属性。)

You can read and write the full spectrum of custom properties with HPSF's low-level methods. They are explained in the next section.(您可以使用 HPSF 的低级方法读取和写入所有自定义属性。它们将在下一节中解释。)

Reading Non-Standard Properties(读取非标准属性)

Note (注意)

This section tells how to read non-standard properties. Non-standard properties are application-specific ID/type/value triples. (本节介绍如何读取非标准属性。非标准属性是特定于应用程序的 ID/类型/值三元组。)

Overview(概述)

Now comes the real hardcode stuff. As mentioned above, SummaryInformation and DocumentSummaryInformation are just special cases of the general concept of a property set. This concept says that a property set consists of properties and that each property is an entity with an ID, a type, and a value.(现在是真正的硬编码的东西。如上所述，SummaryInformation 和 DocumentSummaryInformation 只是属性集一般概念的特例。这个概念表示属性集由属性组成，每个属性都是具有 ID、类型和值的实体。)

Okay, that was still rather easy. However, to make things more complicated, Microsoft in its infinite wisdom decided that a property set shalt be broken into one or more sections. Each section holds a bunch of properties. But since that's still not complicated enough, a section may have an optional dictionary that maps property IDs to property names - we'll explain later what that means.(好吧，那还是很容易的。然而，为了让事情变得更复杂，微软以其无限的智慧决定将一个属性集分成一个或多个部分。每个部分都有一堆属性。但是由于这还不够复杂，所以一个部分可能有一个可选的字典，将属性 ID 映射到属性名称 - 我们稍后会解释这意味着什么。)

The procedure to get to the properties is the following:(获取属性的过程如下：)

Use the PropertySetFactory class to create a PropertySet object from a property set stream. If you don't know whether an input stream is a property set stream, just try to call PropertySetFactory.create(java.io.InputStream): You'll either get a PropertySet instance returned or an exception is thrown.(使用 PropertySetFactory 类从属性集流创建 PropertySet 对象。如果您不知道输入流是否是属性集流，只需尝试调用 PropertySetFactory.create(java.io.InputStream)：您将获得返回的 PropertySet 实例或引发异常。)
Call the PropertySet's method getSections() to get the sections contained in the property set. Each section is an instance of the Section class.(调用 PropertySet 的方法 getSections() 来获取属性集中包含的部分。每个部分都是 Section 类的一个实例。)
Each section has a format ID. The format ID of the first section in a property set determines the property set's type. For example, the first (and only) section of the summary information property set has a format ID of F29F85E0-4FF9-1068-AB-91-08-00-2B-27-B3-D9. You can get the format ID with Section.getFormatID().(每个部分都有一个格式 ID。属性集中第一部分的格式 ID 决定了属性集的类型。例如，摘要信息属性集的第一个（也是唯一一个）部分的格式 ID 为 F29F85E0-4FF9-1068-AB-91-08-00-2B-27-B3-D9。您可以使用 Section.getFormatID() 获取格式 ID。)
The properties contained in a Section can be retrieved with Section.getProperties(). The result is an array of Property instances.(可以使用 Section.getProperties() 检索 Section 中包含的属性。结果是一组 Property 实例。)
A property has a name, a type, and a value. The Property class has methods to retrieve them.(属性具有名称、类型和值。 Property 类具有检索它们的方法。)

A Sample Application(示例应用程序)

Let's have a look at a sample Java application that dumps all property set streams contained in a POI file system. The full source code of this program can be found as ReadCustomPropertySets.java in the examples area of the POI source code tree. Here are the key sections:(让我们看一个示例 Java 应用程序，它转储 POI 文件系统中包含的所有属性集流。该程序的完整源代码可以在 POI 源代码树的示例区域中的 ReadCustomPropertySets.java 中找到。以下是关键部分：)

 
       import java.io.*; 
      
       import java.util.*; 
      
       import org.apache.poi.hpsf.*; 
      
       import org.apache.poi.poifs.eventfilesystem.*; 
      
       import org.apache.poi.util.HexDump;

The most important package the application needs is org.apache.poi.hpsf.*. This package contains the HPSF classes. Most classes named below are from the HPSF package. Of course we also need the POIFS event file system's classes and java.io.* since we are dealing with POI I/O. From the java.util package we use the List and Iterator class. The class org.apache.poi.util.HexDump provides a methods to dump byte arrays as nicely formatted strings.(应用程序需要的最重要的包是 org.apache.poi.hpsf.*。该包包含 HPSF 类。下面命名的大多数类都来自 HPSF 包。当然，我们还需要 POIFS 事件文件系统的类和 java.io.*，因为我们正在处理 POI I/O。在 java.util 包中，我们使用 List 和 Iterator 类。 org.apache.poi.util.HexDump 类提供了一种将字节数组转储为格式良好的字符串的方法。)

 
       public static void main(String[] args) 
      
        throws IOException 
      
       { 
      
        final String filename = args[0]; 
      
        POIFSReader r = new POIFSReader(); 
      
        /* Register a listener for *all* documents. */ 
      
        r.registerListener(new MyPOIFSReaderListener()); 
      
        r.read(new FileInputStream(filename)); 
      
       }

The POIFSReader is set up in a way that the listener MyPOIFSReaderListener is called on every file in the POI file system.(POIFSReader 的设置方式是在 POI 文件系统中的每个文件上调用侦听器 MyPOIFSReaderListener。)

The Property Set(属性集)

The listener class tries to create a PropertySet from each stream using the PropertySetFactory.create() method:(侦听器类尝试使用 PropertySetFactory.create() 方法从每个流创建一个 PropertySet：)

 
       static class MyPOIFSReaderListener implements POIFSReaderListener 
      
       { 
      
        public void processPOIFSReaderEvent(POIFSReaderEvent event) 
      
        { 
      
        PropertySet ps = null; 
      
        try 
      
        { 
      
        ps = PropertySetFactory.create(event.getStream()); 
      
        } 
      
        catch (NoPropertySetStreamException ex) 
      
        { 
      
        out("No property set stream: \"" + event.getPath() + 
      
        event.getName() + "\""); 
      
        return; 
      
        } 
      
        catch (Exception ex) 
      
        { 
      
        throw new RuntimeException 
      
        ("Property set stream \"" + 
      
        event.getPath() + event.getName() + "\": " + ex); 
      
        } 
      
        /* Print the name of the property set stream: */ 
      
        out("Property set stream \"" + event.getPath() + 
      
        event.getName() + "\":");

Creating the PropertySet is done in a try block, because not each stream in the POI file system contains a property set. If it is some other file, the PropertySetFactory.create() throws a NoPropertySetStreamException, which is caught and logged. Then the program continues with the next stream. However, all other types of exceptions cause the program to terminate by throwing a runtime exception. If all went well, we can print the name of the property set stream.(创建 PropertySet 在 try 块中完成，因为不是 POI 文件系统中的每个流都包含一个属性集。如果是其他文件，PropertySetFactory.create() 会抛出 NoPropertySetStreamException，它会被捕获并记录下来。然后程序继续下一个流。但是，所有其他类型的异常都会通过引发运行时异常而导致程序终止。如果一切顺利，我们可以打印属性集流的名称。)

The Sections(部分)

The next step is to print the number of sections followed by the sections themselves:()

 
       /* Print the number of sections: */ 
      
       final long sectionCount = ps.getSectionCount(); 
      
       out(" No. of sections: " + sectionCount); 
      
       /* Print the list of sections: */ 
      
       List sections = ps.getSections(); 
      
       int nr = 0; 
      
       for (Iterator i = sections.iterator(); i.hasNext();) 
      
       { 
      
        /* Print a single section: */ 
      
        Section sec = (Section) i.next(); 
      
        // See below for the complete loop body. 
      
       // 完整的循环体见下文。 
      
       }

The PropertySet's method getSectionCount() returns the number of sections.(PropertySet 的方法 getSectionCount() 返回节数。)

To retrieve the sections, use the getSections() method. This method returns a java.util.List containing instances of the Section class in their proper order.()

The sample code shows a loop that retrieves the Section objects one by one and prints some information about each one. Here is the complete body of the loop:()

 
       /* Print a single section: */ 
      
       Section sec = (Section) i.next(); 
      
       out(" Section " + nr++ + ":"); 
      
       String s = hex(sec.getFormatID().getBytes()); 
      
       s = s.substring(0, s.length() - 1); 
      
       out(" Format ID: " + s); 
      
       /* Print the number of properties in this section. */ 
      
       int propertyCount = sec.getPropertyCount(); 
      
       out(" No. of properties: " + propertyCount); 
      
       /* Print the properties: */ 
      
       Property[] properties = sec.getProperties(); 
      
       for (int i2 = 0; i2 < properties.length; i2++) 
      
       { 
      
        /* Print a single property: */ 
      
        Property p = properties[i2]; 
      
        int id = p.getID(); 
      
        long type = p.getType(); 
      
        Object value = p.getValue(); 
      
        out(" Property ID: " + id + ", type: " + type + 
      
        ", value: " + value); 
      
       }

The Section's Format ID(节的格式 ID)

The first method called on the Section instance is getFormatID(). As explained above, the format ID of the first section in a property set determines the type of the property set. Its type is ClassID which is essentially a sequence of 16 bytes. A real application using its own type of a custom property set should have defined a unique format ID and, when reading a property set stream, should check the format ID is equal to that unique format ID. The sample program just prints the format ID it finds in a section:(在 Section 实例上调用的第一个方法是 getFormatID()。如上所述，属性集中第一部分的格式 ID 决定了属性集的类型。它的类型是 ClassID，本质上是一个 16 字节的序列。使用自己的自定义属性集类型的实际应用程序应该定义唯一的格式 ID，并且在读取属性集流时，应该检查格式 ID 是否等于该唯一格式 ID。示例程序只打印它在一个部分中找到的格式 ID：)

 
       String s = hex(sec.getFormatID().getBytes()); 
      
       s = s.substring(0, s.length() - 1); 
      
       out(" Format ID: " + s);

As you can see, the getFormatID() method returns a ClassID object. An array containing the bytes can be retrieved with ClassID.getBytes(). In order to get a nicely formatted printout, the sample program uses the hex() helper method which in turn uses the POI utility class HexDump in the org.apache.poi.util package. Another helper method is out() which just saves typing System.out.println().(如您所见，getFormatID() 方法返回一个 ClassID 对象。可以使用 ClassID.getBytes() 检索包含字节的数组。为了获得格式良好的打印输出，示例程序使用 hex() 辅助方法，该方法又使用 org.apache.poi.util 包中的 POI 实用程序类 HexDump。另一个辅助方法是 out()，它只保存输入 System.out.println()。)

The Properties(属性)

Before getting the properties, it is possible to find out how many properties are available in the section via the Section.getPropertyCount(). The sample application uses this method to print the number of properties to the standard output:()

int propertyCount = sec.getPropertyCount();

out(" No. of properties: " + propertyCount);

Now its time to get to the properties themselves. You can retrieve a section's properties with the method Section.getProperties():(现在是时候了解这些属性了。您可以使用 Section.getProperties() 方法检索节的属性：)

Property[] properties = sec.getProperties();

As you can see the result is an array of Property objects. This class has three methods to retrieve a property's ID, its type, and its value. The following code snippet shows how to call them:(如您所见，结果是一组 Property 对象。此类具有三种方法来检索属性的 ID、类型和值。以下代码片段显示了如何调用它们：)

 
       for (int i2 = 0; i2 < properties.length; i2++) 
      
       { 
      
        /* Print a single property: */ 
      
        Property p = properties[i2]; 
      
        int id = p.getID(); 
      
        long type = p.getType(); 
      
        Object value = p.getValue(); 
      
        out(" Property ID: " + id + ", type: " + type + 
      
        ", value: " + value); 
      
       }

Sample Output(样本输出)

The output of the sample program might look like the following. It shows the summary information and the document summary information property sets of a Microsoft Word document. However, unlike the first and second section of this HOW-TO the application does not have any code which is specific to the SummaryInformation and DocumentSummaryInformation classes.(示例程序的输出可能如下所示。它显示 Microsoft Word 文档的摘要信息和文档摘要信息属性集。但是，与本 HOW-TO 的第一和第二部分不同，该应用程序没有任何特定于 SummaryInformation 和 DocumentSummaryInformation 类的代码。)

 
       Property set stream "/SummaryInformation": 
      
        No. of sections: 1 
      
        Section 0: 
      
        Format ID: 00000000 F2 9F 85 E0 4F F9 10 68 AB 91 08 00 2B 27 B3 D9 ....O..h....+'.. 
      
        No. of properties: 17 
      
        Property ID: 1, type: 2, value: 1252 
      
        Property ID: 2, type: 30, value: Titel 
      
        Property ID: 3, type: 30, value: Thema 
      
        Property ID: 4, type: 30, value: Rainer Klute (Autor) 
      
        Property ID: 5, type: 30, value: Test (Stichwörter) 
      
        Property ID: 6, type: 30, value: This is a document for testing HPSF 
      
        Property ID: 7, type: 30, value: Normal.dot 
      
        Property ID: 8, type: 30, value: Unknown User 
      
        Property ID: 9, type: 30, value: 3 
      
        Property ID: 18, type: 30, value: Microsoft Word 9.0 
      
        Property ID: 12, type: 64, value: Mon Jan 01 00:59:25 CET 1601 
      
        Property ID: 13, type: 64, value: Thu Jul 18 16:22:00 CEST 2002 
      
        Property ID: 14, type: 3, value: 1 
      
        Property ID: 15, type: 3, value: 20 
      
        Property ID: 16, type: 3, value: 93 
      
        Property ID: 19, type: 3, value: 0 
      
        Property ID: 17, type: 71, value: [B@13582d 
      
       Property set stream "/DocumentSummaryInformation": 
      
        No. of sections: 2 
      
        Section 0: 
      
        Format ID: 00000000 D5 CD D5 02 2E 9C 10 1B 93 97 08 00 2B 2C F9 AE ............+,.. 
      
        No. of properties: 14 
      
        Property ID: 1, type: 2, value: 1252 
      
        Property ID: 2, type: 30, value: Test 
      
        Property ID: 14, type: 30, value: Rainer Klute (Manager) 
      
        Property ID: 15, type: 30, value: Rainer Klute IT-Consulting GmbH 
      
        Property ID: 5, type: 3, value: 3 
      
        Property ID: 6, type: 3, value: 2 
      
        Property ID: 17, type: 3, value: 111 
      
        Property ID: 23, type: 3, value: 592636 
      
        Property ID: 11, type: 11, value: false 
      
        Property ID: 16, type: 11, value: false 
      
        Property ID: 19, type: 11, value: false 
      
        Property ID: 22, type: 11, value: false 
      
        Property ID: 13, type: 4126, value: [B@56a499 
      
        Property ID: 12, type: 4108, value: [B@506411 
      
        Section 1: 
      
        Format ID: 00000000 D5 CD D5 05 2E 9C 10 1B 93 97 08 00 2B 2C F9 AE ............+,.. 
      
        No. of properties: 7 
      
        Property ID: 0, type: 0, value: {6=Test-JaNein, 5=Test-Zahl, 4=Test-Datum, 3=Test-Text, 2=_PID_LINKBASE} 
      
        Property ID: 1, type: 2, value: 1252 
      
        Property ID: 2, type: 65, value: [B@c9ba38 
      
        Property ID: 3, type: 30, value: This is some text. 
      
        Property ID: 4, type: 64, value: Wed Jul 17 00:00:00 CEST 2002 
      
        Property ID: 5, type: 3, value: 27 
      
        Property ID: 6, type: 11, value: true 
      
       No property set stream: "/WordDocument" 
      
       No property set stream: "/CompObj" 
      
       No property set stream: "/1Table"

There are some interesting items to note:(有一些有趣的项目需要注意：)

The first property set (summary information) consists of a single section, the second property set (document summary information) consists of two sections.(第一个属性集（摘要信息）由一个部分组成，第二个属性集（文档摘要信息）由两个部分组成。)
Each section type (identified by its format ID) has its own domain of property ID. For example, in the second property set the properties with ID 2 have different meanings in the two section. By the way, the format IDs of these sections are not equal, but you have to look hard to find the difference.(每个部分类型（由其格式 ID 标识）都有自己的属性 ID 域。例如，在第二个属性集中，ID 为 2 的属性在这两个部分中具有不同的含义。顺便说一句，这些部分的格式 ID 是不相等的，但您必须努力寻找差异。)
The properties are not in any particular order in the section, although they slightly tend to be sorted by their IDs.(这些属性在该部分中没有任何特定的顺序，尽管它们稍微倾向于按其 ID 排序。)

Property IDs(属性 ID)

Properties in the same section are distinguished by their IDs. This is similar to variables in a programming language like Java, which are distinguished by their names. But unlike variable names, property IDs are simple integral numbers. There is another similarity, however. Just like a Java variable has a certain scope (e.g. a member variables in a class), a property ID also has its scope of validity: the section.()

Two property IDs in sections with different section format IDs don't have the same meaning even though their IDs might be equal. For example, ID 4 in the first (and only) section of a summary information property set denotes the document's author, while ID 4 in the first section of the document summary information property set means the document's byte count. The sample output above does not show a property with an ID of 4 in the first section of the document summary information property set. That means that the document does not have a byte count. However, there is a property with an ID of 4 in the second section: This is a user-defined property ID - we'll get to that topic in a minute.(具有不同部分格式 ID 的部分中的两个属性 ID 具有不同的含义，即使它们的 ID 可能相同。例如，摘要信息属性集的第一（也是唯一）部分中的 ID 4 表示文档的作者，而文档摘要信息属性集的第一部分中的 ID 4 表示文档的字节数。上面的示例输出未在文档摘要信息属性集的第一部分显示 ID 为 4 的属性。这意味着该文档没有字节数。但是，在第二部分中有一个 ID 为 4 的属性：这是用户定义的属性 ID - 我们将在稍后讨论该主题。)

So, how can you find out what the meaning of a certain property ID in the summary information and the document summary information property set is? The standard property sets as such don't have any hints about the meanings of their property IDs. For example, the summary information property set does not tell you that the property ID 4 stands for the document's author. This is external knowledge. Microsoft defined standard meanings for some of the property IDs in the summary information and the document summary information property sets. As a help to the Java and POI programmer, the class PropertyIDMap in the org.apache.poi.hpsf.wellknown package defines constants for the "well-known" property IDs. For example, there is the definition(那么，如何找出摘要信息和文档摘要信息属性集中的某个属性ID的含义是什么？这样的标准属性集没有任何关于其属性 ID 含义的提示。例如，摘要信息属性集不会告诉您属性 ID 4 代表文档的作者。这是外在的知识。 Microsoft 为摘要信息和文档摘要信息属性集中的某些属性 ID 定义了标准含义。作为对 Java 和 POI 程序员的帮助，org.apache.poi.hpsf.wellknown 包中的类 PropertyIDMap 定义了“知名”属性 ID 的常量。例如，有定义)

public final static int PID_AUTHOR = 4;

These definitions allow you to use symbolic names instead of numbers.()

In order to provide support for the other way, too, - i.e. to map property IDs to property names - the class PropertyIDMap defines two static methods: getSummaryInformationProperties() and getDocumentSummaryInformationProperties(). Both return java.util.Map objects which map property IDs to strings. Such a string gives a hint about the property's meaning. For example, PropertyIDMap.getSummaryInformationProperties().get(4) returns the string "PID_AUTHOR". An application could use this string as a key to a localized string which is displayed to the user, e.g. "Author" in English or "Verfasser" in German. HPSF might provide such language-dependent ("localized") mappings in a later release.(为了也支持另一种方式，即将属性 ID 映射到属性名称，PropertyIDMap 类定义了两个静态方法：getSummaryInformationProperties() 和 getDocumentSummaryInformationProperties()。两者都返回将属性 ID 映射到字符串的 java.util.Map 对象。这样的字符串给出了关于属性含义的提示。例如，PropertyIDMap.getSummaryInformationProperties().get(4) 返回字符串“PID_AUTHOR”。应用程序可以将此字符串用作显示给用户的本地化字符串的键，例如英语中的“Author”或德语中的“Verfasser”。 HPSF 可能会在以后的版本中提供这种与语言相关的（“本地化”）映射。)

Usually you won't have to deal with those two maps. Instead you should call the Section.getPIDString(int) method. It returns the string associated with the specified property ID in the context of the Section object.(通常您不必处理这两个地图。相反，您应该调用 Section.getPIDString(int) 方法。它返回与 Section 对象上下文中的指定属性 ID 关联的字符串。)

Above you learned that property IDs have a meaning in the scope of a section only. However, there are two exceptions to the rule: The property IDs 0 and 1 have a fixed meaning in all sections:(上面您了解到，属性 ID 仅在部分范围内有意义。但是，该规则有两个例外：属性 ID 0 和 1 在所有部分中具有固定含义：)

Property ID(物业编号)	Meaning(意义)
0(0)	The property's value is a dictionary, i.e. a mapping from property IDs to strings.(属性的值是一个字典，即从属性 ID 到字符串的映射。)
1(1)	The property's value is the number of a codepage, i.e. a mapping from character codes to characters. All strings in the section containing this property must be interpreted using this codepage. Typical property values are 1252 (8-bit "western" characters, ISO-8859-1), 1200 (16-bit Unicode characters, UFT-16), or 65001 (8-bit Unicode characters, UFT-8).(该属性的值是代码页的编号，即从字符代码到字符的映射。包含此属性的部分中的所有字符串都必须使用此代码页进行解释。典型的属性值为 1252（8 位“西方”字符，ISO-8859-1）、1200（16 位 Unicode 字符，UFT-16）或 65001（8 位 Unicode 字符，UFT-8）。)

Property ID(物业编号)

Meaning(意义)

0(0)

The property's value is a dictionary, i.e. a mapping from property IDs to strings.(属性的值是一个字典，即从属性 ID 到字符串的映射。)

1(1)

The property's value is the number of a codepage, i.e. a mapping from character codes to characters. All strings in the section containing this property must be interpreted using this codepage. Typical property values are 1252 (8-bit "western" characters, ISO-8859-1), 1200 (16-bit Unicode characters, UFT-16), or 65001 (8-bit Unicode characters, UFT-8).(该属性的值是代码页的编号，即从字符代码到字符的映射。包含此属性的部分中的所有字符串都必须使用此代码页进行解释。典型的属性值为 1252（8 位“西方”字符，ISO-8859-1）、1200（16 位 Unicode 字符，UFT-16）或 65001（8 位 Unicode 字符，UFT-8）。)

Property types(属性类型)

A property is nothing without its value. It is stored in a property set stream as a sequence of bytes. You must know the property's type in order to properly interpret those bytes and reasonably handle the value. A property's type is one of the so-called Microsoft-defined "variant types". When you call Property.getType() you'll get a long value which denoting the property's variant type. The class Variant in the org.apache.poi.hpsf package holds most of those long values as named constants. For example, the constant VT_I4 = 3 means a signed integer value of four bytes. Examples of other types are VT_LPSTR = 30 meaning a null-terminated string of 8-bit characters, VT_LPWSTR = 31 which means a null-terminated Unicode string, or VT_BOOL = 11 denoting a boolean value.(没有价值的财产什么也不是。它作为字节序列存储在属性集流中。您必须知道属性的类型才能正确解释这些字节并合理处理该值。属性的类型是所谓的 Microsoft 定义的“变体类型”之一。当您调用 Property.getType() 时，您将获得一个表示属性变体类型的长值。 org.apache.poi.hpsf 包中的 Variant 类将这些长值中的大部分保存为命名常量。例如，常数 VT_I4 = 3 表示四个字节的有符号整数值。其他类型的示例有 VT_LPSTR = 30 表示以空字符结尾的 8 位字符字符串，VT_LPWSTR = 31 表示以空字符结尾的 Unicode 字符串，或 VT_BOOL = 11 表示布尔值。)

In most cases you won't need a property's type because HPSF does all the work for you.(在大多数情况下，您不需要属性类型，因为 HPSF 会为您完成所有工作。)

Property values(属性值)

When an application wants to retrieve a property's value and calls Property.getValue(), HPSF has to interpret the bytes making out the value according to the property's type. The type determines how many bytes the value consists of and what to do with them. For example, if the type is VT_I4, HPSF knows that the value is four bytes long and that these bytes comprise a signed integer value in the little-endian format. This is quite different from e.g. a type of VT_LPWSTR. In this case HPSF has to scan the value bytes for a Unicode null character and collect everything from the beginning to that null character as a Unicode string.(当应用程序想要检索属性的值并调用 Property.getValue() 时，HPSF 必须根据属性的类型解释生成值的字节。类型决定了值包含多少字节以及如何处理它们。例如，如果类型是 VT_I4，HPSF 知道该值是四个字节长，并且这些字节包含小端格式的有符号整数值。这与例如一种类型的 VT_LPWSTR 完全不同。在这种情况下，HPSF 必须扫描值字节以查找 Unicode 空字符，并将从开头到该空字符的所有内容收集为 Unicode 字符串。)

The good new is that HPSF does another job for you, too: It maps the variant type to an adequate Java type.(好消息是 HPSF 还为您完成了另一项工作：它将变体类型映射到适当的 Java 类型。)

Variant type:(变体类型：)	Java type:(Java类型：)
VT_I2(VT_I2)	java.lang.Integer(java.lang.Integer)
VT_I4(VT_I4)	java.lang.Long(java.lang.Long)
VT_FILETIME(VT_FILETIME)	java.util.Date(java.util.日期)
VT_LPSTR(VT_LPSTR)	java.lang.String(java.lang.String)
VT_LPWSTR(VT_LPWSTR)	java.lang.String(java.lang.String)
VT_CF(VT_CF)	byte[](字节[])
VT_BOOL(VT_BOOL)	java.lang.Boolean(java.lang.Boolean)

The bad news is that there are still a couple of variant types HPSF does not yet support. If it encounters one of these types it returns the property's value as a byte array and leaves it to be interpreted by the application.(坏消息是仍然有几个变体类型 HPSF 还不支持。如果遇到其中一种类型，它会将属性的值作为字节数组返回，并让应用程序对其进行解释。)

An application retrieves a property's value by calling the Property.getValue() method. This method's return type is the abstract Object class. The getValue() method looks up the property's variant type, reads the property's value bytes, creates an instance of an adequate Java type, assigns it the property's value and returns it. Primitive types like int or long will be returned as the corresponding class, e.g. Integer or Long.(应用程序通过调用 Property.getValue() 方法来检索属性的值。此方法的返回类型是抽象 Object 类。 getValue() 方法查找属性的变体类型，读取属性的值字节，创建适当 Java 类型的实例，为其分配属性值并返回它。 int 或 long 等原始类型将作为相应的类返回，例如 Integer 或 Long。)

Dictionaries()

The property with ID 0 has a very special meaning: It is a dictionary mapping property IDs to property names. We have seen already that the meanings of standard properties in the summary information and the document summary information property sets have been defined by Microsoft. The advantage is that the labels of properties like "Author" or "Title" don't have to be stored in the property set. However, a user can define custom fields in, say, Microsoft Word. For each field the user has to specify a name, a type, and a value.(ID 为 0 的属性有一个非常特殊的含义：它是一个将属性 ID 映射到属性名称的字典。我们已经看到，微软已经定义了摘要信息和文档摘要信息属性集中标准属性的含义。优点是“作者”或“标题”等属性的标签不必存储在属性集中。但是，用户可以在 Microsoft Word 中定义自定义字段。对于每个字段，用户必须指定名称、类型和值。)

The names of the custom-defined fields (i.e. the property names) are stored in the document summary information second section's dictionary. The dictionary is a map which associates property IDs with property names.(自定义字段的名称（即属性名称）存储在文档摘要信息第二部分的字典中。字典是将属性 ID 与属性名称相关联的映射。)

The method Section.getPIDString(int) not only returns with the well-known property names of the summary information and document summary information property sets, but with self-defined properties, too. It should also work with self-defined properties in self-defined sections.(Section.getPIDString(int) 方法不仅返回摘要信息和文档摘要信息属性集的众所周知的属性名称，而且还返回自定义属性。它还应该与自定义部分中的自定义属性一起使用。)

Codepage support(代码页支持)

The property with ID 1 holds the number of the codepage which was used to encode the strings in this section. If this property is not available in a section, the platform's default character encoding will be used. This works fine as long as the document being read has been written on a platform with the same default character encoding. However, if you receive a document from another region of the world and the codepage is undefined, you are in trouble.(ID 为 1 的属性保存用于对本节中的字符串进行编码的代码页编号。如果该属性在某个部分中不可用，则将使用平台的默认字符编码。只要正在读取的文档是在具有相同默认字符编码的平台上编写的，这就可以正常工作。但是，如果您收到来自世界其他地区的文档并且代码页未定义，那么您就有麻烦了。)

HPSF's codepage support is only as good as the character encoding support of the Java Virtual Machine (JVM) the application runs on. If HPSF encounters a codepage number it assumes that the JVM has a character encoding with a corresponding name. For example, if the codepage is 1252, HPSF uses the character encoding "cp1252" to read or write strings. If the JVM does not have that character encoding installed or if the codepage number is illegal, an UnsupportedEncodingException will be thrown. This works quite well with Java 2 Standard Edition (J2SE) versions since 1.4. However, under J2SE 1.3 or lower you are out of luck. You should install a newer J2SE version to process codepages with HPSF.(HPSF 的代码页支持仅与运行应用程序的 Java 虚拟机 (JVM) 的字符编码支持一样好。如果 HPSF 遇到代码页编号，则假定 JVM 具有具有相应名称的字符编码。例如，如果代码页是 1252，HPSF 使用字符编码“cp1252”来读取或写入字符串。如果 JVM 没有安装该字符编码或者代码页号非法，则会抛出 UnsupportedEncodingException。这与自 1.4 以来的 Java 2 标准版 (J2SE) 版本配合得非常好。但是，在 J2SE 1.3 或更低版本下，您就不走运了。您应该安装更新的 J2SE 版本来使用 HPSF 处理代码页。)

There are some exceptions to the rule saying that a character encoding's name is derived from the codepage number by prepending the string "cp" to it. In these cases the codepage number is mapped to a well-known character encoding name. Here are a few examples:(该规则有一些例外情况，即字符编码的名称是通过在代码页号前面加上字符串“cp”来派生的。在这些情况下，代码页编号映射到众所周知的字符编码名称。这里有一些例子：)

Codepage 932: is mapped to the character encoding "SJIS".
Codepage 1200: is mapped to the character encoding "UTF-16".
Codepage 65001: is mapped to the character encoding "UTF-8".

More of these mappings between codepage and character encoding name are hard-coded in the classes org.apache.poi.hpsf.Constants and org.apache.poi.hpsf.VariantSupport. Probably there will be a need to add more mappings. The HPSF author will appreciate any hints.(代码页和字符编码名称之间的更多映射在类 org.apache.poi.hpsf.Constants 和 org.apache.poi.hpsf.VariantSupport 中进行了硬编码。可能需要添加更多映射。 HPSF 作者将不胜感激任何提示。)

Writing Properties(编写属性)

Note (注意)

This section describes how to write properties. (本节介绍如何编写属性。)

Overview of Writing Properties(编写属性概述)

Writing properties is possible at a high level and at a low level:(在高级和低级写入属性是可能的：)

Most users will want to create or change entries in the summary information or document summary information streams.(大多数用户都希望在摘要信息或文档摘要信息流中创建或更改条目。)
On the low level, there are no convenience classes or methods. You have to deal with things like property IDs and variant types to write properties. Therefore you should have read section 3 to understand the description of the low-level writing functions.(在底层，没有便利类或方法。您必须处理诸如属性 ID 和变体类型之类的事情来编写属性。因此，您应该阅读第 3 节以了解低级写入函数的描述。)

HPSF's writing capabilities come with the classes PropertySet, Section, Property, and some helper classes.(HPSF 的编写功能包括 PropertySet、Section、Property 类和一些辅助类。)

Low-Level Writing: An Overview()

When you are going to write a property set stream your application has to perform the following steps:(当您要编写属性集流时，您的应用程序必须执行以下步骤：)

Create a PropertySet instance.(创建一个 PropertySet 实例。)
Get hold of a Section. You can either retrieve the one that is always present in a new PropertySet, or you have to create a new Section and add it to the PropertySet.(抓住一个部分。您可以检索始终存在于新 PropertySet 中的部分，或者您必须创建一个新部分并将其添加到 PropertySet。)
Set any Section fields as you like.(根据需要设置任何部分字段。)
Create as many Property objects as you need. Set each property's ID, type, and value. Add the Property objects to the Section.(根据需要创建尽可能多的 Property 对象。设置每个属性的 ID、类型和值。将 Property 对象添加到 Section。)
Create further Sections if you need them.(如果需要，请创建更多部分。)
Eventually retrieve the property set as a byte stream using PropertySet.toInputStream() and write it to a POIFS document.(最终使用 PropertySet.toInputStream() 将属性集检索为字节流并将其写入 POIFS 文档。)

Low-level Writing Functions In Details(低级写函数详解)

Writing properties is introduced by an artificial but simple example: a program creating a new document (aka POI file system) which contains only a single document: a summary information property set stream. The latter will hold the document's title only. This is artificial in that it does not contain any Word, Excel or other kind of useful application document data. A document containing just a property set is without any practical use. However, it is perfectly fine for an example because it make it very simple and easy to understand, and you will get used to writing properties in real applications quickly.(编写属性由一个人工但简单的示例介绍：一个程序创建一个仅包含单个文档的新文档（又名 POI 文件系统）：摘要信息属性集流。后者将仅保留文档的标题。这是人为的，因为它不包含任何 Word、Excel 或其他类型的有用应用程序文档数据。仅包含属性集的文档没有任何实际用途。然而，作为一个例子就很好了，因为它非常简单易懂，而且你很快就会习惯在实际应用程序中编写属性。)

The application expects the name of the POI file system to be written on the command line. The title property it writes is "Sample title".(应用程序希望将 POI 文件系统的名称写入命令行。它写入的标题属性是“示例标题”。)

Here's the application's source code. You can also find it in the "examples" section of the POI source code distribution. Explanations are following below.(这是应用程序的源代码。您还可以在 POI 源代码分发的“示例”部分找到它。说明如下。)

 
       package org.apache.poi.hpsf.examples; 
      
       import java.io.FileOutputStream; 
      
       import java.io.IOException; 
      
       import java.io.InputStream; 
      
       import org.apache.poi.hpsf.Property; 
      
       import org.apache.poi.hpsf.PropertySet; 
      
       import org.apache.poi.hpsf.Section; 
      
       import org.apache.poi.hpsf.Section; 
      
       import org.apache.poi.hpsf.SummaryInformation; 
      
       import org.apache.poi.hpsf.Variant; 
      
       import org.apache.poi.hpsf.WritingNotSupportedException; 
      
       import org.apache.poi.hpsf.wellknown.PropertyIDMap; 
      
       import org.apache.poi.hpsf.wellknown.SectionIDMap; 
      
       import org.apache.poi.poifs.filesystem.POIFSFileSystem; 
      
       /** 
      
        * <p>This class is a simple sample application showing how to create a property 
      
        * set and write it to disk.</p> 
      
        * 
      
        * @author Rainer Klute 
      
        * @since 2003-09-12 
      
        */ 
      
       - 
      
       public class WriteTitle 
      
       { 
      
        /** 
      
        * <p>Runs the example program.</p> 
      
        * 
      
        * @param args Command-line arguments. The first and only command-line 
      
        * argument is the name of the POI file system to create. 
      
        * @throws IOException if any I/O exception occurs. 
      
        * @throws WritingNotSupportedException if HPSF does not (yet) support 
      
        * writing a certain property type. 
      
        */ 
      
       - 
      
        public static void main(final String[] args) 
      
        throws WritingNotSupportedException, IOException 
      
        { 
      
        /* Check whether we have exactly one command-line argument. */ 
      
        if (args.length != 1) 
      
        { 
      
        System.err.println("Usage: " + WriteTitle.class.getName() + 
      
        "destinationPOIFS"); 
      
        System.exit(1); 
      
        } 
      
        final String fileName = args[0]; 
      
        /* Create a mutable property set. Initially it contains a single section 
      
        * with no properties. */ 
      
        final PropertySet mps = new PropertySet(); 
      
        /* Retrieve the section the property set already contains. */ 
      
        final Section ms = mps.getSections().get(0); 
      
        /* Turn the property set into a summary information property. This is 
      
        * done by setting the format ID of its first section to 
      
        * SectionIDMap.SUMMARY_INFORMATION_ID. */ 
      
        ms.setFormatID(SectionIDMap.SUMMARY_INFORMATION_ID); 
      
        /* Create an empty property. */ 
      
        final Property p = new Property(); 
      
        /* Fill the property with appropriate settings so that it specifies the 
      
        * document's title. */ 
      
        p.setID(PropertyIDMap.PID_TITLE); 
      
        p.setType(Variant.VT_LPWSTR); 
      
        p.setValue("Sample title"); 
      
        /* Place the property into the section. */ 
      
        ms.setProperty(p); 
      
        /* Create the POI file system the property set is to be written to. */ 
      
        final POIFSFileSystem poiFs = new POIFSFileSystem(); 
      
        /* For writing the property set into a POI file system it has to be 
      
        * handed over to the POIFS.createDocument() method as an input stream 
      
        * which produces the bytes making out the property set stream. */ 
      
        final InputStream is = mps.toInputStream(); 
      
        /* Create the summary information property set in the POI file 
      
        * system. It is given the default name most (if not all) summary 
      
        * information property sets have. */ 
      
        poiFs.createDocument(is, SummaryInformation.DEFAULT_STREAM_NAME); 
      
        /* Write the whole POI file system to a disk file. */ 
      
        poiFs.writeFilesystem(new FileOutputStream(fileName)); 
      
        } 
      
       }

The application first checks that there is exactly one single argument on the command line: the name of the file to write. If this single argument is present, the application stores it in the fileName variable. It will be used in the end when the POI file system is written to a disk file.(应用程序首先检查命令行上是否只有一个参数：要写入的文件的名称。如果存在此单个参数，则应用程序将其存储在 fileName 变量中。将在最后将 POI 文件系统写入磁盘文件时使用。)

 
       if (args.length != 1) 
      
       { 
      
        System.err.println("Usage: " + WriteTitle.class.getName() + 
      
        "destinationPOIFS"); 
      
        System.exit(1); 
      
       } 
      
       final String fileName = args[0];

Let's create a property set now. We cannot use the PropertySet class, because it is read-only. It does not have a constructor creating an empty property set, and it does not have any methods to modify its contents, i.e. to write sections containing properties into it.(现在让我们创建一个属性集。我们不能使用 PropertySet 类，因为它是只读的。它没有创建空属性集的构造函数，也没有任何方法来修改其内容，即将包含属性的部分写入其中。)

The class to use is PropertySet. The sample application calls its no-args constructor in order to establish an empty property set:(要使用的类是 PropertySet。示例应用程序调用其无参数构造函数以建立一个空属性集：)

final PropertySet mps = new PropertySet();

As said, we have an empty property set now. Later we will put some contents into it.(如前所述，我们现在有一个空属性集。稍后我们会将一些内容放入其中。)

The PropertySet created by the no-args constructor is not really empty: It contains a single section without properties. We can either retrieve that section and fill it with properties or we can replace it by another section. We can also add further sections to the property set. The sample application decides to retrieve the section being already there:(由无参数构造函数创建的 PropertySet 并不是真正的空：它包含一个没有属性的部分。我们可以检索该部分并用属性填充它，也可以用另一个部分替换它。我们还可以向属性集添加更多部分。示例应用程序决定检索已经存在的部分：)

final Section ms = mps.getSections().get(0);

The getSections() method returns the property set's sections as a list, i.e. an instance of java.util.List. Calling get(0) returns the list's first (or zeroth, if you prefer) element.(getSections() 方法将属性集的部分作为列表返回，即 java.util.List 的实例。调用 get(0) 返回列表的第一个（或第零个，如果您愿意）元素。)

The alternative to retrieving the Section being already there would have been to create an new Section like this:(检索已经存在的部分的替代方法是创建一个新的部分，如下所示：)

Section s = new Section();

The Section the sample application retrieved from the PropertySet is still empty. It contains no properties and does not have a format ID. As you have read above the format ID of the first section in a property set determines the property set's type. Since our property set should become a SummaryInformation property set we have to set the format ID of its first (and only) section to F29F85E0-4FF9-1068-AB-91-08-00-2B-27-B3-D9. However, you won't have to remember that ID: HPSF has it defined as the well-known constant SectionIDMap.SUMMARY_INFORMATION_ID. The sample application writes it to the section using the setFormatID(byte[]) method:(从 PropertySet 中检索到的示例应用程序的 Section 仍然是空的。它不包含任何属性，也没有格式 ID。正如您在上面所读到的，属性集中第一部分的格式 ID 决定了属性集的类型。由于我们的属性集应该成为 SummaryInformation 属性集，我们必须将其第一个（也是唯一一个）部分的格式 ID 设置为 F29F85E0-4FF9-1068-AB-91-08-00-2B-27-B3-D9。但是，您不必记住该 ID：HPSF 将其定义为众所周知的常量 SectionIDMap.SUMMARY_INFORMATION_ID。示例应用程序使用 setFormatID(byte[]) 方法将其写入该部分：)

ms.setFormatID(SectionIDMap.SUMMARY_INFORMATION_ID);

final Property p = new Property();

A Property object must have an ID, a type, and a value (see above for details). The class provides methods to set these attributes:(一个 Property 对象必须有一个 ID、一个类型和一个值（详见上文）。该类提供了设置这些属性的方法：)

 
       p.setID(PropertyIDMap.PID_TITLE); 
      
       p.setType(Variant.VT_LPWSTR); 
      
       p.setValue("Sample title");

The Property class has a constructor which you can use to pass in all three attributes in a single call. See the Javadoc API documentation for details!(Property 类有一个构造函数，您可以使用它在一次调用中传入所有三个属性。有关详细信息，请参阅 Javadoc API 文档！)

The sample property set is complete now. We have a PropertySet containing a Section containing a Property. Of course we could have added more sections to the property set and more properties to the sections but we wanted to keep things simple.(示例属性集现已完成。我们有一个 PropertySet，其中包含一个包含 Property 的 Section。当然，我们可以向属性集添加更多部分，并为这些部分添加更多属性，但我们希望保持简单。)

The property set has to be written to a POI file system. The following statement creates it.(必须将属性集写入 POI 文件系统。以下语句创建它。)

final POIFSFileSystem poiFs = new POIFSFileSystem();

Writing the property set includes the step of converting it into a sequence of bytes. The PropertySet class has the method toInputStream() for this purpose. It returns the bytes making out the property set stream as an InputStream:(写入属性集包括将其转换为字节序列的步骤。为此，PropertySet 类具有 toInputStream() 方法。它以 InputStream 的形式返回生成属性集流的字节：)

final InputStream is = mps.toInputStream();

If you'd read from this input stream you'd receive all the property set's bytes. However, it is very likely that you'll never do that. Instead you'll pass the input stream to the POIFSFileSystem.createDocument() method, like this:(如果您从此输入流中读取，您将收到所有属性集的字节。但是，您很可能永远不会这样做。相反，您会将输入流传递给 POIFSFileSystem.createDocument() 方法，如下所示：)

poiFs.createDocument(is, SummaryInformation.DEFAULT_STREAM_NAME);

Besides the InputStream createDocument() takes a second parameter: the name of the document to be created. For a SummaryInformation property set stream the default name is available as the constant SummaryInformation.DEFAULT_STREAM_NAME.(除了 InputStream createDocument() 有第二个参数：要创建的文档的名称。对于 SummaryInformation 属性集流，默认名称可用作常量 SummaryInformation.DEFAULT_STREAM_NAME。)

The last step is to write the POI file system to a disk file:(最后一步是将 POI 文件系统写入磁盘文件：)

poiFs.writeFilesystem(new FileOutputStream(fileName));

HPSF HOW-TO(HPSF 操作方法)

How To Use the HPSF API(如何使用 HPSF API)

Reading Standard Properties(读取标准属性)

Open the document \005SummaryInformation in the root of the POI filesystem(打开 POI 文件系统根目录下的文档 005SummaryInformation)

Additional Standard Properties, Exceptions And Embedded Objects(其他标准属性、异常和嵌入对象)

Writing Standard Properties(编写标准属性)

User-Defined Properties(用户定义的属性)

Reading Non-Standard Properties(读取非标准属性)

Overview(概述)

A Sample Application(示例应用程序)

The Property Set(属性集)

The Sections(部分)

The Section's Format ID(节的格式 ID)

The Properties(属性)

Sample Output(样本输出)

Property IDs(属性 ID)

Property types(属性类型)

Property values(属性值)

Dictionaries()

Codepage support(代码页支持)

Writing Properties(编写属性)

Overview of Writing Properties(编写属性概述)

Low-Level Writing: An Overview()

Low-level Writing Functions In Details(低级写函数详解)

Further Reading(拓展阅读)