POI-HPBF - Java API To Access Microsoft Publisher Format Files(POI-HPBF - 访问 Microsoft Publisher 格式文件的 Java API)

Overview(概述)

Overview(概述)

HPBF is the POI Project's pure Java implementation of the Publisher file format.(HPBF 是 POI 项目的 Publisher 文件格式的纯 Java 实现。)

Currently, HPBF is in an early stage, whilst we try to figure out the file format. So far, we have basic text extraction support, and are able to read some parts within the file. Writing is not yet supported, as we are unable to make sense of the Contents stream, which we think has lots of offsets to other parts of the file.(目前,HPBF 处于早期阶段,而我们正在尝试找出文件格式。到目前为止,我们有基本的文本提取支持,并且能够读取文件中的某些部分。尚不支持写入,因为我们无法理解 Contents 流,我们认为它与文件的其他部分有很多偏移。)

Our initial aim is to produce a text extractor for the format (now done), and be able to extract hyperlinks from within the document (partly supported). Additional low level code to process the file format may follow, if there is demand and developer interest warrants it.(我们最初的目标是为格式生成一个文本提取器(现已完成),并能够从文档中提取超链接(部分支持)。如果有需求并且开发人员的兴趣有保证,则可能会出现其他用于处理文件格式的低级代码。)

Text Extraction is available via the org.apache.poi.hpbf.extractor.PublisherTextExtractor class.(文本提取可通过 org.apache.poi.hpbf.extractor.PublisherTextExtractor 类获得。)

At this time, there is no usermodel api or similar. There is only low level support for certain parts of the file, but by no means all of it.(目前,没有 usermodel api 或类似的api。对文件的某些部分但不是全部文件只有低级支持,。)

Our current understanding of the file format is documented here.(我们目前对文件格式的理解记录在这里。)

As of 2017, we are unaware of a public format specification for Microsoft Publisher .pub files. This format was not included in the Microsoft Open Specifications Promise with the rest of the Microsoft Office file formats. As of 2009 and 2016, Microsoft had no plans to document the .pub file format. If this changes in the future, perhaps we will see a spec published on the Microsoft Office File Format Open Specification Technical Documentation.(截至 2017 年,我们不知道 Microsoft Publisher .pub 文件的公共格式规范。此格式与其他 Microsoft Office 文件格式一起未包含在 Microsoft 开放规范承诺中。截至 2009 年和 2016 年,Microsoft 没有计划记录 .pub 文件格式。如果将来这种情况发生变化,也许我们会在 Microsoft Office 文件格式开放规范技术文档中看到发布的规范。)

Note (注意)
This code currently lives the scratchpad area of the POI SVN repository. To use this component, ensure you have the Scratchpad Jar on your classpath, or a dependency defined on the poi-scratchpad artifact - the main POI jar is not enough! See the POI Components Map for more details. (此代码当前位于 POI SVN 存储库的暂存区。要使用这个组件,请确保您的类路径中有 Scratchpad Jar,或者在 poi-scratchpad 工件上定义了依赖项 - 主 POI jar 是不够的!有关详细信息,请参阅 POI 组件图。)

by Nick Burch(通过尼克伯奇)

 
中英文 | 中文 | 英文