POI-HSLF - A Quick Guide(POI-HSLF - 快速指南)

Overview(概述)

Basic Text Extraction(基本文本提取)

For basic text extraction, make use of org.apache.poi.sl.extractor.SlideShowExtractor. It accepts a slideshow which can be created from a file or stream via org.apache.poi.sl.usermodel.SlideShowFactory. The getText() method can be used to get the text from the slides.(对于基本的文本提取,请使用 org.apache.poi.sl.extractor.SlideShowExtractor。它接受可以通过 org.apache.poi.sl.usermodel.SlideShowFactory 从文件或流中创建的幻灯片。 getText() 方法可用于从幻灯片中获取文本。)

Specific Text Extraction(特定文本提取)

To get specific bits of text, first create a org.apache.poi.hslf.usermodel.HSLFSlideShow (from a org.apache.poi.hslf.usermodel.HSLFSlideShowImpl, which accepts a file or an input stream). Use getSlides() and getNotes() to get the slides and notes. These can be queried to get their page ID (though they should be returned in the right order).(要获取特定的文本位,首先创建一个 org.apache.poi.hslf.usermodel.HSLFSlideShow(来自 org.apache.poi.hslf.usermodel.HSLFSlideShowImpl,它接受文件或输入流)。使用 getSlides() 和 getNotes() 获取幻灯片和注释。可以查询这些以获取它们的页面 ID(尽管它们应该以正确的顺序返回)。)

You can then call getTextParagraphs() on these, to get their blocks of text. (A list of HSLFTextParagraph normally holds all the text in a given area of the page, eg in the title bar, or in a box). From the HSLFTextParagraph, you can extract the text, and check what type of text it is (eg Body, Title). You can also call getTextRuns(), which will return the HSLFTextRuns that make up the TextParagraph. A HSLFTextRun is a text fragment, having the same character formatting. The paragraph formatting is defined in the parent HSLFTextParagraph.(然后,您可以在这些上调用 getTextParagraphs() 以获取它们的文本块。 (HSLFTextParagraph 列表通常包含页面给定区域中的所有文本,例如在标题栏或框中)。从 HSLFTextParagraph 中,您可以提取文本,并检查它是什么类型的文本(例如正文、标题)。您还可以调用 getTextRuns(),它将返回组成 TextParagraph 的 HSLFTextRuns。 HSLFTextRun 是一个文本片段,具有相同的字符格式。段落格式在父 HSLFTextParagraph 中定义。)

Poor Quality Text Extraction(差质量文本提取)

If speed is the most important thing for you, you don't care about getting duplicate blocks of text, you don't care about getting text from master sheets, and you don't care about getting old text, then org.apache.poi.hslf.extractor.QuickButCruddyTextExtractor might be of use.(如果速度对您来说是最重要的事情,您不关心获取重复的文本块,不关心从母版中获取文本,也不关心获取旧文本,那 org.apache。 poi.hslf.extractor.QuickButCruddyTextExtractor 可能有用。)

QuickButCruddyTextExtractor doesn't use the normal record parsing code, instead it uses a tree structure blind search method to get all text holding records. You will get all the text, including lots of text you normally wouldn't ever want. However, you will get it back very very fast!(QuickButCruddyTextExtractor 没有使用普通的记录解析代码,而是使用树形结构盲搜索的方法来获取所有的文本保存记录。你会得到所有的文本,包括很多你通常不会想要的文本。但是,您会非常非常快地找回它!)

There are two ways of getting the text back. getTextAsString() will return a single string with all the text in it. getTextAsVector() will return a vector of strings, one for each text record found in the file.(有两种方法可以取回文本。 getTextAsString() 将返回一个包含所有文本的字符串。 getTextAsVector() 将返回一个字符串向量,文件中的每个文本记录对应一个。)

Changing Text(更改文本)

It is possible to change the text via HSLFTextParagraph.setText(List<HSLFTextParagraph>,String) or HSLFTextRun.setText(String). It is possible to add additional TextRuns with HSLFTextParagraph.appendText(List<HSLFTextParagraph>,String,boolean) or HSLFTextParagraph.addTextRun(HSLFTextRun)(可以通过 HSLFTextParagraph.setText(List ,String) 或 HSLFTextRun.setText(String)来改变文本。可以用HSLFTextParagraph.appendText(List ,String,boolean) 或 HSLFTextParagraph.addTextRun(HSLFTextRun)来添加额外的TextRuns.)

When calling HSLFTextParagraph.setText(List<HSLFTextParagraph>,String), all the text will end up with the same formatting. When calling HSLFTextRun.setText(String), the text will retain the old formatting of that HSLFTextRun.(当调用 HSLFTextParagraph.setText(List ,String)时,所有文本都将以相同的格式结束。调用 HSLFTextRun.setText(String) 时,文本将保留该 HSLFTextRun 的旧格式。)

Adding Slides(添加幻灯片)

You may add new slides by calling HSLFSlideShow.createSlide(), which will add a new slide to the end of the SlideShow. It is possible to re-order slides with HSLFSlideShow.reorderSlide(...).(您可以通过调用 HSLFSlideShow.createSlide()来添加新幻灯片,这将在 SlideShow 的末尾添加一张新幻灯片。可以使用 HSLFSlideShow.reorderSlide(...) 重新排序幻灯片。)

Guide to key classes(重点课程指南)

  • org.apache.poi.hslf.usermodel.HSLFSlideShowImpl Handles reading in and writing out files. Calls org.apache.poi.hslf.record.record to build a tree of all the records in the file, which it allows access to.(org.apache.poi.hslf.usermodel.HSLFSlideShowImpl 处理读入和写出文件。调用 org.apache.poi.hslf.record.record 来构建文件中所有记录的树,它允许访问。)
  • org.apache.poi.hslf.record.Record Base class of all records. Also provides the main record generation code, which will build up a tree of records for a file.(org.apache.poi.hslf.record.Record 所有记录的基类。还提供了主要的记录生成代码,它将为文件构建一棵记录树。)
  • org.apache.poi.hslf.usermodel.HSLFSlideShow Builds up model entries from the records, and presents a user facing view of the file(org.apache.poi.hslf.usermodel.HSLFSlideShow 从记录中建立模型条目,并呈现一个面向用户的文件视图)
  • org.apache.poi.hslf.usermodel.HSLFSlide A user facing view of a Slide in a slideshow. Allows you to get at the Text of the slide, and at any drawing objects on it.(org.apache.poi.hslf.usermodel.HSLFSlide 幻灯片放映中面向用户的幻灯片视图。允许您获取幻灯片的文本以及其上的任何绘图对象。)
  • org.apache.poi.hslf.usermodel.HSLFTextParagraph A list of HSLFTextParagraphs holds all the text in a given area of the Slide, and will contain one or more HSLFTextRuns.(org.apache.poi.hslf.usermodel.HSLFTextParagraph HSLFTextParagraphs 列表包含幻灯片给定区域中的所有文本,并将包含一个或多个 HSLFTextRuns。)
  • org.apache.poi.hslf.usermodel.HSLFTextRun Holds a run of text, all having the same character stylings. It is possible to modify text, and/or text stylings.(org.apache.poi.hslf.usermodel.HSLFTextRun 保存一系列文本,所有文本都具有相同的字符样式。可以修改文本和/或文本样式。)
  • org.apache.poi.sl.extractor.SlideShowExtractor Uses the model code to allow extraction of text from files(org.apache.poi.sl.extractor.SlideShowExtractor 使用模型代码允许从文件中提取文本)
  • org.apache.poi.hslf.extractor.QuickButCruddyTextExtractor Uses the record code to extract all the text from files very fast, but including deleted text (and other bits of Crud).(org.apache.poi.hslf.extractor.QuickButCruddyTextExtractor 使用记录代码非常快速地从文件中提取所有文本,但包括已删除的文本(和 Crud 的其他部分)。)

by Nick Burch(通过尼克伯奇)

 
中英文 | 中文 | 英文