The New Halloween Document(新的万圣节文档)

How to use the HSSF API(如何使用 HSSF API)

Capabilities(能力)

This release of the how-to outlines functionality for the current svn trunk. Those looking for information on previous releases should look in the documentation distributed with that release.(此版本的操作指南概述了当前 svn 主干的功能。如果想要了解以前版本信息,你应该查看与该版本一起分发的文档。)

HSSF allows numeric, string, date or formula cell values to be written to or read from an XLS file. Also in this release is row and column sizing, cell styling (bold, italics, borders,etc), and support for both built-in and user defined data formats. Also available is an event-based API for reading XLS files. It differs greatly from the read/write API and is intended for intermediate developers who need a smaller memory footprint.(HSSF 允许在 XLS 文件中写入或读取数字、字符串、日期或公式单元格值。此版本还支持行和列大小、单元格样式(粗体、斜体、边框等),并支持内置和用户定义的数据格式。还有一个用于读取 XLS 文件的基于事件的 API。它与读/写 API 有很大不同,适用于需要较小内存占用的中级开发人员。)

Different APIs(不同的 API)

There are a few different ways to access the HSSF API. These have different characteristics, so you should read up on all to select the best for you.(有几种不同的方法可以访问 HSSF API。它们具有不同的特征,因此您应该阅读所有内容以选择最适合您的。)

General Use(一般使用)

User API (HSSF and XSSF)(User API(HSSF 和 XSSF))

Writing a new file(写一个新文件)

The high level API (package: org.apache.poi.ss.usermodel) is what most people should use. Usage is very simple.(大多数人应该使用高级 API(包:org.apache.poi.ss.usermodel)。用法很简单。)

Workbooks are created by creating an instance of org.apache.poi.ss.usermodel.Workbook. Either create a concrete class directly (org.apache.poi.hssf.usermodel.HSSFWorkbook or org.apache.poi.xssf.usermodel.XSSFWorkbook), or use the handy factory class org.apache.poi.ss.usermodel.WorkbookFactory.(通过创建 org.apache.poi.ss.usermodel.Workbook 的实例来创建工作簿。也可以创建一个具体的类(org.apache.poi.hssf.usermodel.HSSFWorkbook 或 org.apache.poi.xssf.usermodel.XSSFWorkbook),或者使用便捷的工厂类 org.apache.poi.ss.usermodel.WorkbookFactory。)

Sheets are created by calling createSheet() from an existing instance of Workbook, the created sheet is automatically added in sequence to the workbook. Sheets do not in themselves have a sheet name (the tab at the bottom); you set the name associated with a sheet by calling Workbook.setSheetName(sheetindex,"SheetName",encoding). For HSSF, the name may be in 8bit format (HSSFWorkbook.ENCODING_COMPRESSED_UNICODE) or Unicode (HSSFWorkbook.ENCODING_UTF_16). Default encoding for HSSF is 8bit per char. For XSSF, the name is automatically handled as unicode.(通过从现有的 Workbook 实例调用 createSheet() 创建工作表,创建的工作表会自动按顺序添加到工作簿中。工作表本身没有名称(底部的选项卡);您可以通过调用 Workbook.setSheetName(sheetindex,"SheetName",encoding) 设置与工作表关联的名称。对于 HSSF,名称可以是 8 位格式 (HSSFWorkbook.ENCODING_COMPRESSED_UNICODE) 或 Unicode (HSSFWorkbook.ENCODING_UTF_16)。 HSSF 的默认编码是每字符 8 位。对于 XSSF,名称会自动作为 unicode 处理。)

Rows are created by calling createRow(rowNumber) from an existing instance of Sheet. Only rows that have cell values should be added to the sheet. To set the row's height, you just call setRowHeight(height) on the row object. The height must be given in twips, or 1/20th of a point. If you prefer, there is also a setRowHeightInPoints method.(通过从 Sheet 的现有实例调用 createRow(rowNumber) 创建行。只有拥有单元格的行才应添加到工作表中。要设置行的高度,只需在行对象上调用 setRowHeight(height)。高度必须是缇或点的 1/20。如果您需要,可以用setRowHeightInPoints 方法。)

Cells are created by calling createCell(column, type) from an existing Row. Only cells that have values should be added to the row. Cells should have their cell type set to either Cell.CELL_TYPE_NUMERIC or Cell.CELL_TYPE_STRING depending on whether they contain a numeric or textual value. Cells must also have a value set. Set the value by calling setCellValue with either a String or double as a parameter. Individual cells do not have a width; you must call setColumnWidth(colindex, width) (use units of 1/256th of a character) on the Sheet object. (You can't do it on an individual basis in the GUI either).(单元格通过从现有行调用createCell(column, type) 来创建。只有拥有值的单元格才应添加到行中。单元格应将其类型设置为 Cell.CELL_TYPE_NUMERIC 或 Cell.CELL_TYPE_STRING,具体取决于它们是否包含数字或文本值。单元格还必须有一个值集。通过使用 String 或 double 作为参数调用 setCellValue 来设置值。单个单元格没有宽度;您必须在 Sheet 对象上调用 setColumnWidth(colindex, width) (使用字符的 1/256 的单位)。 (您不能在 GUI 中单独执行此操作)。)

Cells are styled with CellStyle objects which in turn contain a reference to an Font object. These are created via the Workbook object by calling createCellStyle() and createFont(). Once you create the object you must set its parameters (colors, borders, etc). To set a font for an CellStyle call setFont(fontobj).(单元格使用 CellStyle 对象设置样式,这些对象又包含对 Font 对象的引用。这些是通过 Workbook 对象调用 createCellStyle() 和 createFont() 创建的。创建对象后,您必须设置其参数(颜色、边框等)。使用 CellStyle 调用 setFont(fontobj) 设置字体。)

Once you have generated your workbook, you can write it out by calling write(outputStream) from your instance of Workbook, passing it an OutputStream (for instance, a FileOutputStream or ServletOutputStream). You must close the OutputStream yourself. HSSF does not close it for you.(生成工作簿后,您可以通过从工作簿实例调用 write(outputStream) 将其写出,并将其传递给 OutputStream(例如,FileOutputStream 或 ServletOutputStream)。您必须自己关闭 OutputStream。 HSSF 不会为您关闭它。)

Here is some example code (excerpted and adapted from org.apache.poi.hssf.dev.HSSF test class):(这是一些示例代码(摘自并改编自 org.apache.poi.hssf.dev.HSSF 测试类):)

short rownum;
// create a new file
// 创建一个新文件
FileOutputStream out = new FileOutputStream("workbook.xls");
// create a new workbook
// 创建一个新的工作簿
Workbook wb = new HSSFWorkbook();
// create a new sheet
// 创建一个新工作表
Sheet s = wb.createSheet();
// declare a row object reference
// 声明一个行对象引用
Row r = null;
// declare a cell object reference
// 声明一个单元格对象引用
Cell c = null;
// create 3 cell styles
// 创建 3 个单元格样式
CellStyle cs = wb.createCellStyle();
CellStyle cs2 = wb.createCellStyle();
CellStyle cs3 = wb.createCellStyle();
DataFormat df = wb.createDataFormat();
// create 2 fonts objects
// 创建 2 个字体对象
Font f = wb.createFont();
Font f2 = wb.createFont();
//set font 1 to 12 point type
//设置字体1为12点类型
f.setFontHeightInPoints((short) 12);
//make it blue
//把它变成蓝色
f.setColor( (short)0xc );
// make it bold
//arial is the default font
// 加粗
//arial 是默认字体
f.setBoldweight(Font.BOLDWEIGHT_BOLD);
//set font 2 to 10 point type
//设置字体2为10点类型
f2.setFontHeightInPoints((short) 10);
//make it red
//把它变成红色
f2.setColor( (short)Font.COLOR_RED );
//make it bold
//使其加粗
f2.setBoldweight(Font.BOLDWEIGHT_BOLD);
f2.setStrikeout( true );
//set cell stlye
//设置单元格样式
cs.setFont(f);
//set the cell format
//设置单元格格式
cs.setDataFormat(df.getFormat("#,##0.0"));
//set a thin border
//设置细边框
cs2.setBorderBottom(cs2.BORDER_THIN);
//fill w fg fill color
//填充颜色
cs2.setFillPattern((short) CellStyle.SOLID_FOREGROUND);
//set the cell format to text see DataFormat for a full list
//将单元格格式设置为文本,请参阅DataFormat以获取完整列表
cs2.setDataFormat(HSSFDataFormat.getBuiltinFormat("text"));
// set the font
// 设置字体
cs2.setFont(f2);
// set the sheet name in Unicode
// 以 Unicode 格式设置工作表名称
wb.setSheetName(0, "\u0422\u0435\u0441\u0442\u043E\u0432\u0430\u044F " +
"\u0421\u0442\u0440\u0430\u043D\u0438\u0447\u043A\u0430" );
// in case of plain ascii
// wb.setSheetName(0, "HSSF Test");
// create a sheet with 30 rows (0-29)
// 如果是纯Ascii,使用wb.setSheetName(0, "HSSF Test");
// 创建一个 30 行 (索引0-29) 的工作表
int rownum;
for (rownum = (short) 0; rownum < 30; rownum++)
{
// create a row
// 创建一行
r = s.createRow(rownum);
// on every other row
//每隔一行
if ((rownum % 2) == 0)
{
// make the row height bigger (in twips - 1/20 of a point)
// 增大行高(以缇为单位——1/20 点)
r.setHeight((short) 0x249);
}
//r.setRowNum(( short ) rownum);
// create 10 cells (0-9) (the += 2 becomes apparent later
//r.setRowNum(( short ) rownum);
// 创建 10 个单元格 (0-9),cellnum+=2在后面的代码会体现出来
for (short cellnum = (short) 0; cellnum < 10; cellnum += 2)
{
// create a numeric cell
// 创建一个数字单元格
c = r.createCell(cellnum);
// do some goofy math to demonstrate decimals
// 做一些幼稚的数学来演示小数
c.setCellValue(rownum * 10000 + cellnum
+ (((double) rownum / 1000)
+ ((double) cellnum / 10000)));
String cellValue;
// create a string cell (see why += 2 in the
// 创建一个字符串单元格,这就是cellnum += 2的原因
c = r.createCell((short) (cellnum + 1));
// on every other row
//每隔一行
if ((rownum % 2) == 0)
{
// set this cell to the first cell style we defined
// 将此单元格设置为我们定义的第一个单元格样式
c.setCellStyle(cs);
// set the cell's string value to "Test"
// 将单元格的字符串值设置为“Test”
c.setCellValue( "Test" );
}
else
{
c.setCellStyle(cs2);
// set the cell's string value to "\u0422\u0435\u0441\u0442"
// 将单元格的字符串值设置为“Тест”
c.setCellValue( "\u0422\u0435\u0441\u0442" );
}
// make this column a bit wider
// 使这一列更宽一些
s.setColumnWidth((short) (cellnum + 1), (short) ((50 * 8) / ((double) 1 / 20)));
}
}
//draw a thick black border on the row at the bottom using BLANKS
// advance 2 rows
//使用BLANKS在底部的行上画一个粗黑边框
//前进2行
rownum++;
rownum++;
r = s.createRow(rownum);
// define the third style to be the default
// except with a thick black border at the bottom
// 将第三种样式定义为默认样式
// 底部有一个粗黑色边框
cs3.setBorderBottom(cs3.BORDER_THICK);
//create 50 cells
//创建50个单元格
for (short cellnum = (short) 0; cellnum < 50; cellnum++)
{
//create a blank type cell (no value)
//创建一个空白类型单元格(无值)
c = r.createCell(cellnum);
// set it to the thick black border style
// 设置为黑色粗边框样式
c.setCellStyle(cs3);
}
//end draw thick black border
//结束画粗黑边框
// demonstrate adding/naming and deleting a sheet
// create a sheet, set its title then delete it
// 演示添加/命名和删除工作表
// 创建工作表,设置其标题,然后将其删除
s = wb.createSheet();
wb.setSheetName(1, "DeletedSheet");
wb.removeSheetAt(1);
//end deleted sheet
//结束已删除的工作表
// write the workbook to the output stream
// close our file (don't blow out our file handles
// 将工作簿写入输出流
// 关闭我们的文件(不要破坏文件句柄)
wb.write(out);
out.close();

Reading or modifying an existing file(读取或修改现有文件)

Reading in a file is equally simple. To read in a file, create a new instance of org.apache.poi.poifs.Filesystem, passing in an open InputStream, such as a FileInputStream for your XLS, to the constructor. Construct a new instance of org.apache.poi.hssf.usermodel.HSSFWorkbook passing the Filesystem instance to the constructor. From there you have access to all of the high level model objects through their assessor methods (workbook.getSheet(sheetNum), sheet.getRow(rownum), etc).(读入文件同样简单。要读入文件,先创建 org.apache.poi.poifs.Filesystem 的新实例,将InputStream(例如XLS的FileInputStream)传递给构造函数。再把Filesystem 实例传递给构造函数,构造一个新的 org.apache.poi.hssf.usermodel.HSSFWorkbook 实例。从那里您可以通过他们的评估方法(workbook.getSheet(sheetNum)、sheet.getRow(rownum) 等)访问所有高级模型对象。)

Modifying the file you have read in is simple. You retrieve the object via an assessor method, remove it via a parent object's remove method (sheet.removeRow(hssfrow)) and create objects just as you would if creating a new xls. When you are done modifying cells just call workbook.write(outputstream) just as you did above.(修改读入的文件很简单。您可以通过评估器方法检索对象,通过父对象的删除方法 (sheet.removeRow(hssfrow)) 将其删除,然后像创建新 xls 一样创建对象。完成修改单元格后,只需像上面那样调用 workbook.write(outputstream) 即可。)

An example of this can be seen in org.apache.poi.hssf.usermodel.examples.HSSFReadWrite.(org.apache.poi.hssf.usermodel.examples.HSSFReadWrite 中有一个例子可以参考。)

Event API (HSSF Only)(Event API(仅限 HSSF))

The event API is newer than the User API. It is intended for intermediate developers who are willing to learn a little bit of the low level API structures. Its relatively simple to use, but requires a basic understanding of the parts of an Excel file (or willingness to learn). The advantage provided is that you can read an XLS with a relatively small memory footprint.(事件 API 比用户 API 更新。它适用于想要学习一点低级 API 结构的中级开发人员。它使用起来相对简单,但需要对 Excel 文件的各个部分有基本的了解(或想要学习)。它的优势是您可以使用相对较小的内存占用读取XLS。)

One important thing to note with the basic Event API is that it triggers events only for things actually stored within the file. With the XLS file format, it is quite common for things that have yet to be edited to simply not exist in the file. This means there may well be apparent "gaps" in the record stream, which you either need to work around, or use the Record Aware extension to the Event API.(使用基础的Event API 需要注意的一件重要事情是,它只对文件中实际存储的内容触发事件。对于 XLS 文件格式,尚未编辑的内容不存在于文件中是很常见的。这意味着记录流中很可能存在明显的“差距”,您需要解决这些问题或者使用 Event API 的Record Aware扩展。)

To use this API you construct an instance of org.apache.poi.hssf.eventmodel.HSSFRequest. Register a class you create that supports the org.apache.poi.hssf.eventmodel.HSSFListener interface using the HSSFRequest.addListener(yourlistener, recordsid). The recordsid should be a static reference number (such as BOFRecord.sid) contained in the classes in org.apache.poi.hssf.record. The trick is you have to know what these records are. Alternatively you can call HSSFRequest.addListenerForAllRecords(mylistener). In order to learn about these records you can either read all of the javadoc in the org.apache.poi.hssf.record package or you can just hack up a copy of org.apache.poi.hssf.dev.EFHSSF and adapt it to your needs. TODO: better documentation on records.(要使用此API,您需要构建org.apache.poi.hssf.eventmodel.HSSFRequest的实例,并使用 HSSFRequest.addListener(yourlistener, recordsid) 注册您的实现了org.apache.poi.hssf.eventmodel.HSSFListener 接口的类。recordsid 应当是包含在 org.apache.poi.hssf.record 中的类中的静态引用数值(例如 BOFRecord.sid)。您必须知道这些记录是什么,或者,您可以调用 HSSFRequest.addListenerForAllRecords(mylistener)。为了了解这些记录,您可以阅读 org.apache.poi.hssf.record 包中的所有 javadoc,或者您可以直接破解 org.apache.poi.hssf.dev.EFHSSF 的副本并根据您的需要对其进行修改。 TODO:更好的记录文档。)

Once you've registered your listeners in the HSSFRequest object you can construct an instance of org.apache.poi.poifs.filesystem.FileSystem (see POIFS howto) and pass it your XLS file inputstream. You can either pass this, along with the request you constructed, to an instance of HSSFEventFactory via the HSSFEventFactory.processWorkbookEvents(request, Filesystem) method, or you can get an instance of DocumentInputStream from Filesystem.createDocumentInputStream("Workbook") and pass it to HSSFEventFactory.processEvents(request, inputStream). Once you make this call, the listeners that you constructed receive calls to their processRecord(Record) methods with each Record they are registered to listen for until the file has been completely read.(一旦你在 HSSFRequest 对象中注册了你的监听器,你就可以构造一个 org.apache.poi.poifs.filesystem.FileSystem 的实例(参见 POIFS howto)并将它传递给你的 XLS 文件输入流。您可以通过 HSSFEventFactory.processWorkbookEvents(request, Filesystem) 方法将此连同您构造的请求一起传递给 HSSFEventFactory 的实例,或者您可以从 Filesystem.createDocumentInputStream("Workbook") 获取 DocumentInputStream 的实例并把它传递到 HSSFEventFactory.processEvents(request, inputStream)。进行此调用后,您构建的侦听器将收到对其 processRecord(Record) 方法的调用,其中每个 Record 都会被侦听,直到文件被完全读取。)

A code excerpt from org.apache.poi.hssf.dev.EFHSSF (which is in CVS or the source distribution) is reprinted below with excessive comments:(org.apache.poi.hssf.dev.EFHSSF 的代码摘录(在 CVS 或源代码分发中)在下面列出,并带有很多注释:)

/**
* This example shows how to use the event API for reading a file.
*/
/**
* 此示例演示如何使用Event API读取文件
*/
public class EventExample
implements HSSFListener
{
private SSTRecord sstrec;
/**
* This method listens for incoming records and handles them as required.
* @param record The record that was found while reading.
*/
/**
* 此方法侦听传入的记录,并根据需要处理它们.
* @param record 读取的记录
*/
public void processRecord(Record record)
{
switch (record.getSid())
{
// the BOFRecord can represent either the beginning of a sheet or the workbook
// BOFRecord 表示工作表或工作簿的开头
case BOFRecord.sid:
BOFRecord bof = (BOFRecord) record;
if (bof.getType() == bof.TYPE_WORKBOOK)
{
System.out.println("Encountered workbook");
// assigned to the class level member
// 分配给类级别成员
} else if (bof.getType() == bof.TYPE_WORKSHEET)
{
System.out.println("Encountered sheet reference");
}
break;
case BoundSheetRecord.sid:
BoundSheetRecord bsr = (BoundSheetRecord) record;
System.out.println("New sheet named: " + bsr.getSheetname());
break;
case RowRecord.sid:
RowRecord rowrec = (RowRecord) record;
System.out.println("Row found, first column at "
+ rowrec.getFirstCol() + " last column at " + rowrec.getLastCol());
break;
case NumberRecord.sid:
NumberRecord numrec = (NumberRecord) record;
System.out.println("Cell found with value " + numrec.getValue()
+ " at row " + numrec.getRow() + " and column " + numrec.getColumn());
break;
// SSTRecords store a array of unique strings used in Excel.
//SSTRecords存储Excel中使用的唯一字符串数组
case SSTRecord.sid:
sstrec = (SSTRecord) record;
for (int k = 0; k < sstrec.getNumUniqueStrings(); k++)
{
System.out.println("String table value " + k + " = " + sstrec.getString(k));
}
break;
case LabelSSTRecord.sid:
LabelSSTRecord lrec = (LabelSSTRecord) record;
System.out.println("String cell found with value "
+ sstrec.getString(lrec.getSSTIndex()));
break;
}
}
/**
* Read an excel file and spit out what we find.
*
* @param args Expect one argument that is the file to read.
* @throws IOException When there is an error processing the file.
*/
/**
* 读取一个excel文件并获取内容.
*
* @param args 预期的参数是要读取的文件。
* @throws IOException 处理文件时出错。
*/
public static void main(String[] args) throws IOException
{
// create a new file input stream with the input file specified
// at the command line
// 使用命令行中指定的输入文件创建新的文件输入流
FileInputStream fin = new FileInputStream(args[0]);
// create a new org.apache.poi.poifs.filesystem.Filesystem
// 创建一个新的 org.apache.poi.poifs.filesystem.Filesystem
POIFSFileSystem poifs = new POIFSFileSystem(fin);
// get the Workbook (excel part) stream in a InputStream
// 在 InputStream 中获取 Workbook(excel 部分)流
InputStream din = poifs.createDocumentInputStream("Workbook");
// construct out HSSFRequest object
// 构造出 HSSFRequest 对象
HSSFRequest req = new HSSFRequest();
// lazy listen for ALL records with the listener shown above
// 使用上面的侦听器延迟侦听所有记录
req.addListenerForAllRecords(new EventExample());
// create our event factory
// 创建我们的事件工厂
HSSFEventFactory factory = new HSSFEventFactory();
// process our events based on the document input stream
// 根据文档输入流处理我们的事件
factory.processEvents(req, din);
// once all the events are processed close our file input stream
// 处理完所有事件后,关闭我们的文件输入流
fin.close();
// and our document input stream (don't want to leak these!)
// 并且关闭文档输入流(避免它泄露!)
din.close();
System.out.println("done.");
}
}

Record Aware Event API (HSSF Only)(记录感知事件 API(仅限 HSSF))

This is an extension to the normal Event API. With this, your listener will be called with extra, dummy records. These dummy records should alert you to records which aren't present in the file (eg cells that have yet to be edited), and allow you to handle these.(这是对普通Event API 的扩展。这样,您的监听器将被额外的虚拟记录调用。这些虚拟记录应当提醒您文件中不存在的记录(例如,尚未编辑的单元格),并允许您处理这些记录。)

There are three dummy records that your HSSFListener will be called with:(您的 HSSFListener 将调用三个虚拟记录:)

  • org.apache.poi.hssf.eventusermodel.dummyrecord.MissingRowDummyRecord
    This is called during the row record phase (which typically occurs before the cell records), and indicates that the row record for the given row is not present in the file.
    (org.apache.poi.hssf.eventusermodel.dummyrecord.MissingRowDummyRecord在行记录阶段(通常发生在单元格记录之前)被调用,并指示给定行的行记录不存在于文件中。)
  • org.apache.poi.hssf.eventusermodel.dummyrecord.MissingCellDummyRecord
    This is called during the cell record phase. It is called when a cell record is encountered which leaves a gap between it an the previous one. You can get multiple of these, before the real cell record.
    (org.apache.poi.hssf.eventusermodel.dummyrecord.MissingCellDummyRecord在单元记录阶段被调用。当遇到与前一个单元格记录之间存在差距的单元格记录时调用它。在真正的单元格记录之前,您可以获得其中的多个。)
  • org.apache.poi.hssf.eventusermodel.dummyrecord.LastCellOfRowDummyRecord
    This is called after the last cell of a given row. It indicates that there are no more cells for the row, and also tells you how many cells you have had. For a row with no cells, this will be the only record you get.
    (org.apache.poi.hssf.eventusermodel.dummyrecord.LastCellOfRowDummyRecord在给定行的最后一个单元格之后调用。它表示该行没有更多单元格,并且还告诉您有多少个单元格。对于没有单元格的行,这将是您获得的唯一记录。)

To use the Record Aware Event API, you should create an org.apache.poi.hssf.eventusermodel.MissingRecordAwareHSSFListener, and pass it your HSSFListener. Then, register the MissingRecordAwareHSSFListener to the event model, and start that as normal.(要使用 Record Aware Event API,您应该创建一个 org.apache.poi.hssf.eventusermodel.MissingRecordAwareHSSFListener,并将其传递给您的 HSSFListener。然后,将 MissingRecordAwareHSSFListener 注册到事件模型,并正常启动。)

One example use for this API is to write a CSV outputter, which always outputs a minimum number of columns, even where the file doesn't contain some of the rows or cells. It can be found at /src/examples/src/org/apache/poi/examples/hssf/eventusermodel/XLS2CSVmra.java, and may be called on the command line, or from within your own code. The latest version is always available from subversion.(此 API 的一个示例用途是编写一个 CSV 输出器,它始终输出最少数量的列,即使文件不包含某些行或单元格。它可以在 /src/examples/src/org/apache/poi/examples/hssf/eventusermodel/XLS2CSVmra.java 中找到,并且可以在命令行上或从您自己的代码中调用。最新版本总是可以从 subversion获得。)

In POI versions before 3.0.3, this code lived in the scratchpad section. If you're using one of these older versions of POI, you will either need to include the scratchpad jar on your classpath, or build from a subversion checkout.(在 3.0.3 之前的 POI 版本中,此代码位于scratchpad部分。如果您使用这些旧版本的 POI 之一,您将需要在类路径中包含scratchpad jar,或者从subversion checkout构建。)

XSSF and SAX (Event API)(XSSF 和 SAX(Event API))

If memory footprint is an issue, then for XSSF, you can get at the underlying XML data, and process it yourself. This is intended for intermediate developers who are willing to learn a little bit of low level structure of .xlsx files, and who are happy processing XML in java. Its relatively simple to use, but requires a basic understanding of the file structure. The advantage provided is that you can read a XLSX file with a relatively small memory footprint.(如果内存占用是一个问题,那么对于 XSSF,您可以获取底层 XML 数据,并自己处理它。这适用于愿意学习一点 .xlsx 文件的低级结构并且乐于在 java 中处理 XML 的中级开发人员。它使用起来相对简单,但需要对文件结构有基本的了解。提供的优势是您可以用相对较小的内存占用读取XLSX文件。)

One important thing to note with the basic Event API is that it triggers events only for things actually stored within the file. With the XLSX file format, it is quite common for things that have yet to be edited to simply not exist in the file. This means there may well be apparent "gaps" in the record stream, which you need to work around.(使用基础Event API 需要注意的一件重要事情,它只为文件中实际存储的内容触发事件。对于 XLSX 文件格式,尚未编辑的内容在文件中不存在是很常见的。这意味着记录流中很可能存在明显的“差距”,您需要解决这些问题。)

To use this API you construct an instance of org.apache.poi.xssf.eventmodel.XSSFReader. This will optionally provide a nice interface on the shared strings table, and the styles. It provides methods to get the raw xml data from the rest of the file, which you will then pass to SAX.(要使用此 API,您需要构建 org.apache.poi.xssf.eventmodel.XSSFReader 的实例。它在共享字符串表和样式上提供了一个很好的接口。它提供了从文件的其余部分获取原始 xml 数据的方法,然后将其传递给 SAX。)

This example shows how to get at a single known sheet, or at all sheets in the file. It is based on the example in svn src/examples/src/org/apache/poi/examples/xssf/eventusermodel/FromHowTo.java(此示例展示如何获取单个已知工作表或文件中的所有工作表。它基于svn src/examples/src/org/apache/poi/examples/xssf/eventusermodel/FromHowTo.java中的示例)

import java.io.InputStream;
import java.util.Iterator;
import org.apache.poi.util.XMLHelper;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.ParserConfigurationException;
public class ExampleEventUserModel {
public void processOneSheet(String filename) throws Exception {
OPCPackage pkg = OPCPackage.open(filename);
XSSFReader r = new XSSFReader( pkg );
SharedStringsTable sst = r.getSharedStringsTable();
XMLReader parser = fetchSheetParser(sst);
// To look up the Sheet Name / Sheet Order / rID,
// you need to process the core Workbook stream.
// Normally it's of the form rId# or rSheet#
// 要查找工作表名称/工作表顺序/rID,您需要处理核心工作簿流。通常是 rId# 或 rSheet# 的形式
InputStream sheet2 = r.getSheet("rId2");
InputSource sheetSource = new InputSource(sheet2);
parser.parse(sheetSource);
sheet2.close();
}
public void processAllSheets(String filename) throws Exception {
OPCPackage pkg = OPCPackage.open(filename);
XSSFReader r = new XSSFReader( pkg );
SharedStringsTable sst = r.getSharedStringsTable();
XMLReader parser = fetchSheetParser(sst);
Iterator<InputStream> sheets = r.getSheetsData();
while(sheets.hasNext()) {
System.out.println("Processing new sheet:\n");
InputStream sheet = sheets.next();
InputSource sheetSource = new InputSource(sheet);
parser.parse(sheetSource);
sheet.close();
System.out.println("");
}
}
public XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException, ParserConfigurationException {
XMLReader parser = XMLHelper.newXMLReader();
ContentHandler handler = new SheetHandler(sst);
parser.setContentHandler(handler);
return parser;
}
/**
* See org.xml.sax.helpers.DefaultHandler javadocs
*/
/**
* 请查阅API文档 org.xml.sax.helpers.DefaultHandler
*/
private static class SheetHandler extends DefaultHandler {
private SharedStringsTable sst;
private String lastContents;
private boolean nextIsString;
private SheetHandler(SharedStringsTable sst) {
this.sst = sst;
}
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
// c => cell
// c => 单元格
if(name.equals("c")) {
// Print the cell reference
//打印单元格引用
System.out.print(attributes.getValue("r") + " - ");
// Figure out if the value is an index in the SST
// 判断该值是否是 SST 中的索引
String cellType = attributes.getValue("t");
if(cellType != null && cellType.equals("s")) {
nextIsString = true;
} else {
nextIsString = false;
}
}
// Clear contents cache
// 清除内容缓存
lastContents = "";
}
public void endElement(String uri, String localName, String name)
throws SAXException {
// Process the last contents as required.
// Do now, as characters() may be called more than once
// 根据需要处理最后的内容。现在做,因为 characters() 可能会被多次调用
if(nextIsString) {
int idx = Integer.parseInt(lastContents);
lastContents = sst.getItemAt(idx).getString();
nextIsString = false;
}
// v => contents of a cell
// Output after we've seen the string contents
// v => 单元格的内容
//在我们看到字符串内容后输出
if(name.equals("v")) {
System.out.println(lastContents);
}
}
public void characters(char[] ch, int start, int length) {
lastContents += new String(ch, start, length);
}
}
public static void main(String[] args) throws Exception {
ExampleEventUserModel example = new ExampleEventUserModel();
example.processOneSheet(args[0]);
example.processAllSheets(args[0]);
}
}

For a fuller example, including support for fetching number formatting information and applying it to numeric cells (eg to format dates or percentages), please see the XLSX2CSV example in svn(有关更完整的示例,包括支持获取数字格式信息并将其应用于数字单元格(例如格式化日期或百分比),请参阅 svn 中的 XLSX2CSV 示例)

An example is also provided showing how to combine the user API and the SAX API by doing a streaming parse of larger worksheets and a traditional user-model parse of the rest of a workbook.(还提供了一个示例,展示了如何通过对较大的工作表进行流式解析和对工作簿的其余部分进行传统的用户模型解析来组合用户 API 和 SAX API。)

SXSSF (Streaming Usermodel API)(SXSSF(流式Usermodel API))

SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.(SXSSF(包:org.apache.poi.xssf.streaming)是 XSSF 的 API 兼容的流式扩展,在必须生成非常大的电子表格并且堆空间有限时使用。 SXSSF 通过限制对滑动窗口内行的访问来实现其低内存占用,而 XSSF 允许访问文档中的所有行。不在窗口中的旧行会变得不可访问,因为它们被写入了磁盘。)

You can specify the window size at workbook construction time via new SXSSFWorkbook(int windowSize) or you can set it per-sheet via SXSSFSheet#setRandomAccessWindowSize(int windowSize)(您可以通过 new SXSSFWorkbook(int windowSize) 在工作簿构建时指定窗口大小,也可以通过 SXSSFSheet#setRandomAccessWindowSize(int windowSize) 为每个工作表设置它)

When a new row is created via createRow() and the total number of unflushed records would exceed the specified window size, then the row with the lowest index value is flushed and cannot be accessed via getRow() anymore.(当通过 createRow() 创建新行并且未刷新的记录总数将超过指定的窗口大小时,索引值最低的行将被刷新并且不能再通过 getRow() 访问。)

The default window size is 100 and defined by SXSSFWorkbook.DEFAULT_WINDOW_SIZE.(默认窗口大小为 100,由 SXSSFWorkbook.DEFAULT_WINDOW_SIZE 定义。)

A windowSize of -1 indicates unlimited access. In this case all records that have not been flushed by a call to flushRows() are available for random access.(windowSize值为-1时表示无限制访问。在这种情况下,所有未通过调用 flushRows() 刷新的记录都可用于随机访问。)

Note that SXSSF allocates temporary files that you must always clean up explicitly, by calling the dispose method.(请注意,SXSSF会分配临时文件,您必须始终通过调用dispose方法来明确清理这些文件。)

SXSSFWorkbook defaults to using inline strings instead of a shared strings table. This is very efficient, since no document content needs to be kept in memory, but is also known to produce documents that are incompatible with some clients. With shared strings enabled all unique strings in the document has to be kept in memory. Depending on your document content this could use a lot more resources than with shared strings disabled.(SXSSFWorkbook 默认使用内联字符串而不是共享字符串表。这是非常高效的,因为不需要将文档内容保存在内存中,众所周知的事,它生成的文档与某些客户端不兼容。启用共享字符串后,文档中的所有唯一字符串都必须保存在内存中。与禁用共享字符串相比,根据您的文档内容,这可能会占用更多的资源。)

Please note that there are still things that still may consume a large amount of memory based on which features you are using, e.g. merged regions, hyperlinks, comments, ... are still only stored in memory and thus may require a lot of memory if used extensively.(请注意,根据您使用的功能,有一些东西可能会消耗大量内存,例如合并区域、超链接、注释……存储在内存中。因此如果要广泛使用,可能需要大量内存。)

Carefully review your memory budget and compatibility needs before deciding whether to enable shared strings or not.(在决定是否启用共享字符串之前,请仔细检查您的内存预算和兼容性需求。)

The example below writes a sheet with a window of 100 rows. When the row count reaches 101, the row with rownum=0 is flushed to disk and removed from memory, when rownum reaches 102 then the row with rownum=1 is flushed, etc.(下面的示例编写了一个具有 100 行窗口的工作表。当行数达到 101 时,rownum=0 的行被刷新到磁盘并从内存中删除,当 rownum 达到 102 时,rownum=1 的行被刷新,以此类推。)

mport junit.framework.Assert;
mport org.apache.poi.ss.usermodel.Cell;
mport org.apache.poi.ss.usermodel.Row;
mport org.apache.poi.ss.usermodel.Sheet;
mport org.apache.poi.ss.usermodel.Workbook;
mport org.apache.poi.ss.util.CellReference;
mport org.apache.poi.xssf.streaming.SXSSFWorkbook;
public static void main(String[] args) throws Throwable {
SXSSFWorkbook wb = new SXSSFWorkbook(100); // keep 100 rows in memory, exceeding rows will be flushed to disk(在内存中保留 100 行,超过的行将被刷新到磁盘)
Sheet sh = wb.createSheet();
for(int rownum = 0; rownum < 1000; rownum++){
Row row = sh.createRow(rownum);
for(int cellnum = 0; cellnum < 10; cellnum++){
Cell cell = row.createCell(cellnum);
String address = new CellReference(cell).formatAsString();
cell.setCellValue(address);
}
}
// Rows with rownum < 900 are flushed and not accessible
// rownum < 900 的行被刷新且不可访问
for(int rownum = 0; rownum < 900; rownum++){
Assert.assertNull(sh.getRow(rownum));
}
// ther last 100 rows are still in memory
// 最后 100 行仍在内存中
for(int rownum = 900; rownum < 1000; rownum++){
Assert.assertNotNull(sh.getRow(rownum));
}
FileOutputStream out = new FileOutputStream("/temp/sxssf.xlsx");
wb.write(out);
out.close();
// dispose of temporary files backing this workbook on disk
// 处理磁盘上支持此工作簿的临时文件
wb.dispose();
}

The next example turns off auto-flushing (windowSize=-1) and the code manually controls how portions of data are written to disk(下一个示例关闭自动刷新 (windowSize=-1),代码手动控制部分数据写入磁盘的方式)

mport org.apache.poi.ss.usermodel.Cell;
mport org.apache.poi.ss.usermodel.Row;
mport org.apache.poi.ss.usermodel.Sheet;
mport org.apache.poi.ss.usermodel.Workbook;
mport org.apache.poi.ss.util.CellReference;
mport org.apache.poi.xssf.streaming.SXSSFWorkbook;
public static void main(String[] args) throws Throwable {
SXSSFWorkbook wb = new SXSSFWorkbook(-1); // turn off auto-flushing and accumulate all rows in memory( 关闭自动刷新并在内存中累积所有行)
Sheet sh = wb.createSheet();
for(int rownum = 0; rownum < 1000; rownum++){
Row row = sh.createRow(rownum);
for(int cellnum = 0; cellnum < 10; cellnum++){
Cell cell = row.createCell(cellnum);
String address = new CellReference(cell).formatAsString();
cell.setCellValue(address);
}
// manually control how rows are flushed to disk
// 手动控制将行刷新到磁盘
if(rownum % 100 == 0) {
((SXSSFSheet)sh).flushRows(100); // retain 100 last rows and flush all others( 保留最后 100 行并刷新所有其他行)
// ((SXSSFSheet)sh).flushRows() is a shortcut for ((SXSSFSheet)sh).flushRows(0),
// this method flushes all rows
// ((SXSSFSheet)sh).flushRows() 是 ((SXSSFSheet)sh).flushRows(0) 的快捷方式,此方法刷新所有行
}
}
FileOutputStream out = new FileOutputStream("/temp/sxssf.xlsx");
wb.write(out);
out.close();
// dispose of temporary files backing this workbook on disk
// 处理磁盘上支持此工作簿的临时文件
wb.dispose();
}

SXSSF flushes sheet data in temporary files (a temp file per sheet) and the size of these temporary files can grow to a very large value. For example, for a 20 MB csv data the size of the temp xml becomes more than a gigabyte. If the size of the temp files is an issue, you can tell SXSSF to use gzip compression:(SXSSF 刷新临时文件中的工作表数据(每个工作表一个临时文件),这些临时文件的大小可以增长到一个非常大的值。例如,对于 20 MB 的 csv 数据,临时 xml 的大小变得超过 1 GB。如果临时文件的大小有问题,您可以告诉 SXSSF 使用 gzip 压缩:)

SXSSFWorkbook wb = new SXSSFWorkbook();
wb.setCompressTempFiles(true); // temp files will be gzipped( 临时文件将被压缩)

Low Level APIs(低级 API)

The low level API is not much to look at. It consists of lots of "Records" in the org.apache.poi.hssf.record.* package, and set of helper classes in org.apache.poi.hssf.model.*. The record classes are consistent with the low level binary structures inside a BIFF8 file (which is embedded in a POIFS file system). You probably need the book: "Microsoft Excel 97 Developer's Kit" from Microsoft Press in order to understand how these fit together (out of print but easily obtainable from Amazon's used books). In order to gain a good understanding of how to use the low level APIs should view the source in org.apache.poi.hssf.usermodel.* and the classes in org.apache.poi.hssf.model.*. You should read the documentation for the POIFS libraries as well.(低级API没什么可看的。它由org.apache.poi.hssf.record.*包中的许多“记录”以及org.apache.poi.hssf.model.*中的帮助程序类集组成。记录类与BIFF8文件(嵌入POIFS文件系统)中的低级二进制结构一致。您可能需要这本书:Microsoft Press的“ Microsoft Excel 97开发人员工具包”,以了解它们的组合方式(绝版但可从Amazon的二手书中获得)。为了更好地了解如何使用低级API,您应当查看org.apache.poi.hssf.usermodel.*中的源代码以及org.apache.poi.hssf.model.*中的类。您还应该阅读POIFS库的文档。)

Generating XLS from XML(从 XML 生成 XLS)

If you wish to generate an XLS file from some XML, it is possible to write your own XML processing code, then use the User API to write out the document.(如果您希望从一些 XML 生成 XLS 文件,可以编写自己的 XML 处理代码,然后使用User API 来写出文档。)

The other option is to use Cocoon. In Cocoon, there is the HSSF Serializer, which takes in XML (in the gnumeric format), and outputs an XLS file for you.(另一种选择是使用 Cocoon。在 Cocoon 中,有一个 HSSF Serializer,它接收 XML(以 gnumeric 格式),并为您输出一个 XLS 文件。)

HSSF Class/Test Application(HSSF 类/测试应用)

The HSSF application is nothing more than a test for the high level API (and indirectly the low level support). The main body of its code is repeated above. To run it:(HSSF 应用程序只不过是对高级 API 的测试(以及间接的低级支持)。上面重复了它的代码主体。要运行它:)

  • download the poi-alpha build and untar it (tar xvzf tarball.tar.gz)(下载 poi-alpha 构建并解压它 (tar xvzf tarball.tar.gz))
  • set up your classpath as follows: export HSSFDIR={wherever you put HSSF's jar files} export LOG4JDIR={wherever you put LOG4J's jar files} export CLASSPATH=$CLASSPATH:$HSSFDIR/hssf.jar:$HSSFDIR/poi-poifs.jar:$HSSFDIR/poi-util.jar:$LOG4JDIR/jog4j.jar(如下设置您的类路径: export HSSFDIR={无论您将 HSSF 的 jar 文件放在哪里} export LOG4JDIR={无论您将 LOG4J 的 jar 文件放在哪里} export CLASSPATH=$CLASSPATH:$HSSFDIR/hssf.jar:$HSSFDIR/poi-poifs.jar:$HSSFDIR/poi-util.jar:$LOG4JDIR/jog4j.jar)
  • type: java org.apache.poi.hssf.dev.HSSF ~/myxls.xls write(类型:java org.apache.poi.hssf.dev.HSSF ~/myxls.xls write)

This should generate a test sheet in your home directory called "myxls.xls".(这应当在您的主目录中生成一个名为“myxls.xls”的测试表。)

  • Type: java org.apache.poi.hssf.dev.HSSF ~/input.xls output.xls

    This is the read/write/modify test. It reads in the spreadsheet, modifies a cell, and writes it back out. Failing this test is not necessarily a bad thing. If HSSF tries to modify a non-existant sheet then this will most likely fail. No big deal.
    (类型:java org.apache.poi.hssf.dev.HSSF ~/input.xls output.xls
    这是读/写/修改测试。它读入电子表格,修改单元格,然后将其写回。没通过测试不一定是坏事。如果 HSSF尝试修改不存在的工作表,那么这很可能会失败。没什么大不了的。)

HSSF Developer's Tools(HSSF 开发者工具)

HSSF has a number of tools useful for developers to debug/develop stuff using HSSF (and more generally XLS files). We've already discussed the app for testing HSSF read/write/modify capabilities; now we'll talk a bit about BiffViewer. Early on in the development of HSSF, it was decided that knowing what was in a record, what was wrong with it, etc. was virtually impossible with the available tools. So we developed BiffViewer. You can find it at org.apache.poi.hssf.dev.BiffViewer. It performs two basic functions and a derivative.(HSSF具有许多对开发人员有用的工具,这些工具可用于使用HSSF(以及常见的XLS文件)调试/开发内容。我们已经讨论过用于测试HSSF读/写/修改功能的应用程序;现在我们来谈谈BiffViewer。在开发HSSF的早期,用现有的工具几乎不可能知道记录中的内容,出了什么问题等。因此,我们开发了BiffViewer。您可以在org.apache.poi.hssf.dev.BiffViewer中找到它。它拥有两个基本功能和一个派生物。)

The first is "biffview". To do this you run it (assumes you have everything setup in your classpath and that you know what you're doing enough to be thinking about this) with an xls file as a parameter. It will give you a listing of all understood records with their data and a list of not-yet-understood records with no data (because it doesn't know how to interpret them). This listing is useful for several things. First, you can look at the values and SEE what is wrong in quasi-English. Second, you can send the output to a file and compare it.(第一个是“biffview”。所以,您可以使用 xls 文件作为参数运行它(假设您已在类路径中设置了所有内容,并且您知道自己在做什么)。它将为您提供所有已了解的记录及其数据列表和尚未理解的无数据记录列表(因为它不知道如何解释它们)。这个清单对几件事很有用。首先,您可以查看这些值并查看准英语有什么问题。其次,您可以输出到文件并进行比较。)

The second function is "big freakin dump", just pass a file and a second argument matching "bfd" exactly. This will just make a big hexdump of the file.(第二个函数是“big freakin dump”,只需传递一个文件和一个与“bfd”完全匹配的参数。这将使文件变成一个大的hexdump。)

Lastly, there is "mixed" mode which does the same as regular biffview, only it includes hex dumps of certain records intertwined. To use that just pass a file with a second argument matching "on" exactly.(最后,有一种“混合”模式,它与常规的 biffview 相同,只是它包含某些交织在一起的记录的十六进制转储。要使用它,只需传递一个与“on”完全匹配的文件(第二个参数)。)

In the next release cycle we'll also have something called a FormulaViewer. The class is already there, but its not very useful yet. When it does something, we'll document it.(在下一个发布周期中,我们还将拥有一个叫做 FormulaViewer 的东西。该类已经存在,但还没什么卵用。等它干活时,我们会记录。)

What's Next?(下一步是什么?)

Further effort on HSSF is going to focus on the following major areas:(HSSF 的进一步研发将集中在以下主要领域:)

  • Performance: POI currently uses a lot of memory for large sheets.(性能:POI 目前对大型工作表使用大量内存。)
  • Charts: This is a hard problem, with very little documentation.(图表:这是一个难题,文档很少。)

So jump in! (点这里!)

by Andrew C. Oliver, Glen Stampoultzis, Nick Burch, Sergei Kozello(作者:Andrew C. Oliver、Glen Stampoultzis、Nick Burch、Sergei Kozello)

 
中英文 | 中文 | 英文