2026-02-16 20:14:18 +01:00

504 lines
22 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta content="Apache Forrest" name="Generator">
<meta name="Forrest-version" content="0.9">
<meta name="Forrest-skin-name" content="pelt">
<title>Apache POI&trade; - HWPF and XWPF - Java API to Handle Microsoft Word Files</title>
<link type="text/css" href="../../skin/basic.css" rel="stylesheet">
<link media="screen" type="text/css" href="../../skin/screen.css" rel="stylesheet">
<link media="print" type="text/css" href="../../skin/print.css" rel="stylesheet">
<link type="text/css" href="../../skin/profile.css" rel="stylesheet">
<script src="../../skin/getBlank.js" language="javascript" type="text/javascript"></script><script src="../../skin/getMenu.js" language="javascript" type="text/javascript"></script><script src="../../skin/fontsize.js" language="javascript" type="text/javascript"></script>
<link rel="shortcut icon" href="../../images/favicon.ico">
</head>
<body onload="init()">
<script type="text/javascript">ndeSetTextSize();</script>
<div id="top">
<!--+
|breadtrail
+-->
<div class="breadtrail">
<a href="https://www.apache.org">Apache Software Foundation</a> &gt; <a href="https://poi.apache.org">Apache POI</a><script src="../../skin/breadcrumbs.js" language="JavaScript" type="text/javascript"></script>
</div>
<!--+
|header
+-->
<div class="header">
<!--+
|start group logo
+-->
<div class="grouplogo">
<a href="https://www.apache.org"><img class="logoImage" alt="Apache Software Foundation" src="../../images/asflogo_horizontal_color.svg" title="The Apache Software Foundation is a cornerstone of the modern Open Source software ecosystem &ndash; supporting some of the most widely used and important software solutions powering today's Internet economy."></a>
</div>
<!--+
|end group logo
+-->
<!--+
|start Project Logo
+-->
<div class="projectlogo">
<a href="https://poi.apache.org"><img class="logoImage" alt="Apache POI" src="../../images/project-header.png" title="Apache POI is well-known in the Java field as a library for reading and writing Microsoft Office file formats, such as Excel, PowerPoint, Word, Visio, Publisher and Outlook. It supports both the older (OLE2) and new (OOXML - Office Open XML) formats."></a>
</div>
<!--+
|end Project Logo
+-->
<!--+
|start Search
+-->
<div class="searchbox">
<form action="https://www.google.com/search" method="get" class="roundtopsmall">
<input value="poi.apache.org" name="sitesearch" type="hidden"><input onFocus="getBlank (this, 'Search the site with google');" size="25" name="q" id="query" type="text" value="Search the site with google">&nbsp;
<input name="Search" value="Search" type="submit">
</form>
</div>
<!--+
|end search
+-->
<!--+
|start Tabs
+-->
<ul id="tabs">
<li>
<a class="unselected" href="../../index.html">Home</a>
</li>
<li>
<a class="unselected" href="../../help/index.html">Help</a>
</li>
<li class="current">
<a class="selected" href="../../components/index.html">Component APIs</a>
</li>
<li>
<a class="unselected" href="../../devel/index.html">Getting Involved</a>
</li>
</ul>
<!--+
|end Tabs
+-->
</div>
</div>
<div id="main">
<div id="publishedStrip">
<!--+
|start Subtabs
+-->
<div id="level2tabs"></div>
<!--+
|end Endtabs
+-->
<script type="text/javascript"><!--
document.write("Last Published: " + document.lastModified);
// --></script>
</div>
<!--+
|breadtrail
+-->
<div class="breadtrail">
&nbsp;
</div>
<!--+
|start Menu, mainarea
+-->
<!--+
|start Menu
+-->
<div id="menu">
<div onclick="SwitchMenu('menu_selected_1.1', '../../skin/')" id="menu_selected_1.1Title" class="menutitle" style="background-image: url('../../skin/images/chapter_open.gif');">Component APIs</div>
<div id="menu_selected_1.1" class="selectedmenuitemgroup" style="display: block;">
<div class="menuitem">
<a href="../../components/index.html">Overview</a>
</div>
<div class="menuitem">
<a href="../../apidocs/index.html">Javadocs</a>
</div>
<div onclick="SwitchMenu('menu_1.1.3', '../../skin/')" id="menu_1.1.3Title" class="menutitle">Excel (HSSF/XSSF)</div>
<div id="menu_1.1.3" class="menuitemgroup">
<div class="menuitem">
<a href="../../components/spreadsheet/index.html">Overview</a>
</div>
<div class="menuitem">
<a href="../../components/spreadsheet/quick-guide.html">Quick Guide</a>
</div>
<div class="menuitem">
<a href="../../components/spreadsheet/how-to.html">HOWTO</a>
</div>
<div class="menuitem">
<a href="../../components/spreadsheet/converting.html">HSSF to SS Converting</a>
</div>
<div class="menuitem">
<a href="../../components/spreadsheet/formula.html">Formula Support</a>
</div>
<div class="menuitem">
<a href="../../components/spreadsheet/eval.html">Formula Evaluation</a>
</div>
<div class="menuitem">
<a href="../../components/spreadsheet/eval-devguide.html">Eval Dev Guide</a>
</div>
<div class="menuitem">
<a href="../../components/spreadsheet/examples.html">Examples</a>
</div>
<div class="menuitem">
<a href="../../components/spreadsheet/use-case.html">Use Case</a>
</div>
<div class="menuitem">
<a href="../../components/spreadsheet/diagrams.html">Pictorial Docs</a>
</div>
<div class="menuitem">
<a href="../../components/spreadsheet/limitations.html">Limitations</a>
</div>
<div class="menuitem">
<a href="../../components/spreadsheet/user-defined-functions.html">User Defined Functions</a>
</div>
<div class="menuitem">
<a href="../../components/spreadsheet/excelant.html">ExcelAnt Tests</a>
</div>
<div class="menuitem">
<a href="../../components/spreadsheet/hacking-hssf.html">Hacking HSSF</a>
</div>
<div class="menuitem">
<a href="../../components/spreadsheet/record-generator.html">Record Generator</a>
</div>
<div class="menuitem">
<a href="../../components/spreadsheet/chart.html">Charts</a>
</div>
</div>
<div onclick="SwitchMenu('menu_1.1.4', '../../skin/')" id="menu_1.1.4Title" class="menutitle">PowerPoint (HSLF/XSLF)</div>
<div id="menu_1.1.4" class="menuitemgroup">
<div class="menuitem">
<a href="../../components/slideshow/index.html">Overview</a>
</div>
<div class="menuitem">
<a href="../../components/slideshow/quick-guide.html">Quick Guide</a>
</div>
<div class="menuitem">
<a href="../../components/slideshow/how-to-shapes.html">HSLF Cookbook</a>
</div>
<div class="menuitem">
<a href="../../components/slideshow/xslf-cookbook.html">XSLF Cookbook</a>
</div>
<div class="menuitem">
<a href="../../components/slideshow/ppt-wmf-emf-renderer.html">Render SL/WMF/EMF</a>
</div>
<div class="menuitem">
<a href="../../components/slideshow/ppt-file-format.html">PPT File Format</a>
</div>
</div>
<div onclick="SwitchMenu('menu_selected_1.1.5', '../../skin/')" id="menu_selected_1.1.5Title" class="menutitle" style="background-image: url('../../skin/images/chapter_open.gif');">Word (HWPF/XWPF)</div>
<div id="menu_selected_1.1.5" class="selectedmenuitemgroup" style="display: block;">
<div class="menupage">
<div class="menupagetitle">Overview</div>
</div>
<div class="menuitem">
<a href="../../components/document/quick-guide.html">HWPF Quick Guide</a>
</div>
<div class="menuitem">
<a href="../../components/document/quick-guide-xwpf.html">XWPF Quick Guide</a>
</div>
<div class="menuitem">
<a href="../../components/document/docoverview.html">HWPF Format</a>
</div>
<div class="menuitem">
<a href="../../components/document/projectplan.html">HWPF Project plan</a>
</div>
</div>
<div class="menuitem">
<a href="../../components/hsmf/index.html">Outlook (HSMF)</a>
</div>
<div class="menuitem">
<a href="../../components/diagram/index.html">Visio (HDGF+XDGF)</a>
</div>
<div onclick="SwitchMenu('menu_1.1.8', '../../skin/')" id="menu_1.1.8Title" class="menutitle">Publisher (HPBF)</div>
<div id="menu_1.1.8" class="menuitemgroup">
<div class="menuitem">
<a href="../../components/hpbf/index.html">Overview</a>
</div>
<div class="menuitem">
<a href="../../components/hpbf/file-format.html">File Format</a>
</div>
</div>
<div onclick="SwitchMenu('menu_1.1.9', '../../skin/')" id="menu_1.1.9Title" class="menutitle">OLE2 Filesystem (POIFS)</div>
<div id="menu_1.1.9" class="menuitemgroup">
<div class="menuitem">
<a href="../../components/poifs/index.html">Overview</a>
</div>
<div class="menuitem">
<a href="../../components/poifs/how-to.html">How To</a>
</div>
<div class="menuitem">
<a href="../../components/poifs/embeded.html">Embedded Documents</a>
</div>
<div class="menuitem">
<a href="../../components/poifs/fileformat.html">File System Documentation</a>
</div>
<div class="menuitem">
<a href="../../components/poifs/usecases.html">Use Cases</a>
</div>
<div class="menuitem">
<a href="../../components/poifs/design.html">Design</a>
</div>
</div>
<div onclick="SwitchMenu('menu_1.1.10', '../../skin/')" id="menu_1.1.10Title" class="menutitle">OLE2 Document Props (HPSF)</div>
<div id="menu_1.1.10" class="menuitemgroup">
<div class="menuitem">
<a href="../../components/hpsf/index.html">Overview</a>
</div>
<div class="menuitem">
<a href="../../components/hpsf/how-to.html">How To</a>
</div>
<div class="menuitem">
<a href="../../components/hpsf/thumbnails.html">Thumbnails</a>
</div>
<div class="menuitem">
<a href="../../components/hpsf/internals.html">Internals</a>
</div>
<div class="menuitem">
<a href="../../components/hpsf/todo.html">To Do</a>
</div>
</div>
<div class="menuitem">
<a href="../../components/hmef/index.html">TNEF (HMEF) for winmail.dat</a>
</div>
<div class="menuitem">
<a href="../../components/oxml4j/index.html">OpenXML4J (OOXML)</a>
</div>
<div class="menuitem">
<a href="../../components/logging.html">Logging framework</a>
</div>
<div class="menuitem">
<a href="../../components/configuration.html">Configuration</a>
</div>
</div>
<div id="credit">
<hr>
<a href="https://www.apache.org/events/current-event.html"><img border="0" title="Apache Event" alt="Apache Event - logo" src="https://www.apache.org/events/current-event-125x125.png" style="width: 125px;height: 125px;"></a>
</div>
<div id="roundbottom">
<img style="display: none" class="corner" height="15" width="15" alt="" src="../../skin/images/rc-b-l-15-1body-2menu-3menu.png"></div>
<!--+
|alternative credits
+-->
<div id="credit2">
<a href="https://donate.apache.org/"><img border="0" title="Support Apache" alt="Support Apache - logo" src="../../images/support-asf.png" style="width: 125px;height: 125px;"></a><a href="https://www.apache.org/foundation/press/kit/#poweredby"><img border="0" title="powered by POI" alt="powered by POI - logo" src="../../images/poweredby-poi-logo.png" style="width: 125px;height: 125px;"></a>
</div>
</div>
<!--+
|end Menu
+-->
<!--+
|start content
+-->
<div id="content">
<h1>Apache POI&trade; - HWPF and XWPF - Java API to Handle Microsoft Word Files</h1>
<h3>Overview</h3>
<div id="front-matter"></div>
<a name="Overview"></a>
<h2 class="boxed">Overview</h2>
<div class="section">
<p>HWPF is the name of our port of the Microsoft Word 97(-2007) file format
to pure Java. It also provides limited read only support for the older
Word 6 and Word 95 file formats.</p>
<p>The partner to HWPF for the new Word 2007 .docx format is <em>XWPF</em>.
Whilst HWPF and XWPF provide similar features, there is not a common
interface across the two of them at this time.</p>
<p>Both HWPF and XWPF could be described as "moderately functional". For some
use cases, especially around text extraction, support is very strong. For
others, support may be limited or incomplete, and it may be necessary to
dig down into low-level code. Error checking may be missing in places,
so it may be possible to accidentally generate invalid files. Enhancements
to fix such things are generally very well received!</p>
<p>As detailed in the <a href="../../components/">Components
Page</a>, HWPF is contained within the poi-scratchpad-XXX.jar, while XWPF
is in the poi-ooxml-XXX.jar. You will need to ensure you include the appropriate
jars (and their dependencies!) in your classpath to use HWPF or XWPF.</p>
<p>Please note that in version 3.12, due to a bug, you might need to include
poi-scratchpad-XXX.jar when using XWPF. This has been fixed again for the next
release as there should not be such a dependency.</p>
</div>
<a name="An+overview+of+the+code"></a>
<h2 class="boxed">An overview of the code</h2>
<div class="section">
<p>
Source in the <em>org.apache.poi.hwpf.model</em> tree is the Java representation of
internal Word format structure. This code is "internal", it shall not
be used by your code. Code from <em>org.apache.poi.hwpf.usermodel</em>
package is actual public and user-friendly (as much as possible) API to access document
parts. Source code in the
<em>org.apache.poi.hwpf.extractor</em>
tree is a wrapper of this to facilitate easy extraction of interesting things (eg the Text),
and
<em>org.apache.poi.hwpf.converter</em>
package contains Word-to-HTML and Word-to-FO converters (latest can be used to generate PDF
from Word files when using with
<a href="https://xmlgraphics.apache.org/fop/">Apache FOP</a>
). Also there is a small file-structure-dumping utility in
<em>org.apache.poi.hwpf.dev</em>
package, primally for developing purposes.
</p>
<p>
The main entry point to HWPF is HWPFDocument. Currently it has a lot of references both to
internal interfaces (
<em>org.apache.poi.hwpf.model</em>
package) and public API (
<em>org.apache.poi.hwpf.usermodel</em>
) package. It is possible that it will be split into two different interfaces (like WordFile
and WordDocument) in later versions.
</p>
<p>
The main entry point to XWPF is XWPFDocument. From there, you can get the
paragraphs, pictures, tables, sections, headers etc.
</p>
<p>
Currently, there are only a handful of example programs using HWPF and XWPF
available. They can be found in svn in the examples section, under
<a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hwpf">HWPF</a>
and
<a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xwpf">XWPF</a>.
Both HWPF and XWPF have fairly high levels of unit test coverage, which
provides examples of using the various areas of functionality of both
modules. These can be found in svn, under
<a href="https://github.com/apache/poi/tree/trunk/poi-scratchpad/src/test/java/org/apache/poi/hwpf">HWPF</a>
and
<a href="https://github.com/apache/poi/tree/trunk/poi-ooxml/src/test/java/org/apache/poi/xwpf">XWPF</a>.
Contributions of more examples, whether inspired by the unit tests or
not, would be most welcomed!
</p>
</div>
<a name="HWPF+Notes"></a>
<h2 class="boxed">HWPF Notes</h2>
<div class="section">
<p>A .doc Word document, as handled by HWPF, can be considered as very long single
text buffer. The HWPF API provides "pointers"
to document parts, like sections, paragraphs and character runs. Usually user will iterates
over main document part sections, paragraphs from sections and character runs from
paragraph. Each such interface is a pointer to document text subrange along with additional
properties (and they all extends same Range parent class). There is additional Range
implementations like Table, TableRow, TableCell, etc. Some structures like Bookmark or Field
can also provide subranges pointers.
</p>
<p>Changing file content usually requires a lot of synchronized changes in those structures like
updating property boundaries, position handlers, etc. Because of that HWPF API shall be
considered as not thread safe. In addition, there is a "one pointer" rule for changing
content. It means you should not use two different Range instances at one time. More
precisely, if you are changing file content using some range pointer, all other range
pointers except parents' ones become invalid. For example if you obtain overall range (1),
paragraph range (2) from overall range and character run range (3) from paragraph range and
change text of paragraph, character run range is now invalid and should not be used, but
overall range pointer still valid. Each time you obtaining range (pointer) new instance is
created. It means if you obtained two range pointers and changed document text using first
range pointer, second one became invalid.
</p>
</div>
<a name="XWPF+Patches+Required%21"></a>
<h2 class="boxed">XWPF Patches Required!</h2>
<div class="section">
<p>At the moment, XWPF covers many common use cases for reading and writing
.docx files. Whilst this is a great thing, it does mean that XWPF does
everything that the current POI committers need it to do, and so none of
the committers are actively adding new features.</p>
<p>If you come across a feature in XWPF that you need, and isn't currently
there, please do send in a patch to add the extra functionality! More details
on contributing patches are available on the <a href="../../devel/guidelines.html">"Contribution to POI" page</a>.</p>
</div>
<a name="HWPF+Patches+Required%21"></a>
<h2 class="boxed">HWPF Patches Required!</h2>
<div class="section">
<p>At the moment we unfortunately do not have someone taking care for HWPF
and fostering its development. What we need is someone to stand up, take
this thing under his hood as his baby and push it forward. Ryan Ackley,
who put a lot of effort into HWPF, is no longer on board, so HWPF is an
orphan child waiting to be adopted.</p>
<p>If <strong>you</strong> are interested in becoming the new HWPF
pointman, you should look into the Microsoft Word internals. A good
starting point seems to be Ryan Ackley's <a href="../../components/document/docoverview.html">overview</a>. An introduction to the binary
file formats is <a href="https://msdn.microsoft.com/en-us/library/cc998577%28v=office.12%29.aspx">available
from Microsoft</a>, which has some good references and links. After that,
the full details on the word format are available from
<a href="https://msdn.microsoft.com/en-us/library/cc313153%28v=office.12%29.aspx">Microsoft</a>,
but the documentation can be a little hard to get into at first... Try reading the
<a href="../../components/document/docoverview.html">overview</a> first, and looking at the existing
code, then finally look up the documentation for specific missing features.</p>
<p>As a first step you should familiarize yourself with the source code,
examples, test cases, and the HWPF patches available at <a href="https://issues.apache.org/">Bugzilla</a> (if any). Then you
should compile an overview of</p>
<ul>
<li>the current HWPF status,</li>
<li>the patches in <a href="https://issues.apache.org/bugzilla/">Bugzilla</a> to be checked
in (and those that should better be ditched),</li>
<li>the available test cases and the test cases still to be written,</li>
<li>the available documentation and the docs to be written,</li>
<li>anything else that seems reasonable</li>
</ul>
<p>When you start coding, you will not yet have write access to the
SVN repository. Please submit your patches to <a href="https://issues.apache.org/">Bugzilla</a> and nag <a href="mailto:dev.at.poi.apache.org">the dev list</a> until someone commits
them. Besides the actual checking in of HWPF patches, current POI
committers will also do some minor reviews now and then of your source code
patches, test cases and documentation to help ensure software quality. But
most of the time you will be on your own. However, anyone offering useful
contributions over a period of time will be offered committership!</p>
<p>Please do not forget to write <a href="https://www.junit.org/">JUnit</a> test cases and documentation!
We won't accept code that doesn't come with test cases. And please
consider that other contributors should be able to understand your source
code easily. If you need any help getting started with JUnit test cases
for HWPF, please ask on the developers' mailing list! If you show that you
are prepared to stick at it you will most likely be given SVN commit
access. See <a href="../../devel/guidelines.html">"Contribution to POI" page</a>
for more details and help getting started.</p>
<p>Of course we will help you as best as we can. However, presently there
is no committer who is really familiar with the Word format, so you'll be
mostly on your own. We are looking forward for you and your contributions!
Honor and glory of becoming a POI committer are waiting!</p>
</div>
<p align="right">
<font size="-2">by&nbsp;Nicola Ken Barozzi,&nbsp;Andrew C. Oliver,&nbsp;Ryan Ackley,&nbsp;Rainer Klute</font>
</p>
</div>
<!--+
|end content
+-->
<div class="clearboth">&nbsp;</div>
</div>
<div id="footer">
<!--+
|start bottomstrip
+-->
<div class="lastmodified">
<script type="text/javascript"><!--
document.write("Last Published: " + document.lastModified);
// --></script>
</div>
<div class="copyright">
Copyright &copy;
2001-2026 <a href="https://www.apache.org/">The Apache Software Foundation</a>
<br>
Apache POI, POI, Apache, the Apache logo, and the Apache
POI project logo are trademarks of The Apache Software Foundation.
</div>
<div id="logos">
<a href="https://validator.w3.org/check/referer"><img style="height: 31px; width: 88px;" title="Valid HTML 4.01!" alt="Valid HTML 4.01!" src="../../skin/images/valid-html401.png" class="logoImage"></a><a href="https://jigsaw.w3.org/css-validator/check/referer"><img style="height: 31px; width: 88px;" title="Valid CSS!" alt="Valid CSS!" src="../../skin/images/vcss.png" class="logoImage"></a>
</div>
<div id="feedback">
Send feedback about the website to:
<a id="feedbackto" href="mailto:dev@poi.apache.org?subject=Feedback%C2%A0components/document/index.html">dev@poi.apache.org</a>
</div>
<!--+
|end bottomstrip
+-->
</div>
</body>
</html>