Frequently Asked Questions

What is the difference between the normal and extended edition?

The extended edition of both the PDF Library and the Report Generator offer three additional features:

  • The ability to load and edit an existing PDF document as a template
  • The ability to digitally sign a document
  • Access to the interactive Form or "AcroForm" in the document

Can I convert HTML pages to PDF with the Report Generator?

Not directly. HTML comes in lots of different flavours, whereas the Report Generator uses its own XML (similar to XHTML, but with a few extensions and exceptions specific to PDF). You cannot parse arbitary HTML with the Report Generator - it will require some transformation, e.g. the top level tag is <pdf> not <html>. The "SampleTransformer.java" example in the download package shows one way to convert these tags, and the userguide has a useful section on "Migrating from HTML". The Report Generator conforms to the CSS2 specification, so if your HTML documents use CSS2 style sheets to separate content from presentation, the transformation will be simpler. The Tidy package, which converts HTML to XHTML, may help your conversion.

Can I convert MS Office documents (Word and Excel) to PDF?

No. Microsoft Office documents are saved in a proprietary format which we cannot parse.

Why do I get the message "Cannot connect to X11 server"?

This is only an issue with the PDF library viewer or the Graph library on UNIX when rendering to bitmap images, like PNG or GIF, and is a frustrating aspect of Java on UNIX - the java.awt.* classes need an X11 system to connect to. You have three options.

  1. Upgrade to Java 1.4 and pass the parameter "-Djava.awt.headless=true" to java when you run it - eg java -Djava.awt.headless=true PDFtoTIFF file.pdf. This is the best option, although it does require the X11 packages to be installed (even if X11 isn't running).
  2. Install and run "Xvfb" - the X virtual frame buffer. This is available for Linux and Solaris (and probably others), and acts as a "virtual" X server, without requiring a monitor to be attached. Also check the "xhost" command.
  3. Try using the Eteks pure java AWT classes at http://www.eteks.com/pja/en. We know of one user who got it working, but we haven't tried it ourselves.

N.B. Unlike Etek The X11 libraries are native libraries, not Java libraries, so you wouldn't include them in the classpath. They would typically be installed in /usr/lib, which needs root permission.

The Report Generator outputs to PDF rather than a bitmap, so you won't need any form of "windows" running at all to produce graphs, regardless of operating system.

Why is a blank screen is shown in Internet Explorer when a PDF document is requested from a Servlet or JSP?

This may occur when IE has failed to open your PDF viewer due to how this particular browser overrides the mime-type of the response with it's own guess, based on the suffix of the URL. More information regarding this "feature" of Internet Explorer can be found on the MSDN Web Site.

We have found the easiest way around this is to append a harmless "?.pdf" or a "&.pdf" to the end of the request string, e.g. http://bfo.co.uk/products/report/filteredexamples/date.jsp?.pdf

How can I generate PDFs from ASP pages ?

Java and ASP are not the most natural of bedfellows but thanks to the platform independency of XML you can integrate the Report Generator with your Microsoft Web Application.

The basic elements of an application that uses ASP to generate PDF documents are as follows.

  1. The ASP Server - Generates XML for the Report generator.
  2. The Java Application Server - Runs the BFO Report Generator parsing the ASP produced XML and generating the PDF.

To set up a test harness probably the best Java Application Server you can get for free is Tomcat. This will run on a Windows box and comes with good documentation to get you started. Setting up the Report Generator to run in Tomcat is a simple task, and there are complete instructions upon how to do this at the start of our userguide. Once the Report Generator is installed you will need to set up a PDFProxyServlet - one example called "SampleServlet.java" can be found in your Report Generator download. If you are new to Servlets you may require some help with a few definitions, but your Server documentation should have information on how to get them up an running.

So how does it work? In your application all requests for PDF Documents should be made via the Proxy Servlet running on your Java Application Server. Your servlet will need to return the URL of the ASP page returning the XML, which the Proxy Servlet will convert to a PDF to return to the client

For more information our userguide has a complete explanation of the PDFProxyServlet method.

Why does my table not continue onto the next page?

With the Report Generator there are specific rules for where a page break can occur. Page 17 of the userguide has some useful information regarding this, but the most basic rule to bear in mind is that only the following tags will split if they are spread across multiple pages.

  • <TABLE>
  • <UL>
  • <P>
  • <PRE>
  • <OL>
  • <H1>..<H4>
  • <BLOCKQUOTE>

Automatic pagination will not occur inside a <td> tag. A <table> nested inside a <td> will be cut off at the bottom if it spreads across multiple pages.

Does the PDF Library support "Web Ready" or "Linearised" PDFs?

Linearisation is Adobes method of constructing a PDF so that it appears to load faster in a Web browser. This is achieved by showing the first few pages of the document whilst the rest of the document is loaded in the background. The PDF Library and Report Generator can both read and write Linearized PDF's.

Why do I receive a "(0xd): Skipping unknown character" warning message?

The message you receive is telling you that the Unicode character 0x0D (hex) cannot be found in the font you have selected to draw it with. In this case it is due to 0x0D being an invisible control character.

Why is the document a diffent size when I print it?

The PDF Library and Report Generator will create pages and draw elements to the size you specify, but we can't control how the reader will print them. When printing a PDF, to ensure they are printed at the correct size check that the "Shrink oversized pages to paper size" or "Stretch undersized pages to paper size" checkboxes are unchecked in Acrobats Print dialogue.

Can we extract text or images from an existing PDF?

Yes, since release 2.6.2 of the PDF library this is possible. See the ExtractText.java example in the PDF library package to see how.

Can we create PDF files with Chinese, Japanese and Korean characters?

Yes, the PDF Library and Report Generator support Chinese, Japanese and Korean characters. The best place to start is in the examples that come with the download, e.g. "example/samples/HelloWorld-chinese.xml", and in the userguide where there is a section devoted to Internationalization.

For more details on CJK font support please refer to the StandardCJKFont class in the API docs. When using these standard CJK fonts the required fonts need to be installed on the client machine - the Asian font packs for Acrobat can be found here.

Is the PDF Library "Thread Safe"?

The PDF Library is thread safe in as much as you can have two separate threads manipulating two separate documents and they will not interfere with each other. If you have two separate threads manipulating the same document you are likely to come into problems.

Why doesn't my <jsp:include> work ?

There may be a couple of reasons for this. If you are trying to include a .PDF file, this will not work because a PDF is binary content, but we expect a JSP included in this way to produce text content (i.e. XML that the Report Generator can understand). You can include another PDF file in your document, but this is done in the Report Generator by using the "background-pdf" attribute in the body tag.

If you are using the PDFFilter method, the content type of an HTTP response must be text/xml for it to be parsed. We have noted in some servers (e.g. early versions of Tomcat 4.x) that an inner JSP may erroneously override the Content-type set in the outer JSP - causing it to be skipped by the PDFFilter. If in doubt, ensure both the outer and the included JSP set the content type to text/xml.

Will the Report Generator work with the JSPs that use custom tags or tag-libs?

Yes, as long as those tags are transformed into tags the Report Generator can parse. In the case of tag libraries, these tags will be resolved in the JSP compilation stage, well before the Report Generator takes over.

Will the Report Generator work with the Jakarta Struts framework?

Yes - we have had no reports to the contrary. See the question above.

Why when I use &, <, > in my XML does the Report Generator throw an error?

These characters are used to mark elements in XML documents, and cannot be used unquoted as they are in HTML. If you need to use these characters either wrap the text in a CDATA block, or use the entities &amp; for "&", &lt; for "<" and &gt; for ">".

How do I set headers and footers for specific pages ?

The Report Generator userguide has a section devoted to headers and footers. In version 1.1.x of the Report Generator the options for setting headers are:

  • Setting them for specific pages using the page selector in the style sheet.
  • Setting them in the <body> tag and <pbr> tags, in which case the header and footer will be changed from that point forward.
  • Setting them for odd and even pages using the CSS2 right and left selectors for the body tag in the style sheet (see the "examples/dynamic/bigtable.jsp" example in the download package).

At the moment we do not have a facility for explicitly assigning a footer or header to the last page in a document when the number of pages is unknown.

How do I force a page to begin on an odd or even page?

You can set this is in the <pbr> tag - eg. <pbr page-break-before="odd"> which means only do a page break if the next page is odd. You can also do <pbr page-break-before="even">, which means only do a page break if the next page is even.

I am having problems with ligatures in words like "Office", characters are being drawn on top of each other?

The font you are using may have a zero width for the ligature 'fi'. You have a few options:

  1. See if you can download a later version of the font you are using.
  2. Insert a zero width non-joiner, U+200C. between the f and the i in any occurence of "fi"
  3. Use the "suppress-ligatures" in the <p> tag (see the Report Generator tag guide).

Is it possible to put a line-break in the labels of a graph?

In the Report Generator using the &#10; will work for carriage returns in labels.

For the Graph Library you can put a '\n' in the label text if you're using JSP code to create the label dynamically:

<bfg:label distance="40"><%= value1 + "\n" + value2 %></bfg:label>

or just hit Enter and put the line break into the XML explicitly:

<bfg:label distance="20">On Multiple Lines</bfg:label>

Does it work over HTTPS?

Yes, PDFs and graphs can be requested and returned via HTTPS. When using the Report Generator to create PDFs, you may find a CertificateException is thrown. This occurrs whenever a Java process tries to open an SSL connection as a client, and means that the certificate presented by the webserver isn't trusted by the client (your Java process). This even happens when the client and server processes are the same Application Server. Although it's not specific to the Report Generator, you'll often see it when you reference images or stylesheets in your XML with relative URLs. There are three ways around it, in decreasing order of preference.

  • Change the base URL of the Report Generator using the META tag in the document head. Using this you can set the base URL of the report, eg. <meta name="base" value="file:C:/webapp/report.jsp"/>. This has the effect of making a relative URL like <img src="images/myimage.jpg">resolve to the URL file:C:/webapp/images/myimage.jpg - ie. the image is loaded from the filesystem, not through the webserver, and no certificate exception occurs. It's a bit faster too, but obviously will only work for resources that are files.
  • Add the certificate of your webserver to the "cacerts" file supplied with your installation of Java - this is the list of certificates that are considered trusted by Java. This is done using the keytool application.
  • Turn off certificate validation altogether - we're assuming that as you're already using SSL you know the risks this entails. Calling the init method of this class will do the trick (note this code is completely unsupported - use at your own risk)
    import java.security.cert.*;
    import javax.net.ssl.*;
    
    public class EasyConnect implements X509TrustManager, HostnameVerifier {
        public final static void init() throws Exception {
            EasyConnect e = new EasyConnect();
            SSLContext sc = SSLContext.getInstance("SSL");
            sc.init(null, new TrustManager[] { e }, null);
            HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory());
            HttpsURLConnection.setDefaultHostnameVerifier(e);
        }
        public void checkClientTrusted(X509Certificate[] chain, String auth) { }
        public void checkServerTrusted(X509Certificate[] chain, String auth) { }
        public X509Certificate[] getAcceptedIssuers() { return null; }
        public boolean verify(String urlHostname, SSLSession session) { return true; }
    }

How can I optimize the Report Generator to make response time as fast as possible?

We've optimised the Report Generator as best we can and we are always exploring ideas to speed things up even more - so the best place to start is to make sure you have the latest and greatest version. Having said that, here are some ideas.

XML Tag density - More tags means more to parse and hence more time. If you can reduce the number of tags in your document with some concise and clever markup this will always help. Use a good style sheet rather than attributes or inline styles. It is best to avoid relying on nested tables and spacer GIFs for controlling layout, instead using "colspan", "rowspan", "margin" and "padding" attributes to achieve the same effect. Using smaller tables with "table-layout=fixed" will save the Report Genrator from reading the whole table to determine the width of the column.

External Resources - A document that uses external resources such as fonts, images or other PDFs will always take longer to create than one without. External resources are cached within the document, but are not cached across multiple documents or requests.

Application design - Another good place to look is to ask some questions about your application use cases. Do all documents need to be created from scratch? Are 30 users viewing more or less the same document with slight changes? Could I use a cached template? Could I write the document once, store it on disk or database and serve it up?

Does the PDF Library support a news-paper or column type layout?

Not directly, but it can be done. Using the LayoutBox class you could first create a LayoutBox the width of one column. Fill this with your text, pictures etc., and then call the LayoutBox.split() method to split the LayoutBox into column-length chunks. Then just draw each chunk side by side on the page. A simple example illustrating this is included with the download package as "ColumnLayout.java".

When I copy or add pages from an existing document to a new document, form fields are not copied.

In version 2.0, if you want to copy a form field annotation from one document to another you need to move the FormElement associated with it separately. Have a look at section 2 in the PDF userguide, which has all the information you need.

"WARNING (PG1): Annotation 1/152 on page 1 is part of another PDF's form - removing" - what does this mean?

See Appendix A in the PDF userguide for a list of warning messages. Also refer to the question above.

After setting the origin of a page, why when I add a form field does it not appear where I expect?

Annotations (such as Form Fields) use absolute co-ordinates for positioning on the page, starting at (0,0) in the bottom left corner. They are not affected by calls to the setUnits method

We want to create really big PDF documents - will it cope?

The biggest document we know of created with the library was 7000 pages or so, although we haven't reproduced this ourselves as we don't have enough memory here!

The PDF specification allows for documents of up to 10Gb, but as the design of the PDF Library (and consequently the Report Generator) is to hold most of the document in RAM, you're not going to get anywhere close to this. Using a Cache will help, particularly if your document consists of large streams (usually bitmap images), but If you're trying to save memory, use low-res images and the built-in fonts rather than embedding, and, with the Report Generator, try and limit the number of tags by making use of the padding/margin attributes instead of nesting tables. Don't forget you can increase the heap size if necessary by passing arguments in to the "java" command.

Also be sure to keep up to date with revisions - we release often, and making our products both smaller and faster is high on our priority list.

How can I prevent a PDF from being saved to disk?

A PDF is simply a file, so there's really no way to do this. Even if there were some option that could be set in the document to signal Acrobat (and there isn't), the user could simply right click on the link and select "Save As", or extract the document from the cache. You can make it a little harder for them by having it returned from an HTML POST perhaps, but there is no 100% effective way to do this (and this applies to any type of file returned from a webserver, not just PDFs).

I get the message "mmiVerifyTpAndGetWorkSize: stack_height=2 should be zero" in my logs.

This message is printed by some implementations of IBMs JIT compiler, such as that supplied with WebSphere 5. We have no idea what it means, but it doesn't seem to make any difference - the program still runs correctly, so as best as we can tell you can safely ignore it. Update - May 2004: IBM have listed this as a known bug on their website, and although at the time of writing the information is pretty sparse, it appears it will be fixed in an upcoming release of their JVM.

How can I create a PDF from a JSP using the PDF library (not the Report Generator)?

It is not recommended to create PDF's from JSPs. JSP's are intended to return text only, not binary content like PDF's or images (for which you should use a Servlet). In more detail, a JSP page only has access to the PrintWriter, not the ServletOutputStream, so your response will be dependent on the encoding of the page. In addition, any newlines or spaces in your JSP will be inserted into your PDF, which as it's a binary file is not a good idea. We're not saying it can't be done if you know what you're doing and you're careful, but whether it will work will depend on the browser, your application server and the environment it's running in - so we don't support it.

I've got an OutOfMemoryError when using the PDF library or Report Generator

This is one of the most common questions we get. First, this does not mean there is a memory leak - it simply means that there is not enough memory available. By default Java only has 64Mb of heap regardless of the amount of physical memory in the machine, which is not enough when you're manipulating large documents (how big "large" is depends on what you're doing with it and the composition of the document, but chances are it's smaller than you think). You have some simple options to fix this:

  1. Increase the heap size by using the "java -Xmx" option - eg java -Xmx256M YourApp
  2. Consider using a Cache to store part of the document on Disk.
  3. If reading a PDF using a PDFReader, use a File rather than a FileInputStream in the constructor.
  4. If you open an InputStream, you need to close it. The API doesn't close any streams it didn't open itself.
  5. Try upgrading your version of Java - like any program the Java interpreter has bugs, and we've had a couple of reports that this makes a difference

If this still doesn't work, you need to look at your architecture to find ways to save memory. There are a number of tricks which we recommend.

  1. Are you using form fields? Form fields are very expensive, and many customers load a template, complete the fields and then flatten the document, thereby losing any benefit of having a field in the first place. For simple text fields, consider using a LayoutBox to draw the text directly onto the page in the same location as the field. When helping a customer struggling with a large number of fields, using this approach in our tests gave smaller documents, reduced memory requirements by 40% and a sped things up by a factor of 15! This is our #1 tip for improving performance.
  2. If you're loading a PDF repeatedly and completing it, consider loading it once at the start and cloning it using the PDF(PDF) constructor. Reading them in is the complicated bit, so you'll certainly see speed improvements.
  3. Try to manage your object creation carefully - every time you call "new", think about whether you could use a pre-existing object (styles, fonts, images etc.). Reading from disk is a slow operation, cloning and re-using much quicker.

My XML fails when parsing non-ASCII characters

This is an encoding issue. By default XML is encoded in UTF-8, so if you're creating your XML with a text editor, be sure to save the XML using the UTF-8 encoding. If you're returning XML from a JSP you need to set the encoding explicitly, as JSP's encode their content as ISO-8859-1 by default. See the internationalization section of the userguide for more info on this, but if you want a quick fix, ensure the first two lines of your JSP look something like this:

<?xml version="1.0"?>
<%@ page language="java" contentType="text/xml; charset=UTF-8"%>

How can I stop the letters in my table from being stretched out?

By default the text in tables is justified. In order to prevent this you need to set align="left". Remember that each <td> element has a <p> implicitly placed around the data, so the best way to achieve this is to use a style sheet and add:

td p { align:left }

which will cause all the table data elements to align to the left.

How do I use the keycode I've been issued?

To upgrade from the demo version, you need to have purchased one of the products and been issued with a licence key. Once you've been sent the keycode:

  • For the Graph Library:
    • If you're running it as an application, add the following line of code to your program BEFORE the first graph is created, ideally within an initialization routine
      Graph.setLicenseKey("...");
    • If you're running the Tag Library to create graphs from XML or JSP files, you need to add the license key as a "context parameter" to the "web.xml" file of your web application. It should be placed as the first block after your opening <web-app> tag. The web server will need to be restarted after this change.
      <context-param>
         <param-name>org.faceless.graph2.License</param-name>
         <param-value>...</param-value>
      </context-param>
  • For the PDF Library add the following line of code to your program BEFORE the first PDF is created, ideally within an initialization routine
    PDF.setLicenseKey("...");
  • For the Report Generator:
    • If you're running it as an application, add the following code to your program BEFORE the first report is run, ideally within an initialization routine
      ReportParser.setLicenseKey("...");
    • If you're running the PDFFilter or Sample Servlet, you need to add the license key as an "initialization parameter" to the filter or servlet. This is done with an <init-param> block in the "web.xml" file of your web application. The web server will need to be restarted after this change.

      If you're running the PDFFilter, your web.xml should look something like this:
      <filter>
         <filter-name>bforeport</filter-name>
         <filter-class>org.faceless.report.PDFFilter</filter-class>
         <init-param>
             <param-name>license</param-name>
             <param-value>...</param-value>
         </init-param>
      </filter>
      If you're running the SampleServlet, it should look something like this:
      <servlet>
         <servlet-name>ReportServlet</servlet-name>
         <servlet-class>SampleServlet</servlet-class>
         <init-param>
             <param-name>license</param-name>
             <param-value>...</param-value>
         </init-param>
      </servlet>

My license key has suddenly stopped working and I'm seeing "DEMO" again.

The only way this could happen is if you're accidentally resetting it somewhere else in your code, or if you're still using a temporary key by accident.

Search all of your code for any calls to setLicenseKey - ideally you want just one, set in the manner described above. Also make sure that you're not calling PDF.setLicenseKey or Graph.setLicenseKey after a call to ReportParser.setLicenseKey, and that if you're running the Report Generator your call is actually to ReportParser.setLicenseKey rather than PDF or Graph (this applies even if you're using the Graph Library component of the Report Generator as a standalone component).

You may also want to check your logs. This happens often enough that if a previously valid license key is overridden with an invalid one, recent versions of our products will write a warning to stderr, along with a stack trace of the second call to setLicenseKey to help you find and remove it.

Can your product work with Adobe Reader Extensions

No. Adobe have created the Reader Extensions so that they can only be used with their Document Server product, presumably in order to avoid losing sales of Acrobat to a combination of Acrobat Reader and third party products like our own. Only Adobe products (and perhaps some of their licensees) can work with PDFs contains Reader Extensions.

How do I find out the location of an element created in the Report Generator

This is a common question by those wanting to add something to a PDF created by the Report Generator that can't be defined in the XML, such as a custom annotation or maybe a type of page numbering that can't be supported directly by the XML syntax. The ideas is that after the XML is converted to a PDF but before it's written to the OutputStream, the PDF can be altered using the PDF library API. The trick is finding out the location of the element you're using as a marker in the document.

To do this you can add an annotation to the tag and then search for it once the PDF is generated. First, add an href to the element you want to find:

<h1 href="pdf:dummy">Heading</h1>

Then edit the code that converts the XML to the PDF so that after the PDF is created by before it's written out, find that annotation (and delete it when you're done). The new code to insert is indicated below.

PDF pdf = parser.parse(inputsource);
// BEGIN INSERTED CODE
List pages = pdf.getPages();
 for (int i=0;i<pages.size();i++) {
 PDFPage page = (PDFPage)pages.get(i);
 List annots = page.getAnnotations();
 for (int j=0;j<annots.size();j++) {
     PDFAnnotation annot = (PDFAnnotation)annots.get(j);
     if (annot.getAction().getType().equals("Named:dummy")) {
         annots.remove(j);
         float[] rect = annot.getRectangle();
         // Now you have "rect" and "page" set to the location of the tag
         // in your document - do whatever you need to with them.
         break;
     }
 }
}
// END INSERTED CODE
pdf.render(outputstream);

This technique of modifying the PDF after it's created but before it's written can be used to do other things too - append other documents, reorder pages and so on. For those using the PDFFilter, the source code is supplied in the docs directory. Remember to put the modified version a different package.

Extracting text from a PDF gives incorrect results

Extracting text from a PDF can fail for a number of reasons, mostly due to the way they're constructed internally. A PDF has no concept of a sentence or a word, only letters. Typically these are grouped together internally to form words which we can search for, but this isn't always the case - we've seen documents where all the capital letters in a line were printed in one go followed by the lower case letters, or documents where letters were arranged from right-to-left, the cursor moving backwards between each one. Some older documents use images to make up the letters, or fonts with no useful encoding so it's impossible to know which letter is which. The one thing all these variations have in common is it's almost impossible to extract useful text from them (and this will apply to any PDF tool, including Acrobat and our library).

99% of the time this won't be an issue, but it's for this reason when we're asked if it's possible to extract text from a PDF we say usually, rather than an outright yes.

(Also be aware that our trial version replaces all lowercase e's with lowercase a's. This is deliberate, not a fault, and isn't the case when you have a valid license key).

What sort of compression is used when converting PDF to TIFF

This depends on the colormodel used. If you're using a 1-bit colormodel like PDFParser.BLACKANDWHIE then CCITT Group 4 compression is used, otherwise LZW compression is used. These are the two best options available in baseline TIFF - although Flate compression is is defined in an extension to the TIFF specification, it's support is limited and LZW is, despite it's patent problems in the past, still a very effective compression algorithm.

The raster format also depends on the ColorModel used - if it's an IndexColorModel then the image will be stored as an indexed image with 8pp. Most color images are not indexed however, and will be stored at 24bpp for RGB or 32bpp for CMYK images.

When embedding a v2 Graph into a PDF, the text is "fuzzy" when printed

Fuzzy text is a result of embedding a low resolution image into a PDF - it looks fine on screen (which is typically 96dpi) but when printed on a 600dpi printer it looks blocky and blurred. If using the tag library, the solution is to either set the dpi parameter on the axesgraph or piegraph element, or (if you have the extended edition of the Report Generator) set the format parameter to "rg1pdf".

When verifying a digital signature I get "java.security.cert.CertificateException: toDerInputStream rejects tag type -96"

This is a problem with the Sun crypto package, which incorrectly fails on some X.509 certificates. The solution is to use another JCE provider - we recommend the Bouncy Castle Crypto API from http://www.bouncycastle.org. Download the appropriate JAR and install it, either by following the instructions supplied with their package or (for a quick fix) by adding the line

Security.insertProviderAt(new org.bouncycastle.jce.provider.BouncyCastleProvider(), 1);

to your code during an initialization routine, before the signature is verified. Alternatively, try upgrading to Java 1.6.