Sunday, August 14, 2016

Hyperlinks with PDFBox-Layout

One thing that made HTML so successful is the hyperlink. Keywords are marked up, and just by clicking on them, you are redirected to the refrerenced position in the document, or even to some totally different document. So it makes perfect sense to use hyperlinks in PDF documents also. Whether you are linking the entries of a TOC to the corresponding chapter, or a piece of information to a corresponding URL in wikipedia. Linking text enriches the content, and makes it more usable.

Luckily PDF (and PDFBox) supports hyperlinks, so why not use it? Because it's a pain. The PDF standard has no notion of marked up text, but the more general and abstract idea of annotated areas: You can describe some area in the document by coordinates, and add some metadata telling the PDF reader what to do with that area. That's quite powerful. You can do highlighting and all kinds of actions with that, and totally independant of the content, just by describing an area. And that's also the catch: it is totally independent of the content. If you are used to the notion of marked up text, this feels a bit unhandy. Let me give you an example how to do a link:

PDDocument document = new PDDocument();

PDPage page = new PDPage();
float upperRightX = page.getMediaBox().getUpperRightX();
float upperRightY = page.getMediaBox().getUpperRightY();

PDFont font = PDType1Font.HELVETICA;
PDPageContentStream contentStream = new PDPageContentStream(document, page);
contentStream.setFont(font, 18);
contentStream.moveTextPositionByAmount( 0, upperRightY-20);
contentStream.drawString("This is a link to PDFBox");

// create a link annotation
PDAnnotationLink txtLink = new PDAnnotationLink();

// add an underline
PDBorderStyleDictionary underline = new PDBorderStyleDictionary();

// set up the markup area
float offset = (font.getStringWidth("This is a link to ") / 1000) * 18;
float textWidth = (font.getStringWidth("PDFBox") / 1000) * 18;
PDRectangle position = new PDRectangle();
position.setLowerLeftY(upperRightY - 24f);
position.setUpperRightX(offset + textWidth);
position.setUpperRightY(upperRightY -4);

// add an action
PDActionURI action = new PDActionURI();

// and that's all ;-)

Ouch, this ain't no fun. Ok, I see you have all freedom to markup whatever you want. And you can do real fancy stuff with that, like highlight things, adding tooltips and a lot more. But if you just wanna add a hyperlink... hmmm. Let's do that again with PDFBox-Layout:

Document document = new Document();

Paragraph paragraph = new Paragraph();
paragraph.addText("This is a link to ", 18f,
// create a hyperlink annotation
HyperlinkAnnotation hyperlink = 
   new HyperlinkAnnotation("", LinkStyle.ul);

// create styled text annotated with the hyperlink
AnnotatedStyledText styledText = 
   new AnnotatedStyledText("PDFBox", 18f,

final OutputStream outputStream = new FileOutputStream("link.pdf");;

This performs exactly the same things. But you just say what you want: add a hyperlink to the given text. And all this odd area marking boilerplate code is handled by PDFBox-Layout. And we can do even better using markup:

Document document = new Document();

Paragraph paragraph = new Paragraph();
   "This is a link to {link[]}PDFBox{link}", 
   18f, BaseFont.Helvetica);

final OutputStream outputStream = new FileOutputStream("link.pdf");;

We just markup the text with the hyperlink URL and that's it :-) So now we can do external URLs, what about links into the document itself? Let's take the example In order to link to some point in the document we have to add an anchor to this position. After that we can link to that anchor using the anchors name:

   "And here comes a link to an internal anchor name {link[#hello]}hello{link}.\n\n", 
   11, BaseFont.Times);

   "\n\n{anchor:hello}Here{anchor} comes the internal anchor named *hello*\n\n", 
   15, BaseFont.Courier);

So we define an anchor "{anchor:hello}Here{anchor}" somewhere in the document with the logical name `hello`. This anchor name is used in the link prefixed with a hash to indicate an internal link "{link[#hello]}hello{link}". See the example PDF links.pdf for the results. And that's all to say about links. Hopefully easy to use, and the dirty work is done behind the scenes by PDFBox-Layout.

A chain is no stronger than its weakest link,
and life is after all a chain.
William James