Zugferd PDF Validation Failure: Font Embedding Problem
Introduction
In this article, we'll dive into a common issue encountered when working with Zugferd PDFs: validation failures after merging a PDF with an XML file, specifically related to font embedding. This problem often arises when the resulting PDF doesn't fully comply with the ISO 19005-3:2012 standard, which governs PDF/A-3 conformance for long-term archiving, as used by Zugferd. We will explore the error report, analyze the cause, and discuss potential solutions to ensure your generated Zugferd PDFs pass validation.
The Problem: Font Embedding Issues
When generating Zugferd invoices, which combine a PDF document with an embedded XML file containing the invoice data, it's crucial that the PDF adheres to the PDF/A standard. A common pitfall is font embedding, where the fonts used in the PDF are not fully included within the file itself. This can lead to validation errors, as highlighted in the provided error report.
The error report indicates two primary issues:
- Missing Font Programs: The font programs for all fonts used for rendering within a conforming file shall be embedded within that file, as defined in ISO 32000-1:2008, 9.9. The error specifically points out that the font programs for "Helvetica" and "Helvetica-Bold" are not embedded.
- CIDSet Stream and CID Fonts: If the FontDescriptor dictionary of an embedded CID font contains a CIDSet stream, then it shall identify all CIDs which are present in the font program, regardless of whether a CID in the font is referenced or used by the PDF or not.
These errors essentially mean that the PDF is not self-contained in terms of font information, which violates the PDF/A standard's requirement for long-term accessibility and consistent rendering.
Analyzing the Error Report
Let's break down the error report to understand the specific issues:
Rule Status
Specification: ISO 19005-3:2012, Clause: 6.2.11.4.1, Test number: 1
The font programs for all fonts used for rendering within a conforming file shall be embedded within that file, as defined in ISO 32000-1:2008, 9.9 Failed
2 occurrences [Hide](file:///private/var/folders/v4/2pygnkzs3fg6wj12mgymbggh0000gn/T/veraPDF-tempHTMLReport8258164961283391258.html#)
PDFont
Subtype == "Type3" || Subtype == "Type0" || renderingMode == 3 || containsFontFile == true
root/document[0]/pages[2](7 0 obj PDPage)/contentStream[0](8 0 obj PDContentStream)/operators[9]/xObject[0]/contentStream[0](87 0 obj PDContentStream)/operators[5]/font[0](Helvetica)
The font program is not embedded
root/document[0]/pages[2](7 0 obj PDPage)/contentStream[0](8 0 obj PDContentStream)/operators[9]/xObject[0]/contentStream[0](87 0 obj PDContentStream)/operators[10]/font[0](Helvetica-Bold)
The font program is not embedded
Specification: ISO 19005-3:2012, Clause: 6.2.11.4.2, Test number: 2
If the FontDescriptor dictionary of an embedded CID font contains a CIDSet stream, then it shall identify all CIDs which are present in the font program, regardless of whether a CID in the font is referenced or used by the PDF or not Failed
2 occurrences Show
The first error clearly states that the font programs for "Helvetica" and "Helvetica-Bold" are not embedded. This means that the PDF relies on these fonts being available on the system where it's being viewed, which violates the PDF/A principle of self-containment. The second error pertains to CID fonts, which are often used for complex character sets. If a CID font is used and its FontDescriptor contains a CIDSet stream, the stream must accurately reflect all CIDs present in the font program.
Code Snippet and Context
The provided code snippet reveals the use of a ZugferdDocumentPdfMerger class to merge a PDF with an XML file:
$pdfMerger = new ZugferdDocumentPdfMerger($xmlFilename, $pdfFilename);
echo "ZugFerd PDF merger initialized successfully.\n";
$pdfMerger->generateDocument();
$pdfMerger->saveDocument('npdf.pdf');
This code suggests that the issue likely lies within the ZugferdDocumentPdfMerger class or the underlying PDF library it uses. The class is responsible for generating the PDF/A-compliant document, and it seems that the font embedding process is not being handled correctly.
The additional information provided, including the OS (Mac Tahoe) and PHP version (7), helps to narrow down potential environment-specific issues, but the core problem appears to be related to the PDF generation library or the merging process itself.
Potential Solutions
To address the font embedding issue and ensure PDF/A compliance, consider the following solutions:
- Investigate the PDF Library: If you're using a third-party PDF library (e.g., TCPDF, mPDF, Dompdf), consult its documentation regarding PDF/A support and font embedding. Many libraries have specific settings or methods to ensure fonts are embedded correctly. Ensure that these settings are enabled and configured properly.
- Font Subsetting: Some PDF libraries offer font subsetting, which means only the characters actually used in the document are embedded, rather than the entire font. This can reduce file size while still complying with PDF/A. However, ensure the subsetting process is accurate and includes all necessary characters.
- Check Font Licenses: Verify that you have the necessary licenses to embed the fonts you're using. Some font licenses restrict embedding, which can cause issues when generating PDF/A documents.
- Use PDF/A-aware Libraries: Consider using PDF libraries specifically designed for PDF/A generation. These libraries often have built-in mechanisms to handle font embedding and other PDF/A requirements automatically.
- Review the Merging Process: If the issue occurs during the merging process, examine how the
ZugferdDocumentPdfMergerclass handles fonts. Ensure that it's not stripping out font information or causing conflicts during the merge. - veraPDF Validation: Utilize the veraPDF validator (https://verapdf.org/) to thoroughly test your generated PDFs. veraPDF provides detailed reports on PDF/A compliance, helping you identify and fix issues.
Debugging Steps
To further diagnose the problem, try the following debugging steps:
- Simplify the PDF: Generate a minimal PDF with only a few lines of text using the problematic fonts. This helps isolate whether the issue is related to the merging process or the basic PDF generation.
- Inspect the PDF Structure: Use a PDF inspection tool (like PDFBox or iText RUPS) to examine the PDF's internal structure, particularly the font dictionaries. This can reveal whether the fonts are embedded at all and if their descriptors are correct.
- Test with Different Fonts: Try using different fonts to see if the issue is specific to "Helvetica" and "Helvetica-Bold" or if it affects all fonts.
- Update Libraries: Ensure you're using the latest versions of your PDF library and any related dependencies. Bug fixes and improvements in newer versions might address the font embedding issue.
Conclusion
Font embedding is a critical aspect of PDF/A compliance, and neglecting it can lead to validation failures and prevent your Zugferd PDFs from being accepted. By understanding the error messages, analyzing the code, and implementing the suggested solutions, you can overcome these challenges and generate valid Zugferd invoices. Remember to thoroughly test your PDFs using veraPDF or similar tools to ensure they meet the PDF/A standard. Debugging and resolving font embedding issues will improve the reliability and interoperability of your generated PDF documents, ensuring compliance with Zugferd and PDF/A standards.
For more information on PDF/A standards and font embedding best practices, refer to the official ISO 19005 specifications and resources from organizations like the PDF Association. You can also find helpful information on the veraPDF website, which offers a wealth of documentation and tools for validating PDF/A compliance. Check out more about PDF/A standards on https://www.pdfa.org/.