Skip to main content

Document

Description

Represents the root element of a PDF's logical structure tree. It serves as the container for all content elements, defining the primary organizational hierarchy of the document.

Namespace
1.7
2.0
Category
document
grouping

Attributes

Properties

ID
Properties

A unique identifier for this structure element, used for referencing it from other elements or external sources.

Ref
Properties

Defines or references the lower-level PDF objects (content items) that represent the actual text or graphic content of this element.

A
Properties

An attribute dictionary providing additional layout or semantic attributes for this element. May reference a class map.

C
Properties

An array of class names associated with this element, which can link to further style or attribute definitions in a class map.

T
Properties

An optional text label or title for this structure element—can serve as a short descriptor (e.g., for a heading).

Lang
Properties

Identifies the primary language for the text in this element (e.g., 'en-US'), aiding in proper text processing and accessibility.

Alt
Properties

Provides alternative text describing the element’s content, primarily for accessibility (screen readers).

E
Properties

Contains expansion text used to clarify abbreviations, acronyms, or symbolic content within the element.

ActualText
Properties

Gives an exact text equivalent for non-textual or symbolic content, allowing screen readers to read it as plain text.

AF
Properties

An array of associated files or file specifications that relate to this structure element (e.g., attachments).

NS
Properties

Specifies a namespace identifier for custom attributes, enabling extensibility without conflicting with standard keys.

PhoneticAlphabet
Properties

Indicates the phonetic alphabet used for any phonetic text within the element (e.g., IPA).

Phoneme
Properties

Contains the phonemic representation of text for pronunciation guidance or linguistic analysis.

Differences

Well tagged PDF:

Well-Tagged PDF provides detailed guidelines for document structure elements—such as Document, Part, Art, Sect, Div, and others—to ensure semantic clarity and reusability. It emphasizes clear nesting and explicit tagging of structural elements.

Requires a well-defined structure tree with proper role mapping, explicit tagging for headings and sections, and consistent application of element boundaries to support both reusability and accessibility.


PDFUA:

PDF/UA (PDF/UA-1 and PDF/UA-2) mandates a complete and accessible logical structure tree. It ensures that all document structure elements are tagged in a way that assistive technologies can interpret, enhancing the document’s accessibility.

Mandates inclusion of all essential structural elements with correct role mapping, proper tagging of headings, lists, tables, and alternative descriptions where needed, ensuring content is accessible to users with disabilities.

Use cases

Tag Relationships

Related Matterhorn Protocol checkpoints

Examples

  • A mail merge PDF typically contains a number of letters to different recipients. This implies that the PDF at the top level is one document, containing multiple documents at its child level where each such document is a letter to a recipient.
  • A mail merge PDF typically contains a number of letters to different recipients. This implies that the PDF at the top level is one document, containing multiple documents at its child level where each such document is a letter to a recipient.