We publish our work on The Future of Text in PDF because PDF is a frozen and non-interactive medium–with a few exceptions–and this is for the long haul, an archival journey. The web is great but links break over time. We want the book to be robustly available for the long haul and we believe that PDF documents will be kept readable for a very long time since it’s such a pervasive format.
We have develop an open metadata standard called Visual-Meta which works with PDF to allow the documents to be richly interactive. You can read more at https://visual-meta.info if you like. We have also developed the macOS Author word processor and the Reader PDF viewer. If you use Reader to read The Future of Text, your reading experience will be augmented in how you can navigate and interact. Reader is free:
Other formats of the books will also be available from the download pages.
11 thoughts on “Why PDF?”
I am a graphic designer by training. As a graphic designer, I dislike Visual-Meta because it is ugly. It’s ugly! I don’t want that ugly stuff in my beautiful publications, and I can’t imagine that other book designers would want it in their books either.
The small text size and the structure of Visual-Meta show that it is primarily intended to be read by a computer, and if it is primarily intended to be read by a computer, then we should put it in a PDF’s XML metadata (or similar format that is not part of the rendered page). If we do that, we keep the ugly stuff off our beautifully rendered pages, but it can still be read by software applications.
Accessibility is no good reason to put Visual-Meta on the rendered pages, since a PDF’s XML metadata is just as accessible to a computer (and even to a human, who can just open the file in a plain-text editor if the metadata is left uncompressed and unencrypted, which is possible). Tagged PDFs are another example of this principle.
Much publishing today, perhaps even most publishing, happens in a variety of formats. Publishers have toolchains that automatically transform the same information into multiple output formats. Academic publishers make their works available as Web pages, EPUB files, PDF files, and other formats. Open standards for metadata should be as inter-translatable between these formats as possible.
HTML is not just a Web technology, not just for connections between Web sites; it is also used in non-Web publishing toolchains, including for producing PDFs (Prince XML, WeasyPrint, etc.). We can make HTML documents without external links and embed semantic data in HTML with RDFa and other formats. HTML is used as the basis for file formats that do not require a Web browser (for example, EPUB: open it in your ebook readers, or unzip it and open it in your Web browsers). I expect standalone HTML file formats such as EPUB to continue to evolve in exciting ways while adhering to open standards. For all these reasons, it is not true that PDF is somehow more suited to long-term storage than HTML and XML formats.
I am betting against Visual-Meta (which I have never seen used except in your books) and would urge you to find a better solution. My advice is to get rid of the visible ugliness of Visual-Meta and pay more attention to the embedded semantic metadata formats that are already widely used in HTML/XML and that can be used in all formats in an easily inter-translatable way.
I could say more, but I hope this is helpful even if you disagree.
You say it’s ugly but it is at the back and you don’t need to go there. In a traditional book you have the copyright page, which is similar in a way.
As for HTML, yes, we are working on this additionally, but Visual-Meta will remain the long term archival format for us, since all the metadata is displayed, nothing is hidden.
The copyright page in a traditional book is not similar; it gives the copyright information and publisher contact information and other information that cannot be found elsewhere in the book. Visual-Meta is entirely redundant; it is information that is already rendered beautifully elsewhere in the book and in the standard table of contents metadata. It should not be in the visually rendered pages.
Not really. Many academic documents at least contain no citation information, especially when downloaded as a article from a Journal. If you don’t want to see Visual-Meta, don’t. Do you object to books having an index and appendices in general?
All professionally published academic documents contain human-readable citation information about the document itself, and the problem of how to extract this information with a computer has already been solved with scraping algorithms. Academic citation information is also embedded in standard metadata in HTML formats, including the parts of edited collections, and something similar could be devised for PDFs.
There is no way not to see Visual-Meta when it is visually rendered as part of the page except to close one’s eyes, which is rather ridiculous to ask of a reader.
And again, the implied analogy from Visual-Meta to indices and appendices is false: the latter are nonredundant information designed for humans, beautifully typeset and highly readable based on well-established user interface design principles even before a book was called a user interface. Visual-Meta violates such good design principles. HTML-based formats don’t show readers the HTML code (although readers can get to the code if they want it), and likewise well-designed PDFs shouldn’t show readers the raw Visual-Meta code. Visual-Meta tries to solve a real problem, but solves it badly. There is a better way.
OK Nathan Artist, that’s enough I think, we don’t agree and that’s ok.
PDFs from academic download sites for example, do contain plenty metadata on the download page but often nothing in the document. That’s not good enough.
You think it’s awful to have an appendix and that is your opinion. Personally I don’t go to the end of all the books or academic documents I read and go through all the appendices and the index and so on, but since you feel you must do this, that is really up to you. From my experience readers are capable of stopping.
As I mentioned a few times we are also working on HTML and JOSN, those formats provide interaction but not long life archive, in my opinion.
If you think it could be done better, please do so, I think competition for richly embedded metadata that’s robust would be wonderful. If you want to ‘compete’ with this concept I request you also test it by printing it out and scanning it in again, something an archival format should be able to do, and see if you have all the metadata intact.
why not then go “full-on permanent archive” and build on blockchain?
We are indeed looking at blockchain but will it be guaranteed readable in 500 years? I think a PDF or a printed PDF will have better chances in the very long run.
ps — pdfs can be modified and spread around varying from the master source
Yes absolutely, which is very useful for reading, but I don’t think anyone would want to to writing straight onto a PDF, even in Acrobat, do you? PDF is publishing format, not a manuscript, and therefore has different affordances.