Plain Text with AsciidocFX

The editor AsciidocFX turns the plain text format Asciidoc into an insightful drafting tool, even if Asciidoc falls short as a production writing tool.

AsciidocFX customizes the well-known Atom editor to handle AsciiDoc, a plain text format that is much better than Markdown but still weaker than ReStructured Text. AsciiDoc is readily extensible, and seamlessly incorporates PlantUML drawings, for example.

AsciidocFX deploys everything needed to convert the text and diagrams into PDF, HTML, and ePub books. The typography is good enough to use without special configuration. Writers can quickly draft documents combining both text and technical diagrams.

However, both the AsciidocFX editor and the Asciidoctor parser have their own flaws. These flaws prevent their adoption as a backbone for serious technical writing. Joaquim is moving the Farfetch infrastructure to DITA, but AsciidocFX earned its place as a technical scratchpad.

This presentation was delivered at TWL.


Desktop publishing is not good enough

Employers always ask for Word, and cannot see the need for anything else.

Word is good for 10-page reports written once by a single person. It really struggles with long documents that change over a long period. It's really hard for several people to work on the same long document. Word can easily be incompatible with itself.

Also, no single tool is perfect for every job. Any large documentation set that evolves over time will eventually need some change that just happens to be difficult in one tool, and would be easier to do with another tool, often a custom script. Therefore, it is paramount that writers can manipulate the documentation with multiple tools, and the best way is to have open formats that can easily be understood by very different tools.


Semantic markup

Semantic markup attempts to capture the intention behind the documentation, ideally in a way that is not tied to a specific output format.


History of semantic markup

IBM was one of the first companies to create large quantities of documentation for the mainframes. IBM felt the pain associated with early typesetting systems, which used codes specific to each printer. A book encoded for one printer model had to be reencoded to be printed in another printer model.

In 1969, Charles Goldfarb developed Generalized Markup Language (GML), which used declarative and printer-independent codes, and enabled the IBM BookMaster product, which allowed the large-scale writing and production of books.

The 1970s saw the development of two systems that influenced every system afterwards: SGML and TeX.

GML evolved into SGML, an international standard and was adopted in specific industries where translation costs or regulation demands justified the investment in custom systems, such as aviation and pharmaceutical companies. For example, Caterpillar sells many variations of large machines in different markets. SGML greatly reduced the costs of Caterpillar to produce localized maintenance guides for each machine variation.

Donald Knuth developed TeX, which included sophisticated page layout algorithms and an unsurpassed ability to typeset mathematics. Unfortunately, TeX is crippled by an antiquated macro language.

The 1980s saw the refinement and dissemination of the ideas in systems.

  • Leslie Lamport created LaTeX, a set of macros that made TeX popular in universities.
  • Tim Berners-Lee used HTML, an application of SGML, for the WWW, thus exposing SGML.
  • GUIs and Desktop Publishing software reused the ideas pioneered by TeX. But Microsoft faked it: for example, type kerning in Windows remains notoriously bad.

In the early 1990s, IBM recreated BookMaster on top of SGML as IBM ID Doc, and similar ideas originated DocBook, an international standard supported by comercial and open source software to create and publish books. O'Reilly uses DocBook. LinuxDoc was another SGML application aiming for simplicity.

In 1998, James Clark, the developer of the major SGML parser, proposed XML to make parsing easier, leading to the adoption of XML in other domains besides text processing.

IBM felt the need to reuse parts of documents as HTML pages, and reconceptualized documentation as a set of topics instead of a set of books. The internal DITA system was open-sourced and became an international standard in 2004.


Linuxdoc in 1997

SGML has a number of features to simplify the work of writers.

This slide shows the beginning of a progress report in Portuguese that I wrote in July-1997, using LinuxDoc.

If you know XML, you may suspect has fewer tags (such as <title>) than expected.

First, the notation <tt>news/ is a shorthand for inline tags, equivalent to <tt>news</tt>.

Then, the tags <title>, <subtitle>, <subtitle>, and <date> are apparently not closed. What happens is that the SGML parser uses the grammar of a LinuxDoc article to deduce the relative structure of the tags. The <subtitle> is nested inside the <title>, but <author> and <date> are not:

<title>1º Relatório de progresso
  <subtitle>Intervenções no Serviço de Informática,
            quinzena de 9 a 20 de Julho de 1997</subtitle></title>
<author>Joaquim Baptista,
        <tt><htmlurl url="mailto:px@acm.org" name="px@acm.org"></tt></author>
<date>21 de Julho de 1997</date>

What you will probably not suspect is that many opening tags are also implicit. In fact, the SGML parser infers the structural <linuxdoc>and <titlepag> tags around the <article> tag.

<!doctype linuxdoc system>
<linuxdoc>
  <article opts="sgmlhack">
    <titlepag>
      <title>1º Relatório de progresso
        <subtitle>Intervenções no Serviço de Informática,
                  quinzena de 9 a 20 de Julho de 1997</subtitle></title>
      <author>Joaquim Baptista,
              <tt><htmlurl url="mailto:px@acm.org" name="px@acm.org"></tt></author>
      <date>21 de Julho de 1997</date>
      <abstract>
      Actualizei o sistema operativo da <tt>news</tt> (Solaris 2.5.1) com
      39 patches, mas ainda faltam 3, anunciados mas não disponíveis.

      Instalei o <tt>perl5</tt>, o <tt>gzip</tt> e o <tt>lynx</tt> na new, e
      recompilei o INN. Para completar a intervenção, faltou instalar
      e testar o INN.


Plain text formats

Plain text formats are less ambitious than semantic markup.

They aim to add just a little emphasis and structure to plain text. Sometimes, that little structure is enough.


History of plain text formats

Plain text formats draw on the informal conventions used in text-only email.

In 1992 TitBITS released issue #100 as text with a program to browse it. InfoMac archives and mailing list. Ian Feldman.

Then, the first wiki adopted similar conventions to represent text. More powerful wikis such as TWiki continued the trend.

In 2002, the trend exploded in a number of interesting alternatives. MediaWiki became the base of Wikipedia and introduced the concept to the masses. Docutils with ReStructured Text pursued formal semantics. Textile approached the HTML structure. AsciiDoc approached the structure of DocBook.

The Emacs tribe caught-up with the trend with Org-mode.

Markdown tried to do "just enough" and, by virtue of its apparent simplicity, gained wide support in tools.

In 2009, PlantUML used text conventions to represent Universal Modeling Language (UML) diagrams instead of text.


Markdown

For example, here is Byword editing a bit of Markdown. Note how it emphasizes the markup (title, list, and link).

Byword is a product of Metaclassy, a very shy company from Coimbra.


AsciidocFX

So, let's turn to our major reason to be here today: AsciidocFX!

AsciidocFX is a customization of the open-source Atom editor AsciidocFX customizes the open-source Atom editor to support:

  • Asciidoctor, a refined reimplementation of Asciidoc.
  • Math, in a LaTeX way.
  • PlantUML, integrating UML diagrams seamlessly in documents.
  • Charts of various kinds.
  • ... and much more.

The AsciidocFX window is divided in three panes:

  • The center pane is where you write.
  • The left pane navigates the filesystem, the current document, or recent documents.
  • The right pane previews the current document.


AsciiDoc advantages

Aiming to be a front-end for DocBook, AsciiDoc retains some sophisticated features:

  • Admonitions such as Warnings and Cautions, required for some technical writing.
  • Nested lists, including description lists.
  • Tables with a readable syntax, including paragraphs and lists in table cells.
  • Anchors and cross-references within the document.
  • Display blocks with titles (for instance, for sidebars, figures, and examples).
  • Macros (like cpp) to avoid repetition.
  • Comments within the document (but comments sometimes influence the structure of the document).
  • Blocks of code with callouts, a way to explain bits of the code.
  • Include other files with AsciiDoc documents or code samples.
  • Variables.
  • Conditional text with cpp-like if-then-else macros.
  • Blocks with types, which serve as an extension point. For example, for PlantUML diagrams.


Demo

AsciidocFX if a Java application supported on Mac, Windows, and Linux.

I use the PlantUML extension for quick diagrams, and the extension requires Graphviz, which you must install separately.


Technical Writing bag of tricks

I used some AsciidocFX features to govern and streamline my technical writing work.

  • Comments with notes to myself, to recall useful stuff when I get back to some part of the document.
  • Admonitions with tasks TODO. In late stages, print the document and tasks TODO stand out.
  • Back-of-the-book indexes with TODOs, including people to talk to. When talking to someone, quickly find the issues that involve them.
  • Conditional text for your TODO notes or auxiliary information structures. For example, in a multi-file AsciiDoc document, the name of the current file!
  • Embedded PlantUML to quickly draft possible drawings.
  • Includes combined with ifdef to publish multiple outputs from the same sources.
  • Generate AsciiDoc from API references, for publishing and to combine with other docs.


AsciidocFX homeland

I find AsciidocFX a flexible technical notepad that feels especially at home in several domains:

  • More complex ReadMe files in GitHub.
  • O'Reilly book authoring. Authors can write without learning DocBook XML and buying a proprietary editor.
  • An alternative to LaTeX. Use the same math, but use the saner AsciiDoc syntax for everything else.
  • Software specifications with UML.
  • Reports with embedded charts.
  • Quick slides using Reveal or Deck.
  • And AsciidocFX is portable (Windows, Mac, Linux), because of Java.


Crippling AsciidocFX issues

However, AsciiDoc has a number of crippling issues.

  • You run into situations where you cannot express what you want, especially if you try to emphasize code samples.
  • Comments change how the next line parses. For example, comments break a list into two separate lists.
  • When including AsciiDoc files, leave blank lines at the end, or be surprised. No local variables.
  • if-then-else can quickly get hairy when you need to combine multiple conditions.

Why the crippling issues?

Under the hood, AsciiDoc parses lines with regular expressions.

  • For hand-crafted text, you can usually (but not always) work around the annoying limitations.
  • For generated text, you easily run into issues that are hard to detect. There is no general way to just escape characters that may have special meaning.


AsciidocFX misbehaves

AsciidocFX adds its own issues:

  • With multiple documents, the AsciidocFX TOC can be wrong and useless.
  • AsciidocFX keeps a mysterious cache of generated images that confuses you when editing embedded PlantUML diagrams. A workaround if to keep renaming the mages.
  • After some time, builds start to just fail until you relaunch AsciidocFX.


Summary: mostly usable

In summary, AsciidocFX is helpful, but not an industrial-strength solution that you can rely on.

About AsciiDoc, the text format:

  • Fairly complete structures, suitable for technical writing.
  • Extensible, so it conveniently integrates other capabilities.
  • Fragile at corner cases, requiring attention to note issues, work-rounds, or rewriting things to avoid issues.

About AsciidocFX, the editor:

  • Open source, so cheap to adopt.
  • Sophisticated environment with a single install, convenient writing environment, powerful integration of tools.
  • Fragile on multi-file documents, on long writing sessions, on frequent tweaks to PlantUML diagrams.


Thanks!

For the record, Farfetch moved on to adopt DITA as the back-end format for API documentation.


Attachments and links

Slides
17 slides used in the presentation, as PDF.
asciidocfx.com
AsciidocFX home page with instructions for download and use.

blogroll

social