Dan AllenNov 20, 2017

A Standard Project Structure for Documentation

In a previous article, we established that content is sovereign, which means the site generator needs to gather content from various repositories and branches. In order for the generator to identify, aggregate, and catalog content partitioned in this way, the repositories must present a consistent structure. Otherwise, the generator isn’t going to know which files represent pages or which pages should be grouped together.

Writers also need consistency so they don’t have to be constantly switching between disparate systems of organization.

Let’s explore the benefits a standard structure would bring to documentation projects, both for writers and tools. We’ll also share a structure we’ve found works well.

The benefits of a standard structure

A standard project structure (or project layout) is a practice commonly used in software development. In Java projects, Java code is organized under src/main/java and src/test/java. Android projects build on this structure by assuming the manifest file is located in src/main and the resource files are located under src/main/res. In Ruby projects, Ruby code resides in lib and test. And the list goes on.

So why is a standard project beneficial? When a software project uses a standard structure, a developer can come into the project and know right away where to find the source code, how to build it, and how to run it. The same goes for the tools the developer uses.

Wouldn’t that be a great practice to apply to a documentation project?

Writers are looking for a writing system that does not involve a steep learning curve. After all, their job is to write, not be detectives. Right now, the lack of a standard structure means writers must dedicate a lot of time learning the ropes of each documentation project they join.

If a documentation project had a standard structure, writers would know where to find files, from documents to supporting assets. Like their developer counterparts, they could immediately be productive upon onboarding. The structure would encourage more contributions since this instant productivity would extend to community members as well. And tools could be developed to work with this structure, further improving writers’ productivity and helping them to keep the content organized.

In the words of Michael Moore: that’s a good idea, so we should steal it.

A proposed standard

Here’s a documentation project structure we’ve put together based on our experience working with a myriad of documentation projects. It includes several sample files thrown in for context.

antora.yml
modules/
  ROOT/
    assets/
      attachments/
        sample-dataset.csv
      images/
        supporting-diagram.svg
    pages/
      account-setup.adoc
      index.adoc
      install-on-fedora.adoc
      _partials/
        tool-definition.adoc
    examples/
      install-commands.sh
  admin/
    assets/
    pages/
    examples/
  api/
    assets/
    pages/
    examples/

The structure in this listing represents a documentation component. You can think of a documentation component as a documentation project.

We know the structure in the example above represents a documentation component because of the presence of an antora.yml file.

antora.yml
<documentation component here>

When we find an antora.yml file in a repository, we expect to find the standard structure of a documentation component below it. Thus, the documentation can live anywhere in the repository. It also means it can share the same repository as the software it documents. This project structure is then repeated in each branch of each repository that hosts a documentation component.

The antora.yml file contains information about the component, such as its name, title, version, and navigation data.

name: component-a
title: Component A
version: '1.0'
nav:
- modules/admin/nav.adoc

All the other files in the component reside in the modules folder. Let’s open that folder and have a look inside.

Modules

A documentation component contains one or more modules. A module is a discrete bundle of content within a component.

admin/
  assets/
    attachments/
      signing-key.gpg
    images/
      web-console-dashboard.png
  pages/
    index.adoc
    dashboard-tour.adoc
    backup/
      index.adoc
      scheduling.adoc
    security/
      index.adoc
    _partials/
      prerequisite-checklist.adoc
  examples/
    ldap.conf

Since documentation contains more than just text, the module itself is composed of a hierarchy of folders. These folders are used to organize files, first by content type, then by topic (or perhaps tag or category).

Text documents that represent pages go in the pages folder. Images go in the assets/images folder. Examples (often code snippets) go in the examples folder. Other content types are organized in a similar fashion. Each one of these folders can have an arbitrary depth of topic folders that are used to group files to make them easier to manage and navigate.

When a writer is working on the content, the module becomes the writer’s primary workspace. The writer doesn’t have to go looking elsewhere to find files that belong together. This arrangement mirrors how software developers work on source code.

Why modules?

You might be contemplating one or more of the following questions:

Do we need all this structure?
Why have modules?
Isn’t a component enough?

One thing we know for sure about content is that it multiplies, often fast and unpredictably. As more content is created, you’ll need this extra layer of organization to keep disparate files from ending up on top of one another.

You could argue certain components don’t need this much structure. In the case of a component that only has single module, we could abbreviate the structure by folding the contents of the module folder directly into the component folder (i.e., the top-level folder). But what happens when the project expands in complexity, and you find yourself needing to add another module? You now have to go back and change the structure of the project in order to allocate space for it. That’s where the ROOT module comes in.

antora.yml
modules/
  ROOT/
    assets/
    pages/
    examples/
  ...

The first module in the structure shown in the listing above is named ROOT. Notice it’s in all uppercase. That’s because it’s special. The ROOT module contains all the content that’s directly associated with the component itself. When the content in the ROOT module is published, it gets promoted a level above the other modules, at the component root, hence the name. In contrast, the content of other modules reside in subfolders.

https://docs.project-name.com/
  component-a/
    index.html
    admin/
      index.html
    api/
      index.html

If you start out with a ROOT module, even for a simple component, you can easily add more modules later and gradually redistribute the content without having to restructure the project. So while the extra structure seems like overkill now, in the long run, you’ll be glad you gave your content the space to grow.

Strong tooling

A compelling benefit of a standard structure is interoperability with tools. Software platforms which have established a standard project structure enjoy strong tooling. This is no accident.

Here are some examples of tools that could be created if we had a standard documentation structure:

wizards: Tools could provide wizards that set up a standard project structure based on a handful of user-defined values. Those wizards could also add, remove, or reorganize files, and know precisely where those files belong and how to handle references to them.
validation: Validation tools could be created to enforce this structure and warn you when you put a file in the wrong place.
migration: Migration tools could be made to pass content between components or modules, or move a module from one component to another. The migration tool could even handle moving content into this standard structure.
text editor / IDE: A text editor or IDE could recognize this structure and inject additional context into the AsciiDoc documents (such as defining document attributes) so the documents can be previewed with full fidelity inside the editor, despite being viewed outside of the context of the site generator.

The more that tools can assume about how the content is organized, the more those tools can help you. If a standard project structure for documentation is adopted, tooling efforts around documentation would surely emerge.

Standard means referenceable

Another benefit of a standard structure hailing from the software world is that it provides a system for making references.

Several site generators support sections, otherwise known as collections, for partitioning the content. Although these mechanisms fill a role similar to modules, they have the drawback of cutting off references between pages. What we need is a contract with the site generator to track a reference to another source file all the way through to publishing. Wherever that other file ends up, you want the site generator to construct the correct link to it. And it should be possible to make a reference not only to a file in another module, but one in another version or even another component. The foundation of such a reference system is a standard, addressable structure.

By using a standard structure that consists of components and modules, files are in a reliable and, more important, referenceable location. That means we can use that structure with AsciiDoc and create input source to input source references that span modules or components. We’ll talk in depth about expanding AsciiDoc’s xref capabilities in a later article in this series.

Tying up loose content

A standard project structure would bring many of the same benefits to documentation as it has to software.

No longer would there be confusion about where to find or append documentation files. Instead, writers would recognize the familiar structure and immediately be able to get down to work.

By offering a logical place for different types of files, the self-describing structure would help writers navigate the hierarchy and keep files organized. Tools can then be developed to work with this structure to streamline daily tasks and automated processes. Among these tools is the multi-repository, CD-friendly site generator we’re introducing in this series, which would be able to discover documentation components and know how to classify the files contained within them.

Whether we’re talking about the writers or the tools, all parties benefit from being able to rely on a standard project structure. Now that’s an idea worth stealing.

Special thanks to Sarah White and Lisa Ruff for their substantial revisions to this article. Additional thanks to Sarah White for creating the diagram featured in this article.