Dan AllenNov 16, 2017

Content is Sovereign

Static site generators are popular tools for producing documentation sites. They consume content written in a lightweight markup language such as AsciiDoc, convert the content to HTML pages, and publish the output to a web server. Simple enough.

But there’s a crucial step missing in this process. Where does the site generator get the content to convert? And does it respect the sovereignty of that content, or is it fostering a content monolith?

That all leads us to ask the following question…

Where does content live?

Most static site generators insist on centralizing the site’s source content. The generator reads local files that originate from the current branch of the site repository, then use that content to make a site.

That means everyone in an organization—writers, engineers, marketing, community contributors—must write into the same branch of the same repository. The result is lots of content kings but zero sovereignty. Who owns the content? Who resolves conflicts? Who decides when to release? What we’ve observed is that contributors just end up stepping on each other’s toes and everyone’s left frustrated.

One branch, one repository. Sometimes even a single folder. That’s the first major obstacle we encounter when using a typical static site generator.

One workaround is to make a copy of the disparate content and import it all into the sole branch of the site repository. But not only does that spawn a content monolith, it also makes the content divergent! And it grows hairier with every new product and product version that engineering releases.

If we weren’t hamstrung by a generator’s one branch, one repository rule, where would we store our documentation?

We could store it alongside the source code it describes. We could store the documentation for distinct software components in their own individual repositories. Perhaps we’d do both!

We don’t want to hesitate to store the documentation in as few or as many repositories as needed so the most applicable workflow, policies, file organization, versioning scheme, and team permissions can be applied to each unique documentation component. What we’re talking about here is modular documentation.

So, if our documentation components could be set free from a site generator’s one branch, one repository mandate, how would all of the documentation components get incorporated into a single, unified site?

What gathers the content?

We’ve rebelled against the one branch, one repository rule and restored the sovereignty of our documentation components. But we still need to create a documentation site from our sovereign components. Who, or what, is responsible for gathering the content from these components?

To adapt a typical static site generator to this paradigm, we might add a shell script that clones a bunch of repositories and dumps the content into a source directory to feed the generator. Or, perhaps, we could use an extension for the site generator that hooks into its lifecycle and pulls down content at build time. Maybe we jerry-rig a few scripts, a custom build, and a dash of manual copy-paste together to grab the right content, at the right time, in the right order, and convince the generator to incorporate that content into the site. Maybe.

Or, we could rethink this whole process.

Let’s head back to the whiteboard and conduct a thought experiment. What do we want a static site generator for documentation to do? What scenarios do we want it to handle?

The site generator doesn’t own any content.: The site repository doesn’t contain any documentation content files. In this new workflow, the generator accepts an inventory of names and addresses of documentation components.
The site generator knows how to fetch documentation components.: Using the inventory, the generator goes out, finds the documentation components, and fetches all of these bundles of content, which consist of text, images, samples, and other supporting materials.

The start of this process might look something like this:

This process could have potential. But before we go any further, let’s address another major obstacle: versions.

What about versions?

When we think about software, what immediately comes to mind?

Versions.

Since software is versioned, so must the content that documents it.

If we don’t align the content with the software version it’s documenting, clarity is lost and confusion fills the void. The user won’t know which version of the software the documentation they’re reading is for. The writer could inadvertently drop critical information about older software versions when updating the documentation for the latest release.

But versioning documentation raises more questions. Where should these versions live? How do we keep documentation for different software versions separate? Do we append a version number to the filename? Or do we put the files inside version folders?

Let’s think about how versions are handled in software. In software, we use branches and tags to specify and manage versions. Employing the practice of docs as code, we would then use branches to manage versions of the software documentation as well.

We could even store the documentation files in the code branches, thus truly embracing the docs as code philosophy.

Think about if we didn’t use branches to specify versions, but instead used folders labeled with version numbers in a single branch. We’d have to duplicate all of the documentation for a software component every time there was a new version. And we’d have no easy way to compare, manage, and merge different instances of a document. So branches are clearly the way to go.

Let’s continue the thought experiment and update our new site generator’s workflow now that we’ve considered documentation versions.

The site generator knows how to fetch documentation components and their branches.: Using the inventory, the generator goes out, finds the documentation components, and finds the branches of each documentation component. The generator then fetches all the materials from all the branches.

Fetching all the branches of a repository could be problematic. It’s doubtful we want the generator to use every branch, or to publish the content from every documentation branch to the production site. Some branches may even contain unedited drafts or content for unreleased software. On the other hand, writers and engineers working on a beta release probably want to see how their new documentation is looking.

In order to reconcile this conflict, and to avoid having to generate the whole documentation site just to preview one component, we want to be able to select different branches for different circumstances.

Identify matching documentation branches

But how does the site generator know what content to find and how much of that content to fetch?

How much should we take?

We don’t always want to generate everything, every time. If we’re able to gather content from sovereign documentation repositories, each with a set of version branches, we need to decide how much content we’re going to take.

At the most basic level, we need to tell the generator what documentation components we want, what branches we want, and where they’re located. We need a way to communicate instructions to the generator. And if we can communicate these instructions without having three to five years of scripting experience, that’s even better.

Taking another page from software development, this time the configuration as code practice, we’ll call this set of instructions a playbook.

Here’s a fragment of a playbook that defines the content sources for a documentation site:

content:
  sources:
  - url: https://github.com/example-org/solution-docs.git
  - url: /home/writer/projects/server-docs
    branches: v3.1, v3.0, v2.5
  - url: git@github.com:example-org/rest-client-docs.git
    branches: v2*

Now that we’ve figured out how the site generator knows which documentation components and branches to fetch, let’s update our thought experiment.

The site generator reads and follows the instructions in a playbook.: In this new workflow, the generator receives and reads a playbook. The playbook is a configuration file that contains an inventory of names, addresses, and branches of documentation components.

A site generator designed to take instructions this way could be very versatile. Depending on the documentation site we need to generate, we could select or exclude specific versions of a documentation component.

For example, we might want to include beta documentation in an early release site:

content:
  sources:
  - url: /home/writer/projects/server-docs
    branches: v4.0-beta, v3.1, v3.0, v2.5

And, as the example above shows, the generator could also incorporate local content we’re working on so we can generate a site locally and preview our work.

Or, we might want to take the latest version of one documentation component and make a microsite for one software product while still including that content in our main documentation site.

Having the ability to tell a site generator what content it should use gives developers and non-developers alike tremendous control over documentation sites. Now we can create a wide variety of sites from the same content. That’s content reuse at its finest. That’s also something that’s difficult (if not impossible) to do when all the content lives in the same branch of the same repository.

Take back your content!

Let’s review the new capabilities and process we’ve specced out for our improved documentation site generator so far.

The site generator doesn’t own any content.: The site repository doesn’t contain any content files. The content for the site is stored in one or as many sovereign documentation components as our organization needs.
The site generator knows how to read and follow the instructions in a playbook.: The generator receives and reads a playbook. The playbook is a configuration file that contains an inventory of names, addresses, and branches of documentation components.
The site generator knows how to fetch documentation components and their branches.: Following the playbook configuration, the generator goes out, finds the specified documentation components, and finds the specified branches of each documentation component. The generator fetches all of the content—text, images, samples, and other supporting materials—from each selected branch and puts this content into a catalog. Only then does it advance to the processing step.

Here’s the whole process starting with the playbook and ending with the content organized in a catalog:

We believe this process fits with the way documentation is managed in the real world.

Documentation is written and stored in different places, is organized in different ways, and is owned and managed by different teams. Documentation isn’t centralized, it’s distributed!

With a site generator that can find and fetch content, content can reside in multiple repositories and organizations, in different hosts, and have different permissions and visibility. Teams can manage their own content, in their own way, keeping it current and under version control. Content could even come from local directories not under version control, including the worktree of a cloned repository.

Next challenges

We’re not done yet. If we’re going to challenge and improve the typical documentation site generation capabilities and workflows, we’ve only scratched the surface.

Here’s a few of the other obstacles we’ll tackle in this series of articles:

Code repositories typically adhere to a standard structure in order to facilitate contributions and CI/CD. What would a content repository’s structure look like?
If our content lives in multiple repositories and multiple branches, how do we create page references between documentation components? And while we’re at it, let’s make sure that when we create a new version we don’t have to update any page references.
If we decouple the content from the site generator, why not the UI too?
With the ability to read a playbook and fetch files from repositories and filesystems, the site generator has expanded beyond the scope of a generator. Such a monolithic piece of software won’t be able to respond to the changing information architecture, web technology, and infrastructure integrations our users and organizations need. Don’t worry, in our quest to eliminate content monoliths, we’ll embrace modularity in the software as well.

Special thanks to Sarah White and Lisa Ruff for their substantial revisions to this article. Additional thanks to Sarah White for creating the spectacular diagrams featured in this article.