Flavored Markdown A repository of variant Markdown documentation

CommonMark

(The official specification for CommonMark is at http://jgm.github.io/stmd/spec.html - as it is written in HTML that cannot be pasted into a Markdown-compatible Jekyll page, the content in this page instead describes it in terms of the vfmd specification.)

CommonMark syntax vs. vfmd syntax

This page attempts to compare the syntaxes of CommonMark and vfmd from the perspective of a Markdown document writer. The background for this comparison is this HackerNews conversation between John MacFarlane, the primary author of the CommonMark spec, and Roopesh Chander, the author of the vfmd spec.

Here are links to the syntaxes being compared: CommonMark and vfmd.

Philosophy

While both CommonMark and vfmd attempt to provide an unambiguous specification for Markdown, their goals and priorities appear to be somewhat different:

  1. Readability vs. ease-of-parsing

    One of CommonMark’s stated goals is to make the syntax “easier to parse”.

    For vfmd, ease-of-parsing is not a goal at all and that does not come in the way of designing the syntax. Readability, and behaving intuitively and consistently are prioritized over ease-of-parsing.

  2. Extras

    As of writing this, there doesn’t seem to be a documented policy on what syntax constructs CommonMark wants to eventually cover in their spec, what it wants to support as extensions, and how the extensions should be integrated.

    CommonMark already includes fenced code blocks, and might possibly include other constructs in the future. There’s also some discussion on CommonMark extensions.

    vfmd specifies only the syntax constructs that were defined in John Gruber’s original Markdown, and includes information on how additional syntax constructs can be added, while remaining in control of how the additional constructs impact handling of the the core constructs.

    vfmd aims to provide a core-syntax-only-spec that enable different Markdown flavours to behave consistently for the core syntax constructs, while allowing them to diverge in their own specialized syntax constructs.

Similarities

There are some aspects in which both CommonMark and vfmd have picked similar ways to diverge from the original Markdown syntax:

  1. Lists:
    • Both CommonMark and vfmd use a vertical-alignment-based list content indentation instead of the 4-space rule
    • In ordered lists, the first number is used as the starting number of the list
    • In unordered lists, changing the list bullet character starts a new list
    • Two blank lines can be used to end a list (and all sublists)
  2. HTML blocks:
    • In both CommonMark and vfmd, a HTML div (or similar) block ends at the next blank line, rather than at the next closing div tag

Major syntax differences

These differences are “major” in the sense that these can potentially require significant changes in one spec to make it behave like the other spec.

Block elements

  1. “Loose” and “tight” in lists:

    In CommonMark, “loose” and “tight” apply to the whole list, while in vfmd, they apply to individual list items.

    For example, consider:

    * One
    * Two
    
    * Three
    
    * Four
    
    * Five
    * Six
    

    CommonMark considers the above list to be “loose”, and for HTML output, wraps all list items in <p> tags.

    vfmd considers only the list items “Three” and “Four” to be “loose”, and wraps only those list items in <p> tags.

  2. Start of HTML blocks:

    CommonMark requires that HTML-block-starting tags (like div tags) be placed at the start of a line; vfmd handles HTML block tags anywhere in a line, and includes any previous text as part of the block.

    For example, consider:

    Some text and
    then suddenly, a <div> *starts*
    
    Inside the `div`, but separated by blank lines
    
    </div>
    

    CommonMark treats the first chunk as a normal paragraph, and the HTML output from CommonMark is not valid HTML:

    <p>Some text and
    then suddenly, a <div> <em>starts</em></p>
    
    <p>Inside the <code>div</code>, but separated by blank lines</p>
    
    </div>
    

    vfmd recognises the <div> even if it’s not at the start of the line, and the HTML output from vfmd looks like:

    Some text and
    then suddenly, a <div> *starts*
    
    <p>Inside the <code>div</code>, but separated by blank lines</p>
    
    </div>
    
  3. HTML blocks that can have blank lines:

    vfmd allows pre / script / style elements to contain blank lines, but only if well-formed (i.e. iff matching opening-closing tags are found). CommonMark does not treat these elements specially, and recommends using &#10; in place of a blank line within pre elements.

    For example, for the input:

    <pre>
    'Twas brillig, and the slithy toves
        Did gyre and gimble in the wabe;
    
    All mimsy were the borogoves,
        And the mome raths outgrabe.
    </pre>
    

    CommonMark outputs the HTML:

    <pre>
    'Twas brillig, and the slithy toves
        Did gyre and gimble in the wabe;
    <p>All mimsy were the borogoves,
    And the mome raths outgrabe.</p>
    </pre>
    

    while vfmd outputs:

    <pre>
    'Twas brillig, and the slithy toves
        Did gyre and gimble in the wabe;
    
    All mimsy were the borogoves,
        And the mome raths outgrabe.
    </pre>
    

    See also: http://talk.commonmark.org/t/139/.

Span elements

  1. Emphasis:

    • Intra-word emphasis:

      CommonMark allows using ‘*’ for intra-word emphasis, but disallows using ‘_’, while vfmd disallows any intra-word emphasis.

    • Nested emphasis of the same kind:

      CommonMark is somewhat inconsistent in the treatment of em-within-em and strong-within-strong emphasis:

      • *foo *bar** is is em-within-em
      • **foo* bar* is normal text without any emphasis

      vfmd treats both the above as em-within-em.

      The same difference can be seen for the strong-within-strong scenario since CommonMark treats any sequence of four or more *s or _s as normal text. vfmd treats * and _ sequences as potential emphasis indicators irrespective of how long they are.

  2. Span-level HTML:

    When handling span-level HTML, CommonMark recognizes HTML tags but not HTML elements, resulting in some invalid HTML being output.

    For example:

    • *foo <u>bar* baz</u>* outputs <em>foo <u>bar</em> baz</u>*
    • *<p>foo</p>* outputs <em><p>foo</p></em>

    vfmd recognizes span-level HTML elements and is aware of HTML nesting rules. In the above cases, vfmd chooses to ignore some Markdown styling over outputting invalid HTML.

    • *foo <u>bar* baz</u>* outputs <em>foo <u>bar* baz</u></em>
    • *<p>foo</p>* outputs *<p>foo</p>* itself

Minor syntax differences

These differences are “minor” in the sense that, in my assessment, each of these by itself doesn’t require a significant change in one spec to make it behave like the other spec.

Please note that this is not an exhaustive list.

Block elements

Span elements

Encoding

vfmd enforces UTF-8 encoding while CommonMark leaves it unspecified.

However, certain aspects of the CommonMark spec (like link reference lookups) seem to require a Unicode-based encoding and knowing what the encoding is. So in practice, CommonMark implementations might have to enforce a Unicode-based encoding, or auto-detect the encoding, or obtain the encoding as an input.