CMFC
tirimid
2024-10-12 (rev. 2025-04-23)

What is CMFC?

CMFC is a compiler for a markup language I designed, CMF. This page will serve as both documentation and a tutorial surrounding both of them, should anybody want to use them. CMFC was created specifically for my personal website (the one you are on right now) and does not have any particularly advanced features. Still, I think it is good enough for simple articles and sharing my basic thoughts on the internet.

Of course, this leads to the question of "why not just use Markdown and Jekyll build with GitHub pages?". The answer to this question is twofold - first of all, it's cool to make things yourself. Second of all, it irritates me how, in Markdown, paragraphs always have to begin on a double-newline separation. I would much rather write my plaintext document files in a style more similar to academic books, where the preferred style is generally no linebreaks but use of indentation to mark paragraph starts. Similarly, I'd prefer for blockquotes to be indented sections rather than just double-newline separated blocks with a > in front of them. Those are the main reasons I created CMF. And CMFC was created so that I could write documents in CMF and have them manifest as HTML files on my website.

CMF stands for Custom Markup Format, and CMFC stands for Custom Markup Format Compiler. I suggest using the .cmf file extension for CMF files.

Installation

CMFC is free software, licensed under the permissive MIT license. The source code for it is hosted on GitHub, and it is designed to work on Linux machines. Windows machines would probably work, but I make no guarantees. To prepare CMFC, clone the repository, Make it, then install it to your system as follows:

$ git clone https://github.com/tirimid/cmfc $ cd cmfc $ make # make install

The cmfc binary will be installed to /usr/bin, and can be uninstalled at any moment by going back to the cloned repository's directory and running this command:

# make uninstall

The basic functioning of CMFC

From an end-user's perspective, you only need to know the following sequence of events, executed by the cmfc binary when invoked:

  1. Read command line arguments
  2. Read in the contents of the input file and (if provided) style and docdata files
  3. Parse the docdata file (if provided), ignoring everything but DOC directives
  4. Parse the input file and construct a document AST
  5. Write out the HTML generated from the AST into the output file (or if it isn't specified, standard output), along with the contents of the style file as the document's <style>

Based on this simple process, we can understand the meaning of the command line flags which the program will accept:

Apart from the command line options, CMFC will also take one mandatory argument, the input file from which to read the CMF.

Example usage of CMFC

Say you have an input file, input.cmf, and you want an output file, output.html, with a stylesheet, style.css, applied to it. In order to create it, you would invoke the following command:

$ cmfc -o output.html -s style.css input.cmf

Pretty simple.

CMF tutorial

Paragraphs

Paragraphs can either be initiated through a double-linebreak or by placing four consecutive space characters at the beginning of a line. I prefer to use the four consecutive spaces, since that's pretty much the reason CMF/CMFC were created to begin with.

Example paragraph:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Titles

Titles of various sizes can be created by using = immediately after a double-linebreak. Similar to Markdown, the number of =s corresponds to the HTML header size. That is, more =s means a smaller header (unless the style file specifies otherwise).

Example titles:

=Size 1 title ==Size 2 title ===Size 3 title

Ordered and unordered lists

Ordered and unordered lists can be created by using a # or a * immediately after a double-linebreak, respectively. This was inspired by Wiki syntax, which will often do something similar. The level of nesting is specified by the number of consecutive #s or *s. Each following element in the list is specified by placing a #/* at the beginning of the line.

Example ordered list:

#Ordered list item 1 #Ordered list item 2 ##Nested ordered list item 1 ##Nested ordered list item 2 ###Double-nested ordered list item 1 ###Double-nested ordered list item 2 #Ordered list item 3

Example unordered list:

*Ordered list item 1 *Ordered list item 2 **Nested ordered list item 1 **Nested ordered list item 2 ***Double-nested ordered list item 1 ***Double-nested ordered list item 2 *Ordered list item 3

Images

Images are a standalone thing in CMF, unlike in Markdown, where they are merely another text element like a link. As such, they must be declared separate from other text. To add an image to the document, you must add a !() immediately after a double-linebreak. After the !(), add the link to the image you want to show up in the document.

Example image:

!()../res/gamingfumo.png

Blockquotes

Blockquotes are meant to look how they do in academic books. To acheive such an effect, every line of a blockquote is indented by six spaces. I also suggest placing linebreaks between paragraphs and blockquote text. To initiate a blockquote, simply start a line with six spaces. Consecutive blockquote lines do not need to be similarly indented, but you should still do so for readability and visual clarity.

Example blockquote:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Long code

Long code snippets, like in Markdown, are enclosed in triple backticks. Unlike Markdown, however, you cannot use more than three backticks in a row to require more than three consecutive backticks in the long code snippet termination line. The long code snippet should be placed immediately after a double-linebreak.

Example long code snippet:

``` this is a long code snippet ```

Tables

Tables are composed in plaintext form of cells, horizontally separated using dashes (-), and vertically separated using pipes (|). The horizontal separators are drawn overtop the vertical separators. To begin a table, place three consecutive -s immediately after a double-linebreak. Note that a single cell can take up more than one line, unlike in Markdown (another problem I have with it).

Example table:

--------------------------------------------------- |Lorem ipsum dolor sit|consectetur adipiscing elit| |amet | | --------------------------------------------------- |sed do eiusmod tempor|Ut enim ad minim veniam | |incididunt ut labore | | |et dolore magna | | |aliqua | | --------------------------------------------------- |quis nostrud |Duis aute irure dolor in | |exercitation ullamco |reprehenderit in voluptate | |laboris nisi ut |velit esse cillum dolore eu| |aliquip ex ea commodo|fugiat nulla pariatur | |consequat | | ---------------------------------------------------

Text formatting

Text formatting in CMF is done by embedding format modifiers within formattable text (as opposed to raw text, where formatting cannot be done). The supported format modifiers are as follows:

Modifier Usage Remarks
Format escape \C Interpret C (text / raw) as a character and not a modifier or something else
External link @[LINK|TEXT] TEXT (text) will become a hyperlink pointing to LINK (raw)
Footnote link [^LINK|TEXT] TEXT (text) will become a footnote with a reference pointing to the anchor specified by link (raw)
Footnote anchor [^LINK] Create a page anchor called LINK (raw) which can be accessed by clicking on a corresponding footnote link
Inline code `TEXT` Interpret TEXT (text) as an inline code snippet stylistically
Bold **TEXT** Interpret TEXT (text) as stylistically bold
Italic *TEXT* Interpret TEXT (text) as stylistically italicized

Example formatting:

\* @[https://gentoo.org|Gentoo Linux] Lorem[^anchor|Lorem ipsum dolor sit amet] [^anchor]anchor: Lorem ipsum dolor sit amet Lorem `ipsum` **dolor** *sit* amet

Collected CMF example

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Size 1 title

Size 2 title

Size 3 title

  1. Ordered list item 1
  2. Ordered list item 2
    1. Nested ordered list item 1
    2. Nested ordered list item 2
      1. Double-nested ordered list item 1
      2. Double-nested ordered list item 2
  3. Ordered list item 3
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
this is a long code snippet
Lorem ipsum dolor sit amet consectetur adipiscing elit
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim veniam
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur

*

Gentoo Linux

Lorem[Lorem ipsum dolor sit amet]

anchor: Lorem ipsum dolor sit amet

Lorem ipsum dolor sit amet

DOC directives and docdata

In publishing articles on this website, I have found myself needing more than just a nice way to lay out my writings. It would also be preferable if I could issue commands that specify meta-information about an article, such as what the title or author should be generated as. For this reason, CMF also supports such a system, here called DOC directives.

These DOC directives are called that because all of them begin with DOC, which stands for document. A DOC directive must be placed at the start of a line and be provided with the necessary argument in order to work. If an unknown DOC directive is encountered, CMFC will complain about that fact and refuse to build your article.

Here are all the supported directives:

Directive Usage Remarks
Title DOC-TITLE TEXT Specify document title as TEXT (text)
Author DOC-AUTHOR TEXT Specify document author as TEXT (text)
Created DOC-CREATED TEXT Specify document creation time as TEXT (text)
Revised DOC-REVISED TEXT Specify document revision time as TEXT (text)
License DOC-LICENSE TEXT Specify document license information as TEXT (text)
Raw DOC-RAW-TEXT N Make CMFC parse formatted text as raw text based on if N is 0 or 1 (0 = format, 1 = raw)

Since sometimes you will want to modify the publishing information for a bunch of related articles all at the same time, CMF has a mechanism called the docdata file to facilitate this. Instead of adding all of the above information into every CMF file, create a docdata file with all the common DOC directives. When compiling a CMF file using CMFC with the -d FILE option, FILE will be read and all of its DOC directives will be processed prior to compiling the input file. This mechanism makes it easy to batch-update things like license or authorial information.

This work by tirimid is licensed under CC BY-SA 4.0