Skip to content

Tokenization

Source Encoding

Ceramic source files are ASCII text. Non-ASCII bytes in string and character literals are passed through as opaque bytes.

Whitespace

ASCII space, tab, carriage return, newline, and form feed are whitespace. Whitespace separates tokens and is otherwise ignored.

Comments

Ceramic has two comment styles: block comments (/* … */) and line comments (// …). Both are treated as a single whitespace character by the lexer.

/* This is a block comment.
   It can span multiple lines. */

// This is a line comment.
Block comments are not nestable.

Identifiers

Identifiers start with a letter, underscore (_), or question mark (?), followed by zero or more letters, digits, underscores, or question marks.

a    a1    a_1    abc123    a?    ?a    ?
The following are reserved keywords and may not be used as identifiers:

__ARG__ __COLUMN__ __FILE__ __LINE__ __llvm__ alias and as break case catch continue define else enum eval external false finally for forward goto if import in inline instance not onerror or overload private public record ref return rvalue static switch throw true try var variant while

Integer Literals

Integer literals can be decimal or hexadecimal (prefixed with 0x). Underscores may appear after any digit for readability and have no effect on the value.

0    1    23    0x45abc    1_000_000    0xFFFF_FFFF

Floating-Point Literals

A decimal float literal is distinguished from an integer by including a . or an exponent (e/E). Hexadecimal float literals require a binary exponent (p/P). Underscores are allowed after any digit.

// Decimal
1.    1.0    1e0    1e-2    0.000_001

// Hexadecimal
0x1p0    0x1.0p0    0x1.0000_0000_0000_1p1_023

Character Literals

A character literal represents a single ASCII character, written between single quotes.

'x'    ' '    '\n'    '\''    '\x7F'
Supported escape codes:

Escape Meaning
\0 Null
\t Tab
\n Newline
\f Form feed
\r Carriage return
\" Double quote
\' Single quote
\\ Backslash
\xNN Arbitrary byte (two hex digits)

String Literals

String literals hold a sequence of ASCII text. Ceramic has two forms:

  • "-delimited: " and \ must be escaped.
  • """-delimited: only \ must be escaped. Useful for strings that contain quotes.
"hello world"
"\"hello world\""
"""the string "hello world""""

"""
"But not with you, Derek, this star nonsense."
"Yes, yes."
"""