[NAME] ALL.dao.grammar.lexical [TITLE] Lexical Structures [DESCRIPTION] The standard encoding for Dao program source code is UTF-8. But the system encoding of t he execution environment is also supported, and source code in such encoding is automatic ally converted to UTF-8 before parsing. 0.1 Comments Dao uses the number sign # to mark comments. * Single line comment: from # to the end of the line; * Multiple line comment: paired with #{ and #}; Multiple line comments may contain other #{ and #}, if they are properly paired, namely, they are allowed to be nested. 0.2 Quotation Marks Both the single quotation mark (0x27) and the double quotation mark (0x22) can be used to quote string literals. They must be used in pairs, and no escape is required for one kind of quotation mark inside strings quoted with the other kind quotation marks. 0.3 Keywords The following keywords are reserved for the language: * Types: 1 TypeKeyword ::= type | any | int | float | complex | string 2 | enum | array | list | map | tuple * Structures: 1 StructKeyword ::= interface | class | routine * Storage/scoping: 1 StorageKeyword ::= const | var | invar | static * Permisions: 1 PermKeyword ::= private | protected | public * Built-in constants/variables: 1 ConstVarKeyword ::= none | false | true | self * Control statements: 1 ControlKeyword ::= if | else | for | while | do 2 | switch | case | default | break | skip 3 | defer | return | yield * Other statements: 1 OtherStmtKeyword ::= load | import | as * Operators: 1 OperatorKeyword ::= and | or | not | in * Miscellaneous: 1 MiscKeyword ::= 1 Keyword ::= TypeKeyword | StructKeyword | StorageKeyword 2 | PermKeyword | ConstVarKeyword | ControlKeyword 3 | OtherStmtKeyword | OperatorKeyword | MiscKeyword 0.4 Basic Character Class Definitions Basic character classes: 1 DecDigit ::= '0' ... '9' 2 HexDigit ::= DecDigit | 'a' ... 'f' | 'A' ... 'F' 3 AsciiLetter ::= 'a' ... 'z' | 'A' ... 'Z' 4 5 WideChar ::= "UTF-8 encoded unit of one or more bytes" 6 WideAlpha ::= WideChar & iswalpha( WideChar ) != 0 7 WideAlnum ::= WideChar & iswalnum( WideChar ) != 0 Where iswalpha() and iswalnum() are the C99 functions that test if a wide character is be longing to certain class. Here WideChar can be more than one byte, in such case, these UT F-8 bytes are converted into Unicode before passing to the C99 test functions. 0.5 Identifiers 1 AsciiIdentifier ::= ( AsciiLetter | '_' ) ( AsciiLetter | DecDigit | '_' )* 2 WideIdentifier ::= ( WideAlpha | '_' ) ( WideAlnum | '_' )* 3 4 Identifier ::= AsciiIdentifier | WideIdentifier 0.6 Literals 0.7 Number Literals Integer literals: 1 DecInteger ::= DecDigit + 2 HexInteger ::= ( '0x' | '0X' ) HexDigit + 3 4 Integer ::= DecInteger | HexInteger Floating pointer number literals: 1 DotDec ::= DecDigit * '.' DecDigit + 2 DecDot ::= DecDigit + '.' DecDigit * 3 DecNumber ::= DotDec | DecDot 4 DecNumber ::= DecInteger | DecNumber 5 SciNumber ::= DecNumber ( 'e' | 'E' ) [ '+' | '-' ] DecInteger 6 7 Float ::= DecNumber | SciNumber Complex number, imaginary part literal: 1 ComplexImaginary ::= [ Float ] 'C' Symbol literal: 1 Symbol ::= '$' Identifier Type holder literal: 1 TypeHolder ::= '@' Identifier 0.8 String Literal Basic string literal: 1 SingleQuoteString ::= ' ' ' ValidCharSequence ' ' ' 2 DoubleQuoteString ::= ' " ' ValidCharSequence ' " ' Verbatim string literal: 1 VerbatimString ::= '@[' [Delimiter] ']' Characters '@[' [Delimiter] ']' Where Delimiter can contain letters, digits, underscores, blank spaces, dots, colons, da shes and assignment marks. It must be unique such that '@[' [Delimiter] ']' or '@@[' [Del imiter] ']' does not appear in the string content. Here a ValidCharSequence is a sequence of characters where the enclosing quotation marks may only appear inside the sequence as escaped characters. So the followings are valid st ring literals: 1 ' " ' 2 " ' " 3 ' \' ' 4 " \" " String literal: 1 String ::= SingleQuoteString + | DoubleQuoteString + | VerbatimString Here the repeating marks mean two or more SingleQuoteString or DoubleQuoteString can be p laced one after another, and they will will be jointed into a single string literal durin g preprocessing. 0.9 Escape Sequences in String Literal Escape characters: * \\: backslash; * \t: horizontal tab; * \f: form feed; (not implemented) * \n: line feed; * \r: carriage return; * \': single quotation mark; * \": double quotation mark; Escape digits (not implemented): * \ooo: character with octal value ooo; * \xhh: character with hex value hh; * \uxxxx: Unicode character with hex value xxxx; * \uxxxxxxxx: Unicode character with hex value xxxxxxxx; 0.10 Operators * Left unary operators: 1 LeftUnaryOperater ::= '++' | '--' | '!' | '~' | '%' | 'not' * Right unary operators: 1 RightUnaryOperator ::= * Binary operators: 1 BinArith ::= '+' | '-' | '*' | '/' | '%' | '**' 2 BinComp ::= '==' | '!=' | '<' | '>' | '<=' | '>=' 3 BinBool ::= '&&' | '||' | 'and' | 'or' 4 BinBit ::= '&' | '|' | '^' | '<<' | '>>' 5 BinMisc ::= 'in' | 'not in' | '?=' | '?<' 6 7 BinaryOperator ::= BinArith | BinComp | BinBool | BinBit | BinMisc * Composite assignment operators: 1 AssignmentOperator ::= '+=' | '-=' | '*=' | '/=' | '&=' | '|=' * Other operators: 1 OtherOperator ::= '->' | '=>' | ':' | '.' | '...' 1 UnaryOperator ::= LeftUnaryOperater | RightUnaryOperator 2 3 Operator ::= UnaryOperator | BinaryOperator 4 | AssignmentOperator | OtherOperator 0.11 Miscellaneous 0.12 Semicolon Like in some other languages, semicolon can be used to mark the end of a statement. Howev er the use of semicolon is optional, the compiler is able to determine the end of a state ment based on some semantic rules.