[NAME]
ALL.dao.grammar.lexical

[TITLE]
Lexical Structures

[DESCRIPTION]

The standard encoding for Dao program source code is UTF-8. But the system encoding of t
he execution environment is also supported, and source code in such encoding is automatic
ally converted to UTF-8 before parsing.

 0.1  Comments 

Dao uses the number sign # to mark comments. 
  *  Single line comment: from # to the end of the line;
  *  Multiple line comment: paired with #{ and #}; 
Multiple line comments may contain other #{ and #}, if they are properly paired, namely, 
they are allowed to be nested.

 0.2  Quotation Marks 

Both the single quotation mark (0x27) and the double quotation mark (0x22) can be used to
quote string literals. They must be used in pairs, and no escape is required for one kind
of quotation mark inside strings quoted with the other kind quotation marks.

 0.3  Keywords 

The following keywords are reserved for the language: 
  *  Types: 
          
        1  TypeKeyword ::= type | any | int | float | complex | string
        2                 | enum | array | list | map | tuple
          
     
  *  Structures: 
          
        1  StructKeyword ::= interface | class | routine
          
     
  *  Storage/scoping: 
          
        1  StorageKeyword ::= const | var | invar | static
          
     
  *  Permisions: 
          
        1  PermKeyword ::= private | protected | public
          
     
  *  Built-in constants/variables: 
          
        1  ConstVarKeyword ::= none | false | true | self
          
     
  *  Control statements: 
          
        1  ControlKeyword ::= if | else | for | while | do 
        2                    | switch | case | default | break | skip
        3                    | defer | return | yield
          
     
  *  Other statements: 
          
        1  OtherStmtKeyword ::= load | import | as
          
     
  *  Operators: 
          
        1  OperatorKeyword ::= and | or | not | in
          
     
  *  Miscellaneous: 
          
        1  MiscKeyword ::=
          


     
   1  Keyword ::= TypeKeyword | StructKeyword | StorageKeyword 
   2            | PermKeyword | ConstVarKeyword | ControlKeyword
   3            | OtherStmtKeyword | OperatorKeyword | MiscKeyword
     

 0.4  Basic Character Class Definitions 

Basic character classes: 
     
   1  DecDigit ::= '0' ... '9'
   2  HexDigit ::= DecDigit | 'a' ... 'f' | 'A' ... 'F'
   3  AsciiLetter ::= 'a' ... 'z' | 'A' ... 'Z'
   4  
   5  WideChar ::= "UTF-8 encoded unit of one or more bytes"
   6  WideAlpha ::= WideChar & iswalpha( WideChar ) != 0
   7  WideAlnum ::= WideChar & iswalnum( WideChar ) != 0
     
Where iswalpha() and iswalnum() are the C99 functions that test if a wide character is be
longing to certain class. Here WideChar can be more than one byte, in such case, these UT
F-8 bytes are converted into Unicode before passing to the C99 test functions.

 0.5  Identifiers 

     
   1  AsciiIdentifier ::= ( AsciiLetter | '_' ) ( AsciiLetter | DecDigit | '_' )*
   2  WideIdentifier ::= ( WideAlpha | '_' ) ( WideAlnum | '_' )*
   3  
   4  Identifier ::= AsciiIdentifier | WideIdentifier
     


 0.6  Literals 

 0.7  Number Literals 
Integer literals: 
     
   1  DecInteger ::= DecDigit +
   2  HexInteger ::= ( '0x' | '0X' ) HexDigit +
   3  
   4  Integer ::= DecInteger | HexInteger
     


Floating pointer number literals: 
     
   1  DotDec ::= DecDigit * '.' DecDigit +
   2  DecDot ::= DecDigit + '.' DecDigit *
   3  DecNumber ::= DotDec | DecDot
   4  DecNumber ::= DecInteger | DecNumber
   5  SciNumber ::= DecNumber ( 'e' | 'E' ) [ '+' | '-' ] DecInteger
   6  
   7  Float  ::= DecNumber | SciNumber
     

Complex number, imaginary part literal: 
     
   1  ComplexImaginary ::= [ Float ] 'C'
     

Symbol literal: 
     
   1  Symbol ::= '$' Identifier
     

Type holder literal: 
     
   1  TypeHolder ::= '@' Identifier
     


 0.8  String Literal 

Basic string literal: 
     
   1  SingleQuoteString ::= ' ' ' ValidCharSequence ' ' '
   2  DoubleQuoteString ::= ' " ' ValidCharSequence ' " '
     

Verbatim string literal: 
     
   1  VerbatimString ::= '@[' [Delimiter] ']' Characters '@[' [Delimiter] ']'
     
Where Delimiter can contain letters, digits, underscores,  blank spaces, dots, colons, da
shes and assignment marks. It must be unique such that '@[' [Delimiter] ']' or '@@[' [Del
imiter] ']' does not appear in the string content.

Here a ValidCharSequence is a sequence of characters where the enclosing quotation marks 
may only appear inside the sequence as escaped characters. So the followings are valid st
ring literals: 
     
   1  ' " '
   2  " ' "
   3  ' \' '
   4  " \" "
     

String literal: 
     
   1  String ::= SingleQuoteString + | DoubleQuoteString + | VerbatimString
     
Here the repeating marks mean two or more SingleQuoteString or DoubleQuoteString can be p
laced one after another, and they will will be jointed into a single string literal durin
g preprocessing.

 0.9  Escape Sequences in String Literal 

Escape characters: 
  *  \\: backslash;
  *  \t: horizontal tab;
  *  \f: form feed; (not implemented)
  *  \n: line feed;
  *  \r: carriage return;
  *  \': single quotation mark;
  *  \": double quotation mark; 

Escape digits (not implemented): 
  *  \ooo: character with octal value ooo;
  *  \xhh: character with hex value hh;
  *  \uxxxx: Unicode character with hex value xxxx;
  *  \uxxxxxxxx: Unicode character with hex value xxxxxxxx; 


 0.10  Operators 


  *  Left unary operators: 
          
        1  LeftUnaryOperater ::= '++' | '--' | '!' | '~' | '%' | 'not'
          

  *  Right unary operators: 
          
        1  RightUnaryOperator ::=
          

  *  Binary operators: 
          
        1  BinArith ::= '+' | '-' | '*' | '/' | '%' | '**'
        2  BinComp  ::= '==' | '!=' | '<' | '>' | '<=' | '>='
        3  BinBool  ::= '&&' | '||' | 'and' | 'or'
        4  BinBit   ::= '&' | '|' | '^' | '<<' | '>>'
        5  BinMisc  ::= 'in' | 'not in' | '?=' | '?<'
        6  
        7  BinaryOperator ::= BinArith | BinComp | BinBool | BinBit | BinMisc
          
     
  *  Composite assignment operators: 
          
        1  AssignmentOperator ::= '+=' | '-=' | '*=' | '/=' | '&=' | '|='
          
     
  *  Other operators: 
          
        1  OtherOperator ::= '->' | '=>' | ':' | '.' | '...'
          
     
     
   1  UnaryOperator ::= LeftUnaryOperater | RightUnaryOperator
   2  
   3  Operator ::= UnaryOperator | BinaryOperator 
   4             | AssignmentOperator | OtherOperator
     

 0.11  Miscellaneous 

 0.12  Semicolon 

Like in some other languages, semicolon can be used to mark the end of a statement. Howev
er the use of semicolon is optional, the compiler is able to determine the end of a state
ment based on some semantic rules.