[NAME] ALL.daovm.spec.bytecode [TITLE] Bytecode Format [DESCRIPTION] This document contains specifications of the bytecode format for Dao virtual machine. In this bytecode format, integers are always stored in big endian. In the following specifications or examples, each byte is represented by two hexadecimal digits, unless it is quoted by quotation marks. 1 Header Section 1 Header Section 1 Header Section The header section contains 32 bytes, which are divided as the following: 1 Byte # ESC, 0x1B; 2 Byte # 0x44, namely 'D'; 3 Byte # 0x61, namely 'a'; 4 Byte # 0x6F, namely 'o'; 5 Byte # major version number, 0x2; 6 Byte # minor version number, 0x0; 7 Byte # Carriage Return (CR), 0x0D; 8 Byte # Line Feed (LF), 0x0A; 9 Byte # format class, 0x0 for the official one; 10 Byte # size of integer type, default 0x4; 11 Byte[4] # format hash (rotating hash of the ASM tags and VM opcodes); 12 Byte[16] # 16 reserved bytes; 13 Byte # Carriage Return (CR), 0x0D; 14 Byte # Line Feed (LF), 0x0A; The ninth byte is for format class, where 0x0 is reserved for the official format, and 0x 1 for encrypted format (only the main section is encrypted, see below for more informatio n). The four bytes for format hash serves as a signature for the format in which the bytecode is encoded. It is the rotating hash value of a string that is constructed from the byteco de tag indices and names, and virtual machine opcode indices and names: 1 TagIndex1:TagName1;TagIndex2:TagName2;...; OpcodeIndex1:OpcodeName1;... Each index is separated with its corresponding name by a colon; and each pair of index an d name is followed by a semicolon. The substrings for bytecode tags and opcodes are seper ated by a blank space. The rotating hash is computed by 1 var hash = length( text ); 2 for(byte in text) hash = ((hash<<4)^(hash>>28)^byte)&0x7fffffff; 3 return hash; 2 Source Path Section 2 Source Path Section 2 Source Path Section 1 Byte[2] # length of the source path; 2 Byte[] # source path (null-terminated); 3 Byte # Carriage Return (CR), 0x0D; 4 Byte # Line Feed (LF), 0x0A; 3 Main Section 3 Main Section 3 Main Section The main section is encoded as structured blocks. Each block is divided into chunks of 9 bytes, where the first byte always stores a tag which identifies the chunk type. The rema ining 8 bytes are used to store data. There are the following type of chunks: 1 ASM_COPY # 2 ASM_TYPEOF # 3 ASM_TYPEDEF # 4 ASM_TYPEINVAR # 5 ASM_ROUTINE # 6 ASM_CLASS # 7 ASM_INTERFACE # 8 ASM_ENUM # 9 ASM_TYPE # 10 ASM_VALUE # 11 ASM_EVAL # 12 ASM_BASES # 13 ASM_DECOS # 14 ASM_PATTERNS # 15 ASM_CONSTS # 16 ASM_TYPES # 17 ASM_CODE # 18 ASM_END # 19 ASM_LOAD # 20 ASM_IMPORT # 21 ASM_VERBATIM # 22 ASM_CONST # 23 ASM_STATIC # 24 ASM_GLOBAL # 25 ASM_VAR # 26 ASM_DATA # 27 ASM_DATA2 # 28 ASM_SEEK # 4 Chunk Specifications: 4 Chunk Specifications: 4 Chunk Specifications: 4.1 Values: 1 int: 2 ASM_VALUE(1Byte): DAO_INTEGER(1Bytes), Zeros(7Bytes); 3 ASM_END(1B): Value(4B/8B), Zeros(4B/0B); 4 5 6 float: 7 ASM_VALUE(1B): DAO_FLOAT(1B), Zeros(7B); 8 ASM_END(1B): Value(4B), Zeros(4B); 9 10 11 double: 12 ASM_VALUE(1B): DAO_DOUBLE(1B), Zeros(7B); 13 ASM_END(1B): Value(8B); 14 15 16 complex: 17 ASM_VALUE(1B): DAO_COMPLEX(1B), Zeros(7B); 18 ASM_DATA(1B): Real(8B); 19 ASM_END(1B): Imag(8B); 20 21 22 string: 23 ASM_VALUE(1B): DAO_STRING(1B), SizeMod16(1B), Bytes(6B); 24 ASM_DATA(1B); Bytes(8B); 25 ASM_END(1B): Bytes(8B); 26 27 28 enum symbol: 29 ASM_VALUE(1B): DAO_ENUM(1B), Zeros(1B), Type-Index(2B), Zeros(4B); 30 ASM_END(1B): Value(4B), Zeros(0); Notes: The "Type-Index" reference previous blocks which are located backwardly by a such "index" offset. Only blocks that represent values are indexed, and such index is stored as a two- byte short. In case short is not sufficient to represent such index, an intermediate indexing chunk c an be used: 1 ASM_SEEK(1B): New-Index(2B), Zeros(6B); When "New-Index" is also seeked backwardly, and is relative to the seek chunk. 1 array: 2 ASM_VALUE(1B): DAO_ARRAY(1B), Numeric-Type(1B), Dimensions(2B), Size(4B); 3 ASM_DATA(1B); Dim1(4B), Dim2(4B); 4 ASM_DATA(1B); More dimensions; 5 ASM_DATA(1B); Data(4B), Data(4B); Or Data(8B); 6 ASM_DATA(1B); More Data; 7 ASM_END(1B): Data(8B); 8 9 10 list: 11 ASM_VALUE(1B): DAO_LIST(1B), Zeros(1B), Type-Index(2B), Size(4B); 12 ASM_DATA(1B); Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 13 ASM_END(1B): Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 14 15 16 map: 17 ASM_VALUE(1B): DAO_MAP(1B), Zeros(1B), Type-Index(2B), Hash-Seed(4B); 18 ASM_DATA(1B); Key-Index(2B), Value-Index(2B), Key-Index(2B), Value-Index(2B); 19 ASM_END(1B): Key-Index(2B), Value-Index(2B), Key-Index(2B), Value-Index(2B); 20 21 A pair of "Value-Index"s is for a pair of key-value, zero marks the end. 22 23 24 tuple: 25 ASM_VALUE(1B): DAO_TUPLE(1B), SubTypeID(1B), Type-Index(2B), Size(2B), Value-Index(2B); 26 ASM_DATA(1B); Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 27 ASM_END(1B): Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 28 29 30 namevalue: 31 ASM_VALUE(1B): DAO_PAR_NAMED(1B), Zeros(1B), Name-Index(2B), Value-Index(2B), Type-Index(2B); 32 ASM_END(1B): Zeros(8B); 33 34 35 specialized ctype: 36 ASM_VALUE(1B): DAO_CTYPE(1B), Zeros(1B), Value-Index(2B), Type-Index(2B) X 2; 37 ASM_DATA(1B): Type-Index(2B) X 4; 38 ASM_END(1B): Type-Index(2B) X 4; 4.2 Other Values 1 copied value: 2 ASM_COPY(1B): Value-Index(2B), Zeros(6B); 3 4 type of a value: 5 ASM_TYPEOF(1B): Value-Index(2B), Zeros(6B); 6 7 const/invar type: 8 ASM_TYPEINVAR(1B): Type-Index(2B), SubType(2B), Zeros(4B); 9 10 type alias: 11 ASM_TYPEDEF(1B): Name-Index(2B), Type-Index(2B), Zeros(4B); 4.3 Structures: 1 routine: 2 ASM_ROUTINE(1B): Name-Index(2B), Type-Index(2B), Host-Index(2B), Attrib(2B); 3 ... 4 ASM_END: RegCount(2B), Zeros(4B), DefaultConstructor(1B), Permission(1B); 5 6 7 class: 8 ASM_CLASS(1B): Name/Decl-Index(2B), Parent-Index(2B), Attrib(4B); 9 ASM_BASES(1B): Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 10 ... 11 ASM_END(1B): LineDef(2B), Zeros(5B), Permission(1B); 12 13 14 interface: 15 ASM_INTERFACE(1B): Name/Decl-Index(2B), Parent-Count(2B), Zeros(4B); 16 ASM_BASES(1B): Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 17 ... 18 ASM_END(1B): LineDef(2B), Zeros(5B), Permission(1B); 19 20 21 enum: 22 ASM_ENUM(1B): Name-Index(2B), Enum/Flag(2B), Count(4B); 23 ASM_DATA(1B): Name-Index(2B), Value(4B), Zeros(2B); 24 ASM_END(1B): Name-Index(2B), Value(4B), Zeros(2B); 25 26 27 type: 28 ASM_TYPE(1B): Name-Index(2B), TypeID(2B), Aux-Index(2B), CodeBlockType-Index(2B); 29 ASM_DATA(1B): Type-Index(2B) X 4; 30 ASM_END(1B): Type-Index(2B) X 4; 31 32 Note 1: the nested types are zero Type-Index terminated; 33 Note 2: "Aux-Index" could be index to returned type or class block etc; 34 35 36 value: 37 See above; 38 39 40 evaluation: 41 ASM_EVAL(1B): Opcode(2B), OpB(2B), Type-Index(2B), Zeros(2B); 42 ASM_DATA(1B): Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 43 ASM_END(1B): Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 44 45 46 bases (mixin components or interface parents): 47 ASM_BASES(1B): Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 48 ASM_DATA(1B): Value-Index(2B) X 4; 49 ASM_END(1B): Value-Index(2B) X 4; 50 51 52 decorators for the current routine: 53 ASM_DECOS(1B): Func-Index(2B), ParList-Index(2B), Func-Index(2B), ParList-Index(2B); 54 ASM_DATA(1B): Func-Index(2B), ParList-Index(2B), Func-Index(2B), ParList-Index(2B); 55 ASM_END(1B): Func-Index(2B), ParList-Index(2B), Func-Index(2B), ParList-Index(2B); 56 57 58 patterns for automatic decorator application: 59 ASM_PATTERNS(1B): PatternString-Index(2B) X 4; 60 ASM_DATA(1B): PatternString-Index(2B) X 4; 61 ASM_END(1B): PatternString-Index(2B) X 4; 62 63 64 consts: 65 ASM_CONSTS(1B): Count(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 66 ASM_DATA(1B): Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 67 ASM_END(1B): Value-Index(2B), Value-Index(2B), Value-Index(2B), Value-Index(2B); 68 69 70 types: 71 ASM_TYPES(1B): Count(2B), Zeros(2B), Var-Index(2B), Type-Index(2B); 72 ASM_DATA(1B): Var-Index(2B), Type-Index(2B), Var-Index(2B), Type-Index(2B); 73 ASM_END(1B): Var-Index(2B), Type-Index(2B), Var-Index(2B), Type-Index(2B); 74 75 76 code: 77 ASM_CODE(1B): CodeNum(2B), Line-Num-Count(2B), LineNum(2B), Count(2B); 78 ASM_DATA(1B): LineDiff(2B), Count(2B), LineDiff(2B), Count(2B); 79 ASM_DATA(1B): Opcode(2B), A(2B), B(2B), C(2B); 80 ASM_END(1B): Opcode(2B), A(2B), B(2B), C(2B); 4.4 Statement: 1 load statement: 2 ASM_LOAD(1B): File-Path-Index(2B), Optional-Name-Index(2B), Zeros(4B); 3 4 import from namespace/module: 5 ASM_IMPORT(1B): Mod-Index(2B), Name-Index(2B), Scope(2B), Offset(2B); 6 7 verbatim: 8 ASM_VERBATIM(1B): Tag-Index(2B), Mode-Index(2B), Text-Index(2B), LineNum(2B); 9 10 var declaration: 11 ASM_VAR(1B): Name-Index(2B), Value-Index(2B), Type-Index(2B), Scope(1B), Perm(1B); 12 13 const declaration: 14 ASM_CONST(1B): Name-Index(2B), Value-Index(2B), Zeros(2B), Scope(1B), Permission(1B); 15 16 static declaration: 17 ASM_STATIC(1B): Name-Index(2B), Value-Index(2B), Type-Index(2B), Scope(1B), Perm(1B); 18 19 global declaration: 20 ASM_GLOBAL(1B): Name-Index(2B), Value-Index(2B), Type-Index(2B), Scope(1B), Perm(1B); 21 22 seek: 23 ASM_SEEK(1B): New-Index(2B), Zeros(6B); 5 Samples 5 Samples 5 Samples Input code: 1 io.writeln( 'Hello Dao!' ); Output disassembled bytecode: 1 ASM_ROUTINE: 0, 0, 0, 0; 2 ASM_VALUE: DAO_STRING, 2, 'io'; 3 ASM_END: ''; 4 5 ASM_EVAL: GETCG, 1, 0, 0; 6 ASM_END: 1, 0, 0, 0; 7 8 ASM_VALUE: DAO_STRING, 7, 'writel'; 9 ASM_END: 'n'; 10 11 ASM_EVAL: GETF, 2, 0, 0; 12 ASM_END: 2, 1, 0, 0; 13 14 ASM_VALUE: DAO_STRING, 10, 'Hello '; 15 ASM_END: 'Dao!'; 16 17 ASM_CONSTS: 2, 2, 1, 0; 18 ASM_END: 0, 0, 0, 0; 19 20 ASM_TYPES: 0, 0, 0, 0; 21 ASM_END: 0, 0, 0, 0; 22 23 ASM_CODE: 6, 1, 1, 6; 24 ASM_DATA: GETCG , 1, 5, 0; 25 ASM_DATA: GETCL , 0, 0, 1; 26 ASM_DATA: LOAD , 0, 0, 2; 27 ASM_DATA: GETCL , 0, 1, 3; 28 ASM_DATA: MCALL , 1, 2, 4; 29 ASM_END: RETURN , 4, 1, 0; 30 ASM_END: ; Input code: 1 load web.cgi as cgi 2 3 import cgi.random_string 4 5 enum Bool 6 { 7 False, 8 True 9 } 10 11 static abc = random_string( 100 ) 12 13 var index = 123 + %abc 14 15 class Klass 16 { 17 const name = "abc"; 18 var index = 123; 19 20 routine Method( a :int ){ 21 } 22 } 23 24 routine Func() 25 { 26 var name = index 27 } 28 29 var klass = Klass() Output disassembled bytecode: 1 ASM_ROUTINE: 0, 0, 0, 0; 2 ASM_VALUE: DAO_STRING, 7, 'web/cg'; 3 ASM_END: 'i'; 4 5 ASM_VALUE: DAO_STRING, 3, 'cgi'; 6 ASM_END: ''; 7 8 ASM_LOAD: 2, 1, 0, 0; 9 10 ASM_EVAL: GETCG, 1, 0, 0; 11 ASM_END: 1, 0, 0, 0; 12 13 ASM_VALUE: DAO_STRING, 13, 'random'; 14 ASM_END: '_string'; 15 16 ASM_IMPORT: 2, 1, 0, 28; 17 18 ASM_VALUE: DAO_STRING, 4, 'Bool'; 19 ASM_END: ''; 20 21 ASM_VALUE: DAO_STRING, 5, 'False'; 22 ASM_END: ''; 23 24 ASM_VALUE: DAO_STRING, 4, 'True'; 25 ASM_END: ''; 26 27 ASM_VALUE: DAO_STRING, 0, 'enum<F'; 28 ASM_DATA: 'alse,Tru'; 29 ASM_END: 'e>'; 30 31 ASM_ENUM: 1, 68, 2; 32 ASM_DATA: 3, 0; 33 ASM_END: 2, 1; 34 35 ASM_TYPEINVAR: 5, 1, 0, 0; 36 37 ASM_EVAL: GETCG, 1, 0, 0; 38 ASM_END: 7, 0, 0, 0; 39 40 ASM_VALUE: DAO_INTEGER; 41 ASM_END: 100 ; 42 43 ASM_VALUE: DAO_STRING, 1, '?'; 44 ASM_END: ''; 45 46 ASM_TYPE: 1, 66, 0, 0; 47 ASM_END: 0, 0, 0, 0; 48 49 ASM_EVAL: CALL, 1, 0, 1; 50 ASM_END: 4, 3, 0, 0; 51 52 ASM_TYPEOF: 1, 0, 0, 0; 53 54 ASM_VALUE: DAO_STRING, 3, 'abc'; 55 ASM_END: ''; 56 57 ASM_GLOBAL: 1, 3, 2, 3; 58 59 ASM_VALUE: DAO_STRING, 5, 'index'; 60 ASM_END: ''; 61 62 ASM_GLOBAL: 1, 0, 5, 3; 63 64 ASM_VALUE: DAO_STRING, 5, 'Klass'; 65 ASM_END: ''; 66 67 ASM_CLASS: 1, 0, 0, 0; 68 ASM_END: ; 69 70 ASM_VALUE: DAO_STRING, 12, 'class<'; 71 ASM_END: 'Klass>'; 72 73 ASM_TYPE: 1, 14, 2, 0; 74 ASM_END: 0, 0, 0, 0; 75 76 ASM_VALUE: DAO_STRING, 7, 'interf'; 77 ASM_DATA: 'ace<clas'; 78 ASM_DATA: 's<Klass>'; 79 ASM_END: '>'; 80 81 ASM_INTERFACE: 2, 0, 0, 0; 82 ASM_END: ; 83 84 ASM_TYPE: 6, 11, 5, 0; 85 ASM_END: 0, 0, 0, 0; 86 87 ASM_VALUE: DAO_STRING, 0, 'interf'; 88 ASM_DATA: 'ace<Klas'; 89 ASM_END: 's>'; 90 91 ASM_INTERFACE: 2, 0, 0, 0; 92 ASM_END: ; 93 94 ASM_CLASS: 8, 0, 0, 1; 95 ASM_BASES: 0, 0, 0, 0; 96 ASM_END: 0, 0, 0, 0; 97 98 ASM_VALUE: DAO_STRING, 4, 'name'; 99 ASM_END: ''; 100 101 ASM_CONST: 1, 13, 0, 3; 102 103 ASM_VALUE: DAO_INTEGER; 104 ASM_END: 123 ; 105 106 ASM_VALUE: DAO_STRING, 3, 'int'; 107 ASM_END: ''; 108 109 ASM_TYPE: 1, 1, 0, 0; 110 ASM_END: 0, 0, 0, 0; 111 112 ASM_VAR: 15, 3, 1, 3; 113 114 ASM_VALUE: DAO_STRING, 10, 'self:K'; 115 ASM_END: 'lass'; 116 117 ASM_TYPE: 1, 28, 9, 0; 118 ASM_END: 0, 0, 0, 0; 119 120 ASM_VALUE: DAO_STRING, 5, 'a:int'; 121 ASM_END: ''; 122 123 ASM_TYPE: 1, 28, 4, 0; 124 ASM_END: 0, 0, 0, 0; 125 126 ASM_VALUE: DAO_STRING, 7, '@Metho'; 127 ASM_END: 'd'; 128 129 ASM_TYPE: 1, 65, 0, 0; 130 ASM_END: 0, 0, 0, 0; 131 132 ASM_VALUE: DAO_STRING, 2, 'routin'; 133 ASM_DATA: 'e<self:K'; 134 ASM_DATA: 'lass,a:i'; 135 ASM_DATA: 'nt=>@Met'; 136 ASM_END: 'hod>'; 137 138 ASM_TYPE: 1, 17, 2, 0; 139 ASM_END: 6, 4, 0, 0; 140 141 ASM_VALUE: DAO_STRING, 6, 'Method'; 142 ASM_END: ''; 143 144 ASM_ROUTINE: 1, 2, 17, 1; 145 ASM_END: ; 146 147 ASM_ROUTINE: 1, 3, 18, 1; 148 ASM_CONSTS: 2, 0, 0, 0; 149 ASM_END: 0, 0, 0, 0; 150 151 ASM_TYPES: 2, 0, 0, 10; 152 ASM_END: 1, 12, 0, 0; 153 154 ASM_CODE: 1, 1, 0, 1; 155 ASM_END: RETURN , 0, 0, 0; 156 ASM_END: ; 157 158 ASM_VALUE: DAO_STRING, 0, 'routin'; 159 ASM_DATA: 'e<=>Klas'; 160 ASM_END: 's>'; 161 162 ASM_TYPE: 1, 17, 20, 0; 163 ASM_END: 0, 0, 0, 0; 164 165 ASM_VALUE: DAO_STRING, 12, 'Klass:'; 166 ASM_END: ':Klass'; 167 168 ASM_ROUTINE: 1, 2, 22, 512; 169 ASM_CONSTS: 0, 0, 0, 0; 170 ASM_END: 0, 0, 0, 0; 171 172 ASM_TYPES: 0, 0, 0, 0; 173 ASM_END: 0, 0, 0, 0; 174 175 ASM_CODE: 1, 1, 0, 1; 176 ASM_END: RETURN , 0, 0, 0; 177 ASM_END: ; 178 ASM_END: ; 179 180 ASM_VALUE: DAO_STRING, 5, '@Func'; 181 ASM_END: ''; 182 183 ASM_TYPE: 1, 65, 0, 0; 184 ASM_END: 0, 0, 0, 0; 185 186 ASM_VALUE: DAO_STRING, 0, 'routin'; 187 ASM_DATA: 'e<=>@Fun'; 188 ASM_END: 'c>'; 189 190 ASM_TYPE: 1, 17, 2, 0; 191 ASM_END: 0, 0, 0, 0; 192 193 ASM_VALUE: DAO_STRING, 4, 'Func'; 194 ASM_END: ''; 195 196 ASM_ROUTINE: 1, 2, 0, 0; 197 ASM_CONSTS: 0, 0, 0, 0; 198 ASM_END: 0, 0, 0, 0; 199 200 ASM_TYPES: 0, 0, 0, 0; 201 ASM_END: 0, 0, 0, 0; 202 203 ASM_CODE: 3, 1, 26, 3; 204 ASM_DATA: GETVG , 0, 36, 0; 205 ASM_DATA: MOVE_XX , 0, 3, 1; 206 ASM_END: RETURN , 0, 0, 0; 207 ASM_END: ; 208 209 ASM_CONSTS: 0, 0, 0, 0; 210 ASM_END: 0, 0, 0, 0; 211 212 ASM_TYPES: 0, 0, 0, 0; 213 ASM_END: 0, 0, 0, 0; 214 215 ASM_CODE: 10, 3, 3, 1; 216 ASM_DATA: 10, 5, 16, 4; 217 ASM_DATA: GETCG , 0, 52, 0; 218 ASM_DATA: DATA_I , 1, 123, 1; 219 ASM_DATA: GETVG , 0, 37, 2; 220 ASM_DATA: SIZE , 2, 0, 3; 221 ASM_DATA: ADD_III , 1, 3, 4; 222 ASM_DATA: SETVG_II , 4, 36, 2; 223 ASM_DATA: GETCG , 0, 35, 5; 224 ASM_DATA: CALL , 5, 0, 6; 225 ASM_DATA: MOVE_PP , 6, 3, 7; 226 ASM_END: RETURN , 0, 0, 0; 227 ASM_END: ;