HW#3 (Need help)
For the coding portion of the homework, I want you to write an interpreter in Rust for the S-IMP language defined below. Before you begin coding you should make a LL- grammar and attribute grammar for the language. The attribute grammar should have inherited attributes. Put your grammars in a file grammars.txt you include in your HW zip file. Your interpreter will be compiled by switching in to a Rust subfolder of your Hw3.zip file and typing:
cargo build
Your interpreter will be run from a command prompt with a syntax like:
s-imp name_of_program_file [args]
For example,
s-imp chess_program.simp s-imp draw-pattern 10 50
You can assume the grader has set up a path environment variable to correctly find your s-imp program above.
Your interpreter might output one of two error messages: SYNTAX ERROR, if a line a couldn’t be parsed, or SEMANTIC ERROR, in situations described below. If an error is output you should precede it with the line that caused the error.
Your interpreter should maintain a global symbol table. You can assume the supplied program file can fit in RAM and so you can read it entirely before processing . Your interpreter should then process the supplied program file line by line, for example in a while loop. To process a line it should parse that line using a recursive descent parser. One of the arguments to the recursive functions you call should be the attribute arguments for inherited attributes you need to pass down the recursion. As part of parsing a line your program may need to update the symbol table, read from, write to the standard input/output streams. Before executing any lines, your program should initialize S-IMP variables X0, X1, X2 according to the command line arguments. I.e., for the second example above, X0 has value the string s-imp; X1, the string draw- pattern; X2, the string 10; X3 the string 50.
A S-IMP program is a sequence of S-IMP statements delimited by newlines.
A statement is a number followed by one or more non-newline whitespace characters followed by either an assignment_statement, a branch_statement,
a comment_line, an input_statement, or a print_statement. It is a SYNTAX error if two statements begin with the same NUMBER.
A number consists of 1 or more of the digits 1-9. For example, 0553.
An assignment_statement consists of a variable followed by = followed by an expression. The intended meaning of an assignment is that the left hand variable after execution should have attribute values as computed from the expression. Let’s spend a moment to describe the parts of an assignment before proceeding to the next statement type.
A variable is as in HW2. For this HW variables have both a value and a type attribute. The type can be one of NUMBER or STRING.
An expression can be one of a number, a string literal (same as in HW2), a variable, the keyword ISSET followed followed by at least one whitespace character and a variable, the keyword NUMBER followed by at least one whitespace character and a variable, the keyword STRING followed by at least one whitespace character and a variable, a variable followed by at least one whitespace character followed by an operation followed by at least one whitespace character followed by a variable. Here an operation is either the + symbol or the – symbol. Your program should output SEMANTIC ERROR on expressions involving variables that are not yet assigned and program execution should stop. Below are the intended meanings of the each expression possibility:
The intended meaning of a number literal is the number written and its type is NUMBER.
Similarly, intended meaning of a string literal is the string written and its type is STRING.
The type of ISSET some_var is NUMBER and it should have value 1 is some_var was ever assigned a value and 0 otherwise.
The type of NUMBER some_var is NUMBER and the intended meaning of NUMBER some_var is some_var if some_var is a NUMBER, and if some_var is a string with only digits for characters then the corresponding number, otherwise 0. For example, if X2 was “123”, NUMBER X2 is 123. if X0 was “hell0 123”, NUMBER X0 is 0.
The type of STRING some_var is STRING and the intended meaning of STRING some_var is some_var if some_var is of type STRING, and is the string version of a number if some_var has type NUMBER. I.e., 12345 would be “12345”.
For two STRING variables, Xm + Xn is a STRING whose value is their concatenation, Xm – Xn is a STRING where all occurrences of Xn in Xm have
been replaced with the empty string. I.e., “Yellow tallow” – “ll” has value “Yeow taow”.
For two NUMBER variables Xm operation Xn is a NUMBER with value the maximum of that operation applied the two numbers and 0. I.e., if X0 was 5 and X1 was 6, X1 + X0 would have value 11, X1 – X0 would have value 1, and X0 – X1 would have value 0.
If we have a situation Xm operation Xn where Xm is a NUMBER and Xn is a STRING, a SEMANTIC error should be output and the program should stop.
Xm + Xn where Xm is a STRING variable and Xn is a NUMBER variable has value all of the characters in Xm beginning with the Xn’th character through the end of the string. For example, if Xm is “Hello World” and Xn is 1, the Xm + Xn is “ello World”.
Xm – Xn where Xm is a STRING variable and Xn is a NUMBER variable has value all of the characters in Xm except the Xn’th character through the end of the string. For example, if Xm is “Hello World” and Xn is 1, then Xm – Xn is “H”.
Returning to out discussion of statement types…
A branch_statement consists of the keyword BRANCH followed by at least one whitespace character, followed by two whitespace separated variables followed by a number. For example, BRANCH X1 X2 10. The intended meaning of such a line is if the two variables have the same type and value, the next line to execute should be number and execution should continue from there. If the two variables don’t have the same type or don’t have the same value, then we should treat this statement as doing nothing.
A comment_line consists of the letter C followed by a whitespace character, followed by any printable character string till the next newline character. Your interpreter should do nothing when evaluating such a line.
An input_statement consists of the keyword INPUT followed by at least one whitespace character followed a variable. For example, INPUT X55 . This should read from stdin a line and assign it to the variable X55 as a STRING.
A print_statement consists of the keyword PRINT followed by at least one whitespace character followed a variable followed by at least one whitespace character followed by an attribute. An attribute is either the keyword VALUE or the keyword TYPE. For example, PRINT X1 VALUE or PRINT X099 TYPE. The intended meaning of this command is to output to stdout either the value or the type of the given variable.
Below is an example S-IMP program to print hello world ten times:
010 C 020 C Hello World Program 030 C 040 X0 = 1 050 X1 = 10 055 X2 = 1 056 X3 = 0 060 X4 = “hello worldn” 065 PRINT X4 VALUE 070 X0 = X0 + X2 080 X5 = X0 – X1 090 BRANCH X5 X3 065
Point Breakdown
cargo build compiles the program. program when run reads file and tries to interpret and give error messages as described above 1pt
grammar.txt as described 1pt Statement parsing is recursive descent according to attribute grammar in grammar.txt. Inherited attributes are passed down in any recursive calls. 1pt
comment_lines, input_statements, and print_statements work as described (0.5pts each) 1.5pt
branch_statements work as described 1pt assignment_statements work as described 1.5pt