Overview
For this project, you will turn the recognizer you built for PA4 into a full-fledged parser. In other words, you will
- Reconstruct the parse tree from the parse edges. This means keeping track of how edges were derived during the parsing process.
- Complete the diagnosis of ambiguous grammars. Now that you have the
parse tree, you can finish the implementation of associativity for
%left and %right directives.
- Execute the syntax-directed translation (aka semantic actions).
This will allow you to make translate the parse tree to an AST or to
other interesting applications.
- Rejoice: you now have a complete interpreter for our CS164 language. :)
Teamwork and Repositories
You will work in the same groups as in PA4 and use the same repository. If your team has changed in any form, please fill out this form.
Implementation
In the file parser_generator.py from your PA4, you must implement a function makeParser(grammar.Grammar)
, returning an object with a parse(string)
method instead of recognize(string)
method. This parse()
method should return the semantic value (val
)
of the start symbol from the grammar (which is the root of the parse
tree), after parsing and SDT (if the input string is valid).
(Compared to PA4, in PA5, when ambiguities remain, pick one parse
tree arbitrarily but do NOT raise an exception. To ease grammar
debugging, you may want to print a diagnostic message when you encounter
two conflicting unresolved edges (for your own information), but
suppress the message before submitting.)
When your parser generator encounters an
error (such as getting a syntactically invalid input), you should print
an error to sys.stderr
and exit with a non-zero code. We have provided the function util.error()
for printing the error. Please see the util
module documentation
We have broken down the implementation of the second part of your
parser generator into the steps below. These steps are merely meant to
be a helpful guide; you are free to follow your own plan.
You can implement this part of the parser generator however you want.
You can add as many functions, classes, and methods as you wish to the
files from the starter kit. We only require that you don't change or
remove any of the classes or methods already in the starter kit. These
form your parser-generator API, which we will use in later projects. We
will also use that API to test your solution to this project.
Step 1: Reconstruct Parse Trees
We recommend you implement this as soon as possible. Seeing
reconstructed parse trees can make debugging the later stages much
easier. Along with reconstructing parse trees, you need to write code
that visualizes parse trees with graphviz. This will be useful for
debugging.
Our tree reconstruction took about 25 lines of code; printing parse trees took about 10. Hints:
- Reconstruct the parse tree (rooted at an edge) at the point
when you are inserting the edge into the graph. Note that you need to
reconstruct the parse tree before disambiguation of this edge with an
existing edge.
- All edges, not just complete edges, need to keep track of
children. As you move the dot from E+.E to E+E., the latter edge
receives a copy of references to children stored at the former edge.
The copy of children is expensive. Do it lazily by wrapping
the copy code in a lambda and invoking the lambda at the latest
convenient moment. No need to do this.
Step 2: Complete Disambiguation
You have already done operator precedence for PA4. Now that you have
the parse tree, you can do associativity easily. Please review the
detailed notes of PA4 on
disambiguation. Our core disambiguation logic required about 30 lines
of code. The associated bookkeeping added about another 20 lines.
Hint: Recall that you disambiguate between E->E1+E and E->E2-E by comparing how much input was derived from E1 vs from E2.
Step 3: Execute Semantic Actions
Start by hard-wiring actions into your generated parsers (e.g., E.action = lambda n1,n2: n1.val+n2.val
). Once your hard-coded SDT is correct, use the actual actions in the Grammar
objects. The given code from PA4 has already created the function bodies in the Grammar
objects for you. It is stored as actions
field in Production
object.
Semantic Actions
Synthesized Attributes
The following grammar specification is intended to evaluate simple arithmetic expressions:
%left '+'
%%
E -> E '+' E %{ return n1.val + n3.val %} // S
1
| /[0-9]+/ %{ return int (n1.val) %} // S
2
;
This specification contains two semantic actions: S1 and S2. They compute the translation (value) of the left-hand-side E from the values of E's children.
In the action for E -> E '+' E, we refer to the second right-hand-side E's value with n3.val because the '+' symbol also has a value, n2.val (see below).
For example, given the input 1+2+3, your parser would produce the parse tree and syntax-directed translation below:Grammar productions without actions get the default action:
%{ return n1.val %}
This convention is adopted from bison. As you might have
guessed, in semantic actions, the objects containing the attributes of
each production's symbols can be referred to by ni. The left-hand-side symbol is n0, and the right-hand-side symbols are n1, n2, etc., from left to right.
Action Creation and Execution
Each action (e.g., %{ return
n1.val %}) defines the body of a Python function. Each action must be
syntactically valid Python were it to appear as [BODY]
in the following statement:
def foo ():[BODY]
Thus, the action in this grammar is valid:
%%
S -> A B %{
if n1.val:
return 3
%} ;
...
but this is invalid:
%%
S -> A B %{ if n1.val:
return 3
%} ;
...
As shown in the examples above, actions can access the attributes of
all the symbols in the right-hand side of their productions. Actions
can also access the attributes of the left-hand side symbol. You will
pass these symbol-attribute objects as arguments to the Python function
created from the action. These function parameters must be named:
(n0) (n1) (n2) (n3)
L -> A B C ... ;
When you call action functions during SDT, each ni
argument should reference its parse-tree node's attribute object. So an action such as %{ n1 = 0 %}
only sets the action-function's local variable n1
to be 0; no parse-tree node's attribute object is modified, and n1's original reference to the parse-tree node is lost to the action.
Calling an action has the effect: n0.val = s_action (n0, n1, n2, ...)
.
In other words, your parser code that performs the SDT must set the
synthesized attribute of the LHS symbol's node to the value returned by
its action.
The attribute objects referenced by ni
must have at least one field: val
, set initially to None
. By convention, val
is a node's synthesized attribute.
Actions must execute in their own Python namespace, shared among all
actions. This namespace must contain all the modules declared by %import
in the grammar specification. We provide the util
module to assist you with creating this namespace and these Python functions. See the util
documentation for example code that creates a namespace and a function.
If an action body is syntactically invalid, Python will raise an
exception when you attempt to dynamically create a function from it. You
should treat this as an error in the grammar specification: call util.error()
with an error message. If an action raises an exception during SDT, you should also call util.error()
with an error message.
Consider this grammar:
E -> Integer '+' Integer %{ return ['+', n1.val, n3.val] %}
;
Integer -> /[0-9]+/ %{ return int(n1.val) %} ;
Let p
be a Production
object that represents E -> Integer '+' Integer
. You can execute the corresponding action body by calling p.actions(result,a,b)
, where result
, a
, and b
must be objects with val
field; essentially, result
, a
, and b
are passed to the action function as n0
, n1
, and n2
respectively.
If you've never used python feature for a calling function with a variable number of parameters, this article might be useful for you.
Epsilons and Semantic Actions
If you match an epsilon with an SDT rule, its value must be None. Consider a small grammar:
%%
S -> 'A' T;
T -> _ %{ print 'Epsilons are represented as ' + repr(n1.val) %};
It will print Epsilons are represented as None
, when parsing the string 'A'.
Deliverables
- Your complete parser, with SDT.
- Submit three test grammars (each with
semantic rules) and three inputs for each grammar. Name these folders
grammar1, grammar2, and grammar3.
Logistics
Starter-kit
You can fetch the starter-kit by running the following Mercurial commands:
- hg pull https://<your username>@/cs164_overlord/cs164sp13
- hg merge or hg update, whichever Mercurial tells you to run.
This will create in your repository a new PA5 directory containing
the starter-kit. Then, copy over the following files from your PA4
project
- grammar.py
- grammar_parser.py
- parser_generator.py
Finally, one can invoke the parser with
$ ./main.py test.grm input file
Output
Your parser must produce the following output
- Print the AST, as produced by the SDT rules, when there exists a parse tree. If there is an unresolved ambiguity, pick any parse tree.
- Print Error otherwise (i.e. when there is no parse tree).
Errata and Bug fixes
Once a bug is discovered in the starter-kit, we will publish a
corrected version on Bitbucket. Then, you can update your starter-kit
(and get the bugfix) with following commands:
- hg pull https://<your username>@/cs164_overlord/cs164sp13
- hg merge or hg update, whichever Mercurial tells you to run.
Submission
Autograder
We will use the same autograder as for the previous PAs. The same rules apply:
You have to make sure your interpreter plays nice with the autograder. Do not modify main.py,
or change the way the interpreter is invoked. Printing debugging
information will interfere with the autograder. Make sure you remove all
debugging print statements before submitting.
Using The Autograder With Your Own Tests
The structure of the tests directory has slightly changed. Now, each subfolder of tests
represent a language. Inside, the grammar is always named grammar.grm.
All the other files are input/output pairs. For example:
./tests/simple/grammar.grm
./tests/simple/1.in
./tests/simple/1.out
./tests/simple/2.in
./tests/simple/2.out
This defines a language simple, whose grammar is grammar.grm with 2 input/output pairs.
Using Our Remote Parser as Reference
You can query our remote parser using rparse.py as follow:
./rparse.py -g your_grammar.grm < file_containing_input_to_parse
./rparse.py -g your_grammar.grm "
input to parse"
The Submission Itself
You will submit by tagging the revision you want us to grade with the following tag: "PA5_SUBMIT". Case matters. To do so, please follow the instructions below:
- Make sure your interpreter plays nice with the autograder by running ./grader.py. Make sure you pass all the tests. To do so, all debugging print statements must be removed.
- Please don't forget to
add all your files to the repository (use hg add). This includes the
parser code, the grammars, and the test files.
- Commit with "hg commit -m message_here"
- Add the tag with the command "hg tag PA5_SUBMIT"
- Publish the project on the server with "hg push"
- Go to https:///<username>/cs164sp13/changesets
You should see the tag in the timeline. If you do, you are all set.
If you have discovered a last minute bug, you can remove the tag as follows:
- Run "hg tag --remove PA5_SUBMIT"
- Publish with "hg push"
Then you can keep working on the project and add the tag again when you are ready to submit.