KDevelop/KDevelop-PG-Qt Development Guide
This article is about working on KDevelop-PG-Qt itself, for an introduction to KDevelop-PG-Qt see Development/KDevelop-PG-Qt Introduction, but it might also be useful for advanced KDevelop-PG-Qt users.
Ideas behind KDevelop-PG-Qt
A Recursive Descent Parser
The idea behind the generated parser is call "Recursive Descent". There is exactly one function for every symbol. The parser will try to replace the start-symbol with other symbols, the process ends with a terminal used in a rule. It is some kind of depth-first-search, for every sub-symbol the parse will invoke the corresponding function. More precisely, KDevelop-PG-Qt generates a LL(1) parser, that means it looks at exactly 1 token to decide, which sub-symbols to choose.
Inject your own hand-written code
A strength of KDevelop-PG-Qt is to be able to "inject" hand-written code nearly everywhere. That is why developers of KDevelop-PG-Qt should take care about the following aspects:
- When implementing new constructs in the grammar try to make it possible to use custom code where it could make sense.
- Allow the user to understand the generated code and to forecast the output: When you ignore this advise it will not be possible for the user to inject his code quickly.
- Do not manipulate the rules, this would be against the last point. E.g. it would not be good to shift symbols and tokens to resolve conflicts automatically without asking the user. For the user it would not be obvious any longer where his code will be inserted.
- Do not change the names of functions and variables in the generated classes.
- Even AST's etc. sometimes need custom code.
There is also another aspect: Grammar-constructs are better than C++-constructs, feel free to replace common C++-code with KDevelop-PG-Qt-constructs!
KDevelop-PG-Qt's generated sources follow the visitor pattern. That is the focus of it: It should be possible to visit the parse-tree more than once and it should be simple to implement visitors.
Recursive Descent Parsers have a serious problem: Imagine operators in C++: They have as many as 18 different priorities (from "," to "::"). A RDP would theoretically (there are certainly some tricks to reduce it for C++) need 18 different symbols (from "CommaExpression" to "Name") and every expression would be parsed by invoking 18 functions down to the terminal, even if the expression would be a plain numeric-literal. But this is maybe the smaller problem: The simple number will also need 18 AST-Nodes. Each list node would need a type-identifier, the start-position, the end-position, a pointer to a list of the one and only child and the ListNode, on a 64-bit system you would need 48 bytes for each node. The PHP plugin also introduces a DUContext-pointer, this means 56 bytes per node, that is a lot ;). But there is an alternative: Walk through the tokens and construct the tree beginning on the bottom by adding each token to the tree in the right way. The tree will grow but some leaps will stay, that is why it is called "Bottom-Up Parsing". I am currently implementing support for this kind of parsing for arithmetic expressions.
That does not mean that parsers should use signals and slots and templates are forbidden, but KDevelop-PG-Qt was designed to be easily integratable into Qt. So it should use Qt's classes like QString and QList.
Structure of the code
We plan to make a 0.9 release in the next few weeks, but we do not know more than that.
Ideas for the future
- Complete Bottom-Up Parsing (Difficulty: high, priority: high, in progress)
- Add a short-form for (expression|0) (Difficulty: low, priority: nice to have) DONE
- Complete the alias-support (Difficulty: ?, probably medium, priority: low)
- Make it more OOPish, remove global variables (Difficulty: high, priority: ask milian)
- Use QList etc. instead of ListNodes, but take care about compatibility, it should be optional (Difficulty: low, priority: nice to have, it is for optimization)
- Use object-oriented Bison (Difficulty: low, priority: low)
- Use forward-declarations in parser.h (Difficulty: low, priority: low)
- Bootstrapping (Difficulty: high, priority: low/ask milian)
- tree2tree translators, e.g. make it easy to create DUChain-stuff (Difficulty: high, priority: low)
- Support other languages than C++, see AntLR (Difficulty: high, priority: low, would need more OOP)
- Resolve conflicts automatically (but ask the user!), maybe give hints how to resolve them (Difficulty: medium/high, priority: low)
- Use unions, be careful, should be optional (Difficulty: medium, priority: medium, it is for optimization and you could save a lot of memory, there is already an algorithm to compute the perfect layout in svn, see https://barney.cs.uni-potsdam.de/mailman/private/kdevelop-devel/2010-April/037096.html for some details for implementation)
- A KDevelop-plugin (Difficulty: medium, priority: nice to have, maybe it would be much easier with more OOP)
- Implement a tokenizer (Difficulty: medium/high, priority: low)
- Implement --symbol-text (like --token-text) (Difficulty: low, priority: would be good for debugging)
- Default-implementation for expectedToken and expectedSymbol (would require --symbol-text, difficulty: low, priority: medium)
- LALR-parsing (Difficulty: ultra-high, priority: low)
- AST forward-declarations (Difficulty: low, priority: nice to have for optimizations)
- Check if files really changed (Difficulty: low, priority: nice to have for optimizations)