Walk the parse tree¶
This page is a recipe book for traversing the AST that TGSqlParser.parse() produces. Each recipe is a self-contained, copy-pasteable example that you can adapt. If you want the conceptual picture first — what the AST is, how nodes are wired, what TParseTreeVisitor actually guarantees — read Parse-tree internals first.
The two ways to walk¶
Every recipe on this page picks one of two strategies:
| Strategy | When to use it |
|---|---|
Visitor pattern (TParseTreeVisitor) |
You want to find or process every node of some type anywhere in the tree (every function call, every column reference, every join). Override preVisit(SomeType); the framework dispatches by node type. |
| Manual recursion | You know the exact path you want to walk (the FROM clause's first table's join condition's left side). Just access typed properties: stmt.joins.getJoin(0).LeftTable. |
Visitors win for "find all X". Manual recursion wins for "decode this specific structure". Most non-trivial walks mix both.
Setup boilerplate (used by every recipe)¶
All recipes assume this scaffolding around them. Replace the SQL and the dialect to taste:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | |
If your script has more than one statement separated by ;, the loop visits each one in turn. Most recipes below show what to do inside the loop.
Recipe 1 — Print every function call in a script¶
Classic use case: find every COUNT, SUM, LENGTH, MY_CUSTOM_FUNC etc. that appears anywhere — projections, WHERE clauses, ORDER BY, function arguments, everywhere.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | |
For SQL like:
1 2 3 4 | |
You get:
1 2 3 4 5 | |
Why acceptChildren, not accept? accept calls preVisit on the node itself, then on its children. acceptChildren skips the self-visit. Since TSelectSqlStatement is not a TFunctionCall, the difference is invisible here — but if you start at a node that is the type you're filtering for, acceptChildren avoids visiting it twice.
Recipe 2 — List every table referenced (any statement type)¶
TCustomSqlStatement exposes .tables, a TTableList populated for SELECT, INSERT, UPDATE, DELETE, and MERGE. Each entry is a TTable with a TableName (a TObjectName — schema.table.alias).
1 2 3 4 5 6 7 8 9 | |
For:
1 2 | |
You get:
1 2 3 4 | |
This is the "right answer" for nine out of ten cases that look like "give me all tables". For sub-queries, CTEs, and lateral joins, .tables flattens to the same list — so you don't need to recurse manually.
Recipe 3 — List every column referenced, with its qualifying table¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
For:
1 2 3 | |
You get:
1 2 3 4 5 6 | |
TObjectName is the universal "named thing" node. ObjectType tells you what kind: column, table, function, schema, database, index, etc. Filtering on it is how you say "I want columns specifically".
Recipe 4 — Walk the SELECT list, the FROM list, and the WHERE clause separately¶
When you know the statement is a SELECT and want clause-by-clause access:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | |
The TSelectSqlStatement shape continues with GroupByClause, HavingClause (under GroupByClause), OrderbyClause, LimitClause, FetchFirstClause, WindowClause, ForUpdateClause, etc. Hover over select. in your IDE to see the full surface.
Recipe 5 — Find every WHERE clause, anywhere (incl. subqueries)¶
Subqueries inside SELECT, INSERT, UPDATE, DELETE all carry their own TWhereClause nodes. Walk the tree once with a visitor:
1 2 3 4 5 6 7 8 9 10 11 12 | |
For:
1 2 3 4 | |
You get two lines — the outer WHERE and the inner one — because the visitor descends into the subquery automatically.
Recipe 6 — Walk an expression tree (using IExpressionVisitor)¶
When you already hold a TExpression (a WHERE condition, a CASE WHEN, a function argument) and want to enumerate its operands:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
inOrderTraverse does an in-order walk (left-operand → operator → right-operand). For salary > 50000 AND department_id = 10 you get five visits: salary, >, 50000, AND, then the right side recursively.
IExpressionVisitor is the right tool when the granularity you care about is operators and operands, not "every kind of node". Use TParseTreeVisitor instead when the answer is "every kind of node".
Recipe 7 — Track ancestor context with a visitor stack¶
TParseTreeNode has no parent pointer. If you need to know "what node am I inside right now", maintain a stack of your own:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | |
Pattern: override preVisit(T) to push, postVisit(T) to pop, on every node-type whose context you care about. Override the inner preVisit(...) you actually want to act on, and read the stack.
This is also how you find the depth of nesting (ancestors.Count), reconstruct the path (string.Join(" → ", ancestors.Select(a => a.GetType().Name))), or detect "is this column in a subquery?".
Recipe 8 — Stop walking early¶
TParseTreeVisitor has no built-in "abort" mechanism. If you want to short-circuit, throw a private exception and catch it at the call site:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
IExpressionVisitor, by contrast, supports early-out natively — exprVisit returning false stops the walk.
Recipe 9 — Modify the AST and re-emit SQL¶
Tree walks aren't read-only. Mutate any TObjectName.toString (column/table reference) or TSourceToken.astext (raw token), then ask the script writer to re-emit:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | |
TScriptGenerator walks the (now-modified) tree and emits the SQL back out. This is the engine behind the tableColumnRename, removeColumn, and removeCondition demos.
For source-preserving rewrites (i.e. you want comments and exact whitespace preserved), prefer mutating the underlying startToken.astext of the affected node and re-emitting the token list rather than going through TScriptGenerator — see Parse-tree internals → source positions.
Recipe 10 — Custom XML / JSON dump of the whole tree¶
When you want every node, in nesting order, with its type and source range:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | |
In practice you'll override the specific node types you want to format differently (function calls, expressions, identifiers) and let the catch-all preVisit(TParseTreeNode) handle the rest. A worked-out version of this pattern lives in gsp_demo_dotnet/src/demos/lib/xmlVisitor.cs.
Common pitfalls¶
Forgetting that null is a valid value for any clause¶
Every clause property — WhereClause, OrderbyClause, joins, tables — can be null when the statement doesn't have that clause. Always null-check before dereferencing. (This applies to manual recursion only; visitors skip null fields automatically.)
Calling accept instead of acceptChildren at the top of the walk¶
If you start at a node that is itself the type you're filtering for, accept will visit it once for itself and then descend; if you only want descendants, use acceptChildren. A common bug: a stat counter that double-counts the root.
Modifying the AST during a walk and getting confused output¶
Don't mutate the same node in preVisit and read it in postVisit of the same class. The visitor sees its own intermediate state. Either do all reads first (collect into a list, then mutate after the walk completes), or pick preVisit xor postVisit and stick with one.
Confusing TObjectName with a string¶
TObjectName is a node, not a string. It has .ToString() for printing and .toString() (lowercase, legacy compat) for the same — but it also has .ObjectType, .TableString, .SchemaString, .DatabaseString, etc. Always use the typed accessors when you need parts; ToString() only when you want the printed form.
Visitor methods not firing¶
If you override preVisit(SomeType) and it never fires, you probably picked the wrong base class. C# overload resolution dispatches by static type — but the visitor framework dispatches by runtime type via overloads. If you override preVisit(TExpression), that overload fires for every node whose runtime type is exactly TExpression. Subclasses (e.g. TFunctionCall) fire their own overloads if they exist, not the parent's. Override the most specific type that matches your need.
See also¶
- Parse-tree internals — the conceptual map this page builds on
- AST Tree Nodes Reference — the per-node catalog
- Tutorials → Advanced features — guided walkthroughs of AST modification and visitors
- The
gsp_demo_dotnet/src/demos/visitors/andgsp_demo_dotnet/src/demos/extractTableColumns/demos in the source repo — runnable, complete examples