Skip to content

Walk the parse tree

This page is a recipe book for traversing the AST that TGSqlParser.parse() produces. Each recipe is a self-contained, copy-pasteable example that you can adapt. If you want the conceptual picture first — what the AST is, how nodes are wired, what TParseTreeVisitor actually guarantees — read Parse-tree internals first.

The two ways to walk

Every recipe on this page picks one of two strategies:

Strategy When to use it
Visitor pattern (TParseTreeVisitor) You want to find or process every node of some type anywhere in the tree (every function call, every column reference, every join). Override preVisit(SomeType); the framework dispatches by node type.
Manual recursion You know the exact path you want to walk (the FROM clause's first table's join condition's left side). Just access typed properties: stmt.joins.getJoin(0).LeftTable.

Visitors win for "find all X". Manual recursion wins for "decode this specific structure". Most non-trivial walks mix both.

Setup boilerplate (used by every recipe)

All recipes assume this scaffolding around them. Replace the SQL and the dialect to taste:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
using System;
using gudusoft.gsqlparser;
using gudusoft.gsqlparser.nodes;
using gudusoft.gsqlparser.stmt;

var parser = new TGSqlParser(EDbVendor.dbvoracle);
parser.sqltext = "select * from employees where salary > 50000";

if (parser.parse() != 0)
{
    Console.WriteLine(parser.Errormessage);
    return;
}

for (int i = 0; i < parser.sqlstatements.size(); i++)
{
    TCustomSqlStatement stmt = parser.sqlstatements.get(i);
    // ... your walk goes here ...
}

If your script has more than one statement separated by ;, the loop visits each one in turn. Most recipes below show what to do inside the loop.

Recipe 1 — Print every function call in a script

Classic use case: find every COUNT, SUM, LENGTH, MY_CUSTOM_FUNC etc. that appears anywhere — projections, WHERE clauses, ORDER BY, function arguments, everywhere.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
class FunctionFinder : TParseTreeVisitor
{
    public override void preVisit(TFunctionCall node)
    {
        Console.WriteLine($"  function: {node.FunctionName} at line {node.startToken.lineNo}");
    }
}

var finder = new FunctionFinder();
foreach (TCustomSqlStatement stmt in EnumerateStatements(parser))
{
    Console.WriteLine($"--- {stmt.sqlstatementtype} ---");
    stmt.acceptChildren(finder);   // walk the statement's whole subtree
}

static System.Collections.Generic.IEnumerable<TCustomSqlStatement> EnumerateStatements(TGSqlParser p)
{
    for (int i = 0; i < p.sqlstatements.size(); i++) yield return p.sqlstatements.get(i);
}

For SQL like:

1
2
3
4
SELECT count(*), max(salary)
FROM employees
WHERE upper(name) LIKE 'A%'
ORDER BY length(department);

You get:

1
2
3
4
5
--- sstselect ---
  function: count at line 1
  function: max at line 1
  function: upper at line 3
  function: length at line 4

Why acceptChildren, not accept? accept calls preVisit on the node itself, then on its children. acceptChildren skips the self-visit. Since TSelectSqlStatement is not a TFunctionCall, the difference is invisible here — but if you start at a node that is the type you're filtering for, acceptChildren avoids visiting it twice.

Recipe 2 — List every table referenced (any statement type)

TCustomSqlStatement exposes .tables, a TTableList populated for SELECT, INSERT, UPDATE, DELETE, and MERGE. Each entry is a TTable with a TableName (a TObjectName — schema.table.alias).

1
2
3
4
5
6
7
8
9
foreach (TCustomSqlStatement stmt in EnumerateStatements(parser))
{
    if (stmt.tables == null) continue;
    for (int i = 0; i < stmt.tables.size(); i++)
    {
        TTable table = stmt.tables.getTable(i);
        Console.WriteLine($"{stmt.sqlstatementtype}: {table.TableName}");
    }
}

For:

1
2
INSERT INTO audit_log SELECT id, name FROM employees JOIN departments ON ...;
UPDATE employees SET salary = salary * 1.1 WHERE department_id = 10;

You get:

1
2
3
4
sstinsert: audit_log
sstinsert: employees
sstinsert: departments
sstupdate: employees

This is the "right answer" for nine out of ten cases that look like "give me all tables". For sub-queries, CTEs, and lateral joins, .tables flattens to the same list — so you don't need to recurse manually.

Recipe 3 — List every column referenced, with its qualifying table

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
class ColumnFinder : TParseTreeVisitor
{
    public override void preVisit(TObjectName node)
    {
        // TObjectName is "thing that looks like a name": columns, tables, functions, etc.
        // Filter to column references via the ObjectType field.
        if (node.ObjectType == EDbObjectType.column)
        {
            Console.WriteLine($"  column: {node}  (table: {node.TableString})");
        }
    }
}

foreach (TCustomSqlStatement stmt in EnumerateStatements(parser))
{
    stmt.acceptChildren(new ColumnFinder());
}

For:

1
2
3
SELECT e.first_name, e.last_name, d.name AS dept
FROM employees e JOIN departments d ON e.dept_id = d.id
WHERE e.salary > 50000;

You get:

1
2
3
4
5
6
  column: e.first_name  (table: e)
  column: e.last_name  (table: e)
  column: d.name  (table: d)
  column: e.dept_id  (table: e)
  column: d.id  (table: d)
  column: e.salary  (table: e)

TObjectName is the universal "named thing" node. ObjectType tells you what kind: column, table, function, schema, database, index, etc. Filtering on it is how you say "I want columns specifically".

Recipe 4 — Walk the SELECT list, the FROM list, and the WHERE clause separately

When you know the statement is a SELECT and want clause-by-clause access:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
foreach (TCustomSqlStatement stmt in EnumerateStatements(parser))
{
    if (stmt is not TSelectSqlStatement select) continue;

    // SELECT list
    if (select.ResultColumnList != null)
    {
        Console.WriteLine("Projections:");
        for (int i = 0; i < select.ResultColumnList.size(); i++)
        {
            TResultColumn col = select.ResultColumnList.getResultColumn(i);
            Console.WriteLine($"  {col.Expr}  (alias: {col.AliasClause?.AliasName})");
        }
    }

    // FROM list
    if (select.joins != null)
    {
        Console.WriteLine("From-list joins:");
        for (int i = 0; i < select.joins.size(); i++)
        {
            TJoin join = select.joins.getJoin(i);
            Console.WriteLine($"  table: {join.Table?.TableName}");
            if (join.JoinItems != null)
            {
                for (int j = 0; j < join.JoinItems.size(); j++)
                {
                    TJoinItem item = join.JoinItems.getJoinItem(j);
                    Console.WriteLine($"    {item.JoinType} -> {item.Table?.TableName}  ON {item.OnCondition}");
                }
            }
        }
    }

    // WHERE clause
    if (select.WhereClause != null)
    {
        Console.WriteLine($"WHERE: {select.WhereClause.Condition}");
    }
}

The TSelectSqlStatement shape continues with GroupByClause, HavingClause (under GroupByClause), OrderbyClause, LimitClause, FetchFirstClause, WindowClause, ForUpdateClause, etc. Hover over select. in your IDE to see the full surface.

Recipe 5 — Find every WHERE clause, anywhere (incl. subqueries)

Subqueries inside SELECT, INSERT, UPDATE, DELETE all carry their own TWhereClause nodes. Walk the tree once with a visitor:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class WhereFinder : TParseTreeVisitor
{
    public override void preVisit(TWhereClause node)
    {
        Console.WriteLine($"WHERE at line {node.startToken.lineNo}: {node.Condition}");
    }
}

foreach (TCustomSqlStatement stmt in EnumerateStatements(parser))
{
    stmt.acceptChildren(new WhereFinder());
}

For:

1
2
3
4
SELECT * FROM employees
WHERE department_id IN (
    SELECT id FROM departments WHERE budget > 1000000
);

You get two lines — the outer WHERE and the inner one — because the visitor descends into the subquery automatically.

Recipe 6 — Walk an expression tree (using IExpressionVisitor)

When you already hold a TExpression (a WHERE condition, a CASE WHEN, a function argument) and want to enumerate its operands:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
using gudusoft.gsqlparser.nodes;

class ExpressionWalker : IExpressionVisitor
{
    public bool exprVisit(TParseTreeNode pNode, bool isLeafNode)
    {
        string kind = isLeafNode ? "leaf" : "operator/function";
        Console.WriteLine($"  {kind}: {pNode}");
        return true;   // return false to abort the walk
    }
}

if (select.WhereClause?.Condition is TExpression cond)
{
    cond.inOrderTraverse(new ExpressionWalker());
}

inOrderTraverse does an in-order walk (left-operand → operator → right-operand). For salary > 50000 AND department_id = 10 you get five visits: salary, >, 50000, AND, then the right side recursively.

IExpressionVisitor is the right tool when the granularity you care about is operators and operands, not "every kind of node". Use TParseTreeVisitor instead when the answer is "every kind of node".

Recipe 7 — Track ancestor context with a visitor stack

TParseTreeNode has no parent pointer. If you need to know "what node am I inside right now", maintain a stack of your own:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
using System.Collections.Generic;

class ContextualVisitor : TParseTreeVisitor
{
    private readonly Stack<TParseTreeNode> ancestors = new();

    public override void preVisit(TFunctionCall node)
    {
        // ancestors[0] is the closest enclosing node we've already entered.
        // Search up the stack for the enclosing statement type, for example.
        TCustomSqlStatement enclosing = null;
        foreach (var n in ancestors)
        {
            if (n is TCustomSqlStatement s) { enclosing = s; break; }
        }
        Console.WriteLine($"function {node.FunctionName} inside {enclosing?.sqlstatementtype}");
    }

    public override void preVisit(TSelectSqlStatement node)   { ancestors.Push(node); }
    public override void postVisit(TSelectSqlStatement node)  { ancestors.Pop(); }
    public override void preVisit(TInsertSqlStatement node)   { ancestors.Push(node); }
    public override void postVisit(TInsertSqlStatement node)  { ancestors.Pop(); }
    public override void preVisit(TUpdateSqlStatement node)   { ancestors.Push(node); }
    public override void postVisit(TUpdateSqlStatement node)  { ancestors.Pop(); }
    public override void preVisit(TDeleteSqlStatement node)   { ancestors.Push(node); }
    public override void postVisit(TDeleteSqlStatement node)  { ancestors.Pop(); }
}

Pattern: override preVisit(T) to push, postVisit(T) to pop, on every node-type whose context you care about. Override the inner preVisit(...) you actually want to act on, and read the stack.

This is also how you find the depth of nesting (ancestors.Count), reconstruct the path (string.Join(" → ", ancestors.Select(a => a.GetType().Name))), or detect "is this column in a subquery?".

Recipe 8 — Stop walking early

TParseTreeVisitor has no built-in "abort" mechanism. If you want to short-circuit, throw a private exception and catch it at the call site:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
class FoundIt : Exception { public TFunctionCall Node; public FoundIt(TFunctionCall n) { Node = n; } }

class FirstFunctionFinder : TParseTreeVisitor
{
    public override void preVisit(TFunctionCall node) => throw new FoundIt(node);
}

try
{
    stmt.acceptChildren(new FirstFunctionFinder());
}
catch (FoundIt found)
{
    Console.WriteLine($"first function: {found.Node.FunctionName}");
}

IExpressionVisitor, by contrast, supports early-out natively — exprVisit returning false stops the walk.

Recipe 9 — Modify the AST and re-emit SQL

Tree walks aren't read-only. Mutate any TObjectName.toString (column/table reference) or TSourceToken.astext (raw token), then ask the script writer to re-emit:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
using gudusoft.gsqlparser.scriptWriter;

class TableRenamer : TParseTreeVisitor
{
    public override void preVisit(TObjectName node)
    {
        if (node.ObjectType == EDbObjectType.table && node.ToString() == "employees")
        {
            node.setObjectString("workforce");   // mutate the AST
        }
    }
}

foreach (TCustomSqlStatement stmt in EnumerateStatements(parser))
{
    stmt.acceptChildren(new TableRenamer());
}

var writer = new TScriptGenerator();
string rewritten = writer.generateScript(parser.sqlstatements);
Console.WriteLine(rewritten);

TScriptGenerator walks the (now-modified) tree and emits the SQL back out. This is the engine behind the tableColumnRename, removeColumn, and removeCondition demos.

For source-preserving rewrites (i.e. you want comments and exact whitespace preserved), prefer mutating the underlying startToken.astext of the affected node and re-emitting the token list rather than going through TScriptGenerator — see Parse-tree internals → source positions.

Recipe 10 — Custom XML / JSON dump of the whole tree

When you want every node, in nesting order, with its type and source range:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class TreePrinter : TParseTreeVisitor
{
    private int depth = 0;
    public override void preVisit(TParseTreeNode node)
    {
        // This overload is called for every node where no more-specific overload matched.
        Print(node);
        depth++;
    }
    public override void postVisit(TParseTreeNode node) => depth--;

    private void Print(TParseTreeNode node)
    {
        var indent = new string(' ', depth * 2);
        var range  = $"L{node.startToken?.lineNo}C{node.startToken?.columnNo}";
        Console.WriteLine($"{indent}{node.GetType().Name}  [{range}]");
    }
}

foreach (TCustomSqlStatement stmt in EnumerateStatements(parser))
{
    stmt.accept(new TreePrinter());
}

In practice you'll override the specific node types you want to format differently (function calls, expressions, identifiers) and let the catch-all preVisit(TParseTreeNode) handle the rest. A worked-out version of this pattern lives in gsp_demo_dotnet/src/demos/lib/xmlVisitor.cs.

Common pitfalls

Forgetting that null is a valid value for any clause

Every clause property — WhereClause, OrderbyClause, joins, tables — can be null when the statement doesn't have that clause. Always null-check before dereferencing. (This applies to manual recursion only; visitors skip null fields automatically.)

Calling accept instead of acceptChildren at the top of the walk

If you start at a node that is itself the type you're filtering for, accept will visit it once for itself and then descend; if you only want descendants, use acceptChildren. A common bug: a stat counter that double-counts the root.

Modifying the AST during a walk and getting confused output

Don't mutate the same node in preVisit and read it in postVisit of the same class. The visitor sees its own intermediate state. Either do all reads first (collect into a list, then mutate after the walk completes), or pick preVisit xor postVisit and stick with one.

Confusing TObjectName with a string

TObjectName is a node, not a string. It has .ToString() for printing and .toString() (lowercase, legacy compat) for the same — but it also has .ObjectType, .TableString, .SchemaString, .DatabaseString, etc. Always use the typed accessors when you need parts; ToString() only when you want the printed form.

Visitor methods not firing

If you override preVisit(SomeType) and it never fires, you probably picked the wrong base class. C# overload resolution dispatches by static type — but the visitor framework dispatches by runtime type via overloads. If you override preVisit(TExpression), that overload fires for every node whose runtime type is exactly TExpression. Subclasses (e.g. TFunctionCall) fire their own overloads if they exist, not the parent's. Override the most specific type that matches your need.

See also