Skip to content

Advanced Features

This tutorial covers the more powerful capabilities of General SQL Parser .NET — AST modification, custom visitors, the script writer, and performance tuning.

Objectives

  • Modify a parsed AST and re-emit the SQL
  • Walk the AST with custom visitors via IExpressionVisitor
  • Use TScriptGenerator and TScriptWriter to round-trip SQL through the parser
  • Apply performance optimisation techniques for large SQL files
  • Manage memory effectively for long-running parser processes

Prerequisites

  • Completed all previous tutorials in the series
  • Strong understanding of C# and SQL concepts

Time required

About 90 minutes.

1. AST modification + re-emission

TScriptGenerator regenerates SQL text from a (possibly modified) AST. This is the engine behind demos like removeColumn, tableColumnRename, and removeCondition.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
using gudusoft.gsqlparser;
using gudusoft.gsqlparser.nodes;
using gudusoft.gsqlparser.scriptWriter;
using gudusoft.gsqlparser.stmt;

var parser = new TGSqlParser(EDbVendor.dbvoracle);
parser.sqltext = "SELECT id, name, email FROM users WHERE active = 1";
parser.parse();

var select = (TSelectSqlStatement)parser.sqlstatements.get(0);

// Rename the table
foreach (TTable table in select.tables)
{
    if (table.TableName.ToString().Equals("users", StringComparison.OrdinalIgnoreCase))
    {
        // Use TScriptGeneratorVisitor / TScriptWriter to splice the rename
        // and re-emit the statement.
    }
}

var writer = new TScriptWriter();
var generator = new TScriptGenerator(writer);
generator.generate(select);
Console.WriteLine(writer.ScriptText);

For a fully worked example see the tableColumnRename and removeColumn demos under gsp_demo_dotnet/src/demos/ — they walk through the whole modify-then-emit pipeline against real SQL.

2. Custom visitors

Implement IExpressionVisitor to walk the AST and react to specific node types. This is the cleanest way to extract metadata from parsed SQL.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
using gudusoft.gsqlparser;
using gudusoft.gsqlparser.nodes;

public class TableCollector : TParseTreeVisitor
{
    public HashSet<string> Tables { get; } = new(StringComparer.OrdinalIgnoreCase);

    public override void preVisit(TTable table)
    {
        if (table.TableName != null)
            Tables.Add(table.TableName.ToString());
    }
}

// Usage
var parser = new TGSqlParser(EDbVendor.dbvoracle);
parser.sqltext = """
    SELECT u.id, p.title
    FROM   users u
    JOIN   posts p ON u.id = p.user_id
    WHERE  u.id IN (SELECT user_id FROM banned_users)
    """;
parser.parse();

var collector = new TableCollector();
parser.sqlstatements.get(0).acceptChildren(collector);

foreach (var t in collector.Tables)
    Console.WriteLine($"  - {t}");

The visitor pattern composes well — write small visitors that each do one thing (collect tables, collect columns, collect functions) and run them in sequence.

3. The script-writer pipeline

TScriptGenerator traverses an AST and writes formatted SQL back out. This is what TGSqlParser.toScript() uses internally.

Workflow:

  1. parser.parse() — build the AST from text.
  2. Mutate the AST (rename tables, remove columns, swap operators).
  3. TScriptWriter writer = new TScriptWriter();
  4. TScriptGenerator generator = new TScriptGenerator(writer);
  5. generator.generate(stmt);
  6. Read writer.ScriptText for the regenerated SQL.

The generator is dialect-aware and emits SQL in the same vendor flavour as the parser was constructed with.

4. Performance optimisation

Reuse parser instances

Construction is the slowest part because it loads embedded grammar tables. For batch workloads, reuse one parser per vendor:

1
2
3
4
5
6
7
var parser = new TGSqlParser(EDbVendor.dbvoracle);
foreach (string sql in sqlBatch)
{
    parser.sqltext = sql;
    if (parser.parse() != 0) continue;
    Process(parser.sqlstatements);
}

Parse statement-by-statement for large files

For multi-megabyte SQL scripts, split the file by ; (or the vendor's delimiter) and parse statements individually. This keeps peak memory bounded.

1
2
3
4
5
6
foreach (string statement in SplitStatements(largeFile))
{
    parser.sqltext = statement;
    if (parser.parse() == 0)
        Process(parser.sqlstatements.get(0));
}

Tokenise instead of parse when you only need lexical info

If you only need keywords / identifiers / literals — not a full AST — call tokenizeSqltext() instead. It skips the syntactic phase entirely.

1
2
3
4
parser.sqltext = sql;
parser.tokenizeSqltext();
foreach (TSourceToken tok in parser.sourcetokenlist)
    Console.WriteLine($"{tok.tokencode,-20} {tok.astext}");

Disable PL/SQL parsing if you don't need it

Oracle's PL/SQL grammar is large. If you're only validating top-level DML and never see procedural blocks, the cost is negligible — but if you do, expect each BEGIN..END block to take an order of magnitude longer than a SELECT.

Multi-threading

Each TGSqlParser instance is not thread-safe. For parallel workloads, give each thread its own parser:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
var sqls = new List<string> { /* ... */ };
Parallel.ForEach(sqls, () => new TGSqlParser(EDbVendor.dbvoracle),
    (sql, _, parser) =>
    {
        parser.sqltext = sql;
        parser.parse();
        Process(parser.sqlstatements);
        return parser;
    },
    _ => { });

See Performance Optimization for benchmarks and deeper guidance.

5. Memory management

The AST keeps references to source tokens for line/column tracking. After processing a statement, drop your reference to the parser so the AST can be collected:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
public static IEnumerable<MyResult> ProcessLargeFile(string path)
{
    foreach (string stmt in StreamStatementsFromFile(path))
    {
        var parser = new TGSqlParser(EDbVendor.dbvoracle) { sqltext = stmt };
        if (parser.parse() != 0) continue;
        yield return Process(parser.sqlstatements.get(0));
        // parser goes out of scope here; AST + tokens are eligible for GC
    }
}

For very long-lived processes, consider GC.Collect() after major batches — the AST is large enough that it can sit in Gen2 longer than expected.

Summary

You learned how to:

  • Modify a parsed AST and regenerate SQL via TScriptGenerator/TScriptWriter
  • Walk the AST with custom IExpressionVisitor implementations
  • Reuse parser instances and parse statement-by-statement for performance
  • Use tokenizeSqltext() for lexical-only workloads
  • Manage memory in long-running parser processes

What's next?