Advanced Features¶
This tutorial covers the more powerful capabilities of General SQL Parser .NET — AST modification, custom visitors, the script writer, and performance tuning.
Objectives¶
- Modify a parsed AST and re-emit the SQL
- Walk the AST with custom visitors via
IExpressionVisitor - Use
TScriptGeneratorandTScriptWriterto round-trip SQL through the parser - Apply performance optimisation techniques for large SQL files
- Manage memory effectively for long-running parser processes
Prerequisites¶
- Completed all previous tutorials in the series
- Strong understanding of C# and SQL concepts
Time required¶
About 90 minutes.
1. AST modification + re-emission¶
TScriptGenerator regenerates SQL text from a (possibly modified) AST. This is the engine behind demos like removeColumn, tableColumnRename, and removeCondition.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | |
For a fully worked example see the tableColumnRename and removeColumn demos under gsp_demo_dotnet/src/demos/ — they walk through the whole modify-then-emit pipeline against real SQL.
2. Custom visitors¶
Implement IExpressionVisitor to walk the AST and react to specific node types. This is the cleanest way to extract metadata from parsed SQL.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | |
The visitor pattern composes well — write small visitors that each do one thing (collect tables, collect columns, collect functions) and run them in sequence.
3. The script-writer pipeline¶
TScriptGenerator traverses an AST and writes formatted SQL back out. This is what TGSqlParser.toScript() uses internally.
Workflow:
parser.parse()— build the AST from text.- Mutate the AST (rename tables, remove columns, swap operators).
TScriptWriter writer = new TScriptWriter();TScriptGenerator generator = new TScriptGenerator(writer);generator.generate(stmt);- Read
writer.ScriptTextfor the regenerated SQL.
The generator is dialect-aware and emits SQL in the same vendor flavour as the parser was constructed with.
4. Performance optimisation¶
Reuse parser instances¶
Construction is the slowest part because it loads embedded grammar tables. For batch workloads, reuse one parser per vendor:
1 2 3 4 5 6 7 | |
Parse statement-by-statement for large files¶
For multi-megabyte SQL scripts, split the file by ; (or the vendor's delimiter) and parse statements individually. This keeps peak memory bounded.
1 2 3 4 5 6 | |
Tokenise instead of parse when you only need lexical info¶
If you only need keywords / identifiers / literals — not a full AST — call tokenizeSqltext() instead. It skips the syntactic phase entirely.
1 2 3 4 | |
Disable PL/SQL parsing if you don't need it¶
Oracle's PL/SQL grammar is large. If you're only validating top-level DML and never see procedural blocks, the cost is negligible — but if you do, expect each BEGIN..END block to take an order of magnitude longer than a SELECT.
Multi-threading¶
Each TGSqlParser instance is not thread-safe. For parallel workloads, give each thread its own parser:
1 2 3 4 5 6 7 8 9 10 | |
See Performance Optimization for benchmarks and deeper guidance.
5. Memory management¶
The AST keeps references to source tokens for line/column tracking. After processing a statement, drop your reference to the parser so the AST can be collected:
1 2 3 4 5 6 7 8 9 10 | |
For very long-lived processes, consider GC.Collect() after major batches — the AST is large enough that it can sit in Gen2 longer than expected.
Summary¶
You learned how to:
- Modify a parsed AST and regenerate SQL via
TScriptGenerator/TScriptWriter - Walk the AST with custom
IExpressionVisitorimplementations - Reuse parser instances and parse statement-by-statement for performance
- Use
tokenizeSqltext()for lexical-only workloads - Manage memory in long-running parser processes
What's next?¶
- Explore How-to Guides for specific scenarios
- Read Software Architecture for the bigger picture
- Browse the API Reference for the full surface