Performance Considerations¶
A practical guide to GSP .NET's performance characteristics — what's fast, what's slow, and what to do about it.
Typical numbers¶
On a recent x86_64 Linux machine running .NET 10:
| Workload | Time |
|---|---|
| Simple SELECT (one table, one WHERE) | < 1 ms |
| Complex SELECT (multi-join, CTE, window functions) | 1–10 ms |
| Stored procedure (a few hundred lines) | 10–100 ms |
| Tokenisation only | ~5–10× faster than full parsing |
Steady-state batch parsing (reusing one TGSqlParser) |
1000+ statements/sec for typical SELECTs |
These are order-of-magnitude figures — your mileage will vary based on SQL complexity, vendor (Oracle PL/SQL is the slowest), and CPU.
What's fast¶
- Tokenisation — running
tokenizeSqltext()skips parsing entirely. Use this when you only need keyword/identifier/literal info. - Re-parsing on the same
TGSqlParserinstance — once the grammar tables are loaded into memory, subsequentparse()calls are cheap. - ANSI DML on most vendors — SELECT/INSERT/UPDATE/DELETE without procedural blocks parse in well under a millisecond.
What's slow¶
- Parser construction — loading the embedded grammar tables takes hundreds of milliseconds the first time. Reuse instances.
- Oracle PL/SQL — the largest single grammar in the library. Procedural blocks are ~10× slower than equivalently-sized DML.
- Pathologically nested expressions — a 1000-deep nesting of binary operators can blow the parser stack and is slow even when it succeeds. Real SQL doesn't look like this.
- Multi-megabyte single statements — uncommon, but the AST scales linearly with statement size.
Tactics¶
Reuse parser instances¶
The single biggest performance win:
1 2 3 4 5 6 7 | |
Parse statement-by-statement for large files¶
For multi-MB scripts, split first and parse one statement at a time. This keeps peak memory bounded.
Tokenise when you don't need an AST¶
1 2 3 | |
Multi-threading¶
TGSqlParser is not thread-safe. For parallel workloads, give each thread its own instance, or pool them with ConcurrentBag<TGSqlParser>. See Performance Optimization for code samples.
Memory management¶
The AST holds references to the entire sourcetokenlist for line/column tracking. After processing a statement, drop the parser reference so the AST can be collected:
1 2 3 4 5 6 7 8 9 10 | |
For long-running services, an explicit GC.Collect() after major batches can help — the AST objects are large enough to land in Gen2 and stick.
When to suspect the parser¶
Most performance problems in code that uses GSP are not in the parser. Common culprits, in descending order:
- Allocations in your visitor code — building strings inside a
preVisithook that runs on every node. - String concatenation in a tight loop — use
StringBuilder. - Parser construction in a tight loop — see "Reuse parser instances" above.
- Re-parsing the same SQL repeatedly — cache the AST.
- PL/SQL parsing on workloads that don't actually need it — none of the production workloads we've seen require this.
If you've ruled all of those out and still see slow parsing, please file an issue with the offending SQL — there's almost certainly a grammar pessimisation we can fix.
Profiling¶
dotnet-counters monitor --process-id <pid> and dotnet-trace collect work fine against GSP-using processes. For micro-benchmarking, BenchmarkDotNet is the standard.