Parser Design Principles¶
The General SQL Parser .NET library is built around a handful of design principles that have stayed stable across the library's lifetime. Understanding them helps you reason about why the API looks the way it does, and what to expect from a release.
1. Per-vendor accuracy over universal coverage¶
Every supported dialect has its own lexer and parser, generated from a vendor-specific .l/.y grammar source. There is no "ANSI core + dialect overlay" — Oracle SQL, T-SQL, and Snowflake SQL are entirely separate grammars.
The trade-off:
- Pro: vendor-specific syntax (
CONNECT BY,TOP,QUALIFY) parses with the same accuracy as ANSI SQL. There's no second-class citizen. - Pro: a parse error always tells you whether the SQL is valid for the target dialect, not "valid SQL in the abstract".
- Con: ~3× more grammar to maintain than a single-grammar parser. Mitigated by sharing tooling (
gsp_dotnet_parser/) and a regression suite that runs every grammar against thousands of fixture files.
2. Strongly-typed AST¶
Every SQL element gets its own class. There is no generic Node { Type, Children } — there's TSelectSqlStatement, TFunctionCall, TJoinExpr, ... 200+ classes total, organised by responsibility under nodes/, stmt/, and the vendor-specific subnamespaces.
This means consumer code that needs to walk an AST can do so via type tests and pattern-matching rather than string-matching node-type IDs:
1 2 | |
The trade-off is that the type surface is large. The DocFX API reference is the canonical map.
3. Backwards compatibility¶
Public APIs (classes, methods, property names, enum members) are stable across minor and patch releases. We don't rename TGSqlParser.sqltext or move classes between namespaces without a deprecation cycle.
This costs us some long-term cleanup but is essential for consumers building on GSP — most use cases involve traversing a deeply-nested AST, and any name change ripples through their code.
4. Self-contained binary¶
The parser does not depend on any third-party NuGet packages and does not connect to any database. The grammar tables for all 15 supported dialects ship as embedded resources inside the assembly.
This is a deliberate choice that supports air-gapped, regulated, or restricted environments — the assembly is the entire deliverable.
5. Round-tripping over re-implementing¶
The parser is paired with a script writer (scriptWriter/) that emits SQL from a (possibly modified) AST. We optimise for the common case where consumers want to modify SQL (rename columns, swap dialects, redact constants) rather than just analyse it.
This is why visitors are a first-class citizen, why every node has a String representation, and why the AST is mutable in place.
6. Per-vendor isolation in the codebase¶
The most recent architectural refactor split per-vendor logic out of the central TGSqlParser into per-vendor delegate classes (parser/T<V>SqlParser.cs) and per-vendor command tables (sqlcmds/TSqlCmds<V>.cs). The goal: changing how Oracle parses a WITH clause should never affect MSSQL, and adding a new vendor should not require editing the central dispatcher.
This is the same principle applied to per-database conditional compilation (<ItemGroup Condition="'$(includeOracle)' == 'true'">) — vendors are decoupled from each other at the build level too.
7. Errors are recoverable¶
A syntax error in one statement of a multi-statement script does not abort the others. The parser collects errors into SyntaxErrors and continues. For interactive tools (linters, IDE plugins), EnablePartialParsing = true builds a best-effort AST around the broken statement.
8. Cross-platform .NET¶
The library multi-targets net10.0 (recommended — LTS) and netstandard2.0 (compatibility facade for .NET Framework 4.6.1+, Mono, Unity, Xamarin). We do not target netstandard2.1, netcoreapp*, or any of the EOL net5–9.0 runtimes.
See also¶
- Software Architecture for the component-level view
- Database Compatibility for vendor coverage details