Format (pretty-print) SQL¶
This page is a step-by-step recipe for using the SQL formatter that ships with General SQL Parser .NET. If you just want to take messy SQL text and pretty-print it, copy the 20-line minimum below and you're done. Read further only when you need to customise the output.
TL;DR — the 20-line minimum¶
Save this file as Program.cs, install the package, run. That's it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | |
Project setup:
1 2 3 4 5 | |
Output:
1 2 3 4 5 6 7 8 | |
That's the whole thing. Three using statements, two object instantiations, one method call. The rest of this page is "how to make the output look the way you want".
What each line does¶
The formatter has three moving parts:
| Type | What it is | Why you need it |
|---|---|---|
TGSqlParser |
The parser. Reads SQL text, builds an AST. | The formatter walks the AST, not the source text — so the SQL has to parse first. |
GFmtOpt |
Style options bag. ~80 fields controlling indentation, casing, line breaks, etc. | Lets you tune the output. The defaults match the screenshot above. |
FormatterFactory.pp(parser, option) |
The formatter itself. | Returns the formatted SQL as a string. |
You always need all three. There is no shorter API.
Why is the namespace pp?
The formatter started life as a pretty printer, so the namespace and the method pp(...) both keep the abbreviation. Mentally read it as "pretty-print".
Reading from a file instead of a string¶
Swap sqltext for sqlfilename:
1 2 3 4 5 6 7 8 9 10 11 | |
The parser handles UTF-8 / UTF-16 with or without BOM automatically. Don't pre-load the file yourself; let the parser do it.
Writing the output back to a file¶
The formatter returns a plain string. Use any of the standard .NET file APIs:
1 | |
Or if you want to overwrite the original:
1 | |
Picking the right dialect¶
The first argument to TGSqlParser is one of the EDbVendor enum values:
| Dialect | EDbVendor value |
|---|---|
| Oracle (incl. PL/SQL) | EDbVendor.dbvoracle |
| SQL Server (T-SQL) | EDbVendor.dbvmssql |
| MySQL | EDbVendor.dbvmysql |
| PostgreSQL | EDbVendor.dbvpostgresql |
| DB2 | EDbVendor.dbvdb2 |
| Snowflake | EDbVendor.dbvsnowflake |
| Teradata | EDbVendor.dbvteradata |
| Redshift | EDbVendor.dbvredshift |
| Greenplum | EDbVendor.dbvgreenplum |
| Hive | EDbVendor.dbvhive |
| Impala | EDbVendor.dbvimpala |
| Sybase ASE | EDbVendor.dbvsybase |
| Informix | EDbVendor.dbvinformix |
| Netezza | EDbVendor.dbvnetezza |
| MDX (Analysis Services) | EDbVendor.dbvmdx |
| MS Access | EDbVendor.dbvaccess (alias for MSSQL) |
If your script doesn't parse with the vendor you picked, that's almost always the wrong vendor. The error message will look like state:NNNN(10102) near: SOMETOKEN(line,col) — pick a different EDbVendor and try again.
Common customisation recipes¶
All customisation goes through GFmtOpt. Create one with GFmtOptFactory.newInstance(), change the fields you care about, then pass it to FormatterFactory.pp(...).
Recipe 1 — keywords lower-case¶
1 2 3 | |
1 2 3 4 | |
The four casing options that ship with the formatter:
| Value | Effect |
|---|---|
TCaseOption.CoUppercase |
SELECT |
TCaseOption.CoLowercase |
select |
TCaseOption.CoInitCap |
Select |
TCaseOption.CoNoChange |
leave it as-is |
You can apply each option independently to:
Field on GFmtOpt |
Affects |
|---|---|
caseKeywords |
SELECT, FROM, WHERE, JOIN, … |
caseIdentifier |
unquoted table/column names |
caseQuotedIdentifier |
identifiers wrapped in "..." or [...] |
caseFuncname |
function calls like count, sum |
caseDatatype |
VARCHAR, INT, NUMBER, … |
Recipe 2 — indent with 4 spaces (or with tabs)¶
1 2 3 4 | |
Recipe 3 — comma at the start of the line, not the end¶
1 2 | |
1 2 3 4 | |
The two options are LfAfterComma (default) and LfBeforeComma. Same field exists for selectFromclauseComma, parametersComma, and defaultCommaOption (which catches everything else).
Recipe 4 — wrap columns onto one line instead of stacking¶
1 2 3 | |
1 2 3 4 | |
Recipe 5 — line up AND / OR under WHERE¶
1 2 | |
1 2 3 | |
Recipe 6 — tighter spacing inside expressions¶
1 2 3 | |
Recipe 7 — multi-statement scripts¶
You don't need to do anything special. If sqltext contains five SELECTs separated by ;, the formatter formats all five and concatenates them with one blank line between, in the same order.
1 2 3 4 5 6 7 8 | |
Output:
1 2 3 4 5 6 7 8 9 10 | |
If one statement in the script has a syntax error, the formatter still emits the others — but parser.parse() returns non-zero. Check parser.SyntaxErrors to see exactly which statement failed.
Recipe 8 — round-trip a folder of .sql files¶
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
Note that we create a new TGSqlParser per file. The parser holds the AST of the last parsed script; reusing one for multiple files is more error-prone than just constructing a fresh one.
Full reference: every field on GFmtOpt¶
There are about 80 settable fields on GFmtOpt. The ones you'll touch most often are listed above. The complete list lives in the source — open pp/para/GFmtOpt.cs in the package or browse the API reference.
Categories:
| Category | Field name patterns | Examples |
|---|---|---|
| Casing | case* |
caseKeywords, caseIdentifier, caseFuncname |
| Column-list layout | selectColumnlist* |
selectColumnlistStyle, selectColumnlistComma |
| FROM-clause layout | selectFromclause* |
selectFromclauseJoinOnInNewline, alignJoinWithFromKeyword |
| INSERT layout | insertColumnlist*, insertValuelist* |
|
| CREATE TABLE layout | createtable*, beStyleCreatetable* |
|
| Indent | indentLen, useTab, tabSize |
|
| Begin/end blocks | beStyleBlock*, beStyleFunctionBody* |
|
| Whitespace padding | wsPadding* |
wsPaddingOperatorArithmetic, wsPaddingParenthesesInFunction |
| CTE / WITH | cte* |
cteNewlineBeforeAs |
| CASE / WHEN | caseWhenThenInSameLine, indentCase* |
|
| Stored-procedure parameters | parameters* |
parametersStyle, parametersComma |
| Empty-line policy | *EmptyLines* |
controlled via the TEmptyLinesOption enum |
If you can't find the right knob, pick the closest section above, open GFmtOpt.cs in your editor, and search for the related word — the field names are descriptive.
Troubleshooting¶
parser.parse() returns non-zero¶
The formatter never runs if the parse fails. Check:
- Did you pick the right
EDbVendor? T-SQLtop 10, Oracleconnect by, MySQLLIMIT 10,20are all dialect-specific — using the wrong vendor will fail with a "near token" error. - Does the script use a non-standard
delimiter? MySQL stored procedures often usedelimiter $$; the parser handles this automatically only whenEDbVendor.dbvmysqlis selected. - Is there really a syntax error in the source? Read
parser.Errormessagefirst andparser.SyntaxErrorssecond — the latter has line/column numbers.
Output is empty¶
You called FormatterFactory.pp(parser, option) before parser.parse(). Always parse first. The formatter walks the AST that parse() builds — without that AST it has nothing to walk.
Output looks identical to input¶
You didn't pass the option, or you passed null. FormatterFactory.pp(parser, null) is a no-op shortcut that returns the original SQL untouched. Always pass GFmtOptFactory.newInstance() (or your own customised GFmtOpt).
Comments disappeared¶
Block comments (/* ... */) and line comments (-- ...) are preserved. If they vanished, you probably modified the AST in between parse() and pp(...) — modifying the AST after parsing can disconnect comment tokens from the nodes that owned them. Format first, then modify, not the other way around.
"How do I emit HTML or RTF output, like in the old Java version?"¶
The .NET edition only emits plain text. The Java edition's HTML/RTF rendering depended on System.Drawing and WPF, which don't exist on Linux/Mac. For coloured HTML output, pipe the formatter's plain-text output through any external syntax highlighter (Pygments, Prism, highlight.js, …).
See also¶
- Quick start — covers parsing, formatting, error handling end-to-end
- Tutorials → Basic SQL parsing — for the parser side
- API Reference — full DocFX-generated reference for
GFmtOpt,FormatterFactory,GFmtOptFactory