Skip to content

Format (pretty-print) SQL

This page is a step-by-step recipe for using the SQL formatter that ships with General SQL Parser .NET. If you just want to take messy SQL text and pretty-print it, copy the 20-line minimum below and you're done. Read further only when you need to customise the output.

TL;DR — the 20-line minimum

Save this file as Program.cs, install the package, run. That's it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
using System;
using gudusoft.gsqlparser;
using gudusoft.gsqlparser.pp.para;
using gudusoft.gsqlparser.pp.stmtformatter;

var sql = "select e.last_name as name, e.commission_pct comm, e.salary*12 \"Annual Salary\" from scott.employees as e where e.salary>1000 or 1=1 order by e.first_name,e.last_name;";

// 1. Parse the SQL (pick the dialect that matches your script)
var parser = new TGSqlParser(EDbVendor.dbvoracle);
parser.sqltext = sql;
if (parser.parse() != 0)
{
    Console.WriteLine(parser.Errormessage);
    return;
}

// 2. Format with default options
GFmtOpt option   = GFmtOptFactory.newInstance();
string formatted = FormatterFactory.pp(parser, option);

Console.WriteLine(formatted);

Project setup:

1
2
3
4
5
mkdir formatdemo && cd formatdemo
dotnet new console
dotnet add package gudusoft.gsqlparser
# paste the code above into Program.cs
dotnet run

Output:

1
2
3
4
5
6
7
8
SELECT e.last_name AS name,
       e.commission_pct comm,
       e.salary * 12 "Annual Salary"
  FROM scott.employees AS e
 WHERE e.salary > 1000
    OR 1 = 1
 ORDER BY e.first_name,
          e.last_name;

That's the whole thing. Three using statements, two object instantiations, one method call. The rest of this page is "how to make the output look the way you want".

What each line does

The formatter has three moving parts:

Type What it is Why you need it
TGSqlParser The parser. Reads SQL text, builds an AST. The formatter walks the AST, not the source text — so the SQL has to parse first.
GFmtOpt Style options bag. ~80 fields controlling indentation, casing, line breaks, etc. Lets you tune the output. The defaults match the screenshot above.
FormatterFactory.pp(parser, option) The formatter itself. Returns the formatted SQL as a string.

You always need all three. There is no shorter API.

Why is the namespace pp?

The formatter started life as a pretty printer, so the namespace and the method pp(...) both keep the abbreviation. Mentally read it as "pretty-print".

Reading from a file instead of a string

Swap sqltext for sqlfilename:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
var parser = new TGSqlParser(EDbVendor.dbvoracle);
parser.sqlfilename = @"C:\scripts\employees.sql";   // absolute or relative path

if (parser.parse() != 0)
{
    Console.WriteLine(parser.Errormessage);
    return;
}

string formatted = FormatterFactory.pp(parser, GFmtOptFactory.newInstance());
Console.WriteLine(formatted);

The parser handles UTF-8 / UTF-16 with or without BOM automatically. Don't pre-load the file yourself; let the parser do it.

Writing the output back to a file

The formatter returns a plain string. Use any of the standard .NET file APIs:

1
File.WriteAllText("employees-formatted.sql", formatted);

Or if you want to overwrite the original:

1
File.WriteAllText(parser.sqlfilename, formatted);

Picking the right dialect

The first argument to TGSqlParser is one of the EDbVendor enum values:

Dialect EDbVendor value
Oracle (incl. PL/SQL) EDbVendor.dbvoracle
SQL Server (T-SQL) EDbVendor.dbvmssql
MySQL EDbVendor.dbvmysql
PostgreSQL EDbVendor.dbvpostgresql
DB2 EDbVendor.dbvdb2
Snowflake EDbVendor.dbvsnowflake
Teradata EDbVendor.dbvteradata
Redshift EDbVendor.dbvredshift
Greenplum EDbVendor.dbvgreenplum
Hive EDbVendor.dbvhive
Impala EDbVendor.dbvimpala
Sybase ASE EDbVendor.dbvsybase
Informix EDbVendor.dbvinformix
Netezza EDbVendor.dbvnetezza
MDX (Analysis Services) EDbVendor.dbvmdx
MS Access EDbVendor.dbvaccess (alias for MSSQL)

If your script doesn't parse with the vendor you picked, that's almost always the wrong vendor. The error message will look like state:NNNN(10102) near: SOMETOKEN(line,col) — pick a different EDbVendor and try again.

Common customisation recipes

All customisation goes through GFmtOpt. Create one with GFmtOptFactory.newInstance(), change the fields you care about, then pass it to FormatterFactory.pp(...).

Recipe 1 — keywords lower-case

1
2
3
GFmtOpt option = GFmtOptFactory.newInstance();
option.caseKeywords = TCaseOption.CoLowercase;
string formatted = FormatterFactory.pp(parser, option);
1
2
3
4
select e.last_name as name,
       e.commission_pct comm
  from scott.employees as e
 where e.salary > 1000;

The four casing options that ship with the formatter:

Value Effect
TCaseOption.CoUppercase SELECT
TCaseOption.CoLowercase select
TCaseOption.CoInitCap Select
TCaseOption.CoNoChange leave it as-is

You can apply each option independently to:

Field on GFmtOpt Affects
caseKeywords SELECT, FROM, WHERE, JOIN, …
caseIdentifier unquoted table/column names
caseQuotedIdentifier identifiers wrapped in "..." or [...]
caseFuncname function calls like count, sum
caseDatatype VARCHAR, INT, NUMBER, …

Recipe 2 — indent with 4 spaces (or with tabs)

1
2
3
4
GFmtOpt option = GFmtOptFactory.newInstance();
option.indentLen = 4;
// option.useTab = true;   // uncomment to use real tabs instead of spaces
// option.tabSize = 4;     // visual width of a tab (only used when useTab = true)

Recipe 3 — comma at the start of the line, not the end

1
2
GFmtOpt option = GFmtOptFactory.newInstance();
option.selectColumnlistComma = TLinefeedsCommaOption.LfBeforeComma;
1
2
3
4
SELECT e.last_name AS name
     , e.commission_pct comm
     , e.salary * 12 "Annual Salary"
  FROM scott.employees AS e;

The two options are LfAfterComma (default) and LfBeforeComma. Same field exists for selectFromclauseComma, parametersComma, and defaultCommaOption (which catches everything else).

Recipe 4 — wrap columns onto one line instead of stacking

1
2
3
GFmtOpt option = GFmtOptFactory.newInstance();
option.selectColumnlistStyle = TAlignStyle.AsWrapped;   // default is AsStacked
option.selectFromclauseStyle = TAlignStyle.AsWrapped;
1
2
3
4
SELECT e.last_name AS name, e.commission_pct comm,
       e.salary * 12 "Annual Salary"
  FROM scott.employees AS e
 WHERE e.salary > 1000;

Recipe 5 — line up AND / OR under WHERE

1
2
GFmtOpt option = GFmtOptFactory.newInstance();
option.andOrUnderWhere = true;
1
2
3
 WHERE e.salary > 1000
   AND e.commission_pct IS NOT NULL
   AND e.last_name LIKE 'A%';

Recipe 6 — tighter spacing inside expressions

1
2
3
GFmtOpt option = GFmtOptFactory.newInstance();
option.wsPaddingOperatorArithmetic = false;        // e.salary*12 instead of e.salary * 12
option.wsPaddingParenthesesInExpression = false;   // (1+2) instead of ( 1 + 2 )

Recipe 7 — multi-statement scripts

You don't need to do anything special. If sqltext contains five SELECTs separated by ;, the formatter formats all five and concatenates them with one blank line between, in the same order.

1
2
3
4
5
6
7
8
parser.sqltext = """
    select id from t1;
    select name from t2 where id = 1;
    update t1 set name = 'x' where id = 2;
    """;

parser.parse();
string formatted = FormatterFactory.pp(parser, GFmtOptFactory.newInstance());

Output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
SELECT id
  FROM t1;

SELECT name
  FROM t2
 WHERE id = 1;

UPDATE t1
   SET name = 'x'
 WHERE id = 2;

If one statement in the script has a syntax error, the formatter still emits the others — but parser.parse() returns non-zero. Check parser.SyntaxErrors to see exactly which statement failed.

Recipe 8 — round-trip a folder of .sql files

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
foreach (var path in Directory.EnumerateFiles(@"C:\scripts", "*.sql", SearchOption.AllDirectories))
{
    var parser = new TGSqlParser(EDbVendor.dbvoracle) { sqlfilename = path };
    if (parser.parse() != 0)
    {
        Console.Error.WriteLine($"{path}: {parser.Errormessage}");
        continue;
    }

    string formatted = FormatterFactory.pp(parser, GFmtOptFactory.newInstance());
    File.WriteAllText(path, formatted);
    Console.WriteLine($"formatted {path}");
}

Note that we create a new TGSqlParser per file. The parser holds the AST of the last parsed script; reusing one for multiple files is more error-prone than just constructing a fresh one.

Full reference: every field on GFmtOpt

There are about 80 settable fields on GFmtOpt. The ones you'll touch most often are listed above. The complete list lives in the source — open pp/para/GFmtOpt.cs in the package or browse the API reference.

Categories:

Category Field name patterns Examples
Casing case* caseKeywords, caseIdentifier, caseFuncname
Column-list layout selectColumnlist* selectColumnlistStyle, selectColumnlistComma
FROM-clause layout selectFromclause* selectFromclauseJoinOnInNewline, alignJoinWithFromKeyword
INSERT layout insertColumnlist*, insertValuelist*
CREATE TABLE layout createtable*, beStyleCreatetable*
Indent indentLen, useTab, tabSize
Begin/end blocks beStyleBlock*, beStyleFunctionBody*
Whitespace padding wsPadding* wsPaddingOperatorArithmetic, wsPaddingParenthesesInFunction
CTE / WITH cte* cteNewlineBeforeAs
CASE / WHEN caseWhenThenInSameLine, indentCase*
Stored-procedure parameters parameters* parametersStyle, parametersComma
Empty-line policy *EmptyLines* controlled via the TEmptyLinesOption enum

If you can't find the right knob, pick the closest section above, open GFmtOpt.cs in your editor, and search for the related word — the field names are descriptive.

Troubleshooting

parser.parse() returns non-zero

The formatter never runs if the parse fails. Check:

  • Did you pick the right EDbVendor? T-SQL top 10, Oracle connect by, MySQL LIMIT 10,20 are all dialect-specific — using the wrong vendor will fail with a "near token" error.
  • Does the script use a non-standard delimiter? MySQL stored procedures often use delimiter $$; the parser handles this automatically only when EDbVendor.dbvmysql is selected.
  • Is there really a syntax error in the source? Read parser.Errormessage first and parser.SyntaxErrors second — the latter has line/column numbers.

Output is empty

You called FormatterFactory.pp(parser, option) before parser.parse(). Always parse first. The formatter walks the AST that parse() builds — without that AST it has nothing to walk.

Output looks identical to input

You didn't pass the option, or you passed null. FormatterFactory.pp(parser, null) is a no-op shortcut that returns the original SQL untouched. Always pass GFmtOptFactory.newInstance() (or your own customised GFmtOpt).

Comments disappeared

Block comments (/* ... */) and line comments (-- ...) are preserved. If they vanished, you probably modified the AST in between parse() and pp(...) — modifying the AST after parsing can disconnect comment tokens from the nodes that owned them. Format first, then modify, not the other way around.

"How do I emit HTML or RTF output, like in the old Java version?"

The .NET edition only emits plain text. The Java edition's HTML/RTF rendering depended on System.Drawing and WPF, which don't exist on Linux/Mac. For coloured HTML output, pipe the formatter's plain-text output through any external syntax highlighter (Pygments, Prism, highlight.js, …).

See also