Parsing SQL Statements
JDBC and the Limits of ResultSet Metadata
For my work in the area of data citation, I need to analyse queries, which are used for creating subsets. I am particularly interested in query parameters, sortings and filters. One of the most commonly used query languages is SQL, which is used by many relational database management systems such as MySQL. In some cases, the interaction with databases is abstract, meaning that there is hardly any SQL statements executed directly. The SQL statements are rather built on the fly by object relational mappers such as Hibernate. Other scenarios use SQL statements as String and also prepared statements, which are executed via JDBC. However, analysing SQL statements is tricky as the language is very flexible.
In order to understand what columns have been selected, it is sufficient to utilise the ResultSet Metadata and retrieve the column names from there. In my case I need to extract this imformation from the query in advance and potentially enforce a specific sorting by adding columns to the ORDER BY clause. In this scenario, I need to parse the SQL statement and retrieve this information from the statement itself. Probably the best way to do this would be to implement a parser for the SQL dialect with ANTLR (ANother Tool for Language Recognition). But this is quite a challenge, so I decided to take a shortcut: FoundationDB.
The FoundationDB Parser
FoundationDB was a NoSQL database which provided several layers for supporting different paradigms at once. I am using past tense here, because the project got acquired by Apple in 2015 and since then does pursue the open source project any more. However, the Maven libraries for the software are still available at Maven Central. FoundationDB uses its own SQL parser, which understands standard SQL queries. These queries can be interpreted as a tree and the parser library allows traversing SQL statements and analyse the nodes. We can use this tree to parse and interpret SQL statements and extract additional information.
The Foundations of FoundationDB
The FoundationDB parser can be included into your own project with the following Maven dependency:
<dependency>
<groupId>com.foundationdb</groupId>
<artifactId>fdb-sql-parser</artifactId>
<version>1.6.1</version>
</dependency>
The usage of the parser is straight forward. We use the following example SQL statement as input:
FROM tableA AS a, tableB AS b
WHERE a.firstColumn = b.secondColumn AND
b.thirdColumn < 5
ORDER BY a.thirdColumn,a.secondColumn DESC
The following function calls the parser and prints the tree of the statement.
/*
* Print a SQL statement
* @param sqlString
*/
public void parseSQLString(String sqlString) {
Parser parser = new Parser();
StatementNode stmt;
try {
stmt = this.parser.parseStatement(sqlString);
stmt.treePrint();
} catch (StandardException e) {
e.printStackTrace();
}
}
The resulting tree is listed below. The statement has also been normalized, which ensures a stable sequence of the parameters.
name: null
updateMode: UNSPECIFIED
statementType: SELECT
resultSet:
com.foundationdb.sql.parser.SelectNode@21b8d17c
isDistinct: false
resultColumns:
com.foundationdb.sql.parser.ResultColumnList@6433a2
[0]:
com.foundationdb.sql.parser.ResultColumn@5910e440
exposedName: firstcolumn
name: firstcolumn
tableName: null
isDefaultColumn: false
type: null
expression:
com.foundationdb.sql.parser.ColumnReference@6267c3bb
columnName: firstcolumn
tableName: a
type: null
[1]:
com.foundationdb.sql.parser.ResultColumn@533ddba
exposedName: secondcolumn
name: secondcolumn
tableName: null
isDefaultColumn: false
type: null
expression:
com.foundationdb.sql.parser.ColumnReference@246b179d
columnName: secondcolumn
tableName: b
type: null
[2]:
com.foundationdb.sql.parser.ResultColumn@7a07c5b4
exposedName: thirdcolumn
name: thirdcolumn
tableName: null
isDefaultColumn: false
type: null
expression:
com.foundationdb.sql.parser.ColumnReference@26a1ab54
columnName: thirdcolumn
tableName: b
type: null
fromList:
com.foundationdb.sql.parser.FromList@3d646c37
[0]:
com.foundationdb.sql.parser.FromBaseTable@41cf53f9
tableName: tablea
updateOrDelete: null
null
correlation Name: a
a
[1]:
com.foundationdb.sql.parser.FromBaseTable@5a10411
tableName: tableb
updateOrDelete: null
null
correlation Name: b
b
whereClause:
com.foundationdb.sql.parser.AndNode@2ef1e4fa
operator: and
methodName: and
type: null
leftOperand:
com.foundationdb.sql.parser.BinaryRelationalOperatorNode@306a30c7
operator: =
methodName: equals
type: null
leftOperand:
com.foundationdb.sql.parser.ColumnReference@b81eda8
columnName: firstcolumn
tableName: a
type: null
rightOperand:
com.foundationdb.sql.parser.ColumnReference@68de145
columnName: secondcolumn
tableName: b
type: null
rightOperand:
com.foundationdb.sql.parser.BinaryRelationalOperatorNode@27fa135a
operator: <
methodName: lessThan
type: null
leftOperand:
com.foundationdb.sql.parser.ColumnReference@46f7f36a
columnName: thirdcolumn
tableName: b
type: null
rightOperand:
com.foundationdb.sql.parser.NumericConstantNode@421faab1
value: 5
type: INTEGER NOT NULL
orderByList:
com.foundationdb.sql.parser.OrderByList@2b71fc7e
allAscending: false
[0]:
com.foundationdb.sql.parser.OrderByColumn@5ce65a89
ascending: true
nullsOrderedLow: false
columnPosition: -1
expression:
com.foundationdb.sql.parser.ColumnReference@25f38edc
columnName: thirdcolumn
tableName: a
type: null
[1]:
com.foundationdb.sql.parser.OrderByColumn@1a86f2f1
ascending: false
nullsOrderedLow: false
columnPosition: -1
expression:
com.foundationdb.sql.parser.ColumnReference@3eb07fd3
columnName: secondcolumn
tableName: a
type: null
This tree offers a lot of information, which can be used programmatically as well. In the top of the output, we can see that the statement was a SELECT statement and that it was not DISTINCT. Then follows the ResultSet, which contains a list of the three ResultColumns, which have been specified in the SELECT clause. We can see the column names and the table names from which they are drawn. The next block provides the referenced tables (the FROM list) and their alias names. The WHERE – block contains the operands which have been used for filtering and last but not least, there is the list of ORDER BY clauses and their sorting directions.
The Visitor
In order to access the information shown above programmatically, we need to access the content of the node one by one. This can be achieved with the visitor pattern, which traverses all the nodes of the tree. The following listing shows how the visitor pattern can be used for accessing the list of columns from the SELECT clause.
**
* Return a list of columns of a SELECT clause
* @param sql
* @return
*/
public ArrayList selectColumnsList(String sql){
SQLParser parser = new SQLParser();
BooleanNormalizer normalizer = new BooleanNormalizer(parser);
StatementNode stmt = null;
try {
stmt = parser.parseStatement(sql);
stmt = normalizer.normalize(stmt);
} catch (StandardException e) {
e.printStackTrace();
}
final ArrayList<ResultColumn> columns = new ArrayList<ResultColumn>();
Visitor v = new Visitor() {
@Override
public boolean visitChildrenFirst(Visitable node) {
if (node instanceof SelectNode)
return true;
return false;
}
@Override
public Visitable visit(Visitable node) throws StandardException {
if (node instanceof ResultColumn) {
ResultColumn resultColumn = (ResultColumn) node;
columns.add(resultColumn);
System.out.println("Column " + columns.size()+ ": " + resultColumn.getName());
}
return null;
}
@Override
public boolean stopTraversal() {
// TODO Auto-generated method stub
return false;
}
@Override
public boolean skipChildren(Visitable node) throws StandardException {
if (node instanceof FromList) {
return true;
}
return false;
}
};
try {
stmt.accept(v);
} catch (StandardException e) {
e.printStackTrace();
}
return columns;
}
This code example, we define a visitor which traverses all the ResultColumn nodes. Every time the current node is an instance of ResultColumn, we add this node to our list of columns. The nodes are only visited, if they are children of a SELECT statement. This is our entry point into the tree. We leave the tree when we reach the FROM list. We then apply the visitor to the statement, which initiates the traversal. As a result, we receive a list of columns which have been used for the result set.
In order to get the list of ORDER BY columns, we can utilise a similar approach. The following functions gives an example:
/**
* Return list of order by clauses
* @param sqlText
* @return
*/
public OrderByList orderByColumns(String sqlText){
SQLParser parser = new SQLParser();
BooleanNormalizer normalizer = new BooleanNormalizer(parser);
StatementNode stmt;
OrderByList orderByList = null;
try {
stmt = parser.parseStatement(sqlText);
stmt = normalizer.normalize(stmt);
CursorNode node = (CursorNode) stmt;
orderByList = node.getOrderByList();
int i=0;
for(OrderByColumn orderByColumn : orderByList){
i++;
String direction;
if(orderByColumn.isAscending()){
direction="ASC";
}else{
direction="DESC";
}
System.out.println("ORDER BY Column " +i+ ": " +orderByColumn.getExpression().getColumnName()+ " Direction: " + direction );
}
} catch (StandardException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return orderByList;
}
This time, we retrieve the list of ORDER BY columns directly from the CurserNode. Similar principles can be used for manipulating SQL statements and apply a different sorting for instance.