Extended SQL Parser Design
This documentation explains the technical decision made in building a SQL parser in Go. It is used to parsed the extended SELECT syntax of SQL that integrates TensorFlow Estimators.
Lexer and Parser Generator
In 2001, when I was in graduate school, I defined an extended SQL syntax for
querying distributed relational databases, as part of the course project of
Distributed RDBMS by Prof. Li-zhu Zhou. I wrote the parser using
modern version of
flex (a modern version of
lex generate C code;
flex generate C++ code. However, this time, I’d use Go.
standard Go tool. The usage is very similar to that of
However, the Go toolchain doesn’t provide a tool like
golex, which is out of
Handwritten Lexer and Parser
Some documents, including this one recommends handwriting lexers. However, it doesn’t explain how to write the parser.
GoAcademy always provides high-quality tech blog posts. This
one is from the
author of InfluxDB. However, I
stopped at where it explains wrapping a SQL statement as a string by an
io.Reader, because it is obvious that we should keep the string as a string so
that that token strings could refer to the same memory storage of the SQL
Following a link in the above GoAcademy post, I found Rob Pike’s excellent talk on how to write a lexer in Go in 2011. Many works after that change Rob’s implementation somehow but always lead to longer and less comprehensible codebases.
Therefore, I wrote the lexer and parser both following Rob Pike’s idea. After few days work, I realized that:
- I should borrow the idea from Rob to represent SQL statements as strings, but
io.Readeras other work do,
- but no need to use channels and goroutines at all, and
- it is technically intractable to write a SQL lexer/parser manually.
So, I switched to write a lexer manually, and to generate the parser using
goyacc. During my work, I referred to this
manual for details
about operator association and precedence.