This documentation explains the technical decision made in building a SQL parser in Go. It is used to parsed the extended SELECT syntax of SQL that integrates TensorFlow Estimators.
In 2001, when I was in graduate school, I defined an extended SQL syntax for
querying distributed relational databases, as part of the course project of
Distributed RDBMS by Prof. Li-zhu Zhou. I wrote the parser using
modern version of
flex (a modern version of
lex generate C code;
flex generate C++ code. However, this time, I’d use Go.
standard Go tool. The usage is very similar to that of
However, the Go toolchain doesn’t provide a tool like
golex, which is out of
Some documents, including this one recommends handwriting lexers. However, it doesn’t explain how to write the parser.
GoAcademy always provides high-quality tech blog posts. This
one is from the
author of InfluxDB. However, I
stopped at where it explains wrapping a SQL statement as a string by an
io.Reader, because it is obvious that we should keep the string as a string so
that that token strings could refer to the same memory storage of the SQL
Following a link in the above GoAcademy post, I found Rob Pike’s excellent talk on how to write a lexer in Go in 2011. Many works after that change Rob’s implementation somehow but always lead to longer and less comprehensible codebases.
Therefore, I wrote the lexer and parser both following Rob Pike’s idea. After few days work, I realized that:
- I should borrow the idea from Rob to represent SQL statements as strings, but
io.Readeras other work do,
- but no need to use channels and goroutines at all, and
- it is technically intractable to write a SQL lexer/parser manually.
So, I switched to write a lexer manually, and to generate the parser using
goyacc. During my work, I referred to this
manual for details
about operator association and precedence.