|
@@ -0,0 +1,496 @@
|
|
|
+#+Title: Test Java Parser in Common Lisp with Esrap
|
|
|
+#+AUTHOR: Sam Flint
|
|
|
+#+EMAIL: swflint@flintfam.org
|
|
|
+#+DATE: \today
|
|
|
+#+INFOJS_OPT: view:info toc:nil path:http://flintfam.org/org-info.js
|
|
|
+#+OPTIONS: toc:nil H:5 ':t *:t
|
|
|
+#+PROPERTY: noweb no-export
|
|
|
+#+PROPERTY: comments noweb
|
|
|
+#+LATEX_HEADER: \usepackage[color]{showkeys}
|
|
|
+#+LATEX_HEADER: \parskip=5pt
|
|
|
+#+LATEX_HEADER: \lstset{texcl=true,breaklines=true,columns=fullflexible,frame=lines,literate={lambda}{$\lambda$}{1} {set}{$\gets$}1 {setq}{$\gets$}1 {setf}{$\gets$}1 {<=}{$\leq$}1 {>=}{$\geq$}1}
|
|
|
+
|
|
|
+#+BEGIN_abstract
|
|
|
+This document presents a personal exercise in both Common Lisp and the
|
|
|
+use of Context-free grammars. It's one of those just-for-fun
|
|
|
+projects, but will eventually be able to be used for both code metrics
|
|
|
+and even potentially code generation and manipulation.
|
|
|
+
|
|
|
+The parser is implemented using the Packrat Parsing technique, which
|
|
|
+provides LL$(k)$ parsing in linear time.
|
|
|
+#+END_abstract
|
|
|
+
|
|
|
+#+TOC: headlines 3
|
|
|
+#+TOC: listings
|
|
|
+
|
|
|
+* WORKING The Parser [16%]
|
|
|
+ To build this parser, I'm using a somewhat newer parsing technique
|
|
|
+ to do this called "Packrat Parsing". Packrat Parsing is an
|
|
|
+ unlimited look-ahead parsing technique based on Parser Expression
|
|
|
+ Grammars optimized for use in functional languages. To achieve
|
|
|
+ this, I'm using the excellent =ESRAP= parsing library for Common
|
|
|
+ Lisp.
|
|
|
+
|
|
|
+** DONE Common Rules
|
|
|
+ To effectively represent the programming language, there are just a
|
|
|
+ few housekeeping rules to allow us to parse. Many of them are
|
|
|
+ fairly simple, defining things like names and strings.
|
|
|
+
|
|
|
+ #+CAPTION: Common Rules
|
|
|
+ #+Name: common-rules
|
|
|
+ #+BEGIN_SRC lisp
|
|
|
+ <<string-related-rules>>
|
|
|
+
|
|
|
+ <<number-related-rules>>
|
|
|
+
|
|
|
+ <<identifiers>>
|
|
|
+
|
|
|
+ <<misc-rules>>
|
|
|
+ #+END_SRC
|
|
|
+
|
|
|
+*** DONE String Related
|
|
|
+ To accurately parse strings, we need to provide two rules, a rule
|
|
|
+ that provides string characters, and a second rule that defines a
|
|
|
+ string. Aside from these, one needs a function to ensure that
|
|
|
+ certain characters are not part of strings.
|
|
|
+
|
|
|
+#+CAPTION: String Related Generic Rules
|
|
|
+#+Name: string-related-rules
|
|
|
+#+BEGIN_SRC lisp
|
|
|
+ (defun not-doublequote (character)
|
|
|
+ (not (eql #\" character)))
|
|
|
+
|
|
|
+ (defrule string-chars
|
|
|
+ (or (not-doublequote character)
|
|
|
+ (and #\\ #\")))
|
|
|
+
|
|
|
+ (defrule string
|
|
|
+ (and #\" (* string-chars) #\")
|
|
|
+ (:destructure (q1 string q2)
|
|
|
+ (declare (ignore q1 q2))
|
|
|
+ (list :string (text string))))
|
|
|
+#+END_SRC
|
|
|
+*** DONE Number Related
|
|
|
+ Because of the nature of computers, there is the frequent need for
|
|
|
+ numeric literals. Numeric literals can, however be difficult to
|
|
|
+ parse, especially if you focus on type verification. To obviate
|
|
|
+ the need for type verification, it generalizes to allow for all
|
|
|
+ base-10 numeric literals, parsing a sign, whole part, fractional
|
|
|
+ part and signed exponent part.
|
|
|
+
|
|
|
+#+CAPTION: Number Related Parsing Rules
|
|
|
+#+Name: number-related-rules
|
|
|
+#+BEGIN_SRC lisp
|
|
|
+ (defrule number
|
|
|
+ (and (? (or "-" "+"))
|
|
|
+ (+ (or "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"))
|
|
|
+ (? (and "."
|
|
|
+ (+ (or "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"))))
|
|
|
+ (? (and (or "e" "E")
|
|
|
+ (? "-")
|
|
|
+ (+ (or "0" "1" "2" "3" "4" "5" "6" "7" "8" "9")))))
|
|
|
+ (:lambda (list)
|
|
|
+ (list :number (text list))))
|
|
|
+#+END_SRC
|
|
|
+*** DONE Identifiers
|
|
|
+ All modern programming languages use identifiers for the names of
|
|
|
+ methods, classes, functions and variables. The Java definition is
|
|
|
+ a string of alphanumeric characters starting with any
|
|
|
+ alphanumeric, the =_= or the =$= characters with the =_= or =.=
|
|
|
+ interspersed in as appropriate. Rather than simply parse as a
|
|
|
+ string, the rule will split the identifier at the =.= and return a
|
|
|
+ list of all the parts of the identifier.
|
|
|
+
|
|
|
+#+CAPTION: Identifiers
|
|
|
+#+Name: identifiers
|
|
|
+#+BEGIN_SRC lisp
|
|
|
+ (defrule identifier
|
|
|
+ (and (or (alphanumericp character) "_" "$")
|
|
|
+ (+ (or (alphanumericp character) "_" ".")))
|
|
|
+ (:text t)
|
|
|
+ (:lambda (string)
|
|
|
+ (concatenate 'list
|
|
|
+ (list :identifier)
|
|
|
+ (split-sequence #\. string))))
|
|
|
+#+END_SRC
|
|
|
+*** DONE Type Signatures
|
|
|
+
|
|
|
+*** DONE Type Formats
|
|
|
+
|
|
|
+*** DONE Miscellaneous Rules
|
|
|
+ There are a few miscellaneous rules, consisting of things like
|
|
|
+ spaces and non-integers. They are both fairly simple, and provide
|
|
|
+ a way to work easily.
|
|
|
+
|
|
|
+#+CAPTION: Miscellaneous Rules
|
|
|
+#+Name: misc-rules
|
|
|
+#+BEGIN_SRC lisp
|
|
|
+ (defun not-integer (string)
|
|
|
+ (when (find-if-not #'digit-char-p string)
|
|
|
+ t))
|
|
|
+
|
|
|
+ (defrule spaces
|
|
|
+ (+ (or #\Space #\Tab #\Newline))
|
|
|
+ (:constant nil))
|
|
|
+
|
|
|
+ (defrule package-declaration)
|
|
|
+#+END_SRC
|
|
|
+*** TODO Comments
|
|
|
+ foo
|
|
|
+
|
|
|
+#+CAPTION: Comments
|
|
|
+#+Name: comments
|
|
|
+#+BEGIN_SRC lisp
|
|
|
+ (defrule comment
|
|
|
+ (or (and "//" (* (or (alphanumericp character) "_" "." "+" "\"" "'" "," "." "<" ">")))
|
|
|
+ (and "/*" (* (or (alphanumpericp character) "_" "." "+" "\"" "'" "," "<" ">"))))
|
|
|
+ (:constant nil))
|
|
|
+#+END_SRC
|
|
|
+** WORKING Class Rules [80%]
|
|
|
+ Given the fact that Java is an inherently object-oriented language,
|
|
|
+ and in fact forces it on users, it requires class definitions for
|
|
|
+ everything.
|
|
|
+
|
|
|
+ #+CAPTION: Class Rules
|
|
|
+ #+Name: class-rules
|
|
|
+ #+BEGIN_SRC lisp
|
|
|
+ <<class-modifiers>>
|
|
|
+
|
|
|
+ <<super-class>>
|
|
|
+
|
|
|
+ <<interface-types>>
|
|
|
+
|
|
|
+ <<class-body>>
|
|
|
+
|
|
|
+ <<class-declaration>>
|
|
|
+ #+END_SRC
|
|
|
+
|
|
|
+*** DONE Class Modifiers
|
|
|
+ Java has the ability to set classes as having certain attributes,
|
|
|
+ these attributes are set using class modifiers. These are
|
|
|
+ =public=, =abstract= and =final=. However, these may occur with
|
|
|
+ just one or two though it may occur with all three, and to my
|
|
|
+ understanding, they are order independent. Rather than parsing as
|
|
|
+ a single string, we instead return a de-duplicated list of the
|
|
|
+ modifiers as a keyword list.
|
|
|
+
|
|
|
+#+CAPTION: Class Modifiers
|
|
|
+#+Name: class-modifiers
|
|
|
+#+BEGIN_SRC lisp
|
|
|
+ (defrule class-modifier
|
|
|
+ (or "public" "abstract" "final"))
|
|
|
+
|
|
|
+ (defrule class-modifiers
|
|
|
+ (* (and class-modifier (? spaces)))
|
|
|
+ (:lambda (modifier-list)
|
|
|
+ (let ((mods (map 'list #'(lambda (mod)
|
|
|
+ (intern (string-upcase (car mod)) "KEYWORD"))
|
|
|
+ modifier-list)))
|
|
|
+ (remove-duplicates mods))))
|
|
|
+#+END_SRC
|
|
|
+*** DONE Super Class
|
|
|
+ Because Java is an object oriented language, the use of classes is
|
|
|
+ implicit, and with the use of classes comes inheritance; to take
|
|
|
+ advantage of inheritance, you need to be able to signify the class
|
|
|
+ to inherit from. The purpose of this rule is to signify the super
|
|
|
+ class.
|
|
|
+
|
|
|
+#+CAPTION: Super Class
|
|
|
+#+Name: super-class
|
|
|
+#+BEGIN_SRC lisp
|
|
|
+ (defrule super-class
|
|
|
+ (and "extends" (? spaces) identifier)
|
|
|
+ (:lambda (list)
|
|
|
+ (cddr list)))
|
|
|
+#+END_SRC
|
|
|
+*** DONE Interface Types
|
|
|
+ Java supports the concept of interfaces and allows the programmer
|
|
|
+ to declare them in the class declaration. The possible
|
|
|
+ declarations are the different types, things like =byte=, =long=
|
|
|
+ and =boolean=.
|
|
|
+
|
|
|
+#+CAPTION: Interface Types
|
|
|
+#+Name: interface-types
|
|
|
+#+BEGIN_SRC lisp
|
|
|
+ (defrule interface-types
|
|
|
+ (or "byte" "short" "int" "long" "char" "float" "double" "boolean"))
|
|
|
+
|
|
|
+ (defrule interfaces
|
|
|
+ (and "implements" spaces (* (and interface-types (? (and (? spaces) "," (? spaces))))))
|
|
|
+ (:destructure (interstring space interfaces)
|
|
|
+ (declare (ignore interstring space))
|
|
|
+ (let ((interlist (map 'list
|
|
|
+ #'(lambda (interface-spec)
|
|
|
+ (intern (string-upcase (car interface-spec))
|
|
|
+ "KEYWORD"))
|
|
|
+ interfaces)))
|
|
|
+ (remove-duplicates interlist))))
|
|
|
+#+END_SRC
|
|
|
+*** WORKING Class Body
|
|
|
+ The class body contains a set of method definitions, variable
|
|
|
+ declarations and the like.
|
|
|
+
|
|
|
+#+CAPTION: Class Body
|
|
|
+#+Name: class-body
|
|
|
+#+BEGIN_SRC lisp
|
|
|
+ (defrule class-body
|
|
|
+ (* (or (alphanumericp character) "_" "." ";" spaces)))
|
|
|
+#+END_SRC
|
|
|
+*** DONE Class Declaration
|
|
|
+ To define a class in Java, you write a class declaration, which
|
|
|
+ has optional [[Class Modifiers][Class Modifiers]], the word =class=, an [[Identifiers][Identifier]], a
|
|
|
+ [[Super Class][Super Class]] declaration, an [[Interface Types][Interface Type]] list, an opening brace,
|
|
|
+ a [[Class Body][Class Body]] and a closing brace.
|
|
|
+
|
|
|
+#+CAPTION: Class Declaration
|
|
|
+#+Name: class-declaration
|
|
|
+#+BEGIN_SRC lisp
|
|
|
+ (defrule class
|
|
|
+ (and (? class-modifiers)
|
|
|
+ (? spaces)
|
|
|
+ "class"
|
|
|
+ (? spaces)
|
|
|
+ identifier
|
|
|
+ (? spaces)
|
|
|
+ (? super-class)
|
|
|
+ (? spaces)
|
|
|
+ (? interfaces)
|
|
|
+ (? spaces)
|
|
|
+ "{"
|
|
|
+ (? spaces)
|
|
|
+ class-body
|
|
|
+ (? spaces)
|
|
|
+ "}")
|
|
|
+ (:destructure (modifiers
|
|
|
+ s1
|
|
|
+ class
|
|
|
+ s2
|
|
|
+ class-name
|
|
|
+ s3
|
|
|
+ superclass
|
|
|
+ s4
|
|
|
+ interfaces
|
|
|
+ s5
|
|
|
+ o-brace
|
|
|
+ s6
|
|
|
+ body
|
|
|
+ s7
|
|
|
+ c-brace)
|
|
|
+ (declare (ignore s1 class s2 s3 s4 s5 o-brace s6 s7 c-brace))
|
|
|
+ (list :modifiers modifiers
|
|
|
+ :class class-name
|
|
|
+ :superclass superclass
|
|
|
+ :interfaces interfaces
|
|
|
+ :body body)))
|
|
|
+#+END_SRC
|
|
|
+** WORKING Method Rules [0%]
|
|
|
+ foo
|
|
|
+
|
|
|
+#+CAPTION: Method Rules
|
|
|
+#+Name: method-rules
|
|
|
+#+BEGIN_SRC lisp
|
|
|
+ <<method-modifiers>>
|
|
|
+
|
|
|
+ <<result-type>>
|
|
|
+
|
|
|
+ <<formal-parameter-list>>
|
|
|
+
|
|
|
+ <<throwing-errors>>
|
|
|
+
|
|
|
+ <<method-declaration>>
|
|
|
+#+END_SRC
|
|
|
+
|
|
|
+*** TODO Method Modifiers
|
|
|
+ foo
|
|
|
+
|
|
|
+#+CAPTION: Method Modifiers
|
|
|
+#+Name: method-modifiers
|
|
|
+#+BEGIN_SRC lisp
|
|
|
+ (defrule method-modifier
|
|
|
+ (or "public" "protected" "private" "static" "abstract" "final" "synchonized" "native"))
|
|
|
+
|
|
|
+ (defrule method-modifiers
|
|
|
+ (* (and method-modifier (? spaces)))
|
|
|
+ (:lambda (method-modifier-list)
|
|
|
+ (let ((mods (map 'list #'(lambda (modifier)
|
|
|
+ (intern (string-upcase (car modifier))
|
|
|
+ "KEYWORD"))
|
|
|
+ method-modifier-list)))
|
|
|
+ (remove-duplicates mods))))
|
|
|
+#+END_SRC
|
|
|
+*** TODO Result Type
|
|
|
+ foo
|
|
|
+
|
|
|
+#+CAPTION: Result Type
|
|
|
+#+Name: result-type
|
|
|
+#+BEGIN_SRC lisp
|
|
|
+ (defrule result-type
|
|
|
+ (or interface-types "void")
|
|
|
+ (:lambda (type)
|
|
|
+ (intern (string-upcase type) "KEYWORD")))
|
|
|
+#+END_SRC
|
|
|
+*** TODO Formal Parameter List
|
|
|
+ foo
|
|
|
+
|
|
|
+#+CAPTION: Formal Parameter List
|
|
|
+#+Name: formal-parameter-list
|
|
|
+#+BEGIN_SRC lisp
|
|
|
+ (defrule formal-param
|
|
|
+ (and interface-types (? spaces) identifier)
|
|
|
+ (:destructure (type space ident)
|
|
|
+ (declare (ignore space))
|
|
|
+ (list (intern (string-upcase type)
|
|
|
+ "KEYWORD")
|
|
|
+ ident)))
|
|
|
+
|
|
|
+ (defrule formal-param-list
|
|
|
+ (* (and formal-param (? (and "," (? spaces)))))
|
|
|
+ (:lambda (params)
|
|
|
+ (map 'list #'car params)))
|
|
|
+#+END_SRC
|
|
|
+*** TODO Error Throws
|
|
|
+ foo
|
|
|
+
|
|
|
+#+CAPTION: Throwing Errors
|
|
|
+#+Name: throwing-errors
|
|
|
+#+BEGIN_SRC lisp
|
|
|
+ (defrule throws
|
|
|
+ (and "throws" (? spaces) identifier)
|
|
|
+ (:destructure (throws s1 idents)
|
|
|
+ (declare (ignore throws s1))
|
|
|
+ idents))
|
|
|
+#+END_SRC
|
|
|
+*** TODO Method Declaration
|
|
|
+ foo
|
|
|
+
|
|
|
+#+CAPTION: Method Declaration
|
|
|
+#+Name: method-declaration
|
|
|
+#+BEGIN_SRC lisp
|
|
|
+ (defrule method-declarator
|
|
|
+ (and identifier (? spaces) "(" (? spaces) (? formal-param-list) (? spaces) ")")
|
|
|
+ (:destructure (ident s1 opar s2 params s3 cpar)
|
|
|
+ (declare (ignore s1 opar s2 s3 cpar))
|
|
|
+ (list :name ident
|
|
|
+ :parameters params)))
|
|
|
+
|
|
|
+ (defrule method-header
|
|
|
+ (and (? method-modifiers) (? spaces) result-type (? spaces) method-declarator (? spaces) (? throws))
|
|
|
+ (:destructure (mods s1 res-type s2 declarator s3 throws)
|
|
|
+ (declare (ignore s1 s2 s3))
|
|
|
+ (list :modifiers mods
|
|
|
+ :result-type res-type
|
|
|
+ :declarator declarator
|
|
|
+ :throws throws)))
|
|
|
+
|
|
|
+ (defrule method-declaration
|
|
|
+ (and method-header (? spaces) "{" (? spaces) class-body (? spaces) "}")
|
|
|
+ (:destructure (header s1 obrace s2 body s3 cbrace)
|
|
|
+ (declare (ignore s1 obrace s2 s3 cbrace))
|
|
|
+ (concatenate 'list header (list :body body))))
|
|
|
+#+END_SRC
|
|
|
+
|
|
|
+** TODO Body Rules
|
|
|
+
|
|
|
+** TODO Data Structures
|
|
|
+
|
|
|
+** TODO The Interface
|
|
|
+
|
|
|
+* TODO The Tests [0%]
|
|
|
+
|
|
|
+** TODO The First Test
|
|
|
+
|
|
|
+** TODO A Test Class
|
|
|
+
|
|
|
+** TODO A Test Tree Walker
|
|
|
+
|
|
|
+* TODO The Packaging Machinery [50%]
|
|
|
+ While this document is mostly just an experiment in parser and
|
|
|
+ tokenizer construction, I do intend it to be used for things like
|
|
|
+ automated source-to-source translation and various related
|
|
|
+ functionality.
|
|
|
+
|
|
|
+** DONE The =defpackage= Forms and =packages.lisp=
|
|
|
+ To properly load and work with this application, a set of packages
|
|
|
+ need to be defined, and to simplify the definition of such, they
|
|
|
+ will be placed in a single file.
|
|
|
+
|
|
|
+#+CAPTION: defpackage forms
|
|
|
+#+Name: defpackage-forms
|
|
|
+#+BEGIN_SRC lisp
|
|
|
+ ;;;; package.lisp
|
|
|
+
|
|
|
+ (defpackage #:java-parser
|
|
|
+ (:use :esrap
|
|
|
+ :cl)
|
|
|
+ (:import-from #:parse-number
|
|
|
+ #:parse-number))
|
|
|
+
|
|
|
+ (defpackage #:java-parser.test
|
|
|
+ (:use :lift
|
|
|
+ :java-parser
|
|
|
+ :cl))
|
|
|
+#+END_SRC
|
|
|
+
|
|
|
+** TODO The Parser
|
|
|
+ The parser is a fairly serious piece of code, comprised of various
|
|
|
+ order-sensitive rules, which together form a complete
|
|
|
+ parser/tokenizer combination.
|
|
|
+
|
|
|
+#+CAPTION: Parser Packaging
|
|
|
+#+Name: parser-packaging
|
|
|
+#+BEGIN_SRC lisp
|
|
|
+ ;;;; parser.lisp
|
|
|
+
|
|
|
+ (in-package #:java-parser)
|
|
|
+
|
|
|
+
|
|
|
+ ;; "parser.lisp" goes here
|
|
|
+
|
|
|
+
|
|
|
+#+END_SRC
|
|
|
+
|
|
|
+** TODO The Tests
|
|
|
+ Proper development of a package such as this requires regular
|
|
|
+ testing of the code to ensure it does exactly as it's supposed to,
|
|
|
+ and to help prevent errors.
|
|
|
+
|
|
|
+#+CAPTION: Tests
|
|
|
+#+Name: test-packaging
|
|
|
+#+BEGIN_SRC lisp
|
|
|
+ ;;;; test.lisp
|
|
|
+
|
|
|
+ (in-package #:java-parser.test)
|
|
|
+
|
|
|
+ ;; "test.lisp"
|
|
|
+#+END_SRC
|
|
|
+
|
|
|
+** DONE The ASDF Mumbo-Jumbo
|
|
|
+ Packaging an application, or system of functions in Lisp is fairly
|
|
|
+ simple, using *ASDF*, or the "Another System Definiton Facility".
|
|
|
+ Doing this is fairly simple, using a single file.
|
|
|
+
|
|
|
+#+CAPTION: ASDF System Definition File
|
|
|
+#+Name: asdf-file
|
|
|
+#+BEGIN_SRC lisp
|
|
|
+ ;;;; java-parser.asd
|
|
|
+
|
|
|
+ (asdf:defsystem #:java-parser
|
|
|
+ :serial t
|
|
|
+ :description "A Basic Java Parser"
|
|
|
+ :author "Samuel W. Flint <swflint@flintfam.org>"
|
|
|
+ :license "GNU GPLv3"
|
|
|
+ :depends-on (#:esrap
|
|
|
+ #:parse-number
|
|
|
+ #:lift)
|
|
|
+ :components ((:file "package.lisp")
|
|
|
+ (:file "parser.lisp")
|
|
|
+ (:file "test.lisp")))
|
|
|
+#+END_SRC
|
|
|
+
|
|
|
+* WORKING Notes and Resources
|
|
|
+ - [[http://cs.au.dk/~amoeller/dRegAut/JavaBNF.html]] is the Java BNF
|
|
|
+ that I've based the parser on.
|
|
|
+ - https://scymtym.github.io/esrap is the Common Lisp parser library
|
|
|
+ that I've decided to use.
|
|
|
+ - To accomplish unit testing and the like, I'm using the [[http://common-lisp.net/project/lift][Lift]]
|
|
|
+ framework.
|