Browse Source

Initial commit, brought stuff over

Samuel W. Flint 8 years ago
commit
f805712c8a
2 changed files with 520 additions and 0 deletions
  1. 496 0
      java-parser.org
  2. 24 0
      java-parser.org_archive

+ 496 - 0
java-parser.org

@@ -0,0 +1,496 @@
+#+Title: Test Java Parser in Common Lisp with Esrap
+#+AUTHOR: Sam Flint
+#+EMAIL: swflint@flintfam.org
+#+DATE: \today
+#+INFOJS_OPT: view:info toc:nil path:http://flintfam.org/org-info.js
+#+OPTIONS: toc:nil H:5 ':t *:t
+#+PROPERTY: noweb no-export
+#+PROPERTY: comments noweb
+#+LATEX_HEADER: \usepackage[color]{showkeys}
+#+LATEX_HEADER: \parskip=5pt
+#+LATEX_HEADER: \lstset{texcl=true,breaklines=true,columns=fullflexible,frame=lines,literate={lambda}{$\lambda$}{1} {set}{$\gets$}1 {setq}{$\gets$}1 {setf}{$\gets$}1 {<=}{$\leq$}1 {>=}{$\geq$}1}
+
+#+BEGIN_abstract
+This document presents a personal exercise in both Common Lisp and the
+use of Context-free grammars.  It's one of those just-for-fun
+projects, but will eventually be able to be used for both code metrics
+and even potentially code generation and manipulation.
+
+The parser is implemented using the Packrat Parsing technique, which
+provides LL$(k)$ parsing in linear time.
+#+END_abstract
+
+#+TOC: headlines 3
+#+TOC: listings
+
+* WORKING The Parser [16%]
+  To build this parser, I'm using a somewhat newer parsing technique
+  to do this called "Packrat Parsing".  Packrat Parsing is an
+  unlimited look-ahead parsing technique based on Parser Expression
+  Grammars optimized for use in functional languages.  To achieve
+  this, I'm using the excellent =ESRAP= parsing library for Common
+  Lisp.
+
+** DONE Common Rules
+   To effectively represent the programming language, there are just a
+   few housekeeping rules to allow us to parse.  Many of them are
+   fairly simple, defining things like names and strings.
+
+   #+CAPTION: Common Rules
+   #+Name: common-rules
+   #+BEGIN_SRC lisp
+     <<string-related-rules>>
+
+     <<number-related-rules>>
+
+     <<identifiers>>
+
+     <<misc-rules>>
+   #+END_SRC
+
+*** DONE String Related
+    To accurately parse strings, we need to provide two rules, a rule
+    that provides string characters, and a second rule that defines a
+    string.  Aside from these, one needs a function to ensure that
+    certain characters are not part of strings.
+
+#+CAPTION: String Related Generic Rules
+#+Name: string-related-rules
+#+BEGIN_SRC lisp
+  (defun not-doublequote (character)
+    (not (eql #\" character)))
+
+  (defrule string-chars
+      (or (not-doublequote character)
+         (and #\\ #\")))
+
+  (defrule string
+      (and #\" (* string-chars) #\")
+    (:destructure (q1 string q2)
+                  (declare (ignore q1 q2))
+                  (list :string (text string))))
+#+END_SRC
+*** DONE Number Related
+    Because of the nature of computers, there is the frequent need for
+    numeric literals.  Numeric literals can, however be difficult to
+    parse, especially if you focus on type verification.  To obviate
+    the need for type verification, it generalizes to allow for all
+    base-10 numeric literals, parsing a sign, whole part, fractional
+    part and signed exponent part.
+
+#+CAPTION: Number Related Parsing Rules
+#+Name: number-related-rules
+#+BEGIN_SRC lisp
+  (defrule number
+      (and (? (or "-" "+"))
+         (+ (or "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"))
+         (? (and "."
+               (+ (or "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"))))
+         (? (and (or "e" "E")
+               (? "-")
+               (+ (or "0" "1" "2" "3" "4" "5" "6" "7" "8" "9")))))
+    (:lambda (list)
+      (list :number (text list))))
+#+END_SRC
+*** DONE Identifiers
+    All modern programming languages use identifiers for the names of
+    methods, classes, functions and variables.  The Java definition is
+    a string of alphanumeric characters starting with any
+    alphanumeric, the =_= or the =$= characters with the =_= or =.=
+    interspersed in as appropriate.  Rather than simply parse as a
+    string, the rule will split the identifier at the =.= and return a
+    list of all the parts of the identifier.
+
+#+CAPTION: Identifiers
+#+Name: identifiers
+#+BEGIN_SRC lisp
+  (defrule identifier
+      (and (or (alphanumericp character) "_" "$")
+         (+ (or (alphanumericp character) "_" ".")))
+    (:text t)
+    (:lambda (string)
+      (concatenate 'list
+                   (list :identifier)
+                   (split-sequence #\. string))))
+#+END_SRC
+*** DONE Type Signatures
+
+*** DONE Type Formats
+
+*** DONE Miscellaneous Rules
+    There are a few miscellaneous rules, consisting of things like
+    spaces and non-integers.  They are both fairly simple, and provide
+    a way to work easily.
+
+#+CAPTION: Miscellaneous Rules
+#+Name: misc-rules
+#+BEGIN_SRC lisp
+  (defun not-integer (string)
+    (when (find-if-not #'digit-char-p string)
+      t))
+
+  (defrule spaces
+      (+ (or #\Space #\Tab #\Newline))
+    (:constant nil))
+
+  (defrule package-declaration)
+#+END_SRC
+*** TODO Comments
+    foo
+
+#+CAPTION: Comments
+#+Name: comments
+#+BEGIN_SRC lisp
+  (defrule comment
+      (or (and "//" (* (or (alphanumericp character) "_" "." "+" "\"" "'" "," "." "<" ">")))
+         (and "/*" (* (or (alphanumpericp character) "_" "." "+" "\"" "'" "," "<" ">"))))
+    (:constant nil))
+#+END_SRC
+** WORKING Class Rules [80%]
+   Given the fact that Java is an inherently object-oriented language,
+   and in fact forces it on users, it requires class definitions for
+   everything.
+
+   #+CAPTION: Class Rules
+   #+Name: class-rules
+   #+BEGIN_SRC lisp
+     <<class-modifiers>>
+
+     <<super-class>>
+
+     <<interface-types>>
+
+     <<class-body>>
+
+     <<class-declaration>>
+   #+END_SRC
+
+*** DONE Class Modifiers
+    Java has the ability to set classes as having certain attributes,
+    these attributes are set using class modifiers.  These are
+    =public=, =abstract= and =final=.  However, these may occur with
+    just one or two though it may occur with all three, and to my
+    understanding, they are order independent.  Rather than parsing as
+    a single string, we instead return a de-duplicated list of the
+    modifiers as a keyword list.
+
+#+CAPTION: Class Modifiers
+#+Name: class-modifiers
+#+BEGIN_SRC lisp
+  (defrule class-modifier
+      (or "public" "abstract" "final"))
+
+  (defrule class-modifiers
+      (* (and class-modifier (? spaces)))
+    (:lambda (modifier-list)
+            (let ((mods (map 'list #'(lambda (mod)
+                                       (intern (string-upcase (car mod)) "KEYWORD"))
+                             modifier-list)))
+              (remove-duplicates mods))))
+#+END_SRC
+*** DONE Super Class
+    Because Java is an object oriented language, the use of classes is
+    implicit, and with the use of classes comes inheritance; to take
+    advantage of inheritance, you need to be able to signify the class
+    to inherit from.  The purpose of this rule is to signify the super
+    class.
+
+#+CAPTION: Super Class
+#+Name: super-class
+#+BEGIN_SRC lisp
+  (defrule super-class
+      (and "extends" (? spaces) identifier)
+    (:lambda (list)
+      (cddr list)))
+#+END_SRC
+*** DONE Interface Types
+    Java supports the concept of interfaces and allows the programmer
+    to declare them in the class declaration.  The possible
+    declarations are the different types, things like =byte=, =long=
+    and =boolean=.
+
+#+CAPTION: Interface Types
+#+Name: interface-types
+#+BEGIN_SRC lisp
+  (defrule interface-types
+      (or "byte" "short" "int" "long" "char" "float" "double" "boolean"))
+
+  (defrule interfaces
+      (and "implements" spaces (* (and interface-types (? (and (? spaces) "," (? spaces))))))
+    (:destructure (interstring space interfaces)
+                  (declare (ignore interstring space))
+                  (let ((interlist (map 'list
+                                        #'(lambda (interface-spec)
+                                            (intern (string-upcase (car interface-spec))
+                                                    "KEYWORD"))
+                                        interfaces)))
+                    (remove-duplicates interlist))))
+#+END_SRC
+*** WORKING Class Body
+    The class body contains a set of method definitions, variable
+    declarations and the like.
+
+#+CAPTION: Class Body
+#+Name: class-body
+#+BEGIN_SRC lisp
+  (defrule class-body
+      (* (or (alphanumericp character) "_" "." ";" spaces)))
+#+END_SRC
+*** DONE Class Declaration
+    To define a class in Java, you write a class declaration, which
+    has optional [[Class Modifiers][Class Modifiers]], the word =class=, an [[Identifiers][Identifier]], a
+    [[Super Class][Super Class]] declaration, an [[Interface Types][Interface Type]] list, an opening brace,
+    a [[Class Body][Class Body]] and a closing brace.
+
+#+CAPTION: Class Declaration
+#+Name: class-declaration
+#+BEGIN_SRC lisp
+  (defrule class
+      (and (? class-modifiers)
+         (? spaces)
+         "class"
+         (? spaces)
+         identifier
+         (? spaces)
+         (? super-class)
+         (? spaces)
+         (? interfaces)
+         (? spaces)
+         "{"
+         (? spaces)
+         class-body
+         (? spaces)
+         "}")
+    (:destructure (modifiers
+                   s1
+                   class
+                   s2
+                   class-name
+                   s3
+                   superclass
+                   s4
+                   interfaces
+                   s5
+                   o-brace
+                   s6
+                   body
+                   s7
+                   c-brace)
+                  (declare (ignore s1 class s2 s3 s4 s5 o-brace s6 s7 c-brace))
+                  (list :modifiers modifiers
+                        :class class-name
+                        :superclass superclass
+                        :interfaces interfaces
+                        :body body)))
+#+END_SRC
+** WORKING Method Rules [0%]
+   foo
+
+#+CAPTION: Method Rules
+#+Name: method-rules
+#+BEGIN_SRC lisp
+  <<method-modifiers>>
+
+  <<result-type>>
+
+  <<formal-parameter-list>>
+
+  <<throwing-errors>>
+
+  <<method-declaration>>
+#+END_SRC
+
+*** TODO Method Modifiers
+    foo
+
+#+CAPTION: Method Modifiers
+#+Name: method-modifiers
+#+BEGIN_SRC lisp
+  (defrule method-modifier
+      (or "public" "protected" "private" "static" "abstract" "final" "synchonized" "native"))
+
+  (defrule method-modifiers
+      (* (and method-modifier (? spaces)))
+    (:lambda (method-modifier-list)
+      (let ((mods (map 'list #'(lambda (modifier)
+                                  (intern (string-upcase (car modifier))
+                                          "KEYWORD"))
+                       method-modifier-list)))
+        (remove-duplicates mods))))
+#+END_SRC
+*** TODO Result Type
+    foo
+
+#+CAPTION: Result Type
+#+Name: result-type
+#+BEGIN_SRC lisp
+  (defrule result-type
+      (or interface-types "void")
+    (:lambda (type)
+      (intern (string-upcase type) "KEYWORD")))
+#+END_SRC
+*** TODO Formal Parameter List
+    foo
+
+#+CAPTION: Formal Parameter List
+#+Name: formal-parameter-list
+#+BEGIN_SRC lisp
+  (defrule formal-param
+      (and interface-types (? spaces) identifier)
+    (:destructure (type space ident)
+                  (declare (ignore space))
+                  (list (intern (string-upcase type)
+                                "KEYWORD")
+                        ident)))
+
+  (defrule formal-param-list
+      (* (and formal-param (? (and "," (? spaces)))))
+    (:lambda (params)
+      (map 'list #'car params)))
+#+END_SRC
+*** TODO Error Throws
+    foo
+
+#+CAPTION: Throwing Errors
+#+Name: throwing-errors
+#+BEGIN_SRC lisp
+  (defrule throws
+      (and "throws" (? spaces) identifier)
+    (:destructure (throws s1 idents)
+                  (declare (ignore throws s1))
+                  idents))
+#+END_SRC
+*** TODO Method Declaration
+    foo
+
+#+CAPTION: Method Declaration
+#+Name: method-declaration
+#+BEGIN_SRC lisp
+  (defrule method-declarator
+      (and identifier (? spaces) "(" (? spaces) (? formal-param-list) (? spaces) ")")
+    (:destructure (ident s1 opar s2 params s3 cpar)
+                  (declare (ignore s1 opar s2 s3 cpar))
+                  (list :name ident
+                        :parameters params)))
+
+  (defrule method-header
+      (and (? method-modifiers) (? spaces) result-type (? spaces) method-declarator (? spaces) (? throws))
+    (:destructure (mods s1 res-type s2 declarator s3 throws)
+                  (declare (ignore s1 s2 s3))
+                  (list :modifiers mods
+                        :result-type res-type
+                        :declarator declarator
+                        :throws throws)))
+
+  (defrule method-declaration
+      (and method-header (? spaces) "{" (? spaces) class-body (? spaces) "}")
+    (:destructure (header s1 obrace s2 body s3 cbrace)
+                  (declare (ignore s1 obrace s2 s3 cbrace))
+                  (concatenate 'list header (list :body body))))
+#+END_SRC
+
+** TODO Body Rules
+
+** TODO Data Structures
+
+** TODO The Interface
+
+* TODO The Tests [0%]
+
+** TODO The First Test
+
+** TODO A Test Class
+
+** TODO A Test Tree Walker
+
+* TODO The Packaging Machinery [50%]
+  While this document is mostly just an experiment in parser and
+  tokenizer construction, I do intend it to be used for things like
+  automated source-to-source translation and various related
+  functionality.
+
+** DONE The =defpackage= Forms and =packages.lisp=
+   To properly load and work with this application, a set of packages
+   need to be defined, and to simplify the definition of such, they
+   will be placed in a single file.
+
+#+CAPTION: defpackage forms
+#+Name: defpackage-forms
+#+BEGIN_SRC lisp
+  ;;;; package.lisp
+
+  (defpackage #:java-parser
+    (:use :esrap
+          :cl)
+    (:import-from #:parse-number
+                  #:parse-number))
+
+  (defpackage #:java-parser.test
+    (:use :lift
+          :java-parser
+          :cl))
+#+END_SRC
+
+** TODO The Parser
+   The parser is a fairly serious piece of code, comprised of various
+   order-sensitive rules, which together form a complete
+   parser/tokenizer combination.
+
+#+CAPTION: Parser Packaging
+#+Name: parser-packaging
+#+BEGIN_SRC lisp
+  ;;;; parser.lisp
+
+  (in-package #:java-parser)
+
+
+  ;; "parser.lisp" goes here
+
+
+#+END_SRC
+
+** TODO The Tests
+   Proper development of a package such as this requires regular
+   testing of the code to ensure it does exactly as it's supposed to,
+   and to help prevent errors.
+
+#+CAPTION: Tests
+#+Name: test-packaging
+#+BEGIN_SRC lisp
+  ;;;; test.lisp
+
+  (in-package #:java-parser.test)
+
+  ;; "test.lisp"
+#+END_SRC
+
+** DONE The ASDF Mumbo-Jumbo
+   Packaging an application, or system of functions in Lisp is fairly
+   simple, using *ASDF*, or the "Another System Definiton Facility".
+   Doing this is fairly simple, using a single file.
+
+#+CAPTION: ASDF System Definition File
+#+Name: asdf-file
+#+BEGIN_SRC lisp
+  ;;;; java-parser.asd
+
+  (asdf:defsystem #:java-parser
+    :serial t
+    :description "A Basic Java Parser"
+    :author "Samuel W. Flint <swflint@flintfam.org>"
+    :license "GNU GPLv3"
+    :depends-on (#:esrap
+                 #:parse-number
+                 #:lift)
+    :components ((:file "package.lisp")
+                 (:file "parser.lisp")
+                 (:file "test.lisp")))
+#+END_SRC
+
+* WORKING Notes and Resources
+   - [[http://cs.au.dk/~amoeller/dRegAut/JavaBNF.html]] is the Java BNF
+     that I've based the parser on.
+   - https://scymtym.github.io/esrap is the Common Lisp parser library
+     that I've decided to use.
+   - To accomplish unit testing and the like, I'm using the [[http://common-lisp.net/project/lift][Lift]]
+     framework.

+ 24 - 0
java-parser.org_archive

@@ -0,0 +1,24 @@
+#    -*- mode: org -*-
+
+
+Archived entries from file /home/swflint/org/test-java.org
+
+
+* TODO Comments
+  :PROPERTIES:
+  :ARCHIVE_TIME: 2014-09-06 Sat 22:16
+  :ARCHIVE_FILE: ~/org/test-java.org
+  :ARCHIVE_OLPATH: The Parser/Common Rules
+  :ARCHIVE_CATEGORY: test-java
+  :ARCHIVE_TODO: TODO
+  :END:
+    foo
+
+#+CAPTION: Comments
+#+Name: comments
+#+BEGIN_SRC lisp
+  (defrule comment
+      (or (and "//" (* (not-doublequote character)) #\Newline)
+         (and "/*" (* (not-doublequote character)) "*/"))
+    (:constant nil))
+#+END_SRC