concept-reader-algorithm.tex 8.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225
  1. %-*- Mode: TeX -*-
  2. %% Reader Algorithm
  3. %% 22.1.1 5
  4. This section describes the algorithm used by the \term{Lisp reader}
  5. to parse \term{objects} from an \term{input} \term{character} \term{stream},
  6. including how the \term{Lisp reader} processes \term{macro characters}.
  7. When dealing with \term{tokens}, the reader's basic function is to distinguish
  8. representations of \term{symbols} from those of \term{numbers}.
  9. %%Barmar didn't like the double negatives:
  10. % When a \term{token} is
  11. % accumulated, it is assumed to be a \term{number} unless it does
  12. % not satisfy the Syntax for Numbers listed in \figref\SyntaxForNumericTokens.
  13. When a \term{token} is accumulated, it is assumed to represent a \term{number} if it
  14. satisfies the syntax for numbers listed in \figref\SyntaxForNumericTokens.
  15. %%Ditto:
  16. % If it is not a \term{number}, it is then assumed to be a potential
  17. % number unless it does not satisfy the rules governing the syntax for a
  18. % \term{potential number}.
  19. If it does not represent a \term{number},
  20. it is then assumed to be a \term{potential number}
  21. if it satisfies the rules governing the syntax for a \term{potential number}.
  22. If a valid \term{token} is neither a representation of a \term{number}
  23. nor a \term{potential number},
  24. it represents a \term{symbol}.
  25. The algorithm performed by the \term{Lisp reader} is as follows:
  26. %% 22.1.1 6
  27. \beginlist
  28. \item{1.}
  29. If at end of file, end-of-file processing is performed as specified
  30. in \funref{read}.
  31. Otherwise,
  32. one \term{character}, \param{x}, is read from the \term{input} \term{stream}, and
  33. dispatched according to the \term{syntax type} of \param{x} to one
  34. of steps 2 to 7.
  35. %% 22.1.1 7
  36. \item{2.}
  37. If \param{x} is an \term{invalid} \term{character},
  38. an error \oftype{reader-error} is signaled.
  39. %% 22.1.1 8
  40. \item{3.}
  41. If \param{x} is a \term{whitespace}\meaning{2} \term{character},
  42. then it is discarded and step 1 is re-entered.
  43. %% 22.1.1 9
  44. \item{4.}
  45. If \param{x} is a \term{terminating} or \term{non-terminating} \term{macro character}
  46. then its associated \term{reader macro function} is called with two \term{arguments},
  47. the \term{input} \term{stream} and \param{x}.
  48. %% 22.1.1 10
  49. The \term{reader macro function} may read \term{characters}
  50. from the \term{input} \term{stream};
  51. if it does, it will see those \term{characters} following the \term{macro character}.
  52. The \term{Lisp reader} may be invoked recursively from the \term{reader macro function}.
  53. %% 22.1.5 16
  54. The \term{reader macro function} must not have any side effects other than on the
  55. \term{input} \term{stream};
  56. because of backtracking and restarting of the \funref{read} operation,
  57. front ends to the \term{Lisp reader} (\eg ``editors'' and ``rubout handlers'')
  58. may cause the \term{reader macro function} to be called repeatedly during the
  59. reading of a single \term{expression} in which \param{x} only appears once.
  60. %% 22.1.1 11
  61. The \term{reader macro function} may return zero values or one value.
  62. If one value is returned,
  63. then that value is returned as the result of the read operation;
  64. the algorithm is done.
  65. If zero values are returned, then step 1 is re-entered.
  66. %% 22.1.1 12
  67. \item{5.}
  68. If \param{x} is a \term{single escape} \term{character}
  69. then the next \term{character}, \param{y}, is read, or an error \oftype{end-of-file}
  70. is signaled if at the end of file.
  71. \param{y} is treated as if it is a \term{constituent}
  72. whose only \term{constituent trait} is \term{alphabetic}\meaning{2}.
  73. \param{y} is used to begin a \term{token}, and step 8 is entered.
  74. %% 22.1.1 13
  75. \item{6.}
  76. If \param{x} is a \term{multiple escape} \term{character}
  77. then a \term{token} (initially
  78. containing no \term{characters}) is begun and step 9 is entered.
  79. %% 22.1.1 14
  80. \item{7.}
  81. If \param{x} is a \term{constituent} \term{character}, then it begins a \term{token}.
  82. After the \term{token} is read in, it will be interpreted
  83. either as a \Lisp\ \term{object} or as being of invalid syntax.
  84. If the \term{token} represents an \term{object},
  85. that \term{object} is returned as the result of the read operation.
  86. If the \term{token} is of invalid syntax, an error is signaled.
  87. % If \param{x} is a \term{lowercase} \term{character},
  88. % it is replaced with the corresponding \term{uppercase} \term{character}.
  89. %% Tentatively replaced with the following to satisfy Sandra:
  90. If \param{x} is a \term{character} with \term{case},
  91. it might be replaced with the corresponding \term{character} of the opposite \term{case},
  92. depending on the \term{readtable case} of the \term{current readtable},
  93. as outlined in \secref\ReadtableCaseReadEffect.
  94. \param{X} is used to begin a \term{token}, and step 8 is entered.
  95. %% 22.1.1 15
  96. %% 22.1.1 16
  97. %% 22.1.1 17
  98. \item{8.}
  99. At this point a \term{token} is being accumulated, and an even number
  100. of \term{multiple escape} \term{characters} have been encountered.
  101. If at end of file, step 10 is entered.
  102. Otherwise, a \term{character}, \param{y}, is read, and
  103. one of the following actions is performed according to its \term{syntax type}:
  104. \beginlist
  105. \itemitem{\bull}
  106. If \param{y} is a \term{constituent} or \term{non-terminating} \term{macro character}:
  107. \beginlist
  108. \itemitem{--}
  109. % If \param{y} is a \term{lowercase} \term{character}, it is replaced with the
  110. % corresponding \term{uppercase} \term{character}.
  111. %% Tentatively replaced with the following to satisfy Sandra:
  112. If \param{y} is a \term{character} with \term{case},
  113. it might be replaced with the corresponding \term{character} of the opposite \term{case},
  114. depending on the \term{readtable case} of the \term{current readtable},
  115. as outlined in \secref\ReadtableCaseReadEffect.
  116. \itemitem{--}
  117. \param{Y} is appended to the \term{token} being built.
  118. \itemitem{--}
  119. Step 8 is repeated.
  120. \endlist
  121. %% 22.1.1 18
  122. \itemitem{\bull}
  123. If \param{y} is a \term{single escape} \term{character}, then the next \term{character},
  124. \param{z}, is read, or an error \oftype{end-of-file} is signaled if at end of file.
  125. \param{Z} is treated as if it is a \term{constituent}
  126. whose only \term{constituent trait} is \term{alphabetic}\meaning{2}.
  127. \param{Z} is appended to the \term{token} being built,
  128. and step 8 is repeated.
  129. %% 22.1.1 19
  130. \itemitem{\bull}
  131. If \param{y} is a \term{multiple escape} \term{character},
  132. then step 9 is entered.
  133. %% 22.1.1 20
  134. \itemitem{\bull}
  135. If \param{y} is an \term{invalid} \term{character},
  136. an error \oftype{reader-error} is signaled.
  137. %% 22.1.1 21
  138. \itemitem{\bull}
  139. If \param{y} is a \term{terminating} \term{macro character},
  140. then it terminates the \term{token}.
  141. First the \term{character} \param{y} is unread (see \funref{unread-char}),
  142. and then step 10 is entered.
  143. %% 22.1.1 22
  144. \itemitem{\bull}
  145. If \param{y} is a \term{whitespace}\meaning{2} \term{character}, then it terminates
  146. the \term{token}. First the \term{character} \param{y} is unread
  147. if appropriate (see \funref{read-preserving-whitespace}),
  148. and then step 10 is entered.
  149. \endlist
  150. %% 22.1.1 23
  151. %% 22.1.1 24
  152. \item{9.}
  153. At this point a \term{token} is being accumulated, and an odd number
  154. of \term{multiple escape} \term{characters} have been encountered.
  155. If at end of file, an error \oftype{end-of-file} is signaled.
  156. Otherwise, a \term{character}, \param{y}, is read, and
  157. one of the following actions is performed according to its \term{syntax type}:
  158. %% 22.1.1 25
  159. \beginlist
  160. \itemitem{\bull}
  161. If \param{y} is a \term{constituent}, macro, or \term{whitespace}\meaning{2} \term{character},
  162. \param{y} is treated as a \term{constituent}
  163. whose only \term{constituent trait} is \term{alphabetic}\meaning{2}.
  164. \param{Y} is appended to the \term{token} being built, and step 9 is repeated.
  165. %% 22.1.1 26
  166. \itemitem{\bull}
  167. If \param{y} is a \term{single escape} \term{character}, then the next \term{character},
  168. \param{z}, is read, or an error \oftype{end-of-file} is signaled if at end of file.
  169. \param{Z} is treated as a \term{constituent}
  170. whose only \term{constituent trait} is \term{alphabetic}\meaning{2}.
  171. \param{Z} is appended to the \term{token} being built,
  172. and step 9 is repeated.
  173. %% 22.1.1 27
  174. \itemitem{\bull}
  175. If \param{y} is a \term{multiple escape} \term{character},
  176. then step 8 is entered.
  177. %% 22.1.1 28
  178. \itemitem{\bull}
  179. If \param{y} is an \term{invalid} \term{character},
  180. an error \oftype{reader-error} is signaled.
  181. \endlist
  182. %% 22.1.1 29
  183. \item{10.}
  184. An entire \term{token} has been accumulated.
  185. The \term{object} represented by the \term{token} is returned
  186. as the result of the read operation,
  187. or an error \oftype{reader-error} is signaled if the \term{token} is not of valid syntax.
  188. \endlist
  189. %% 22.1.1 30
  190. %% 22.1.1 31
  191. %%Barmar observes that this is said elsewhere, and in any case is
  192. %%implied by the algorithm above:
  193. % \term{Single escape} and \term{multiple escape} \term{characters}
  194. % can be included in a \term{token} when
  195. % preceded by another \term{single escape} \term{character}.