Next: Whitespace Bug, Up: count-words-example [Contents][Index]
count-words-example
First, we will implement the word count command with a while
loop,
then with recursion. The command will, of course, be interactive.
The template for an interactive function definition is, as always:
(defun name-of-function (argument-list) "documentation…" (interactive-expression…) body…)
What we need to do is fill in the slots.
The name of the function should be self-explanatory and easy to remember.
count-words-region
is the obvious choice. Since that name is used
for the standard Emacs command to count words, we will name our
implementation count-words-example
.
The function counts words within a region. This means that the argument
list must contain symbols that are bound to the two positions, the beginning
and end of the region. These two positions can be called ‘beginning’
and ‘end’ respectively. The first line of the documentation should be
a single sentence, since that is all that is printed as documentation by a
command such as apropos
. The interactive expression will be of the
form ‘(interactive "r")’, since that will cause Emacs to pass the
beginning and end of the region to the function’s argument list. All this
is routine.
The body of the function needs to be written to do three tasks: first, to
set up conditions under which the while
loop can count words, second,
to run the while
loop, and third, to send a message to the user.
When a user calls count-words-example
, point may be at the beginning
or the end of the region. However, the counting process must start at the
beginning of the region. This means we will want to put point there if it
is not already there. Executing (goto-char beginning)
ensures this.
Of course, we will want to return point to its expected position when the
function finishes its work. For this reason, the body must be enclosed in a
save-excursion
expression.
The central part of the body of the function consists of a while
loop
in which one expression jumps point forward word by word, and another
expression counts those jumps. The true-or-false-test of the while
loop should test true so long as point should jump forward, and false when
point is at the end of the region.
We could use (forward-word 1)
as the expression for moving point
forward word by word, but it is easier to see what Emacs identifies as a
“word” if we use a regular expression search.
A regular expression search that finds the pattern for which it is searching leaves point after the last character matched. This means that a succession of successful word searches will move point forward word by word.
As a practical matter, we want the regular expression search to jump over whitespace and punctuation between words as well as over the words themselves. A regexp that refuses to jump over interword whitespace would never jump more than one word! This means that the regexp should include the whitespace and punctuation that follows a word, if any, as well as the word itself. (A word may end a buffer and not have any following whitespace or punctuation, so that part of the regexp must be optional.)
Thus, what we want for the regexp is a pattern defining one or more word constituent characters followed, optionally, by one or more characters that are not word constituents. The regular expression for this is:
\w+\W*
The buffer’s syntax table determines which characters are and are not word constituents. For more information about syntax, see Syntax Tables in The GNU Emacs Lisp Reference Manual.
The search expression looks like this:
(re-search-forward "\\w+\\W*")
(Note that paired backslashes precede the ‘w’ and ‘W’. A single backslash has special meaning to the Emacs Lisp interpreter. It indicates that the following character is interpreted differently than usual. For example, the two characters, ‘\n’, stand for ‘newline’, rather than for a backslash followed by ‘n’. Two backslashes in a row stand for an ordinary, unspecial backslash, so Emacs Lisp interpreter ends of seeing a single backslash followed by a letter. So it discovers the letter is special.)
We need a counter to count how many words there are; this variable must
first be set to 0 and then incremented each time Emacs goes around the
while
loop. The incrementing expression is simply:
(setq count (1+ count))
Finally, we want to tell the user how many words there are in the region.
The message
function is intended for presenting this kind of
information to the user. The message has to be phrased so that it reads
properly regardless of how many words there are in the region: we don’t want
to say that “there are 1 words in the region”. The conflict between
singular and plural is ungrammatical. We can solve this problem by using a
conditional expression that evaluates different messages depending on the
number of words in the region. There are three possibilities: no words in
the region, one word in the region, and more than one word. This means that
the cond
special form is appropriate.
All this leads to the following function definition:
;;; First version; has bugs!
(defun count-words-example (beginning end)
"Print number of words in the region.
Words are defined as at least one word-constituent
character followed by at least one character that
is not a word-constituent. The buffer's syntax
table determines which characters these are."
(interactive "r")
(message "Counting words in region ... ")
;;; 1. Set up appropriate conditions.
(save-excursion
(goto-char beginning)
(let ((count 0))
;;; 2. Run the while loop. (while (< (point) end) (re-search-forward "\\w+\\W*") (setq count (1+ count)))
;;; 3. Send a message to the user.
(cond ((zerop count)
(message
"The region does NOT have any words."))
((= 1 count)
(message
"The region has 1 word."))
(t
(message
"The region has %d words." count))))))
As written, the function works, but not in all circumstances.
Next: Whitespace Bug, Up: count-words-example [Contents][Index]