The Inferno Shell
- Roger Peppé
rog@vitanuova.com
ABSTRACT
The Inferno shell
sh
is a reasonably small shell that brings together aspects of
several other shells along with Inferno's dynamically loaded
modules, which it uses for much of the functionality
traditionally built in to the shell. This paper focuses principally
on the features that make it unusual, and presents
an example ``network chat'' application written entirely
in
sh
script.
Introduction
Shells come in many shapes and sizes. The Inferno
shell
sh
(actually one of three shells supplied with Inferno)
is an attempt to combine the strengths of a Unix-like
shell, notably Tom Duff's
rc,
with some of the features peculiar to Inferno.
It owes its largest debt to
rc,
which provides almost all of the syntax
and most of the semantics too; when in doubt,
I copied
rc's
behaviour.
In fact, I borrowed as many good ideas as I could
from elsewhere, inventing new concepts and syntax
only when unbearably tempted. See Credits
for a list of those I could remember.
This paper does not attempt to give more than
a brief overview of the aspects of
sh
which it holds in common with Plan 9's
rc.
The reader is referred
to
sh(1)
(the definitive reference)
and Tom Duff's paper ``Rc - The Plan 9 Shell''.
I have occasionally pinched examples from the latter,
so the differences are easily contrasted.
Overview
Sh
is, at its simplest level, a command interpreter that will
be familiar to all those who have used the Bourne-shell,
C shell, or any of the numerous variants thereof (e.g.
bash,
ksh,
tcsh).
All of the following commands behave as expected:
date
cat /lib/keyboard
ls -l > file.names
ls -l /dis >> file.names
wc <file
echo [a-f]*.b
ls | wc
ls; date
limbo *.b &
An
rc
concept that will be less familiar to users
of more conventional shells is the rôle of
lists
in the shell.
Each simple
sh
command, and the value of any
sh
environment variable, consists of a list of words.
Sh
lists are flat, a simple ordered list of words,
where a word is a sequence of characters that
may include white-space or characters special
to the shell. The Bourne-shell and its kin
have no such concept, which means that every
time the value of any environment variable is
used, it is split into blank separated words.
For instance, the command:
x='-l /lib/keyboard'
ls $x
would in many shells pass the two arguments
``-l''
and
``/lib/keyboard''
to the
ls
command.
In
sh,
it will pass the single argument
``-l /lib/keyboard''.
The following aspects of
sh's
syntax will be familiar to users of
rc.
File descriptor manipulation:
echo hello, world > /dev/null >[1=2]
Environment variable values:
echo $var
Count number of elements in a variable:
echo $#var
Run a command and substitute its output:
rm `{grep -li microsoft *}
Lists:
echo (((a b) c) d)
List concatenation:
cat /appl/cmd/sh/^(std regex expr)^.b
To the above,
sh
adds a variant of the
`{}
operator:
"{},
which is except that it does not
split the input into tokens,
for example:
for i in "{echo one two three} {
echo loop
}
will only print
loop
once.
Sh
also adds a new redirection operator
<>,
which opens the standard input (by default) for
reading
and
writing.
Command blocks
Possibly
sh's
most significant departure from the
norm is its use of command blocks as values.
In a conventional shell, a command block
groups commands together into a single
syntactic unit that can then be used wherever
a simple command might appear.
For example:
{
echo hello
echo goodbye
} > /dev/null
Sh
allows this, but it also allows a command block to appear
wherever a normal word would appear. In this
case, the command block is not executed immediately,
but is bundled up as if it was a single quoted word.
For example:
cmd = {
echo hello
echo goodbye
}
will store the contents of the braced block inside
the environment variable
$cmd.
Printing the value of
$cmd
gets the block back again, for example:
echo $cmd
gives
{echo hello;echo goodbye}
Note that when the shell parsed the block,
it ignored everything that was not
syntactically relevant to the execution
of the block; for instance, the white space
has been reduced to the minimum necessary,
and the newline has been changed to
the functionally identical semi-colon.
It is also worth pointing out that
echo
is an external module, implementing only the
standard
command(2)
interface; it has no knowledge of shell command
blocks. When the shell invokes an external command,
and one of the arguments is a command block,
it simply passes the equivalent string. Internally, built in commands
are slightly different for efficiency's sake, as we will see,
but for almost all purposes you can treat command blocks
as if they were strings holding functionally equivalent shell commands.
This equivalence also applies to the execution of commands.
When the
shell comes to execute a simple command (a sequence of
words), it examines the first word to decide what to execute.
In most shells, this word can be either the file name of
an external command, or the name of a command built in
to the shell (e.g.
exit).
Sh
follows these conventional rules, but first, it examines
the first character of the first word, and if it is an open
brace
({)
character, it treats it as a command block,
parses it, and executes it according to the normal syntax
rules of the shell. For the duration of this execution, it
sets the environment variable
$*
to the list of arguments passed to the block. For example:
{echo $*} hello world
is exactly the same as
echo hello world
Execution of command blocks is the same whether
the command block is just a string or has already been
parsed by the shell.
For example:
{echo hello}
is exactly the same as
'{echo hello}'
The only difference is that the former case has its syntax
checked for correctness as soon as the shell sees the script;
whereas if the latter contained a malformed command block,
a syntax error will be raised only when it
comes to actually execute the command.
The shell's treatment of braces can be used to provide functionality
similar to the
eval
command that is built in to most other shells.
cmd = 'echo hello; echo goodbye'
'{'^$cmd^'}'
In other words, simply by surrounding a string
by braces and executing it, the string
will be executed as if it had been typed to the
shell. Note the use of the caret
(^)
string concatenatation operator.
Sh
does provide `free carets' in the same way as
rc,
so in the previous example
'{'$cmd'}'
would work exactly the same, but generally,
and in particular when writing scripts, it is
good style to make the carets explicit.
Assignment and scope
The assignment operator in
sh,
in common with most other shells
is
=.
x=a b c d
assigns the four element list
(a b c d)
to the environment variable named
x.
The value can later be extracted
with the
$
operator, for example:
echo $x
will print
a b c d
Sh
also implements a form of local variable.
An execution of a braced block command
creates a new scope for the duration of that block;
the value of a variable assigned with
:=
in that block will be lost when the
block exits. For example:
x = hello
{x := goodbye }
echo $x
will print ``hello''.
Note that the scoping rules are
dynamic
- variable references are interpreted
relative to their containing scope at execution time.
For example:
x := hello
cmd := {echo $x}
{
x := goodbye
$cmd
}
wil print ``goodbye'', not ``hello''. For one
way of avoiding this problem, see ``Lexical
binding'' below.
One late, but useful, addition to the shell's assignment
syntax is tuple assignment. This partially
makes up for the lack of list indexing primitives in the shell.
If the left hand side of the assignment operator is
a list of variable names, each element of the list on the
right hand ride is assigned in turn to its respective variable.
The last variable mentioned gets assigned all the
remaining elements.
For example, after:
(a b c) := (one two three four five)
a
is
one,
b
is
two,
and
c
contains the three element list
(three four five).
For example:
(first var) = $var
knocks the first element off
$var
and puts it in
$first.
One important difference between
sh's
variables and variables in shells under
Unix-like operating systems derives from
the fact that Inferno's underlying process
creation primitive is
spawn,
not
fork.
This means that, even though the shell
might create a new process to accomplish
an I/O redirection, variables changed by
the sub-process are still visible in the parent
process. This applies anywhere a new process
is created that runs synchronously with respect
to the rest of the shell script - i.e. there is no
chance of parallel access to the environment.
For example, it is possible to get
access to the status value of a command executed
by the
`{}
operator:
files=`{du -a; dustatus = $status}
if {! ~ $dustatus ''} {
echo du failed
}
When the shell does spawn an asynchronous
process (background processes and pipelines
are the two occasions that it does so), the
environment is copied so changes in one
process do not affect another.
Loadable modules
The ability to pass command blocks as values is
all very well, but does not in itself provide the
programmability that is central to the power of shell scripts
and is built in to most shells, the conditional
execution of commands, for instance.
The Inferno shell is different;
it provides no programmability within the shell itself,
but instead relies on external modules to provide this.
It has a built in command
load
that loads a new module into the shell. The module
that supports standard control flow functionality
and a number of other useful tidbits is called
std.
load std
loads this module into the shell.
Std
is a Dis module that
implements the
Shellbuiltin
interface; the shell looks in the directory
/dis/sh
for the module file, in this case
/dis/sh/std.dis.
When a module is loaded, it is given the opportunity
to define as many new commands as it wants.
Perhaps slightly confusingly, these are known as
``built-in'' commands (or just ``builtins''), to distinguish
them from commands executed in a separate process
with no access to shell internals. Built-in
commands run in the same process as the shell, and
have direct access to all its internal state (environment variables,
command line options, and state stored within the implementing
module itself). It is possible to find out
what built-in commands are currently defined with
the command
loaded.
Before any modules have been loaded, typing
loaded
produces:
builtin builtin
exit builtin
load builtin
loaded builtin
run builtin
unload builtin
whatis builtin
${builtin} builtin
${loaded} builtin
${quote} builtin
${unquote} builtin
These are all the commands that are built in to the
shell proper; I'll explain the
${}
commands later.
After loading
std,
executing
loaded
produces:
! std
and std
apply std
builtin builtin
exit builtin
flag std
fn std
for std
getlines std
if std
load builtin
loaded builtin
or std
pctl std
raise std
rescue std
run builtin
status std
subfn std
unload builtin
whatis builtin
while std
~ std
${builtin} builtin
${env} std
${hd} std
${index} std
${join} std
${loaded} builtin
${parse} std
${pid} std
${pipe} std
${quote} builtin
${split} std
${tl} std
${unquote} builtin
The name of each command defined
by a loaded module is followed by the name of
the module, so you can see that in this case
std
has defined commands such as
if
and
while.
These commands are reminiscent of the
commands built in to the syntax of
other shells, but have no special syntax
associated with them: they obey the normal
argument gathering and execution semantics.
As an example, consider the
for
command.
for i in a b c d {
echo $i
}
This command traverses the list
(a b c d)
executing
{echo $i}
with
$i
set to each element in turn. In
rc,
this might be written
for (i in a b c d) {
echo $i
}
and in fact, in
sh,
this is exactly equivalent. The round brackets
denote a list and, like
rc,
all lists are flattened before passing to an
executed command.
Unlike the
for
command in
rc,
the braces around the command are
not optional; as with the arguments to
a normal command, gathering of arguments
stops at a newline. The exception to this rule
is that newlines within brackets are treated as white space.
This last rule also
applies to round brackets, for example:
(for i in
a
b
c
d
{echo $i}
)
does the same thing.
This is very useful for commands that take multiple
command block arguments, and is actually the only
line continuation mechanism that
sh
provides (the usual backslash
(h)
aracter is not in any way special to
sh).
Control structures
Inferno commands, like shell commands in Unix
or Plan 9, return a status when they finish.
A command's status in Inferno is a short string
describing any error that has occurred;
it can be found in the environment variable
$status.
This is the value that commands defined by
std
use to determine conditional
execution - if it is empty, it is true; otherwise
false.
Std
defines, for instance, a command
~
that provides a simple pattern matching capability.
Its first argument is the string to test the patterns
against, and subsequent arguments give the patterns,
in normal shell wildcard syntax; its status is true
if there is a match.
~ sh.y '*.y'
~ std.b '*.y'
give true and false statuses respectively.
A couple of pitfalls lurk here for the unwary:
unlike its
rc
namesake, the patterns
are
expanded by the shell if left unquoted, so
one has to be careful to quote wildcard characters,
or escape them with a backslash if they are to
be used literally.
Like any other command,
~
receives a simple list of arguments, so it has to
assume that the string tested has exactly one element;
if you provide a null variable, or one with more
than one element, then you will get unexpected results.
If in doubt, use the
$"
operator to make sure of that.
Used in conjunction with the
$#
operator,
~
provides a way to check the
number of elements in a list:
~ $#var 0
will be true if
$var
is empty.
This can be tested by the
if
command, which
accepts command blocks for
its arguments, executing its second argument if
the status of the first is empty (true).
For example:
if {~ $#var 0} {
echo '$var has no elements'
}
Note that the start of one argument must
come on the same line as the end of of the previous,
otherwise it will be treated as a new command,
and always executed. For example:
if {~ $#var 0}
{echo '$var has no elements'} # this will always be executed
The way to get around this is to use list bracketing,
for example:
(if {~ $#var 0}
{echo '$var has no elements'}
)
will have the desired effect.
The
if
command is more general than
rc's
if,
in that it accepts an arbitrary number
of condition/action pairs, and executes each condition
in turn until one is true, whereupon it executes the associated
action. If the last condition has no action, then it
acts as the ``else'' clause in the
if.
For example:
(if {~ $#var 0} {
echo zero elements
}
{~ $#var 1} {
echo one element
}
{echo more than one element}
)
Std
provides various other control structures.
And
and
or
provide the equivalent of
rc's
&&
and
||
operators. They each take any number of command
block arguments and conditionally execute each
in turn.
And
stops executing when a block's status is false,
or
when a block's status is true:
and {~ $var 1} {~ $var '*.sbl'} {echo variable ends in .sbl}
(or {mount /dev/eia0 /n/remote}
{echo mount has failed with $status}
)
An extremely easy trap to fall into is to use
$*
inside a block assuming that its value is the
same as that outside the block. For instance:
# this will not work
if {~ $#* 2} {echo two arguments}
It will not work because
$*
is set locally for every block, whether it
is given arguments or not. A solution is to
assign
$*
to a variable at the start of the block:
args = $*
if {~ $#args 2} {echo two arguments}
While
provides looping, executing its second argument
as long as the status of the first remains true.
As the status of an empty block is always true,
while {} {echo yes}
will loop forever printing ``yes''.
Another looping command is
getlines,
which loops reading lines from its standard
input, and executing its command argument,
setting the environment variable
$line
to each line in turn.
For example:
getlines {
echo '#' $line
} < x.b
will print each line of the file
x.b
preceded by a
#
character.
Exceptions
When the shell encounters some error conditions, such
as a parsing error, or a redirection failure,
it prints a message to standard error and raises
an
exception.
In an interactive shell this is caught by the interactive
command loop; in a script it will cause an exit with
a false status, unless handled.
Exceptions can be handled and raised with the
rescue
and
raise
commands provided by
std.
An exception has a short string associated with it.
raise error
will raise an exception named ``error''.
rescue error {echo an error has occurred} {
command
}
will execute
command
and will, in the event that it raises an
error
exception, print a diagnostic message.
The name of the exception given to
rescue
can end in an asterisk
(*),
which will match any exception starting with
the preceding characters. The
*
needs quoting to avoid being expanded as a wildcard
by the shell.
rescue '*' {echo caught an exception $exception} {
command
}
will catch all exceptions raised by
command,
regardless of name.
Within the handler block,
rescue
sets the environment variable
$exception
to the actual name of the exception caught.
Exceptions can be caught only within a single
process - if an exception is not caught, then
the name of the exception becomes the
exit status of the process.
As
sh
starts a new process for commands with redirected
I/O, this means that
raise error
echo got here
behaves differently to:
raise error > /dev/null
echo got here
The former prints nothing, while the latter
prints ``got here''.
The exceptions
break
and
continue
are recognised by
std's
looping commands
for,
while,
and
getlines.
A
break
exception causes the loop to terminate;
a
continue
exception causes the loop to continue
as before. For example:
for i in * {
if {~ $i 'r*'} {
echo found $i
raise break
}
}
will print the name of the first
file beginning with ``r'' in the
current directory.
Substitution builtins
In addition to normal commands, a loaded module
can also define
substitution builtin
commands. These are different from normal commands
in that they are executed as part of the argument
gathering process of a command, and instead of
returning an exit status, they yield a list of values
to be used as arguments to a command. They
can be thought of as a kind of `active environment variable',
whose value is created every time it is referenced.
For example, the
split
substitution builtin defined by
std
splits up a single argument into strings separated
by characters in its first argument:
echo ${split e 'hello there'}
will print
h llo th r
Note that, unlike the conventional shell
backquote operator, the result of the
$
command is not re-interpreted, for example:
for i in ${split e 'hello there'} {
echo arg $i
}
will print
arg h
arg llo th
arg r
Substitution builtins can only be named
as the initial command inside a dollar-referenced
command block - they live in a different namespace
from that of normal commands.
For instance,
loaded
and
${loaded}
are quite distinct: the former prints a list
of all builtin names and their defining modules, whereas
the former yields a list of all the currently loaded
modules.
Std
provides a number of useful commands
in the form of substitution builtins.
${join}
is the complement of
${split}:
it joins together any elements in its argument list
using its first argument as the separator, for example:
echo ${join . file tar gz}
will print:
file.tar.gz
The in-built shell operator
$"
is exactly equivalent to
${join}
with a space as its first argument.
List indexing is provided with
${index},
which given a numeric index and a list
yields the
index'th
item in the list (origin 1). For example:
echo ${index 4 one two three four five}
will print
four
A pair of substitution builtins with some of
the most interesting uses are defined by
the shell itself:
${quote}
packages its argument list into a single
string in such a way that it can be later
parsed by the shell and turned back into the same list.
This entails quoting any items in the list
that contain shell metacharacters, such as
';`
or
`&'.
For example:
x='a;' 'b' 'c d' ''
echo $x
echo ${quote $x}
will print
a; b c d
'a;' b 'c d' ''
Travel in the reverse direction is possible
using
${unquote},
which takes a single string, as produced by
${quote},
and produces the original list again.
There are situations in
sh
where only a single string can be used, but
it is useful to be able to pass around the values
of arbitrary
sh
variables in this form;
${quote}
and
${unquote}
between them make this possible. For instance
the value of a
sh
list can be stored in a file and later retrieved
without loss. They are also useful to implement
various types of behaviour involving automatically
constructed shell scripts; see ``Lexical binding'', below,
for an example.
Two more list manipulation commands provided
by
std
are
${hd}
and
${tl},
which mirror their Limbo namesakes:
${hd}
returns the first element of a list,
${tl}
returns all but the first element of a list.
For example:
x=one two three four
echo ${hd $x}
echo ${tl $x}
will print:
one
two three four
Unlike their Limbo counterparts, they
do not complain if their argument list
is not long enough; they just yield a null list.
Std
provides three other substitution builtins of
note.
${pid}
yields the process id of the current
process.
${pipe}
provides a somewhat more cumbersome equivalent of the
>{}
and
<{}
commands found in
rc,
i.e. branching pipelines.
For example:
cmp ${pipe from {old}} ${pipe from {new}}
will regression-test a new version of a command.
Using
${pipe}
yields the name of a file in the namespace
which is a pipe to its argument command.
The substitution builtin
${parse}
is used to check shell syntax without actually
executing a command. The command:
x=${parse '{echo hello, world}'}
will return a parsed version of the string
``echo hello, world'';
if an error occurs, then a
parse error
exception will be raised.
Functions
Shell functions are a facility provided
by the
std
shell module; they associate a command
name with some code to execute when
that command is named.
fn hello {
echo hello, world
}
defines a new command,
hello,
that prints a message when executed.
The command is passed arguments in the
usual way, for example:
fn removems {
for i in $* {
if {grep -s Microsoft $i} {
rm $i
}
}
}
removems *
will remove all files in the current directory
that contain the string ``Microsoft''.
The
status
command provides a way to return an
arbitrary status from a function. It takes
a single argument - its exit status
is the value of that argument. For instance:
fn false {
status false
}
fn true {
status ''
}
It is also possible to define new substitution builtins
with the command
subfn:
the value of
$result
at the end of the execution of the
command gives the value yielded.
For example:
subfn backwards {
for i in $* {
result=$i $result
}
}
echo ${backwards a b c 'd e'}
will reverse a list, producing:
d e c b a
The commands associated with shell functions
are stored as normal environment variables, and
so are exported to external commands in the usual
way.
Fn
definitions are stored in environment variables
starting
fn-;
subfn
definitions use environment variables starting
sfn-.
It is useful to know this, as the shell core knows
nothing of these functions - they look just like
builtin commands defined by
std;
looking at the current definition of
$fn-name
is the only way of finding out the body of code
associated with function
name.
Other loadable
sh
modules
In addition to
std,
and
tk,
which is mentioned later, there are
several loadable
sh
modules that extend
sh's
functionality.
Expr
provides a very simple stack-based calculator,
giving simple arithmetic capability to the shell.
For example:
load expr
echo ${expr 3 2 1 + x}
will print
9.
String
provides shell level access to the Limbo
string library routines. For example:
load string
echo ${tolower 'Hello, WORLD'}
will print
hello, world
Regex
provides regular expression matching and
substitution operations. For instance:
load regex
if {! match '^[a-z0-9_]+$' $line} {
echo line contains invalid characters
}
File2chan
provides a way for a shell script to create a
file in the namespace with properties
under its control. For instance:
load file2chan
(file2chan /chan/myfile
{echo read request from /chan/myfile}
{echo write request to /chan/myfile}
)
Arg
provides support for the parsing of standard
Unix-style options.
Sh
and Inferno devices
Devices under Inferno are implemented as files,
and usually device interaction consists of simple
strings written or read from the device files.
This is a happy coincidence, as the two things
that
sh
does best are file manipulation and string manipulation.
This means that
sh
scripts can exploit the power of direct access to
devices without the need to write more long winded
Limbo programs. You do not get the type checking
that Limbo gives you, and it is not quick, but for
knocking up quick prototypes, or ``wrapper scripts'',
it can be very useful.
Consider the way that Inferno implements network
access, for example. A file called
/net/cs
implements DNS address translation. A string such as
tcp!www.vitanuova.com!telnet
is written to
/net/cs;
the translated form of the address is then read
back, in the form of a (file, text)
pair, where
file
is the name of a
clone
file in the
/net
directory
(e.g.
/net/tcp/clone),
and
text
is a translated address as understood by the relevant
network (e.g.
194.217.172.25!23).
We can write a shell function that performs this
translation, returning a triple
(directory clonefile text):
subfn cs {
addr := $1
or {
<> /net/cs {
(if {echo -n $addr >[1=0]} {
(clone addr) := `{read 8192 0}
netdir := ${dirname $clone}
result=$netdir $clone $addr
} {
echo 'cs: cannot translate "' ^
$addr ^
'":' $status >[1=2]
status failed
}
)
}
} {raise 'cs failed'}
}
The code
<> /net/cs { .... }
opens
/net/cs
for reading and writing, on the standard input;
the code inside the braces can then read and
write it.
If the address translation fails, an error will
be generated on the write, so the
echo
will fail - this is detected, and an appropriate exit status
set.
Being a substitution function, the only way that
cs
can indicate an error is by raising an exception, but
exceptions do not propagate across processes
(a new process is created as a result of the redirection),
hence the need for the status check and the raised exception
on failure.
The external program
read
is invoked to make a single read of the
result from
/lib/cs.
It takes a block size, and a read offset - it
is important to set this, as the initial write of the
address to
/lib/cs
will have advanced the file offset, and we will miss
a chunk of the returned address if we're not careful.
Dirname
is a little shell function that uses one of the
string
builtin functions to get the directory name from
the pathname of the
clone
file. It looks like:
load string
subfn dirname {
result = ${hd ${splitr $1 /}}
}
Now we have an address translation function, we can
access the network interface directly. There are
three main operations possible with Inferno network
devices: connecting to a remote address, announcing
the availability of a local dial-in address, and listening
for an incoming connection on a previously announced
address. They are accessed in similar ways (see
ip(3)
for details):
The dial and announce operations require a new
net
directory, which is created by reading the
clone file - this actually opens the
ctl
file in a newly created net directory, representing
one end of a network connection. Reading a
ctl
file yields the name of the new directory;
this enables an application to find the associated
data
file; reads and writes to this file go to the
other end of the network connection.
The listen operation is similar, but the new
net directory is created by reading from an existing
directory's
listen
file.
Here is a
sh
function that implements some behaviour common
to all three operations:
fn newnetcon {
(netdir constr datacmd) := $*
id := "{read 20 0}
or {~ $constr ''} {echo -n $constr >[1=0]} {
echo cannot $constr >[1=2]
raise failed
}
net := $netdir/^$id
$datacmd <> $net^/data
}
It takes the name of a network protocol directory
(e.g.
/net/tcp),
a possibly empty string to write into the control
file when the new directory id has been read,
and a command to be executed connected to
the newly opened
data
file. The code is fairly straightforward: read
the name of a new directory from standard input
(we are assuming that the caller of
newnetcon
sets up the standard input correctly); then
write the configuration string (if it is not empty),
raising an error if the write failed; then run the
command, attached to the
data
file.
We set up the
$net
environment variable so that
the running command knows its network
context, and can access other files in the
directory (the
local
and
remote
files, for example).
Given
newnetcon,
the implementation of
dial,
announce,
and
listen
is quite easy:
fn announce {
(addr cmd) := $*
(netdir clone addr) := ${cs $addr}
newnetcon $netdir 'announce '^$addr $cmd <> $clone
}
fn dial {
(addr cmd) := $*
(netdir clone addr) := ${cs $addr}
newnetcon $netdir 'connect '^$addr $cmd <> $clone
}
fn listen {
newnetcon ${dirname $net} '' $1 <> $net/listen
}
Dial
and
announce
differ only in the string that is written to the control
file;
listen
assumes it is being called in the context of
an
announce
command, so can use the value
of
$net
to open the
listen
file to wait for incoming connections.
The upshot of these function definitions is that we
can make connections to, and announce, services
on the network. The code for a simple client might look like:
dial tcp!somewhere.com!5432 {
echo connected to `{cat $net/remote}
echo hello somewhere >[1=0]
}
A server might look like:
announce tcp!somewhere.com!5432 {
listen {
echo got connection from `{cat $net/remote}
cat
}
}
Sh
and the windowing environment
The main interface to the Inferno graphics and windowing
system is a textual one, based on Osterhaut's Tk,
where commands to manipulate the graphics inside
windows are strings using a uniform syntax not
a million miles away from the syntax of
sh.
(See section 9 of Volume 1 for details).
The
tk
sh
module provides an interface to the Tk graphics
subsystem, providing not only graphics capabilities,
but also the channel communication on which
Inferno's Tk event mechanism is based.
The Tk module gives each window a unique
numeric id which is used to control that window.
load tk
wid := ${tk window 'My window'}
loads the tk module, creates a new window titled ``My window''
and assigns its unique identifier to the variable
$wid.
Commands of the form
tk $wid
tkcommand
can then be used to control graphics in the window.
When writing tk applets, it is helpful to get feedback
on errors that occur as tk commands are executed, so
here's a function that checks for errors, and minimises
the syntactic overhead of sending a Tk command:
fn x {
args := $*
or {tk $wid $args} {
echo error on tk cmd $"args':' $status
}
}
It assumes that
$wid
has already been set.
Using
x,
we could create a button in our new window:
x button .b -text {A button}
x pack .b -side top
x update
Note that the nice coincidence of the quoting rules
of
sh
and tk mean that the unquoted
sh
command block argument to the
button
command gets through to tk unchanged,
there to become quoted text.
Once we've got a button, we want to know when
it has been pressed. Inferno Tk sends events
through Limbo channels, so the Tk module provides
access to simple string channels. A channel is
created with the
chan
command.
chan event
creates a channel named
event.
A
send
command takes a string to send down the channel,
and the
${recv}
builtin yields a received value. Both operations
block until the transfer of data can proceed - as with
Limbo channels, the operation is synchronous. For example:
send event 'hello, world' &
echo ${recv event}
will print ``hello, world''. Note that the send
and receive operations must execute in different
processes, hence the use of the
&
backgrounding operator.
Although for implementation reasons they are
part of the Tk module, these channel operations
are potentially useful in non-graphical scripts -
they will still work fine if there's no graphics context.
The
tk namechan
command makes a channel known to Tk.
tk namechan $wid event
Then we can get events from Tk:
x .b configure -command {send event buttonpressed}
while {} {echo ${recv event}} &
This starts a background process that prints a message
each time the button is pressed.
Interaction with the window manager is handled in
a similar way. When a window is created, it is automatically
associated with a channel of the same name as the window id.
Strings arriving on this are window manager events, such as
resize
and
move.
These can be interpreted if desired, or forwarded back
to the window manager for default handling with
tk winctl.
The following is a useful idiom that does all the usual
event handling on a window:
while {} {tk winctl $wid ${recv $wid}} &
One thing worth knowing is that the default
exit
action (i.e. when the user closes the window) is
to kill all processes in the current process group, so
in a script that creates windows,
it is usual to fork the process group with
pctl newgrp
early on, otherwise
it can end up killing the shell window that spawned it.
An example
By way of an example. I'll present a function that implements
a simple network chat facility, allowing two people on the
network to send text messages to one another, making use
of the network functions described earlier.
The core is a function called
chat
which assumes that its standard input has
been directed to an active network connection; it creates a
window containing an entry widget and a text widget. Any text
entered into the entry widget is sent to the other end
of the connection; lines of text arriving from
the network are appended to the text widget.
The first part of the function creates the window,
forks the process group, runs the window controller
and creates the widgets inside the window:
fn chat {
load tk
pctl newpgrp
wid := ${tk window 'Chat'}
nl := '
' # newline
while {} {tk winctl $wid ${recv $wid}} &
x entry .e
x frame .f
x scrollbar .f.s -orient vertical -command {.f.t yview}
x text .f.t -yscrollcommand {.f.s set}
x pack .f.s -side left -fill y
x pack .f.t -side top -fill both -expand 1
x pack .f -side top -fill both -expand 1
x pack .e -side top -fill x
x pack propagate . 0
x bind .e '<Key-'^$nl^'>' {send event enter}
x update
chan event
tk namechan $wid event event
The middle part of
chat
loops in the background getting text entered
by the user and sending it across the network
(also putting a copy in the local text widget
so that you can see what you have sent.
while {} {
{} ${recv event}
txt := ${tk $wid .e get}
echo $txt >[1=0]
x .f.t insert end '''me: '^$txt^$nl
x .e delete 0 end
x .f.t see end
x update
} &
Note the null command on the second line,
used to wait for the receive event without
having to deal with the value (there's only
one event that can arrive on the channel, and
we know what it is).
The final piece of
chat
gets lines from the network and puts them
in the text widget. The loop will terminate when
the connection is dropped by the other party, whereupon
the window closes and the chat finished:
getlines {
x .f.t insert end '''you: '^$line^$nl
x .f.t see end
x update
}
tk winctl $wid exit
}
Now we can wrap up the network functions and the
chat function in a shell script, to finish off the little demo:
#!/dis/sh
Include the earlier function definitions here.
fn usage {
echo 'usage: chat [-s] address' >[1=2]
raise usage
}
args=$*
or {~ $#args 1 2} {usage}
(addr args) := $*
if {~ $addr -s} {
# server
or {~ $#args 1} {usage}
(addr nil) := $args
announce $addr {
echo announced on `{cat $net/local}
while {} {
net := $net
listen {
echo got connection from `{cat $net/remote}
chat &
}
}
}
} {
or {~ $#args 0} {usage}
# client
dial $addr {
echo made connection
chat
}
}
If this is placed in an executable script file
named
chat,
then
chat -s tcp!mymachine.com!5432
would announce a chat server using tcp
on
mymachine.com
(the local machine)
on port 5432.
chat tcp!mymachine.com!5432
would make a connection to
the previous server; they would both pop
up windows and allow text to be typed in from
either end.
Lexical binding
One potential problem with all this passing around
of fragments of shell script is the scope of names.
This piece of code:
fn runit {x := Two; $*}
x := One
runit {echo $x}
will print ``Two'', which is quite likely to confound the
expectations of the person writing the script if they
did not know that
runit
set the value of
$x
before calling its argument script.
Some functional languages (and the
es
shell) implement
lexical binding
to get around this problem. The idea
is to derive a new script from the old
one with all the necessary variables bound to
their current values, regardless of the context in which
the script is later called.
Sh
does not provide any explicit support for
this operation; however it is possible to fake
up a reasonably passable job.
Recall that blocks can be treated as strings if necessary,
and that
${quote}
allows the bundling of lists in such a way that they
can later be extracted again without loss. These two
features allow the writing of the following
let
function (I have omitted argument checking code here and
in later code for the sake of brevity):
subfn let {
# usage: let cmd var...
(let_cmd let_vars) := $*
if {~ $#let_cmd 0} {
echo 'usage: let {cmd} var...' >[1=2]
raise usage
}
let_prefix := ''
for let_i in $let_vars {
let_prefix = $let_prefix ^
${quote $let_i}^':='^${quote $$let_i}^';'
}
result=${parse '{'^$let_prefix^$let_cmd^' $*}'}
}
Let
takes a block of code, and the names of environment variables
to bind onto it; it returns the resulting new block of code.
For example:
fn runit {x := hello, world; $*}
x := a 'b c d' 'e'
runit ${let {echo $x} x}
will print:
a b c d e
Looking at the code it produces is perhaps more
enlightening than examining the function definition:
x=a 'b c d' 'e'
echo ${let {echo $x} x}
produces
{x:=a 'b c d' e;{echo $x} $*}
Let
has bundled up the values of the two bound variables,
stuck them onto the beginning of the code block
and surrounded the whole thing in braces.
It makes sure that it has valid syntax by using
${parse},
and it ensures that the correct arguments are
passed to the script by passing it
$*.
Note that all the variable names used inside the
body of
let
are prefixed with
let_.
This is to try to reduce the likelihood that someone
will want to lexically bind to a variable of a name used
inside
let.
The module interface
It is not within the scope of this paper to discuss in
detail the public module interface to the shell, but
it is probably worth mentioning some of the other
benefits that
sh
derives from living within Inferno.
Unlike shells in conventional systems, where
the shell is a standalone program, accessible
only through
exec(),
in Inferno,
sh
presents a module interface that allows programs
to gain lower level access to the primitives provided
by the shell. For example, Inferno programs can make use of
the shell syntax parsing directly, so
a shell command in a configuration script might be
checked for correctness before running it,
or parsed to avoid parsing overhead when running
a shell command within a loop.
More importantly, as long as it implements a superset
of the
Shellbuiltin
interface, an application can
load
itself
into the shell as a module, and define builtin commands
that directly access functionality that it can provide.
This can, with minimum effort, provide an application
with a programmable interface to its primitives.
I have modified the Inferno window manager
wm,
for example, so that instead of using a custom, fairly limited
format file, its configuration file is just
a shell script.
Wm
loads itself into the shell,
defines a new builtin command
menu
to create items in
its main menu, and runs a shell script.
The shell script has the freedom to customise
menu entries dynamically, to run arbitrary programs,
and even to publicise this interface to
wm
by creating a file with
file2chan
and interpreting writes to the file as calls
to the
menu
command:
file2chan /chan/wmmenu {} {menu ${unquote ${rget data}}}
A corresponding
wmmenu
shell function might be written to provide access to
the functionality:
fn wmmenu {
echo ${quote $*} > /chan/wmmenu
}
Inferno has blurred the boundaries between
application and library and
sh
exploits this - the possibilities have only just begun
to be explored.
Discussion
Although it is a newly written shell, the use of tried
and tested semantics means that most of the
normal shell functionality works quite smoothly.
The separation between normal commands and
substitution builtins is arguable, but I think justifiable.
The distinction between the two classes of command
means that there is less awkwardness in the transition between
ordinary commands and internally implemented commands:
both return the same kind of thing. A normal command's
return value remains essentially a simple true/false status,
whereas the new substitution builtins are returning a list
with no real distinction between true and false.
I believe that the decision to keep as much functionality as
possible out
of the core shell has paid off. Allowing command blocks
as values enables external modules to define new
control-flow primitives, which in turn means that
the core shell can be kept reasonably static,
while the design of the shell modules evolves
independently. There is a syntactic price
to pay for this generality, but I think it is worth it!
There are some aspects to the design that I do not
find entirely satisfactory. It is strange, given the
throwaway and non-explicit use of subprocesses
in the shell, that exceptions do not propagate
between processes. The model is Limbo's, but
I am not sure it works perfectly for
sh.
I feel there should probably be some difference
between:
raise error > /dev/null
and
status error > /dev/null
The shared nature of loaded modules can cause
problems; unlike environment variables, which
are copied for asynchronously running processes,
the module instances for an asynchronously running
process remain the same. This means that a
module such as
tk
must maintain mutual exclusion locks to
protect access to its data structures. This
could be solved if Limbo had some kind of polymorphic
type that enabled the shell to hold some data on
a module's behalf - it could ask the module
to copy it when necessary.
One thing that is lost going from Limbo to
sh
when using the
tk
module is the usual reference-counted garbage collection
of windows. Because a shell-script holds not
a direct handle on the window, but only a string
that indirectly refers to a handle held inside
the
tk
module, there is no way for the system to
know when the window is no longer referred to,
so, as long as a
tk
module is loaded, its windows must be
explicitly deleted.
The names defined by loaded modules will
become an issue if
loaded modules proliferate. It is not easy
to ensure that a command that you are executing
is defined by the module you think it is, given name clashes
between modules.I have been considering some
kind of scheme that would allow discrimination
between modules, but for the moment, the point
is moot - there are no module name clashes, and
I hope that that will remain the case.
Credits
Sh
is almost entirely an amalgam of other people's
ideas that I have been fortunate enough to
encounter over the years. I hope they will forgive
me for the corruption I've applied...
I have been a happy user of a version of Tom Duff's
rc
for ten years or so; without
rc,
this shell would not exist in anything like its present form.
Thanks, Tom.
It was Byron Rakitzis's UNIX version of
rc
that I was using for most of those ten years; it was his
version of the grammar that eventually became
sh's
grammar, and the name of my
glom()
function came straight from his
rc
source.
From Paul Haahr's
es,
a descendent of Byron's
rc,
and the shell that probably holds the most in common
with
sh,
I stole the ``blocks as values'' idea;
the way that blocks transform into strings
and vice versa is completely
es's.
The syntax of the
if
command also comes directly from
es.
From Bruce Ellis's
mash,
the other programmable shell for Inferno,
I took the
load
command, the
"{}
syntax and the
<>
redirection operator.
Last, but by no means least, S. R. Bourne,
the author of the original
sh,
the granddaddy of this
sh,
is indirectly responsible for all these shells.
That so much has remained unchanged from
then is a testament to the power of his original
vision.
Portions copyright © 1995-1999 Lucent Technologies Inc. All rights reserved.
Portions copyright © 2000 Vita Nuova Holdings Limited. All rights reserved.