Methods: 1, documents; 2, sniff; 3, protocol scanner; 4, differential technique
http://www.samba.org/ftp/tridge/misc/french_cafe.txt
How Samba was written
---------------------
Andrew Tridgell
August 2003
Method 1:
---------
First off, there are a number of publicly available documents on the
CIFS/SMB protocol. The documents are incomplete and in places rather
inaccurate, but they are a very useful starting point. Perhaps the
most useful document is "draft-leach-cifs-v1-spec-02.txt" from 1997
which is a protocol specification released by SNIA and authored
primarily by Microsoft (with significant input from many other people,
including myself). This document has expired as an IETF draft, and
Microsoft has dropped their attempts to get CIFS accepted as an IETF
standard, but the document is still available if you look hard enough
with an internet search engine.
There are numerous other public specifications for various pieces of
the protocol available. I maintain a collection of the ones I know
about in http://samba.org/ftp/samba/specs/
Method 2:
---------
I call this method the "French Cafe technique". Imagine you wanted to
learn French, and there were no books, courses etc available to teach
you. You might decide to learn by flying to France and sitting in a
French Cafe and just listening to the conversations around you. You
take copious notes on what the customers say to the waiter and what
food arrives. That way you eventually learn the words for "bread",
"coffee" etc.
We use the same technique to learn about protocol additions that
Microsoft makes. We use a network sniffer to listen in on
conversations between Microsoft clients and servers and over time we
learn the "words" for "file size", "datestamp" as we observe what is
sent for each query.
Now one problem with the "French Cafe" technique is that you can only
learn words that the customers use. What if you want to learn other
words? Say for example you want to learn to swear in French? You would
try ordering something at the cafe, then stepping on the waiters toe
or poking him in the eye when he gives you your order. As you are
being kicked out you take copious notes on the words he uses.
The equivalent of "swear words" in a network protocol are "error
packets". When implementing Samba we need to know how to respond to
error conditions. To work this out we write a program that
deliberately accesses a file that doesn't exist, or uses a buffer that
is too small or accesses a file we don't own. Then we watch what error
code is returned for each condition, and take notes.
Method 3:
--------
Method 3 is a greatly expanded variant of the "swear words" technique
I have already mentioned. It involves writing something called a
"protocol scanner". A protocol scanner is a program that tries all
possible "words" in some section of a protocol and uses the response
to automatically deduce new information about the protocol. It is like
the French Cafe technique but with a very patient waiter.
For example, some section of the protocol might contain a 16 bit
"command word" that tells the server what operation to perform. There
are 64 thousand possible command words, so we try all of them and note
which ones give an error code other than "not implemented". Then we
need to work out how much supplementary data each command word needs,
so the program tries 1 byte of blank data, then 2 bytes then 3 bytes
etc until the server changes its response in some way. When the
response changes then you know (with a fairly high level of confidence
at least) that you are using the right quantity of data. You then try
using non-blank data, putting in a filename or a directory name or a
username until the server changes its response again. After a large
number of tries the program eventually finds a combination of data
that gives no error code at all - the server has accepted our request!
We have just discovered a new phrase in "French".
Once the server has accepted the new request we need to work out what
the request actually does. We know its a valid command, but what does
it do? To determine that we send the new command then we follow it up
with a series of already understood commands that ask the server for
lots of detailed information about the files it has. Has a file size
changed? Has a date changed? Has a file changed its name? Eventually
we work out what the command does.
Method 4:
--------
The final method that is worth describing here is the "differential"
technique. This is used to discover interactions between different
command words. Using the (now rather stretched) French Cafe analogy it
is like trying to work out if you should use a different word for
coffee if you are having it with a biscuit than if you are having it
with cake. It goes like this.
You use your new knowledge of French to write a virtual waiter. A
program that is supposed to behave like a real French waiter. Then you
write another program that sends a random series of French phrases in
turn to the real waiter and your virtual waiter. Your program then
examines the replies carefully and notes any differences in how the
two waiters respond. You keep careful notes.
When the two waiters respond differently then you look at your notes
and try the same sequence of phrases again, but this time leaving one
of them out. Do the two waiters now behave in the same way? If they do
then you know that phrase is critical to the difference between the
two waiters, otherwise it isn't. In this way you can quickly determine
the minimum set of phases that causes the two waiters to respond
differently.
Once you have this minimal set then you stare at it hard and use the
methods described earlier to see whats wrong with your virtual
waiter. When you fix it you try again, and keep trying until your
waiter behaves the same as the virtual waiter.
Now imagine using all of the above techniques (plus some other similar
techniques I have not gone into here) over a period of 12 years. Thats
how Samba was written.