Text-Based Internet Application Protocols

Mark Montague

University of Michigan

Navigate by using the arrow keys or clicking.
Press "n" to see notes for slides which have them.
Press "t" to toggle between slide view and outline view.

A copy of this presentation can be downloaded from
http://www-personal.umich.edu/~markmont/tbiap.zip

What is a protocol?

https://en.wikipedia.org/wiki/Communications_protocol

"A communications protocol is a system of digital message formats and rules for exchanging those messages in or between computing systems and in telecommunications"

More informally, a protocol is an agreement between two computer systems on how they will talk to each other.

So what is a text-based protocol?

A protocol where the messages use human-readable text rather than a binary encoding intended for software.

There are some common characteristics that many "Internet application protocols" share:

The first text-based protocol seems to be RFC 354 ("The File Transfer Protocol"), July 1972 (section IV.A), which introduced ASCII text commands and 3-digit ASCII reply codes. Versions of FTP before this used control characters to represent commands.

A common system at the time was a DEC PDP-10. Unix was being rewritten from assembly language into C at the time, and could not yet do networking.

Yes, that's MS-DOS style line endings.

Using a text-based protocol

  1. The client connects to the server via TCP using the well-known port number for the service (25 for SMTP, 80 for HTTP, and so on).
  2. Client sends a command to the server and waits for a reply.
  3. Client acts upon reply.
  4. Client repeats steps 2 & 3 until done.
  5. Client disconnects from the server.

Simple Mail Transfer Protocol (SMTP)

SMTP is used to send email messages. IMAP and POP are used to retrieve messages (which were previously sent and delivered) from a remote mailbox.

SMTP example

$ host -t mx umich.edu
umich.edu mail is handled by 0 mx2.umich.edu.
umich.edu mail is handled by 0 mx3.umich.edu.
umich.edu mail is handled by 0 mx1.umich.edu.
$ telnet mx2.umich.edu 25
Trying 141.211.124.87...
Connected to mx2.umich.edu.
Escape character is '^]'.
220 <halloween.mr.itd.umich.edu> Simple Internet Message Transfer Agent ready
EHLO umich.edu
250-halloween.mr.itd.umich.edu Hello umich.edu
250 SIZE=104857600
MAIL FROM: <markmont@umich.edu>
250 OK
RCPT TO: <lsait-demo@umich.edu>
250 OK
DATA
354 Start mail input; end with <CRLF>.<CRLF>
From: markmont@umich.edu
To: lsait-demo@umich.edu
Subject: SMTP test message

Text-based protocols are awesome!

-- Mark
.
250 Accepted: (511A545E.7E9F.11263)
QUIT
221 <halloween.mr.itd.umich.edu> Service closing transmission channel
Connection closed by foreign host.
$ 

Be sure to do this from a wired campus network... the U-M mail gateway won't accept mail from the U-M wireless networks.

Red text is stuff we type. Blue text is the server's replies.

Dash after status code means "expect another line from the server"

Angle brackets around the email address in the SMTP commands is required

What comes in the DATA section is the email message itself.

The email message may not have From: or To: headers, this is one of the reasons for MAIL FROM, RCPT TO

The delivered message

Delivered-To: lsait-demo@go.itd.umich.edu
Received: by 10.76.87.170 with SMTP id az10csp128822oab;
        Tue, 12 Feb 2013 06:41:14 -0800 (PST)
X-Received: by 10.50.157.130 with SMTP id wm2mr3899411igb.1.1360680074655;
        Tue, 12 Feb 2013 06:41:14 -0800 (PST)
Return-Path: <lsait-demo-errors@umich.edu>
Received: from halloween.mr.itd.umich.edu (mx.umich.edu. [141.211.176.133])
        by mx.google.com with ESMTP id nc3si47015492icb.52.2013.02.12.06.41.14;
        Tue, 12 Feb 2013 06:41:14 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of lsait-demo-errors@umich.edu designates 141.211.2.210 as permitted sender) client-ip=141.211.2.210;
Authentication-Results: mx.google.com;
       spf=pass (google.com: best guess record for domain of lsait-demo-errors@umich.edu designates 141.211.2.210 as permitted sender) smtp.mail=lsait-demo-errors@umich.edu
Date: Tue, 12 Feb 2013 06:41:14 -0800 (PST)
Message-Id: <511a548a.03fe2a0a.7995.3999SMTPIN_ADDED_MISSING@mx.google.com>
Received: FROM umich.edu (zaxxon.gpcc.itd.umich.edu [141.211.2.210])
	By halloween.mr.itd.umich.edu ID 511A545E.7E9F.11263 ;
	12 Feb 2013 09:40:53 EST
From: markmont@umich.edu
To: lsait-demo@umich.edu
Subject: SMTP test message

Text-based protocols are awesome!

-- Mark

Each SMTP server that the message passes through adds its own Received header to the start of the headers. The path the message took in being delivered can be reconstructed by reading the headers from bottom to top.

How do I know what to type?

This is considerably simplified, we're not getting into Technical Specifications or Applicability Statements.

I encourage everyone to read RFCs, they tend to be much less daunting and more accessible than, say, ISO standards.

SMTP commands (RFC 2821)

CommandDescription
EHLOExtended Hello, identifies the client system to the server and queries which SMTP service extensions are supported.
HELOHello. Now obsolete, predecessor to EHLO.
MAILFirst step of a mail transaction, specifies the sender identity (may be different than the From header of the message itself!)
RCPTSecond step of a mail transaction, indicates recipient (To/Cc headers may be different or may not exist at all)
DATAThird and final step of a mail transaction, "Here is the message!" The message format (headers, body, etc.) is specified by RFC 2822. SMTP only supports 7-bit ASCII data, anything else must be encoded. The DATA command ends with a "." on the line by itself.
VERIFYLook up the mailbox corresponding to a name, or make sure a mailbox exists. Disabled on most SMTP servers for privacy and anti-spam reasons.
EXPANDLook up members of a mailing list. Disabled on most SMTP server for privacy and anti-spam reasons.
RESETCancels the current mail transaction and starts over.
HELPFor people who connect via telnet and type stuff (for example, a system administrator debugging a problem between two SMTP servers).
NOOPHave the server send an "OK" reply without doing anything else (useful for maintaining the connection, testing to see if the server is still responding).
QUITClose the connection.
STARTTLSEncrypt everything from this point on (actually part of RFC 3207).

This is every single command!

For the DATA command, if a line in the message begins with a ".", add another "." in front of it to "escape" the message's leading "."

There are several options for encoding message data, for example MIME and Base64.

SMTP replies

3-DIGIT-STATUS-CODE MESSAGE

Many text-based protocols use similar status codes, and clients generally only have to look at the first digit of the code to tell whether the command was successful:

StatusDescription
1xxinformational ("positive preliminary reply")
2xxsuccess ("positive completion reply")
3xxawaiting further commands ("positive intermediate reply")
4xxtemp error ("transient negative completion reply")
5xxerror ("permanent negative completion reply")

For SMTP, the second digit will give category (0=syntax, 1=information, 2=connection, etc.). The third digit depends on the second digit and identifies the specific message.

So what?

Why might we want to use a protocol directly?

(Note that you don't need to resort to using SMTP directly in order to forge email -- most email clients fully support sending spoofed email)

Text-Based protocol benefits

Text-Based protocol downsides

But weight whether these things are relevant. If they are, you should make sure you that you don't use XML anywhere either.

An Intel Core i7 2600K running at 3.4 GHz can execute 128 billion instructions every second.

A few miscellaneous notes

There are two main ways to handle data longer than a single line:

Text-based protocols can make use of generic data encoding/markup methods including XML, JSON, or Base64.

Text-based protocols can also use special data formats such as RFC 2822 (email message format), or iCalendar / RFC 5545 (calendar data format)

We saw the first way of handling multi-line data in SMTP already (line at end of message consisting of a single dot).

We'll see the second way (indicating the length) in a few minutes when we look at HTTP.

HTTP

The Hypertext Transfer Protocol (HTTP) is used by web clients (browsers) to communicate with web servers. HTTP version 1.1 is defined by RFC 2616.

CommandDescription
GETRetrieve information ("the page") for a URI.
HEADGet information about a URI (e.g., to see if it has changed recently). Does not actually retrieve the page for the URI.
POSTSend data to the URI resource for it to do something with. Examples: process a form, upload a file.
PUTUpload a new page to the web server as a particular URI (seldom used).
DELETERemove the page for a URI from the web server (seldom used).
OPTIONSAsks what communications options are available for a URI, or for the server. Often disabled for security reasons since legitimate clients rarely make use of it.
TRACEEcho back the TRACE command and all headers. Used for debugging, should be disabled for security reasons.
CONNECTWhen sent to a web proxy server, tells the proxy to become a tunnel instead (almost never used).

In practice, this list of commands reduces to just GET, HEAD>, and POST.

Response codes are similar to those of SMTP, and many people are more familiar with them (200 - OK, 301 - Moved permanently, 404 - Not found). Read RFC 2616!

HTTP example


$ telnet www-personal.umich.edu 80
Trying 141.211.13.225...
Connected to www-personal.umich.edu.
Escape character is '^]'.
GET /~markmont/tbiap/hello.html
<html>
  <body>
    <p>Hello, world!</p>
  </body>
</html>
Connection closed by foreign host.
$

There is also HTTP 1.0, as specified in RFC 1945.

Lack of support for name-based web virtual hosts means that each virtual host on a web server needs its own IP address, which is a problem in light of IPv4 address exhaustion.

HTTP 1.1

"Encoding" defaults to none, but can optionally be anything that can be decoded by the client to get the actual document. This is how compression is done, for example.

Remember that a "blank line" is actually CR+LF+CR+LF.

HTTP 1.1 example


$ telnet www-personal.umich.edu 80
Trying 141.211.13.223...
Connected to www-personal.umich.edu.
Escape character is '^]'.
GET /~markmont/tbiap/hello.html HTTP/1.1
Host: www-personal.umich.edu

HTTP/1.1 200 OK
Date: Wed, 13 Feb 2013 15:29:19 GMT
Server: Apache
Accept-Ranges: bytes
Content-Length: 59
Content-Type: text/html; charset=utf-8

<html>
  <body>
    <p>Hello, world!</p>
  </body>
</html>
Connection closed by foreign host.
$ 

Note the web server gives the "Content-Length:" response header so clients will know how many bytes to expect after the end of the header, before the response is complete.

The "Content-Type:" response header tells the client how to interpret the response. If something is downloading instead of displaying, the server might be sending the wrong content-type header.

Some extra software

Binary data

Method 1: Just send the binary data! Either include a length before the data, or a marker at the end.

$ echo -ne "GET /vgn-ext-templating/sites/lsa/images/banner.png HTTP/1.1\r\n
Host: www.lsa.umich.edu\r\n\r\n" | nc www.lsa.umich.edu 80 >image.png
$ vi image.png  # remove response header lines, including blank line at end
$ od -c image.png | head -5
0000000  211   P   N   G  \r  \n 032  \n  \0  \0  \0  \r   I   H   D   R
0000020  \0  \0 003 023  \0  \0  \0   T  \b 003  \0  \0  \0 316   1 232
0000040  +  \0  \0  \0 031   t   E   X   t   S   o   f   t   w   a   r
0000060  e  \0   A   d   o   b   e       I   m   a   g   e   R   e   a
0000100  d   y   q 311   e   <  \0  \0 003   "   i   T   X   t   X   M
$ file image.png 
image.png: PNG image data, 787 x 84, 8-bit colormap, non-interlaced
$ open image.png

image we retrieved into file image.png (web site banner for the College of LSA)

Note that SMTP does not accept this method, it can only deal with text.

Yes, you could do the same thing easier using wget or curl, but that would be using a client rather than just connecting and using the protocol directly.

The echo command is all on a single line (it's wrapped above). Note the two \r\n\r\n at the end. This is a blank line signaling the end of the HTTP 1.1 request headers, telling the server it can start processing the request.

Binary Data

Method 2: encode the binary data as text -- for example, by using Base64.

Example: sending an image as an email attachment, using Multipurpose Internet Mail Extensions (MIME) (RFC 2045, 2046, 2047).

We'll use an image from the PNG test suite test pattern image from PNG test suite and convert it to Base64:

$ curl --silent -O http://www.schaik.com/pngsuite/z09n2c08.png
$ od -c z09n2c08.png | head -5
0000000  211   P   N   G  \r  \n 032  \n  \0  \0  \0  \r   I   H   D   R
0000020   \0  \0  \0      \0  \0  \0      \b 002  \0  \0  \0 374 030 243
0000040  243  \0  \0  \0 247   I   D   A   T   x   ڵ  ** 321   K 016 200
0000060      020 003   К  ** 260 300 373 037 026   v 270   1   ~ 020   a
0000100    > 035   W   $ 222 327 320   n 255 241 002 025   (   8 017 171
$ openssl base64 < z09n2c08.png > z09n2c08.b64
$ cat z09n2c08.b64
iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAIAAAD8GO2jAAAAp0lEQVR42rXRSw6A
IBAD0JqwwPsfFna4MX4QYT4dVySS19BuraECFSg4D9158ktyLaEi8suhARnICSVQ
B/agF5x6UEW3HhHw0ukb9Dp3g4FOrGisswJ+dUrATPePvNCdI691T0Ui3Rwg1W0b
KHTDBjpdW5FaVwVYdPkGRl24gV2XVOTSlwFefR5A0Ccjc/S/kWn6sCKm/g0g690G
fP25QYh+VRSlA/kAVZNObtYRvvUAAAAASUVORK5CYII=
$ 

Base64 uses only 7-bit ASCII printable characters; this is beneficial for historical systems (e.g., SMTP, printers, filesystems) that did not support 8-bit text. Base64 requires less space than hexadecimal.

MIME is a text-based data format rather than a protocol.

We're choosing this image for the example because it is only 224 bytes.

Email message with attachment

From: markmont@umich.edu
To: lsait-demo@umich.edu
Subject: An attachment!
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="MIMEpArTbOuNdArY"

--MIMEpArTbOuNdArY
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

A pretty picture is attached.

-- Mark

--MIMEpArTbOuNdArY
Content-Type: image/png; name="z09n2c08.png"
Content-Transfer-Encoding: base64
Content-Disposition: inline; filename="z09n2c08.png"

iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAIAAAD8GO2jAAAAp0lEQVR42rXRSw6A
IBAD0JqwwPsfFna4MX4QYT4dVySS19BuraECFSg4D9158ktyLaEi8suhARnICSVQ
B/agF5x6UEW3HhHw0ukb9Dp3g4FOrGisswJ+dUrATPePvNCdI691T0Ui3Rwg1W0b
KHTDBjpdW5FaVwVYdPkGRl24gV2XVOTSlwFefR5A0Ccjc/S/kWn6sCKm/g0g690G
fP25QYh+VRSlA/kAVZNObtYRvvUAAAAASUVORK5CYII=
--MIMEpArTbOuNdArY--

MIME directives (blue) are embedded in an RFC 2822 email message.

The first two MIME lines declare that MIME is being used and define the message type and boundary. The boundary can be any arbitrary string that does not occur in the message itself.

Each part of the message begins with a line consisting of two dashes and the boundary, followed by MIME headers, a blank line, and then the content of the part. The message is ended by the boundary with "--" both before and after it.

Security - HTTP

To emphasize, HTTPS is just the HTTP protocol inside the SSL/TLS protocol.

Encrypting traffic versus encrypting content is clearer if you consider email messages. Encrypting traffic encrypts the SMTP transaction, including the message while it is in transit, but the message resides in plaintext on both the client and the server. Encrypting the content would mean creating an encrypted mail message using, for example, S/MIME, and the message would be stored in encrypted format on both the client and the server, and the encrypted mail message would be sent over a separately encrypted SMTP connection.

Security - HTTP

$ openssl s_client -crlf -connect www-personal.umich.edu:443
CONNECTED(00000003)
depth=2 /C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External CA Root
verify error:num=19:self signed certificate in certificate chain
verify return:0
---
Certificate chain
 0 s:/C=US/postalCode=48109/ST=MI/L=Ann Arbor/streetAddress=no street/O=University of Michigan/OU=ITS/CN=www-personal.umich.edu
   i:/C=US/O=Internet2/OU=InCommon/CN=InCommon Server CA
 1 s:/C=US/O=Internet2/OU=InCommon/CN=InCommon Server CA
   i:/C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External CA Root
 2 s:/C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External CA Root
   i:/C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External CA Root
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIFMzCCBBugAwIBAgIQXx0XG3GO+Q7kbUCUwzMKYDANBgkqhkiG9w0BAQUFADBR
MQswCQYDVQQGEwJVUzESMBAGA1UEChMJSW50ZXJuZXQyMREwDwYDVQQLEwhJbkNv
bW1vbjEbMBkGA1UEAxMSSW5Db21tb24gU2VydmVyIENBMB4XDTExMDgxMjAwMDAw
MFoXDTE0MDgxNjIzNTk1OVowgaIxCzAJBgNVBAYTAlVTMQ4wDAYDVQQREwU0ODEw
OTELMAkGA1UECBMCTUkxEjAQBgNVBAcTCUFubiBBcmJvcjESMBAGA1UECRMJbm8g
c3RyZWV0MR8wHQYDVQQKExZVbml2ZXJzaXR5IG9mIE1pY2hpZ2FuMQwwCgYDVQQL
EwNJVFMxHzAdBgNVBAMTFnd3dy1wZXJzb25hbC51bWljaC5lZHUwggEiMA0GCSqG
SIb3DQEBAQUAA4IBDwAwggEKAoIBAQC5ncX/D6VcqjNEWG0dxQ7Tptn1pY6n0dGX
OBzWfRv0RzRXSkbJknaek7/L1Lw4/Auwm/FDbR5CPosKE94a0tx2SRyd4LsEkPP+
G4yv39RmeYndAoQ/YVl48s/RqzeM85FROcUrS4dm8Xa6gWB+PizcHStpxHj08TUa
7HBNsPUUTu+u00Vro3NYLGHTKBnhw4o2X1qi6OGx97lk74/EC/6kpXT5lER5C0no
0E8dzDWdxdRDS9zme4rYJtMafPtR2qcAKP6XaW+0xtzU5J4SDT9jVA6gyICfMxiL
teJQqCyhrNZPQbXY7Rj91E8Ooz65cUB28IG/p+pg64EKcPtyeVgRAgMBAAGjggGz
MIIBrzAfBgNVHSMEGDAWgBRIT1r6L0qaXuBQ82t7VaXe9b40XTAdBgNVHQ4EFgQU
TSBiNzgqCRD+KBghohrCYskM9k8wDgYDVR0PAQH/BAQDAgWgMAwGA1UdEwEB/wQC
MAAwHQYDVR0lBBYwFAYIKwYBBQUHAwEGCCsGAQUFBwMCMF0GA1UdIARWMFQwUgYM
KwYBBAGuIwEEAwEBMEIwQAYIKwYBBQUHAgEWNGh0dHBzOi8vd3d3LmluY29tbW9u
Lm9yZy9jZXJ0L3JlcG9zaXRvcnkvY3BzX3NzbC5wZGYwPQYDVR0fBDYwNDAyoDCg
LoYsaHR0cDovL2NybC5pbmNvbW1vbi5vcmcvSW5Db21tb25TZXJ2ZXJDQS5jcmww
bwYIKwYBBQUHAQEEYzBhMDkGCCsGAQUFBzAChi1odHRwOi8vY2VydC5pbmNvbW1v
bi5vcmcvSW5Db21tb25TZXJ2ZXJDQS5jcnQwJAYIKwYBBQUHMAGGGGh0dHA6Ly9v
Y3NwLmluY29tbW9uLm9yZzAhBgNVHREEGjAYghZ3d3ctcGVyc29uYWwudW1pY2gu
ZWR1MA0GCSqGSIb3DQEBBQUAA4IBAQCBcZ2CUHaXnlyRu9SutRe/4Vr9sTu5anap
SS/koDDnCHHNTiJQ2BbOAtV4UJaw0NpBZxLuylkz59rcVzC4v9RcVVTvIl1339F2
/+MEVucegSmtbCf+wOcH+45wYoxtuOjfpedX0G6PjJcOzT88bxBrMo0tleiVB0gO
dKBZjgyFbdx6D5GBH7uyoBmp77kZNai24TjxCBLSmq3hzdGvFkf83I+ARv5sax2x
c4pJYsivu1c6SSu6ni2WVOo3GYW59Q5dNoHRcKBh+OSSUCLw06iSHwjK15OrbnEP
fXaXQXkIyqVUE7SkRZwpndK+0izsbCmUGlWJ6bx9xzOb/GV7UCI5
-----END CERTIFICATE-----
subject=/C=US/postalCode=48109/ST=MI/L=Ann Arbor/streetAddress=no street/O=University of Michigan/OU=ITS/CN=www-personal.umich.edu
issuer=/C=US/O=Internet2/OU=InCommon/CN=InCommon Server CA
---
No client certificate CA names sent
---
SSL handshake has read 4338 bytes and written 322 bytes
---
New, TLSv1/SSLv3, Cipher is DHE-RSA-AES256-SHA
Server public key is 2048 bit
Compression: NONE
Expansion: NONE
SSL-Session:
    Protocol  : TLSv1
    Cipher    : DHE-RSA-AES256-SHA
    Session-ID: 15B8CDF22B6969B0041A7477740D1BC9DC7721E7573CF9DBB0EB5A38180C63E5
    Session-ID-ctx: 
    Master-Key: 075D5446471883BB88965CF3C54BBEC794F350375F52AF4B688BD81BBB4750679787B44C1521A2A7DAE2DDC64D498CF7
    Key-Arg   : None
    Start Time: 1360866200
    Timeout   : 300 (sec)
    Verify return code: 19 (self signed certificate in certificate chain)
---
GET /~markmont/tbiap/hello.html
<html>
  <body>
    <p>Hello, world!</p>
  </body>
</html>
closed
$ exit

Note that we're connecting to port 443, not port 80. That's because the web server listening on port 80 expects clients to talk HTTP, not the binary SSL/TLS protocols. So we need another port where SSL/TLS are expected.

This example is not verifying the server's identity, we've just accepting whatever certificate it gives us.

The client and server negotiated TLS version 1.x with 256-bit AES encryption

Security - other protocols

Querying to determine which features is available is important because older servers did not support STARTTLS, and even on newer servers the owner may have chosen to not enable and configure it.

OpenSSL can do 1-3 in a single step if it is invoked with the -starttls PROTOCOL_NAME command line argument. For example, openssl s_client -connect mx1.umich.edu:25 -starttls smtp

IMAP

POP is also a text-based protocol for accessing delivered email messages, but we only have time to cover IMAP here

IMAP

$ openssl s_client -quiet -crlf -connect imap.gmail.com:993
depth=1 /C=US/O=Google Inc/CN=Google Internet Authority
verify error:num=20:unable to get local issuer certificate
verify return:0
* OK Gimap ready for requests from 141.211.2.207 s49if2858279eem.15
A001 CAPABILITY
* CAPABILITY IMAP4rev1 UNSELECT IDLE NAMESPACE QUOTA ID XLIST CHILDREN X-GM-EXT-1 XYZZY SASL-IR AUTH=XOAUTH AUTH=XOAUTH2
A001 OK Thats all she wrote! s49if2858279eem.15
A002 LOGIN lsait-demo@umich.edu sooper5seekrit
* CAPABILITY IMAP4rev1 UNSELECT IDLE NAMESPACE QUOTA ID XLIST CHILDREN X-GM-EXT-1 UIDPLUS COMPRESS=DEFLATE ENABLE
A002 OK lsait-demo@umich.edu LSAIT Demonstration authenticated (Success)
A003 LIST "" "*"
* LIST (\HasNoChildren) "/" "INBOX"
* LIST (\Noselect \HasChildren) "/" "[Gmail]"
* LIST (\HasNoChildren \All) "/" "[Gmail]/All Mail"
* LIST (\HasChildren \HasNoChildren \Drafts) "/" "[Gmail]/Drafts"
* LIST (\HasChildren \HasNoChildren \Important) "/" "[Gmail]/Important"
* LIST (\HasChildren \HasNoChildren \Sent) "/" "[Gmail]/Sent Mail"
* LIST (\HasNoChildren \Junk) "/" "[Gmail]/Spam"
* LIST (\HasNoChildren \Flagged) "/" "[Gmail]/Starred"
* LIST (\HasChildren \Trash) "/" "[Gmail]/Trash"
A003 OK Success
A004 SELECT INBOX
* FLAGS (\Answered \Flagged \Draft \Deleted \Seen)
* OK [PERMANENTFLAGS (\Answered \Flagged \Draft \Deleted \Seen \*)] Flags permitted.
* OK [UIDVALIDITY 1] UIDs valid.
* 5 EXISTS
* 0 RECENT
* OK [UIDNEXT 8] Predicted next UID.
A004 OK [READ-WRITE] INBOX selected. (Success)
A005 FETCH 1:* (BODY[HEADER.FIELDS (DATE FROM SUBJECT)])
* 1 FETCH (BODY[HEADER.FIELDS (DATE FROM SUBJECT)] {118}
Date: Mon, 11 Feb 2013 07:02:52 -0800
Subject: Get started with Gmail
From: Gmail Team <mail-noreply@google.com>

)
* 2 FETCH (BODY[HEADER.FIELDS (DATE FROM SUBJECT)] {126}
Date: Mon, 11 Feb 2013 07:02:52 -0800
Subject: Get Gmail on your mobile phone
From: Gmail Team <mail-noreply@google.com>

)
* 3 FETCH (BODY[HEADER.FIELDS (DATE FROM SUBJECT)] {134}
Date: Mon, 11 Feb 2013 07:02:52 -0800
Subject: Customize Gmail with colors and themes
From: Gmail Team <mail-noreply@google.com>

)
* 4 FETCH (BODY[HEADER.FIELDS (DATE FROM SUBJECT)] {101}
Date: Tue, 12 Feb 2013 06:41:14 -0800 (PST)
From: markmont@umich.edu
Subject: SMTP test message

)
* 5 FETCH (BODY[HEADER.FIELDS (DATE FROM SUBJECT)] {98}
Date: Thu, 14 Feb 2013 08:22:39 -0800 (PST)
From: markmont@umich.edu
Subject: An attachment!

)
A005 OK Success
A006 FETCH 4 (FULL)
* 4 FETCH (BODY ("TEXT" "PLAIN" NIL NIL NIL "7BIT" 46 3) ENVELOPE ("Tue, 12 Feb 2013 06:41:14 -0800 (PST)" "SMTP test message" ((NIL NIL "markmont" "umich.edu")) ((NIL NIL "markmont" "umich.edu")) ((NIL NIL "markmont" "umich.edu")) ((NIL NIL "markmont" "umich.edu")) NIL NIL NIL "<511a548a.03fe2a0a.7995.3999SMTPIN_ADDED_MISSING@mx.google.com>") FLAGS (\Seen) INTERNALDATE "12-Feb-2013 14:41:14 +0000" RFC822.SIZE 1300)
A006 OK Success
A007 FETCH 4 (BODY[])
* 4 FETCH (BODY[] {1300}
Delivered-To: lsait-demo@go.itd.umich.edu
Received: by 10.76.87.170 with SMTP id az10csp128822oab;
        Tue, 12 Feb 2013 06:41:14 -0800 (PST)
X-Received: by 10.50.157.130 with SMTP id wm2mr3899411igb.1.1360680074655;
        Tue, 12 Feb 2013 06:41:14 -0800 (PST)
Return-Path: <lsait-demo-errors@umich.edu>
Received: from halloween.mr.itd.umich.edu (mx.umich.edu. [141.211.176.133])
        by mx.google.com with ESMTP id nc3si47015492icb.52.2013.02.12.06.41.14;
        Tue, 12 Feb 2013 06:41:14 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of lsait-demo-errors@umich.edu designates 141.211.2.210 as permitted sender) client-ip=141.211.2.210;
Authentication-Results: mx.google.com;
       spf=pass (google.com: best guess record for domain of lsait-demo-errors@umich.edu designates 141.211.2.210 as permitted sender) smtp.mail=lsait-demo-errors@umich.edu
Date: Tue, 12 Feb 2013 06:41:14 -0800 (PST)
Message-Id: <511a548a.03fe2a0a.7995.3999SMTPIN_ADDED_MISSING@mx.google.com>
Received: FROM umich.edu (zaxxon.gpcc.itd.umich.edu [141.211.2.210])
	By halloween.mr.itd.umich.edu ID 511A545E.7E9F.11263 ;
	12 Feb 2013 09:40:53 EST
From: markmont@umich.edu
To: markmont@umich.edu
Subject: SMTP test message

Text-based protocols are awesome!

-- Mark
)
A007 OK Success
A008 LOGOUT
* BYE LOGOUT Requested
A008 OK 73 good day (Success)
read:errno=0
$ 

We're actually connecting to IMAPS here (port 993), which is similar to HTTPS. Google doesn't support IMAP+STARTTLS (port 143).

Note how tags work

Note that many responses include a length in curly brackets before multi-line data, telling the client how much to expect. But there is also usually a closing parenthesis on a line by itself to also mark then end.

CalDAV


BEGIN:VCALENDAR
VERSION:2.0
BEGIN:VEVENT
DTSTART:20130215T180000Z
DTEND:20130215T190000Z
DTSTAMP:20130214T215116Z
UID:20130214T215116Z-1778@umich.edu
DESCRIPTION:Use new knowledge of text-based protocols to do something cool.
LOCATION:1112 LSA
SUMMARY:Friday Fun!
END:VEVENT
END:VCALENDAR

The iCalendar object on this slide is the same sort of thing you get as an ".ics" attachment when you get an email notification of a calendar event from Google.

CalDAV (authentication)

Google Calendar, like most CalDAV servers, uses HTTP Basic Access Authentication ("Basic Auth"). Basic Auth is a part of the HTTP standard (RFC 2616 section 14.8) and just consists of sending an HTTP request header with the authentication credentials with each request:

Authorization: Basic credentials-go-here

To create the credentials, just Base64 encode the username, a colon, and the password. For example:

$ echo -n 'lsait-demo@umich.edu:sooper5seekrit' | openssl base64
bHNhaXQtZGVtb0B1bWljaC5lZHU6c29vcGVyNXNlZWtyaXQ=
$ 

CalDAV

$ openssl s_client -quiet -crlf -connect www.google.com:443
depth=1 /C=US/O=Google Inc/CN=Google Internet Authority
verify error:num=20:unable to get local issuer certificate
verify return:0
PUT /calendar/dav/lsait-demo@umich.edu/events/18330DF8-49AA-4443-9592-7F5BE483CE47.ics HTTP/1.1
If-None-Match: *
Host: www.google.com
Authorization: Basic XXXXXXXXXXXXXXXXXX
Content-Type: text/calendar; charset=UTF-8
Content-Length: 301

BEGIN:VCALENDAR
VERSION:2.0
BEGIN:VEVENT
DTSTART:20130215T180000Z
DTEND:20130215T190000Z
DTSTAMP:20130214T235019Z
UID:20130214T235019Z-1778@umich.edu
DESCRIPTION:Use new knowledge of text-based protocols to do something cool.
LOCATION:1112 LSA
SUMMARY:Friday Fun!
END:VEVENT
END:VCALENDAR
HTTP/1.1 201 Created
Location: https://www.google.com/calendar/dav/lsait-demo@umich.edu/events/20130214T235019Z-1778%40umich.edu.ics
Date: Fri, 15 Feb 2013 01:34:01 GMT
Expires: Fri, 15 Feb 2013 01:34:01 GMT
Cache-Control: private, max-age=0
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Content-Length: 0
Server: GSE
Content-Type: text/html; charset=UTF-8

^C
$ 

The last component of the URI path has to uniquely identify the event; we use a UUID for this, although anything could be used as long as it is unique.

The UID field in the iCalendar object is similar, and will get used in the URL that Google Calendar creates for the event; we probably should have used the same UUID with "@umich.edu" on the end, but wound up using something shorter.

Note that we have to specify the size of the iCalendar object in the PUT request headers.

Other text-based protocols

Also, pretty much anything that communicates over the network just using XML is also a text-based protocol.

Non-text (binary) protocols

LDAP

To understand text-based protocols, it is helpful to see what they are not. Let's take LDAP as an example of a non-text-based protocol.

The Lightweight Directory Access Protocol (LDAP) (RFC 4511) allows access to and management of directories.

Let's say we want to look up my phone number in the University of Michigan MCommunity directory. One way to do this is to use a client (ldapsearch) that is linked against the OpenLDAP client library (libldap.so). The OpenLDAP client library then does all of the work of talking to the LDAP server via the LDAP protocol and interpreting the server's replies:

$ ldapsearch -x -LLL -h ldap.umich.edu -b "ou=People,dc=umich,dc=edu" \
  "(uid=markmont)" "telephoneNumber"
dn: uid=markmont,ou=People,dc=umich,dc=edu
telephoneNumber: 734/763-7413

$ 

If you are writing a program that uses LDAP, you will use an LDAP library. I won't get into LDAP client library APIs here, since each library has its own, and this talk is about protocols rather than how library writes choose to implement interfaces to the protocols.

LDAP - the messages

But what happens behind the scenes?

  1. Client connects to server via TCP on well-known LDAP service port (port 389)
  2. Client sends an LDAP "bind request" message to the server, specifying:
    • which version of LDAP the client is talking (version 3)
    • the username as which to authenticate (an anonymous user)
    • how the client wants to authenticate (simple or SASL, we use simple here)
    • the authentication credentials (an empty password, since we're binding anonymously)
  3. The server responds with a "bind result" message that includes a status code indicating whether the bind request was successful (it was).

LDAP - the messages

  1. The client sends a "search request" message containing:
    • the search base (ou=People,dc=umich,dc=edu)
    • the search scope (search the entire subtree)
    • how aliases should be dereferenced (find the base object)
    • size and time limits for the search (both unlimited)
    • the search filter (uid=markmont)
    • what attributes we want (telephoneNumber)

LDAP - the messages

  1. The server replies with a "search result entry" message containing:
    • the dn that was found (uid=markmont,ou=People,dc=umich,dc=edu)
    • a list of attribute value pairs (telephoneNumber, 734/763-7413)
  2. The server sends a "search result done" message saying that there are no referrals and that the search was successfully completed.
  3. The client sends an "unbind" message to the server.
  4. The client disconnects from the server.

LDAP - if it were text

The LDAP search is no more complicated than the text-based protocol examples we've looked at today. In fact, you could imagine it like this:

BIND LDAP3 anonymous simple ""
200 Bind successful
SEARCH
Base: ou=People,dc=umich,dc=edu
Filter: uid=markmont
Attributes: telephoneNumber

201 Search result
dn: uid=markmont,ou=People,dc=umich,dc=edu
attribute: telephoneNumber, 734/763-7413

200 Search done
UNBIND

Of course, since LDAP is not a text-based protocol, this is not how the messages are actually exchanged.

LDAP and ASN.1

LDAP represents its messages using ASN.1. Per Wikipedia,

Abstract Syntax Notation One (ASN.1) is a standard and flexible notation that describes rules and structures for representing, encoding, transmitting, and decoding data in telecommunications and computer networking. The formal rules enable representation of objects that are independent of machine-specific encoding techniques.

The first message, the bind request, looks like this in ASN.1:

message1 {
    messageID           1,
    protocolOp          BindRequest: [APPLICATION 0] {
        version         3,
        name            "",
        authentication  simple: [CHOICE 0] ""
    }
}

LDAP and BER

However, ASN.1 can't be directly transmitted over a network (except as text, which would defeat the purpose of a binary protocol). So LDAP uses Basic Encoding Rules (BER) encoding for the ASN.1 data structure. Per Wikipedia,

The BER format specifies a self-describing and self-delimiting format for encoding ASN.1 data structures. Each data element is encoded as a type identifier, a length description, the actual data elements, and, where necessary, an end-of-content marker. [...] This format allows a receiver to decode the ASN.1 information from an incomplete stream, without requiring any pre-knowledge of the size, content, or semantic meaning of the data.

LDAP and BER

The LDAP bind request looks like this (in hex) when the ASN.1 is encoded into BER:

30 0c 02 01 01 60 07 02  01 03 04 00 80 00

Here's what this means:

30 - type (for the message) is SEQUENCE
0c - length (of the sequence) (which follows) is 12 bytes
    02 - type (of the first thing in the sequence) is INTEGER
    01 - length (of the integer) is 1 byte
        01 - value (of the integer) is 1 (this is the messageID value)
    60 - type is application 0 ("BindRequest")
    07 - length (of the application) is 7 bytes
        02 - type (of the first thing in the application) is INTEGER
        01 - length (of the integer) is 1 byte
            03 - value (of the integer) is 3 (this is the version value)
        04 - type (of the second thing in the application) is STRING
        00 - length (of the string) is 0 bytes (this means the name is "")
            (because the length is 0, no value follows)
        80 - type (of the third thing in the sequence) is context-specific
             (meaning, it is defined by LDAP) and means CHOICE 0, which in
             turn (for LDAP) means "simple"
        00 - length (of the simple authentication data) is 0 bytes (this
             means there is no password, and, because the length is 0
             no value follows)

This is easier to interpret if you realize it's all just a type, followed by a length, and, if the length is not 0, followed by a value, repeating until the end of the message is reached.

LDAP - the complete search transaction

Client sends BindRequest:

  0000:  30 0c 02 01 01 60 07 02  01 03 04 00 80 00         0....`........

Server responds with BindResponse:

  0000:  30 0c 02 01 01 61 07 0a  01 00 04 00 04 00         0....a........

Client sends SearchRequest:

  0000:  30 53 02 01 02 63 4e 04  19 6f 75 3d 50 65 6f 70   0S...cN..ou=Peop
  0010:  6c 65 2c 64 63 3d 75 6d  69 63 68 2c 64 63 3d 65   le,dc=umich,dc=e
  0020:  64 75 0a 01 02 0a 01 00  02 01 00 02 01 00 01 01   du..............
  0030:  00 a3 0f 04 03 75 69 64  04 08 6d 61 72 6b 6d 6f   .....uid..markmo
  0040:  6e 74 30 11 04 0f 74 65  6c 65 70 68 6f 6e 65 4e   nt0...telephoneN
  0050:  75 6d 62 65 72                                     umber

Server responds with SearchResultEntry:

  0000:  30 52 02 01 02 64 4d 04                            0R...dM.
  0008:  26 75 69 64 3d 6d 61 72  6b 6d 6f 6e 74 2c 6f 75   &uid=markmont,ou
  0018:  3d 50 65 6f 70 6c 65 2c  64 63 3d 75 6d 69 63 68   =People,dc=umich
  0028:  2c 64 63 3d 65 64 75 30  23 30 21 04 0f 74 65 6c   ,dc=edu0#0!..tel
  0038:  65 70 68 6f 6e 65 4e 75  6d 62 65 72 31 0e 04 0c   ephoneNumber1...
  0048:  37 33 34 2f 37 36 33 2d  37 34 31 33               734/763-7413

Server sends SearchResultDone:

  0000:  30 0c 02 01 02 65 07 0a  01 00 04 00 04 00         0....e........

Client sends UnbindRequest:

  0000:  30 05 02 01 03 42 00                               0....B.


I'm not going to describe all the fields in each message, hopefully you got enough of an understanding from the ASN.1 description and BER encoding of the BindRequest.

Non-text protocols - summary

Hopefully this LDAP example illustrated the following:

Or, in short:

Text-based protocols are awesome!

Additional reading

I highly recommend The Art of Unix Programming! It's a great book that addresses the intrinsic values of many aspects of system administration and programming. Chapter 5 was used as a starting point for much of this presentation. The entire book is available on the web at the link on this slide.

Questions?