python-network-scripting-pp4e

Network Scripting in Programming Python, 4th Edition, This chapter introduces Internet fundamentals and explores sockets, the underlying communications mechanism of the Internet.

Plumbing the Internet

The Socket Layer

In simple terms, sockets are a programmable interface to connections between pro- grams, possibly running on different computers of a network. They allow data format- ted as byte strings to be passed between processes and machines. Sockets also form the basis and low-level “plumbing” of the Internet itself: all of the familiar higher-level Net protocols, like FTP, web pages, and email, ultimately occur over sockets. Sockets are also sometimes called communications endpoints because they are the portals through which programs send and receive bytes during a conversation.

socket的英文原义是“孔”或“插座”,通常也称作"套接字",用于描述IP地址和端口,是一个通信链的句柄。在Internet上的主机一般运行了多个服务软件,同时提供几种服务。每种服务都打开一个Socket,并绑定到一个端口上,不同的端口对应于不同的服务。Socket正如其英文原意那样,象一个多孔插座。一台主机犹如布满各种插座的房间,每个插座有一个编号,有的插座提供220伏交流电, 有的提供110伏交流电,有的则提供有线电视节目。 客户软件将插头插到不同编号的插座,就可以得到不同的服务。

Although often used for network conversations, sockets may also be used as a com- munication mechanism between programs running on the same computer, taking the form of a general Inter-Process Communication (IPC) mechanism. We saw this socket usage mode briefly in Chapter 5. Unlike some IPC devices, sockets are bidirectional data streams: programs may both send and receive data through them.

也可IPC,双向数据

To programmers, sockets take the form of a handful of calls available in a library. These socket calls know how to send bytes between machines, using lower-level operations such as the TCP network transmission control protocol. At the bottom, TCP knows how to transfer bytes, but it doesn’t care what those bytes mean. For the purposes of this text, we will generally ignore how bytes sent to sockets are physically transferred. To understand sockets fully, though, we need to know a bit about how computers are named.

Machine identifiers

  • Machine identifiers = Machine names(IP address) + Port numbers(0..1023)
  • domain name server = {domain name : IP address}

The Protocol Layer

Python provides support for standard protocols, which auto- mates most of the socket and message formatting details. Standard Internet protocols define a structured way to talk over sockets. They generally standardize both message formats and socket port numbers:

  • Message formats provide structure for the bytes exchanged over sockets during conversations.
  • Port numbers are reserved numeric identifiers for the underlying sockets over which messages are exchanged.

Port number rules

make it easier for programs to locate the standard protocols, port numbers in the range of 0 to 1023 are reserved and preassigned to the standard higher-level protocols.

Table 12-1. Port Numbers Reserved for Common Protocols

Protocol

Common Function

Port Number

Python Module

HTTP

Web pages

80

http.client , http.server

NNTP

Usenet news

119

nntplib

FTP data default

File transfers

20

ftplib

FTP control

File transfers

21

ftplib

SMTP

Sending email

25

smtplib

POP3

Fetching email

110

poplib

IMAP4

Fetching email

143

imaplib

Finger

Informational

79

n/a

Telnet

Command lines

23

telnetlib

Clients and servers

On one side of a conversation, machines that support standard protocols perpetually run a set of programs that listen for connection requests on the reserved ports. On the other end of a dialog, other machines contact those programs to use the services they export. We usually call the perpetually running listener program a server and the connecting program a client. Let’s use the familiar web browsing model as an example. As shown in Table 12-1, the HTTP protocol used by the Web allows clients and servers to talk over sockets on port 80:

Server

  • A machine that hosts websites usually runs a web server program that constantly listens for incoming connection requests, on a socket bound to port 80. Often, the server itself does nothing but watch for requests on its port perpetually; handling requests is delegated to spawned processes or threads.

Clients

  • Programs that wish to talk to this server specify the server machine’s name and port 80 to initiate a connection. For web servers, typical clients are web browsers like Firefox, Internet Explorer, or Chrome, but any script can open a client-side connection on port 80 to fetch web pages from the server. The server’s machine name can also be simply “localhost” if it’s the same as the client’s.

Protocol structures

The structure of those message bytes varies from protocol to protocol, is hidden by the Python library. For example, the FTP protocol prevents deadlock by conversing over two sockets: one for control messages only and one to transfer file data. An FTP server listens for control messages (e.g., “send me a file”) on one port, and transfers file data over another. FTP clients open socket connections to the server machine’s control port, send requests, and send or receive file data over a socket connected to a data port on the server machine. FTP also defines standard message structures passed between client and server. The control message used to request a file, for instance, must follow a standard format.

Python’s Internet Library Modules

In fact, each supported protocol is represented in Python’s standard library by either a module package of the same name as the protocol or by a module file with a name of the form xxxlib.py

Socket Programming

Although sockets them- selves transfer only byte strings, we can also transfer Python objects through them by using Python’s pickle module. Because this module converts Python objects such as lists, dictionaries, and class instances to and from byte strings, it provides the extra step needed to ship higher-level objects through sockets when required.

Beyond basic data communication tasks, the socket module also includes a variety of more advanced tools. For instance, it has calls for the following and more:

  • Converting bytes to a standard network ordering ( ntohl , htonl )
  • Querying machine name and address ( gethostname , gethostbyname )
  • Wrapping socket objects in a file object interface ( sockobj.makefile )
  • Making socket calls nonblocking ( sockobj.setblocking )
  • Setting socket timeouts ( sockobj.settimeout )

Socket Basics

Server side: open a TCP/IP socket on a port, listen for a message from a client, and send an echo reply; this is a simple one-shot listen/reply conversation per client, but it goes into an infinite loop to listen for more clients as long as this server script runs; the client may run on a remote machine, or on same computer if it uses 'localhost' for server

In [8]:
from socket import *
myHost = ''                                               # '' = all available interfaces on host
myPort = 50007                                        # listen on a non-reserved port number
sockobj = socket(AF_INET, SOCK_STREAM) #make a TCP socket object
sockobj.bind((myHost, myPort))                 #bind it to server port number
sockobj.listen(5)                                         #listen, allow 5 pending connects
while True:
    connections, address = sockobj.accept() #wait for next client connect
    print 'Server connected by', address       #connection is a new socket
    while True:
        data = connections.recv(1024)            # read next line on client socket
        if not data: break                                # send a reply line to the client
        connections.send(b'Echo =>' + data)   # until eof when socket closed
    connections.close()

Client side: use sockets to send data to the server, and print server's reply to each message line; 'localhost' means that the server is running on the same machine as the client, which lets us test client and server on one machine; to test over the Internet, run a server on a remote machine, and set serverHost or argv[1] to machine's domain name or IP addr; Python sockets are a portable BSD socket interface, with object methods for the standard socket calls available in the system's C library;

In [17]:
import sys
from socket import *
serverHost = 'localhost' # portable socket interface plus constants 
                                      # server name, or: 'starship.python.net'
serverPort = 50007        # non-reserved port used by the server
message = [b'Hello network world'] # default text to send to server
                                                        # requires bytes: b'' or str,encode()
if len(sys.argv) > 1:
    serverHost = sys.argv[1]              # server from cmd line arg 1
    if len(sys.argv) > 2:                      # text from cmd line args 2..n
        message = (x.encode() for x in sys.argv[2:]) 
        
sockobj = socket(AF_INET, SOCK_STREAM) # make a TCP/IP socket object
sockobj.connect((serverHost, serverPort)) # connect to server machine + port

for line in message:
    sockobj.send(line)                   # send line to server over socket
    data = sockobj.recv(1024)       # receive line from server: up to 1k
    print 'Client received:', data     # bytes are quoted, was `x`, repr(x)
sockobj.close()

Server socket calls

Uses the Python socket module to create a TCP socket object. The names AF_INET and SOCK_STREAM are preassigned variables defined by and imported from the socket module; using them in combination means “create a TCP/IP socket,” the standard communication device for the Internet. More specifically, AF_INET means the IP address protocol, and SOCK_STREAM means the TCP transfer protocol. The AF_INET / SOCK_STREAM combination is the default because it is so common, but it’s typical to make this explicit.

In [24]:
sockobj = socket(AF_INET, SOCK_STREAM)

Associates the socket object with an address—for IP addresses, we pass a server machine name and port number on that machine. This is where the server identifies the machine and port associated with the socket. In server programs, the hostname is typically an empty string (“”), which means the machine that the script runs on (formally, all available local and remote interfaces on the machine), and the port is a number outside the range 0 to 1023 (which is reserved for standard protocols, described earlier). Note that each unique socket dialog you support must have its own port number; if you try to open a socket on a port already in use, Python will raise an exception. Also notice the nested parentheses in this call—for the AF_INET address protocol socket here, we pass the host/port socket address to bind as a two-item tuple object (pass a string for AF_UNIX ). Te

In [25]:
sockobj.bind((myHost, myPort))

Starts listening for incoming client connections and allows for a backlog of up to five pending requests. The value passed sets the number of incoming client requests queued by the operating system before new requests are denied (which happens only if a server isn’t fast enough to process requests before the queues fill up). A value of 5 is usually enough for most socket-based programs; the value must be at least 1.

In [31]:
sockobj.listen(5)

At this point, the server is ready to accept connection requests from client programs running on remote machines (or the same machine) and falls into an infinite loop— while True (or the equivalent while 1 for older Pythons and ex-C programmers)— waiting for them to arrive:

In [ ]:
connection, address = sockobj.accept()

Waits for the next client connection request to occur; when it does, the accept call returns a brand-new socket object over which data can be transferred from and to the connected client. Connections are accepted on sockobj , but communication with a client happens on connection , the new socket. This call actually returns a two-item tuple— address is the connecting client’s Internet address. We can call accept more than one time, to service multiple client connections; that’s why each call returns a new, distinct socket for talking to a particular client.

Once we have a client connection, we fall into another loop to receive data from the client in blocks of up to 1,024 bytes at a time, and echo each block back to the client:

In [33]:
data = connection.recv(1024)

Reads at most 1,024 more bytes of the next message sent from a client (i.e., coming across the network or IPC connection), and returns it to the script as a byte string. We get back an empty byte string when the client has finished—end-of-file is triggered when the client closes its end of the socket.

In [35]:
connection.send(b'Echo=>' + data)

Sends the latest byte string data block back to the client program, prepending the string 'Echo=>' to it first. The client program can then recv what we send here— the next reply line. Technically this call sends as much data as possible, and returns the number of bytes actually sent. To be fully robust, some programs may need to resend unsent portions or use connection.sendall to force all bytes to be sent.

Transferring byte strings and objects

Although the socket model is limited to transferring byte strings, you can send and receive nearly arbitrary Python objects with the standard library pickle object serialization module. Its dumps and loads calls convert Python objects to and from byte strings, ready for direct socket transfer:

In [36]:
import pickle
In [38]:
x = pickle.dumps([99,100])       # on sending end... convert to byte strings
In [41]:
x                                              # string passed to send, returned by recv
Out[41]:
'(lp0\nI99\naI100\na.'
In [42]:
pickle.loads(x)                         # on receiving end... convert back to object
Out[42]:
[99, 100]

For simpler types that correspond to those in the C language, the struct module provides the byte-string conversion we need as well:

In [43]:
import struct
In [44]:
x = struct.pack('>ii', 99 ,100)      # convert simpler types for transmission
In [45]:
x
Out[45]:
'\x00\x00\x00c\x00\x00\x00d'
In [48]:
struct.unpack('>ii',x)
Out[48]:
(99, 100)

Client socket calls

In [50]:
sockobj.connect((serverHost, serverPort))

Opens a connection to the machine and port on which the server program is lis- tening for client connections. This is where the client specifies the string name of the service to be contacted. In the client, we can either specify the name of the remote machine as a domain name (e.g., starship.python.net) or numeric IP ad- dress. We can also give the server name as localhost (or the equivalent IP address 127.0.0.1 ) to specify that the server program is running on the same machine as the client; that comes in handy for debugging servers without having to connect to the Net. And again, the client’s port number must match the server’s exactly. Note the nested parentheses again—just as in server bind calls, we really pass the server’s host/port address to connect in a tuple object.

Once the client establishes a connection to the server, it falls into a loop, sending a message one line at a time and printing whatever the server sends back after each line is sent:

In [ ]:
sockobj.send(line)

Transfers the next byte-string message line to the server over the socket. Notice that the default list of lines contains bytes strings ( b'...' ). Just as on the server, data passed through the socket must be a byte string, though it can be the result of a manual str.encode encoding call or an object conversion with pickle or struct if desired. When lines to be sent are given as command-line arguments instead, they must be converted from str to bytes ; the client arranges this by en- coding in a generator expression (a call map(str.encode, sys.argv[2:]) would have the same effect).

Running Socket Programs Locally

The server keeps running and responds to requests made each time you run the client script in the other window.

Running Socket Programs Remotely

First, upload the server’s source file to a remote machine where you have an account and a Python. The & syntax in Unix/Linux shells can be used to run the server script in the background. Now that the server is listening for connections on the Net, run the client on your local computer multiple times again. This time, the client runs on a different machine than the server, so we pass in the server’s domain or IP name as a client command-line argument. The server still uses a machine name of "" because it always listens on what- ever machine it runs on.

In [51]:
!ping learning-python.com
PING learning-python.com (97.74.215.115) 56(84) bytes of data.
64 bytes from p3nlh266.shr.prod.phx3.secureserver.net (97.74.215.115): icmp_seq=1 ttl=38 time=210 ms
64 bytes from p3nlh266.shr.prod.phx3.secureserver.net (97.74.215.115): icmp_seq=2 ttl=38 time=212 ms
64 bytes from p3nlh266.shr.prod.phx3.secureserver.net (97.74.215.115): icmp_seq=3 ttl=38 time=214 ms
64 bytes from p3nlh266.shr.prod.phx3.secureserver.net (97.74.215.115): icmp_seq=4 ttl=38 time=209 ms
64 bytes from p3nlh266.shr.prod.phx3.secureserver.net (97.74.215.115): icmp_seq=5 ttl=38 time=217 ms
^C
--- learning-python.com ping statistics ---
6 packets transmitted, 5 received, 16% packet loss, time 5004ms
rtt min/avg/max/mdev = 209.249/212.888/217.931/3.055 ms

Socket pragmatics

  1. you can run the client and server like this on any two Internet-aware machines where Python is installed. All you need then is a computer that allows sockets, and most do.
  2. the socket module generally raises exceptions if you ask for something invalid.
  3. be sure to kill the server process before restarting it again, or else the port number will still be in use, and you’ll get another exception.

Spawning Clients in Parallel

In [52]:
import sys
from PP4E.launchmodes import QuietPortableLauncher
In [53]:
numclients = 8
def start(cmdline):
    QuietPortableLauncher(cmdline, cmdline)()
In [54]:
# start('echo-server.py')                     # spawn server locally if not yet started
args = ' '.join(sys.argv[1:])               # pass server name if running remotely
for i in range(numclients):
    start('echo-client.py %s' % args)  # spawn 8? clients to test the server

Talking to Reserved Ports

It’s also important to know that this client and server engage in a proprietary sort of discussion, and so use the port number 50007 outside the range reserved for standard protocols (0 to 1023). There’s nothing preventing a client from opening a socket on one of these special ports, however. For instance, the following client-side code con- nects to programs listening on the standard email, FTP, and HTTP web server ports on three different server machines:

In [56]:
from socket import *
talk to POP email server
In [57]:
sock = socket(AF_INET,SOCK_STREAM)
In [58]:
sock.connect(('pop.secureserver.net', 110)) 
In [59]:
print sock.recv(70)
+OK <28789.1401261144@p3plpop05-10.prod.phx3.secureserver.net>

In [60]:
sock.close()
talk to FTP server
In [63]:
sock = socket(AF_INET,SOCK_STREAM)
In [64]:
sock.connect(('learning-python.com', 21))
In [65]:
print sock.recv(70)
220---------- Welcome to Pure-FTPd [privsep] [TLS] ----------
220-You
In [60]:
sock.close()
talk to Python's HTTP server
In [78]:
sock = socket(AF_INET,SOCK_STREAM)
In [79]:
sock.connect(('www.python.net', 80))
In [80]:
sock.send(b'GET /\r\n') # fetch root page reply
Out[80]:
7
In [81]:
sock.recv(70)
Out[81]:
'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"\r\n    "http://'
In [82]:
sock.recv(70)
Out[82]:
'www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">\r\n<html xmlns="http://www.'
In [84]:
sock.close()

Python’s poplib , ftplib , and http.client and urllib.request modules provide higher-level interfaces for talking to servers on these ports.

Binding reserved port servers

Speaking of reserved ports, it’s all right to open client-side connections on reserved ports as in the prior section, but you can’t install your own server-side scripts for these ports unless you have special permission.

In [86]:
sock = socket(AF_INET,SOCK_STREAM)
In [88]:
sock.bind(('',80))
---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
<ipython-input-88-0c574f672e0b> in <module>()
----> 1 sock.bind(('',80))

/usr/lib/python2.7/socket.pyc in meth(name, self, *args)
    222 
    223 def meth(name,self,*args):
--> 224     return getattr(self._sock,name)(*args)
    225 
    226 for _m in _socketmethods:

error: [Errno 13] Permission denied

Handling Multiple Clients

In real-world client/server programs, it’s far more typical to code a server so as to avoid blocking new requests while handling a current client’s request. Perhaps the easiest way to do so is to service each client’s request in parallel—in a new process, in a new thread, or by manually switching (multiplexing) between clients in an event loop. This isn’t a socket issue per se, and we already learned how to start processes and threads in Chapter 5. But since these schemes are so typical of socket server programming, let’s explore all three ways to handle client requests in parallel here.

Forking Servers

Server side: open a socket on a port, listen for a message from a client, and send an echo reply; forks a process to handle each client connection; child processes share parent's socket descriptors; fork is less portable than threads--not yet on Windows, unless Cygwin or similar installed

In [92]:
import os, time, sys
from socket import *
myHost = ''                                               # '' = all available interfaces on host
myPort = 50007                                        # listen on a non-reserved port number

sockobj = socket(AF_INET, SOCK_STREAM) #make a TCP socket object
sockobj.bind((myHost, myPort))                 #bind it to server port number
sockobj.listen(5)                                         #listen, allow 5 pending connects

def now():
    return time.ctime(time.time())

activeChildren = []
def reapChildren():                                        # reap any dead child processes
    while activeChildren:                                 # else may fill up system table
        pid, stat = os.waitpid(0, os.WNOHANG) # don't hang if no child exited
        if not pid: break
        activeChildren.remove(pid)

# child process: reply, exit simulate a blocking activity
# read, write a client socket till eof when socket closed        
def handleClient(connection):
    time.sleep(5)
    while True:
        data = connections.recv(1024)            # read next line on client socket
        if not data: break                                # send a reply line to the client
        reply = 'Echo=>%s at %s' % (data,now())
        connections.send(reply.encode())       # until eof when socket closed
    connections.close()
    os._exit(0)

def dispatcher():                                               # listen until process killed 
    while True:                                                    # wait for next connection,
        connection, address = sockobj.accept()      # pass to process for service
        print 'Server connected by', address, 'at', now()
        reapChildren()                                          # clean up exited children now
        childPid = os.fork()                                    # copy this process
        if childPid == 0:                                        # if in child process: handle
            handleClient(connection)
        else:                                                          # else: go accept next connect
            activeChildren.append(childPid)             # add to active child pid list
In [96]:
dispatcher()

Running the forking server

In [ ]:
!netstat -pant | grep 50007 #show 50007 port
!kill -9 pid                             # kill python server

the test proceeds as follows:

  1. The server starts running remotely.
  2. All three clients are started and connect to the server a few seconds apart.
  3. On the server, the client requests trigger three forked child processes, which all immediately go to sleep for five seconds (to simulate being busy doing something useful).
  4. Each client waits until the server replies, which happens five seconds after their initial requests.

In a more realistic application, that delay could be fatal if many clients were trying to connect at once—the server would be stuck in the action we’re simulating with time.sleep , and not get back to the main loop to accept new client requests. With process forks per request, clients can be serviced in parallel.

Killing dead-but-listed child processes zombies

ps -af full process listing command shows that all the dead child pro- cesses stay in the system process table (show as )

When the reapChildren command is reactivated, dead child zombie entries are cleaned up each time the server gets a new client connection request, by calling the Python os.waitpid function. A few zombies may accumulate if the server is heavily loaded, but they will remain only until the next client connection is received (you get only as many zombies as processes served in parallel since the last accept )

In fact, if you type fast enough, you can actually see a child process morph from a real running program into a zombie. Here, for example, a child spawned to handle a new request changes to on exit. Its connection cleans up lingering zombies, and its own process entry will be removed completely when the next request is received.

Preventing zombies with signal handlers on Linux

On some systems, it’s also possible to clean up zombie child processes by resetting the signal handler for the SIGCHLD signal delivered to a parent process by the operating system when a child process stops or exits. If a Python script assigns the SIG_IGN (ignore) action as the SIGCHLD signal handler, zombies will be removed automatically and im- mediately by the operating system as child processes exit; the parent need not issue wait calls to clean up after them. Because of that, this scheme is a simpler alternative to manually reaping zombies on platforms where it is supported.

In [1]:
# Demo Python's signal module; pass signal number as a command-line arg, and use
# a "kill -N pid" shell command to send this process a signal; on my Linux machine,
# SIGUSR1=10, SIGUSR2=12, SIGCHLD=17, and SIGCHLD handler stays in effect even if
# not restored: all other handlers are restored by Python after caught, but SIGCHLD
# behavior is left to the platform's implementation; signal works on Windows too,
# but defines only a few signal types; signals are not very portable in general
In [4]:
import sys, signal, time
def now():
    return time.asctime()
def onSignal(signum, stackframe):              # Python signal handler
    print 'Got signal', signum, 'at', now()
    # most handlers stay in effect but sigchld handler is not
    if signum == signal.SIGCHLD:                #signal.signal(signal.SIGCHLD, onSignal)
        print 'sigchld caught'

signum = int(sys.argv[1])
signal.signal(signum, onSignal)                 # install signal handler
while True: signal.pause()                         # sleep waiting for signals

To run this script, simply put it in the background and send it signals by typing the kill -signal-number process-id shell command line; this is the shell’s equivalent of Python’s os.kill function available on Unix-like platforms only. Process IDs are listed in the PID column of ps command results. Here is this script in action catching signal numbers 10 (reserved for general use) and 9 (the unavoidable terminate signal).

In [5]:
import os, time, sys
from socket import *
myHost = ''                                               # '' = all available interfaces on host
myPort = 50007                                        # listen on a non-reserved port number

sockobj = socket(AF_INET, SOCK_STREAM) #make a TCP socket object
sockobj.bind((myHost, myPort))                 #bind it to server port number
sockobj.listen(5)                                         #listen, allow 5 pending connects
signal.signal(signal.SIGCHLD, signal.SIG_IGN) #avoid child zombie processes

def now():
    return time.ctime(time.time())

# child process: reply, exit simulate a blocking activity
# read, write a client socket till eof when socket closed        
def handleClient(connection):
    time.sleep(5)
    while True:
        data = connections.recv(1024)            # read next line on client socket
        if not data: break                                # send a reply line to the client
        reply = 'Echo=>%s at %s' % (data,now())
        connections.send(reply.encode())       # until eof when socket closed
    connections.close()
    os._exit(0)

def dispatcher():                                               # listen until process killed 
    while True:                                                    # wait for next connection,
        connection, address = sockobj.accept()      # pass to process for service
        print 'Server connected by', address, 'at', now()
        childPid = os.fork()                                    # copy this process
        if childPid == 0:                                        # if in child process: handle
            handleClient(connection)
  1. Much simpler; we don’t need to manually track or reap child processes.
  2. More accurate; it leaves no zombies temporarily between client requests.

this technique is not universally supported across all flavors of Unix. If you care about portability, manually reaping children as we did in Example 12-4 may still be desirable.

Why multiprocessing doesn’t help with socket server portability

Though it's crash on Win,open sockets are not correctly pickled when passed as arguments into the new process, it's ok on linux.

Threading Servers

Because threads all run in the same process and memory space, they automatically share sockets passed between them, similar in spirit to the way that child processes inherit socket descriptors. Unlike processes, though, threads are usually less expensive to start, and work on both Unix-like machines and Windows under standard Python today. Furthermore, many (though not all) see threads as simpler to program—child threads die silently on exit, without leaving behind zombies to haunt the server.

In [ ]:
# Server side: open a socket on a port, listen for a message from a client,
# and send an echo reply; echoes lines until eof when client closes socket;
# spawns a thread to handle each client connection; threads share global
# memory space with main thread; this is more portable than fork: threads
# work on standard Windows systems, but process forks do not
In [ ]:
import time, _thread as thread           # or use threading.Thread().start()
from socket import *                     # get socket constructor and constants
myHost = ''                              # server machine, '' means local host
myPort = 50007                           # listen on a non-reserved port number

sockobj = socket(AF_INET, SOCK_STREAM)           # make a TCP socket object
sockobj.bind((myHost, myPort))                   # bind it to server port number
sockobj.listen(5)                                # allow up to 5 pending connects

def now():
    return time.ctime(time.time())               # current time on the server

def handleClient(connection):                    # in spawned thread: reply
    time.sleep(5)                                # simulate a blocking activity
    while True:                                  # read, write a client socket
        data = connection.recv(1024)
        if not data: break
        reply = 'Echo=>%s at %s' % (data, now())
        connection.send(reply.encode())
    connection.close()

def dispatcher():                                # listen until process killed
    while True:                                  # wait for next connection,
        connection, address = sockobj.accept()   # pass to thread for service
        print 'Server connected by', address, 'at', now()
        thread.start_new_thread(handleClient, (connection,))

dispatcher()

Remember that a thread silently exits when the function it is running returns; unlike the process fork version, we don’t call anything like os . _exit in the client handler func- tion (and we shouldn’t—it may kill all threads in the process, including the main loop watching for new connections!). Because of this, the thread version is not only more portable, but also simpler.

Standard Library Server Classes

socketserver module defines classes that implement all flavors of forking and threading servers that you are likely to be interested in.

In [ ]:
"""
Server side: open a socket on a port, listen for a message from a client, and 
send an echo reply; this version uses the standard library module socketserver to
do its work; socketserver provides TCPServer, ThreadingTCPServer, ForkingTCPServer,
UDP variants of these, and more, and routes each client connect request to a new 
instance of a passed-in request handler object's handle method; socketserver also
supports Unix domain sockets, but only on Unixen; see the Python library manual.
"""

import SocketServer as socketserver, time               # get socket server, handler objects
myHost = ''                             # server machine, '' means local host
myPort = 50007                          # listen on a non-reserved port number
def now():
    return time.ctime(time.time())

class MyClientHandler(socketserver.BaseRequestHandler):
    def handle(self):                           # on each client connect
        print(self.client_address, now())       # show this client's address
        time.sleep(5)                           # simulate a blocking activity
        while True:                             # self.request is client socket
            data = self.request.recv(1024)      # read, write a client socket
            if not data: break
            reply = 'Echo=>%s at %s' % (data, now())
            self.request.send(reply.encode())
        self.request.close()

# make a threaded server, listen/handle clients forever
myaddr = (myHost, myPort)
server = socketserver.ThreadingTCPServer(myaddr, MyClientHandler)
server.serve_forever()

Multiplexing Servers with select

Technically, though, threads and processes don’t really run in parallel, unless you’re lucky enough to have a machine with many CPUs. Instead, your operating system performs a juggling act—it divides the computer’s processing power among all active tasks. It runs part of one, then part of another, and so on. All the tasks appear to run in parallel, but only because the operating system switches focus between tasks so fast that you don’t usually notice. This process of switching between tasks is sometimes called time-slicing when done by an operating system; it is more generally known as multiplexing.

In select-asynchronous servers, a single main loop run in a single process and thread decides which clients should get a bit of attention each time through. Client requests and the main dispatcher loop are each given a small slice of the server’s attention if they are ready to converse.

That is, when the sources passed to select are sockets, we can be sure that socket calls like accept , recv , and send will not block (pause) the server when applied to objects returned by select . Because of that, a single-loop server that uses select need not get stuck communicating with one client or waiting for new ones while other clients are starved for the server’s attention.

Because this type of server does not need to start threads or processes, it can be efficient when transactions with clients are relatively short-lived. However, it also requires that these transactions be quick; if they are not, it still runs the risk of becoming bogged down waiting for a dialog with a particular client to end, unless augmented with threads or forks for long-running transactions.

Confusingly, select-based servers are often called asynchronous, to describe their multiplexing of short-lived transactions. Really, though, the classic forking and threading servers we met earlier are asynchronous, too, as they do not wait for completion of a given client’s request. There is a clearer distinction between serial and parallel servers

  • {“synchronous”: “serial, process one transaction at a time”, “asynchronous” : “parallel” }
  • forking, threading, and select loops are three alternative ways to implement parallel, asynchronous servers.

A select-based echo server

can handle multiple clients without ever starting new processes or threads

In [ ]:
# P822
"""
Server: handle multiple clients in parallel with select. use the select
module to manually multiplex among a set of sockets: main sockets which
accept new client connections, and input sockets connected to accepted
clients; select can take an optional 4th arg--0 to poll, n.m to wait n.m
seconds, or omitted to wait till any socket is ready for processing.
"""

import sys
import time
from select import select
from socket import socket, AF_INET, SOCK_STREAM


def now():
    return time.ctime(time.time())

myHost = ''                             # server machine, '' means local host
myPort = 50007                          # listen on a non-reserved port number
if len(sys.argv) == 3:                  # allow host/port as cmdline args too
    myHost, myPort = sys.argv[1:]
numPortSocks = 2                        # number of ports for client connects

# make main sockets for accepting new client requests
mainsocks, readsocks, writesocks = [], [], []
for i in range(numPortSocks):
    portsock = socket(AF_INET, SOCK_STREAM)   # make a TCP/IP socket object
    portsock.bind((myHost, myPort))           # bind it to server port number
    portsock.listen(5)                              # listen, allow 5 pending connects
    mainsocks.append(portsock)                # add to main list to identify
    readsocks.append(portsock)                # add to select inputs list
    myPort += 1                               # bind on consecutive ports

# event loop: listen and multiplex until server process killed
print('select-server loop starting')
while True:
    # print(readsocks)
    readables, writeables, exceptions = select(readsocks, writesocks, [])
    for sockobj in readables:
        if sockobj in mainsocks:                     # for ready input sockets
            # port socket: accept new client
            newsock, address = sockobj.accept()      # accept should not block
            print('Connect:', address, id(newsock))  # newsock is a new socket
            readsocks.append(newsock)                # add to select list, wait
        else:
            # client socket: read next line
            data = sockobj.recv(1024)                # recv should not block
            print('\tgot', data, 'on', id(sockobj))
            if not data:                             # if closed by the clients
                sockobj.close()                      # close here and remv from
                readsocks.remove(sockobj)            # del list else reselected
            else:
                # this may block: should really select for writes too
                reply = 'Echo=>%s at %s' % (data, now())
                sockobj.send(reply.encode())

Formally, select is called with three lists of selectable objects (input sources, out- put sources, and exceptional condition sources), plus an optional timeout. The timeout argument may be a real wait expiration value in seconds (use floating-point numbers to express fractions of a second), a zero value to mean simply poll and return immediately, or omitted to mean wait until at least one object is ready (as done in our server script). The call returns a triple of ready objects—subsets of the first three arguments—any or all of which may be empty if the timeout expired before sources became ready.

If you’re interested in using select , you will probably also be interested in checking out the asyncore.py module in the standard Python library. It implements a class- based callback model, where input and output callbacks are dispatched to class methods by a precoded select event loop. As such, it allows servers to be con- structed without threads or forks, and it is a select -based alternative to the sock etserver module’s threading and forking module we met in the prior sections. As for this type of server in general, asyncore is best when transactions are short— what it describes as “I/O bound” instead of “CPU bound” programs, the latter of which still require threads or forks. See the Python library manual for details and a usage example.

Twisted

For other server options, see also the open source Twisted system (http://twistedmatrix.com). Twisted is an asynchronous networking framework written in Python that supports TCP, UDP, multicast, SSL/TLS, serial communication, and more. It supports both clients and servers and includes implementations of a number of commonly used network services such as a web server, an IRC chat server, a mail server, a relational database interface, and an object broker. Although Twisted supports processes and threads for longer-running actions, it also uses an asynchronous, event-driven model to handle clients, which is similar to the event loop of GUI libraries like tkinter. It abstracts an event loop, which multiplexes among open socket connections, automates many of the details in- herent in an asynchronous server, and provides an event-driven framework for scripts to use to accomplish application tasks. Twisted’s internal event engine is similar in spirit to our select -based server and the asyncore module, but it is re- garded as much more advanced. Twisted is a third-party system, not a standard library tool; see its website and documentation for more details.

Summary: Choosing a Server Scheme

  1. select

    • perform very well when client transactions are relatively short and are not CPU-bound.
    • split up the processing of a client’s request in such a way that it can be multiplexed with other requests and not block the server’s main loop
    • select also seems more complex than spawning either processes or threads, because we need to manually transfer control among all tasks (for instance, compare the threaded and select versions of our echo server, even without write selects).
  2. threads or forks

    • Threads and forks are especially useful if clients require long-running processing above and beyond the socket calls used to pass data.
  3. The asyncore standard library module

  4. Twisted

Making Sockets Look Like Files and Streams

allow a script to use standard stream tools such as the print and input built-in functions and sys module file calls (e.g., sys.stdout.write ), and connect them to sock- ets only when needed.

The socket object makefile method comes in handy anytime you wish to process a socket with normal file object methods or need to pass a socket to an existing interface or program that expects a file.

The makefile method also allows us to treat normally binary socket data as text instead of byte strings, and has additional arguments such as encoding that let us specify non- default Unicode encodings for the transferred text

Although text can always be encoded and decoded with manual calls after binary mode socket transfers, make file shifts the burden of text encodings from your code to the file wrapper object.

A Stream Redirection Utility

even when line buffering is requested, socket wrapper file writes (and by association, prints) are buffered until the program exits, manual flushes are reques- ted, or the buffer becomes full.

In [ ]:
# socket-unbuff-server.py
from __future__ import print_function
from socket import *           # read three messages over a raw socket
sock = socket()
sock.bind(('', 60000))
sock.listen(5)
print('accepting...')
conn, id = sock.accept()       # blocks till client connect

for i in range(3):
    print('receiving...')
    msg = conn.recv(1024)      # blocks till data received
    print(msg)                 # gets all print lines at once unless flushed
In [ ]:
# socket-unbuff-client.py
# send three msgs over wrapped and raw socket
from __future__ import print_function
import time
from socket import *
sock = socket()                        # default=AF_INET, SOCK_STREAM (tcp/ip)
sock.connect(('localhost', 60000))
# default=full buff, 0=error, 1 not linebuff!
file = sock.makefile('w', buffering=1)

print('sending data1')
file.write('spam\n')
time.sleep(5)               # must follow with flush() to truly send now
# file.flush()               # uncomment flush lines to see the difference

print('sending data2')
# adding more file prints does not flush buffer either
print('eggs', file)
time.sleep(5)
# file.flush()               # output appears at server recv only upon
# flush or exit

print('sending data3')
sock.send(b'ham\n')         # low-level byte string interface sends immediately
time.sleep(5)               # received first if don't flush other two!

Buffering in other contexts: Command pipes revisited

Buffered streams and deadlock are general issues that go beyond socket wrapper files.

In [ ]:
# pipe-unbuff-writer.py
# output line buffered (unbuffered) if stdout is a terminal, buffered by default for
# other devices: use -u or sys.stdout.flush() to avoid delayed output on pipe/socket
import time, sys
for i in range(5):
    print(time.asctime())                 # print transfers per stream buffering
    sys.stdout.write('spam\n')            # ditto for direct stream file access
    time.sleep(2)                         # unles sys.stdout reset to other file
In [ ]:
# no output for 10 seconds unless Python -u flag used or sys.stdout.flush()
# but writer's output appears here every 2 seconds when either option is used
from __future__ import print_function
import os
for line in os.popen('python -u pipe-unbuff-writer.py'):    # iterator reads lines
    print(line, end='')                                     # blocks without -u!

Sockets versus command pipes

why use sockets in this redirection role at all? Programs require a direct spawning relationship, command pipes do not support longerlived or remotely running servers the way that sockets do.

  • With sockets, we can start client and server independently, and the server may continue running perpetually to serve multiple clients (albeit with some changes to our utility module’s listener initialization code). Moreover, passing in remote machine names to our socket redirection tools would allow a client to connect to a server running on a completely different machine.
  • named pipes (fifos) accessed with the open call support stronger independence of client and server, too, but unlike sockets, they are usually limited to the local machine, and are not supported on all platforms.

A Simple Python File Server

implements both the server-side and the client-side logic needed to ship a requested file from server to client machines over a raw socket.

implement client and server-side logic to transfer an arbitrary file from server to client over a socket; uses a simple control-info protocol rather than separate sockets for control and data (as in ftp), dispatches each client request to a handler thread, and loops to transfer the entire file by blocks; see ftplib examples for a higher-level transport scheme

In [6]:
"""
#############################################################################
implement client and server-side logic to transfer an arbitrary file from
server to client over a socket; uses a simple control-info protocol rather
than separate sockets for control and data (as in ftp), dispatches each
client request to a handler thread, and loops to transfer the entire file
by blocks; see ftplib examples for a higher-level transport scheme;
#############################################################################
"""

import sys, os, time, thread
from socket import *

blksz = 1024
defaultHost = 'localhost'
defaultPort = 50001

helptext = """
Usage...
server=> getfile.py  -mode server            [-port nnn] [-host hhh|localhost]
client=> getfile.py [-mode client] -file fff [-port nnn] [-host hhh|localhost]
"""

def now():
    return time.asctime()

def parsecommandline():
    dict = {}                        # put in dictionary for easy lookup
    args = sys.argv[1:]              # skip program name at front of args
    while len(args) >= 2:            # example: dict['-mode'] = 'server'
        dict[args[0]] = args[1]
        args = args[2:]
    return dict

def client(host, port, filename):
    sock = socket(AF_INET, SOCK_STREAM)
    sock.connect((host, port))
    sock.send((filename + '\n').encode())      # send remote name with dir: bytes
    dropdir = os.path.split(filename)[1]       # filename at end of dir path
    file = open(dropdir, 'wb')                 # create local file in cwd
    while True:
        data = sock.recv(blksz)                # get up to 1K at a time
        if not data: break                     # till closed on server side
        file.write(data)                       # store data in local file
    sock.close()
    file.close()
    print('Client got', filename, 'at', now())

def serverthread(clientsock):
    sockfile = clientsock.makefile('r')        # wrap socket in dup file obj
    filename = sockfile.readline()[:-1]        # get filename up to end-line
    try:
        file = open(filename, 'rb')
        while True:
            bytes = file.read(blksz)           # read/send 1K at a time
            if not bytes: break                # until file totally sent
            sent = clientsock.send(bytes)
            assert sent == len(bytes)
    except:
        print 'Error downloading file on server:', filename
    clientsock.close()

def server(host, port):
    serversock = socket(AF_INET, SOCK_STREAM)     # listen on TCP/IP socket
    serversock.bind((host, port))                 # serve clients in threads
    serversock.listen(5)
    while True:
        clientsock, clientaddr = serversock.accept()
        print 'Server connected by', clientaddr, 'at', now()
        thread.start_new_thread(serverthread, (clientsock,))

def main(args):
    host = args.get('-host', defaultHost)         # use args or defaults
    port = int(args.get('-port', defaultPort))    # is a string in argv
    if args.get('-mode') == 'server':             # None if no -mode: client
        if host == 'localhost': host = ''         # else fails remotely
        server(host, port)
    elif args.get('-file'):                       # client mode needs -file
        client(host, port, args['-file'])
    else:
        print helptext

if __name__ == '__main__':
    args = parsecommandline()
    main(args)
  1. The server function farms out each incoming client request to a thread that trans- fers the requested file’s bytes.
  2. The client function sends the server a file’s name and stores all the bytes it gets back in a local file of the same name.
  3. The most novel feature here is the protocol between client and server: the client starts the conversation by shipping a filename string up to the server, terminated with an end- of-line character, and including the file’s directory path in the server. At the server, a spawned thread extracts the requested file’s name by reading the client socket, and opens and transfers the requested file back to the client, one chunk of bytes at a time.

Running the File Server and Clients

One subtle security point here: the server instance code is happy to send any server- side file whose pathname is sent from a client, as long as the server is run with a user- name that has read access to the requested file. If you care about keeping some of your server-side files private, you should add logic to suppress downloads of restricted files. I’ll leave this as a suggested exercise here, but we will implement such filename checks in a different getfile download tool later in this book.

Adding a User-Interface Frontend

For instance, it would be easy to implement a simple tkinter GUI frontend to the client- side portion of the getfile script we just met. Such a tool, run on the client machine, may simply pop up a window with Entry widgets for typing the desired filename, server, and so on. Once download parameters have been input, the user interface could either import and call the getfile.client function with appropriate option arguments, or build and run the implied getfile.py command line using tools such as os.system , os.popen , subprocess , and so on.

Using row frames and command lines
In [ ]:
"""
launch getfile script client from simple tkinter GUI;
could also use os.fork+exec, os.spawnv (see Launcher);
windows: replace 'python' with 'start' if not on path;
"""

import os
from tkinter import *
from tkinter.messagebox import showinfo
def onReturnKey():
    cmdline = ('python getfile.py -mode client -file %s -port %s -host %s' %
                      (content['File'].get(),
                       content['Port'].get(),
                       content['Server'].get()))
    os.system(cmdline)
    showinfo('getfilegui-1', 'Download complete')

box = Tk()
labels = ['Server', 'Port', 'File']
content = {}
for label in labels:
    row = Frame(box)
    row.pack(fill=X)
    Label(row, text=label, width=6).pack(side=LEFT)
    entry = Entry(row)
    entry.pack(side=RIGHT, expand=YES, fill=X)
    content[label] = entry

box.title('getfilegui-1')
box.bind('<Return>', (lambda event: onReturnKey()))
mainloop()
Using grids and function calls
In [ ]:
"""
same, but with grids and import+call, not packs and cmdline;
direct function calls are usually faster than running files;
"""

import getfile
from tkinter import *
from tkinter.messagebox import showinfo

def onSubmit():
    getfile.client(content['Server'].get(),
                   int(content['Port'].get()),
                   content['File'].get())
    showinfo('getfilegui-2', 'Download complete')

box    = Tk()
labels = ['Server', 'Port', 'File']
rownum  = 0
content = {}
for label in labels:
    Label(box, text=label).grid(column=0, row=rownum)
    entry = Entry(box)
    entry.grid(column=1, row=rownum, sticky=E+W)
    content[label] = entry
    rownum += 1

box.columnconfigure(0, weight=0)   # make expandable
box.columnconfigure(1, weight=1)
Button(text='Submit', command=onSubmit).grid(row=rownum, column=0, columnspan=2)

box.title('getfilegui-2')
box.bind('<Return>', (lambda event: onSubmit()))
mainloop()
Using a reusable form-layout class

If you’re like me, though, writing all the GUI form layout code in those two scripts can seem a bit tedious, whether you use packing or grids. In fact, it became so tedious to me that I decided to write a general-purpose form-layout class, shown in Exam- ple 12-20, which handles most of the GUI layout grunt work.

In [ ]:
"""
##################################################################
a reusable form class, used by getfilegui (and others)
##################################################################
"""

from tkinter import *
entrysize = 40

class Form:                                           # add non-modal form box
    def __init__(self, labels, parent=None):          # pass field labels list
        labelsize = max(len(x) for x in labels) + 2
        box = Frame(parent)                           # box has rows, buttons
        box.pack(expand=YES, fill=X)                  # rows has row frames
        rows = Frame(box, bd=2, relief=GROOVE)        # go=button or return key
        rows.pack(side=TOP, expand=YES, fill=X)       # runs onSubmit method
        self.content = {}
        for label in labels:
            row = Frame(rows)
            row.pack(fill=X)
            Label(row, text=label, width=labelsize).pack(side=LEFT)
            entry = Entry(row, width=entrysize)
            entry.pack(side=RIGHT, expand=YES, fill=X)
            self.content[label] = entry
        Button(box, text='Cancel', command=self.onCancel).pack(side=RIGHT)
        Button(box, text='Submit', command=self.onSubmit).pack(side=RIGHT)
        box.master.bind('<Return>', (lambda event: self.onSubmit()))

    def onSubmit(self):                                      # override this
        for key in self.content:                             # user inputs in
            print(key, '\t=>\t', self.content[key].get())    # self.content[k]

    def onCancel(self):                                      # override if need
        Tk().quit()                                          # default is exit

class DynamicForm(Form):
    def __init__(self, labels=None):
        labels = input('Enter field names: ').split()
        Form.__init__(self, labels)
    def onSubmit(self):
        print('Field values...')
        Form.onSubmit(self)
        self.onCancel()

if __name__ == '__main__':
    import sys
    if len(sys.argv) == 1:
        Form(['Name', 'Age', 'Job'])     # precoded fields, stay after submit
    else:
        DynamicForm()                    # input fields, go away after submit
    mainloop()
In [ ]:
"""
launch getfile client with a reusable GUI form class;
os.chdir to target local dir if input (getfile stores in cwd);
to do: use threads, show download status and getfile prints;
"""

from form import Form
from tkinter import Tk, mainloop
from tkinter.messagebox import showinfo
import getfile, os

class GetfileForm(Form):
    def __init__(self, oneshot=False):
        root = Tk()
        root.title('getfilegui')
        labels = ['Server Name', 'Port Number', 'File Name', 'Local Dir?']
        Form.__init__(self, labels, root)
        self.oneshot = oneshot

    def onSubmit(self):
        Form.onSubmit(self)
        localdir   = self.content['Local Dir?'].get()
        portnumber = self.content['Port Number'].get()
        servername = self.content['Server Name'].get()
        filename   = self.content['File Name'].get()
        if localdir:
            os.chdir(localdir)
        portnumber = int(portnumber)
        getfile.client(servername, portnumber, filename)
        showinfo('getfilegui', 'Download complete')
        if self.oneshot: Tk().quit()  # else stay in last localdir

if __name__ == '__main__':
    GetfileForm()
    mainloop()

One caveat worth pointing out here: the GUI is essentially dead while the download is in progress (even screen redraws aren’t handled—try covering and uncovering the window and you’ll see what I mean). We could make this better by running the down- load in a thread, but since we’ll see how to do that in the next chapter when we explore the FTP protocol, you should consider this problem a preview.

In particular, getfile clients can talk only to machines that are running a getfile server. In the next chapter, we’ll discover another way to download files—FTP—which also runs on sockets but provides a higher-level interface and is available as a standard service on many machines on the Net. We don’t generally need to start up a custom server to transfer files over FTP, the way we do with getfile . In fact, the user-interface scripts in this chapter could be easily changed to fetch the desired file with Python’s FTP tools, instead of the getfile module. But instead of spilling all the beans here, I’ll just say, “Read on.”

Using Serial Ports

If you’re looking for a lower-level way to communicate with devices in general, though, you may also be interested in the topic of Python’s serial port interfaces. This isn’t quite related to Internet scripting, but it’s similar enough in spirit and is discussed often enough on the Net to merit a few words here.

In brief, scripts can use serial port interfaces to engage in low-level communication with things like mice, modems, and a wide variety of serial devices and hardware. Serial port interfaces are also used to communicate with devices connected over infrared ports (e.g., hand-held computers and remote modems). Such interfaces let scripts tap into raw data streams and implement device protocols of their own. Other Python tools such as the ctypes and struct modules may provide additional tools for creating and extracting the packed binary data these ports transfer.

At this writing, there are a variety of ways to send and receive data over serial ports in Python scripts. Notable among these options is an open source extension package known as pySerial, which allows Python scripts to control serial ports on both Windows and Linux, as well as BSD Unix, Jython (for Java), and IronPython (for .Net and Mono). Unfortunately, there is not enough space to cover this or any other serial port option in any sort of detail in this text. As always, see your favorite web search engine for up- to-date details on this front.