Python调用COM的API实现Microsoft Office(PPT, Excel, Word)办公自动化

In [3]:
from IPython.core.display import Image
pic = lambda p : Image(filename='./resPPT/%s' % p)

Automating Microsoft Office with Python

Windows applications, for many years, have provided a COM API for automation. This includes Microsoft Office as well.

pywin32 is a library that lets you do many interesting things in Windows, including access these COM APIs.

For example, to open PowerPoint and draw a circle, this is what it takes:

Python实现Microsoft Office办公自动化

多年以前,微软就为Windows软件出了COM组件,便于调用其API实现自动化,也包括Microsoft Office。



In [17]:
import win32com.client

# Open PowerPoint
Application = win32com.client.Dispatch("PowerPoint.Application")

# Add a presentation
Presentation = Application.Presentations.Add()

# Add a slide with a blank layout (12 stands for blank layout)
Base = Presentation.Slides.Add(1, 12)

# Add an oval. Shape 9 is an oval.
oval = Base.Shapes.AddShape(9, 100, 100, 100, 100)
In [4]:

You'll have to try this out to see the result, but just FYI, this will open a new PowerPoint window and add a slide with a circle in it.

This opens up a lot of opportunities for slideware. Similarly, we can open an Excel application, add a circle, and change a few cells.



In [6]:
# Open Excel
Application = win32com.client.Dispatch("Excel.Application")

# Show Excel. Unlike PPT, Word & Excel open up "hidden"
Application.Visible = 1

# Add a workbook
Workbook = Application.Workbooks.Add()

# Take the active sheet
Base = Workbook.ActiveSheet

# Add an oval. Shape 9 is an oval.
oval = Base.Shapes.AddShape(9, 100, 100, 100, 100)

# In the first row, add Values: 0.0, 0.5, 1.0
Base.Cells(1, 1).Value = 'Values'
Base.Cells(1, 2).Value = 0.0
Base.Cells(1, 3).Value = 0.5
Base.Cells(1, 4).Value = 1.0
In [7]:

This means one can go about creating Excel models directly with Python.


Picturing the IMDb Top 250

Let's begin by creating a slide that shows all of the Top 250 movies on the IMDb.

First, let's load all the movies.

画图——IMDb Top 250

让我们做点儿更好玩儿的,用PPT显示IMDb的Top 250 电影

In [18]:
from lxml.html import parse

tree = parse('')
movies = tree.findall('.//table[@class="chart"]//td[@class="titleColumn"]//a')
'The Shawshank Redemption'

We can show these movies. But before that, we can't remember these numbers for the shapes (like 9 is for circle). Let's pre-define those in line with how Office uses them and import them.


In [79]:
# 使用win32com里面的makepy -d生成MSO, MSPPT
import MSO, MSPPT
g = globals()
for c in dir(MSO.constants):    g[c] = getattr(MSO.constants, c)
for c in dir(MSPPT.constants):  g[c] = getattr(MSPPT.constants, c)

Let's draw each movie poster as a little box on a 25x10 grid. We don't have the images yet, but first, let's just draw the rectangles.


In [40]:
Base = Presentation.Slides.Add(1, ppLayoutBlank)

width, height = 28, 41
for i, movie in enumerate(movies):
    # 25 columns one row, column step is 28
    x = 10 + width * (i % 25)
    # 10 rows one column, row step is 41
    y = 100 + height * (i // 25)
    r = Base.Shapes.AddShape(
            5, # Shape Rectangle
            x, y,
            width, height)
In [8]:

It would be nice to get posters into those, so let's scrape the posters.


In [16]:
import os
from urlparse import urljoin
from urllib import urlretrieve
from hashlib import md5

# We'll keep the files under an img/ folder
if not os.path.exists('img'):
def filename(movie):
    '''Filename = MD5 hash of its title in UTF8'''
    name = md5(movie.text_content().encode('utf8')).hexdigest()
    return os.path.join('img', name + '.jpg')
for movie in movies:
    if os.path.exists(filename(movie)):
    url = urljoin('', movie.get('href'))
    tree = parse(url)
    img = tree.find('.//td[@id="img_primary"]//img')
    urlretrieve(img.get('src'), filename(movie))

Now, instead of just rectangles, we'll use the posters.


In [22]:
Base = Presentation.Slides.Add(1, 12)

width, height = 28, 41
for i, movie in enumerate(movies):
    x = 10 + width * (i % 25)
    y = 100 + height * (i // 25)
    image = Base.Shapes.AddPicture(
        Left=x, Top=y,
        Width=width, Height=height)
In [9]:

Wouldn't it be nice to have these hyperlinked to the movies?


In [36]:
Base = Presentation.Slides.Add(1, 12)

width, height = 28, 41
for i, movie in enumerate(movies):
    x = 10 + width * (i % 25)
    y = 100 + height * (i // 25)
    image = Base.Shapes.AddPicture(
        Left=x, Top=y,
        Width=width, Height=height)
    url = urljoin('', movie.get('href'))
    link = image.ActionSettings(ppMouseClick).Hyperlink
    link.Address = url
    link.ScreenTip = movie.text_content().encode('cp1252')

This is ordered by rank, which is useful, but this makes it hard to locate a specific movie. What if we could sort this alphabetically?

But then, we don't want to lose the ordering by rank either. Could we, perhaps, get these movies to move on the click of a button to alphabetical or rank order?

Let's start by adding two buttons -- one to sort alphabetically and the ohter to sort by rank.




In [47]:
Base = Presentation.Slides.Add(1, 12)

# Add two buttons: alphabetical and by rating
button_alpha = Base.Shapes.AddShape(5, 400, 10, 150, 40)
button_alpha.TextFrame.TextRange.Text = 'Alphabetical'

button_rating = Base.Shapes.AddShape(5, 560, 10, 150, 40)
button_rating.TextFrame.TextRange.Text = 'By rating'

# Get the index position when sorted alphabetically
movies_alpha = sorted(movies, key=lambda v: v.text_content())
index_alpha = dict((movie.text_content(), i) for i, movie in enumerate(movies_alpha))

We'll create a function that moves an image along a path when a trigger is clicked. This will be applied to each of the images.


In [54]:
def animate(seq, image, trigger, path, duration=1.5):
    '''Move image along path when trigger is clicked'''
    effect = seq.AddEffect(
    ani = effect.Behaviors.Add(msoAnimTypeMotion)
    ani.MotionEffect.Path = path
    effect.Timing.TriggerType = msoAnimTriggerWithPrevious
    effect.Timing.TriggerShape = trigger
    effect.Timing.Duration = duration

Finally, we draw all the images. After drawing them, we specify one animation for alphabetical ordering, and another for ordering by rating.


In [55]:
seq_alpha = Base.TimeLine.InteractiveSequences.Add()
seq_rating = Base.TimeLine.InteractiveSequences.Add()
width, height = 28, 41
for i, movie in enumerate(movies):
    x = 10 + width * (i % 25)
    y = 100 + height * (i // 25)
    image = Base.Shapes.AddPicture(
        Left=x, Top=y,
        Width=width, Height=height)
    url = urljoin('', movie.get('href'))
    link = image.ActionSettings(ppMouseClick).Hyperlink
    link.Address = url
    link.ScreenTip = movie.text_content().encode('cp1252')
    # Alphabetical 
    index = index_alpha[movie.text_content()]
    animate(seq_alpha, image, trigger=button_alpha, path='M0,0 L{:.3f},{:.3f}'.format(
        (10 + width * (index % 25) - x) / 720.,
        (100 + height * (index // 25) - y) / 540.,
    # By rating
    animate(seq_rating, image, trigger=button_rating, path='M{:.3f},{:.3f} L0,0'.format(
        (10 + width * (index % 25) - x) / 720.,
        (100 + height * (index // 25) - y) / 540.,
In [10]:

Simple Treemap

Let’s do something that you’d have trouble doing manually in PowerPoint: a Treemap. The Guardian’s data store kindly makes available the top 50 banks by assets that we’ll use for this example. Our target output is a simple Treemap visualisation.


我们再用PPT做一个费事的Treemap。我们将使用Guardian’s data store里面的top 50 banks by assets数据来做一个简单的矩形树状结构图

In [85]:
Application = win32com.client.Dispatch("PowerPoint.Application")
Application.Visible = True
Presentation = Application.Presentations.Add()
Slide = Presentation.Slides.Add(1, ppLayoutBlank)

Now let’s import data from The Guardian. The spreadsheet is Google Spreadsheets and we can get just the banks and assets as a CSV file by adding "&output=csv&range=B2:C51" (via OUseful.Info).

现在我们导入数据Guardian. 数据在Google Spreadsheets,我们可以通过在链接后面增加"&output=csv&range=B2:C51"获取数据 (参考 OUseful.Info).

In [86]:
import urllib2, csv
url = ''
reader = csv.reader(urllib2.urlopen(url))
data = list((int(size.replace(',','')), bank.decode('utf8')) for bank, size in reader)

I created a simple Treemap class based on the squarified algorithm — you can play with the source code. This Treemap class can be fed the the data in the format we have, and a draw function. The draw function takes (x, y, width, height, data_item) as parameters, where data_item is a row in the data list that we pass to it.

我用squarified algorithm建了一个simple Treemap类,你可以在源代码看到。这个Treemap类可以把数据转换成指定的格式。

In [87]:
class Treemap:
    def __init__(self, width, height, data, draw):
        '''Treemap(width, height, data, fn) '''
        self.x, self.y = 0.0, 0.0
        self.scale  = (float(width * height) / self.get(data, sum)) ** 0.5
        self.width  = float(width)  / self.scale
        self.height = float(height) / self.scale
        self.draw   = draw
        self.squarify(data, [], min(self.width, self.height))

    def get(self, data, fn): return fn((x[0] for x in data))

    def layoutrow(self, row):
        if self.width >= self.height:
            dx = self.get(row, sum) / self.height
            step = self.height / len(row)
            for i,v in enumerate(row): self.draw(self.scale * self.x, self.scale * (self.y + i * step), self.scale * dx, self.scale * step, v)
            self.x += dx
            self.width -= dx
            dy = self.get(row, sum) / self.width
            step = self.width / len(row)
            for i,v in enumerate(row): self.draw(self.scale * (self.x + i * step), self.scale * self.y, self.scale * step, self.scale * dy, v)
            self.y += dy
            self.height -= dy

    def aspect(self, row, w):
        s = self.get(row, sum)
        return max(w*w*self.get(row, max)/s/s, s*s/w/w/self.get(row, max))

    def squarify(self, children, row, w):
        if not children:
            if row: self.layoutrow(row)
        c = children[0]
        if not row or self.aspect(row, w) > self.aspect(row + [c], w):
            self.squarify(children[1:], row + [c], w)
            self.squarify(children, [], min(self.height, self.width))

还包括一个画图函数。画图函数参数为(x, y, width, height, data_item),其中data_item是输入数据。

In [88]:
def draw(x, y, w, h, n):
    shape = Slide.Shapes.AddShape(msoShapeRectangle, x, y, w, h)
    shape.TextFrame.TextRange.Text = n[1] + ' (' + str(int(n[0]/1000 + 500)) + 'M)'
    shape.TextFrame.MarginLeft = shape.TextFrame.MarginRight = 0
    shape.TextFrame.TextRange.Font.Size = 12
Treemap(720, 540, data, draw)
<__main__.Treemap instance at 0x08BFF148>
In [11]:

Real-life examples

You can see how these were put to use at Gramener:

  • Vijay Karnataka's election coverage was produced entirely in PowerPoint using Python. (And in Kannada, no less!)
  • Appstore downloads -- a mini interactive application that shows the number of downloads from an app store, broken up by country, device, customer segment and time.
  • Revenue breakup of an FMCG company as a clickable application



Live tweeting

Just for kicks, let's use PowerPoint as a dashboard to show live tweets.

I picked TwitterAPI to get streaming results, but twython and Python Twitter Tools look fine too.

Live tweeting


我用TwitterAPI来获取信息,twythonPython Twitter Tools当然也行。

In [60]:
from TwitterAPI import TwitterAPI

# I'm keeping my keys and secrets in a secret file.
from secret_twitter import consumer_key, consumer_secret, access_token_key, access_token_secret
api = TwitterAPI(consumer_key, consumer_secret, access_token_key, access_token_secret)

This function will draw a tweet in a reasonably nice way on a slide. There's a block each for the profile picture, the text of the tweet, and the name of the user.


In [64]:
def draw_tweet(Base, item, pos):
    y = 40 + (pos % 4) * 120
    image = Base.Shapes.AddPicture(
        # To get the larger resolution image, just remove _normal from the URL
        item['user']['profile_image_url'].replace('_normal', ''),
        Left=20, Top=y,
        Width=100, Height=100)
        status = item['text'].encode('cp1252')
    except UnicodeEncodeError:
        status = item['text']
    text = Base.Shapes.AddShape(5, 130, y, 460, 100)
    text.Fill.ForeColor.ObjectThemeColor = msoThemeColorText1
    text.Fill.ForeColor.Brightness = +0.95
    text.Line.Visible = msoFalse
    text.TextFrame.TextRange.Text = status
    text.TextFrame.TextRange.Font.Color.ObjectThemeColor = msoThemeColorText1
    text.TextFrame.TextRange.ParagraphFormat.Alignment = ppAlignLeft
    user = Base.Shapes.AddShape(msoShapeRectangle, 600, y, 100, 100)
    user.Fill.ForeColor.ObjectThemeColor = msoThemeColorAccent6
    user.Line.Visible = False
    user.TextFrame.TextRange.Text = '@' + item['user']['screen_name']

Let's track requests for specific words, and see what we get.


In [ ]:
Base = Presentation.Slides.Add(1, ppLayoutBlank)
api.request('statuses/filter', {'track': '1'})

for pos, item in enumerate(api.get_iterator()):
    draw_tweet(Base, item, pos)
    if pos > 10: