Python扩展方法及工具比较歪歪虫

ekylin 2006-08-10

展开全文

Python扩展方法及工具比较

一、普通扩展方法
扩展Python包括三个步骤：
1. 创建源程序(C, C++, java, ...)；
2. 为源程序写wrap代码；
包括四个步骤：
Ø include "Python.h";
Ø 为每个模块函数写wrap，即：PyObject* Module_func()；
Ø 为每个模块函数写函数定义，即：PyMethodDef ModuleMethods[]对应表；
Ø 写模块的初始化函数：void initModule()部分。
3. 编译连接；
有两种方法：
(1)使用distutils包。步骤如下：
Ø 修改distutils包中的setup.py文件；
Ø 根据需要运行$ python setup.py build或$ python setup.py install命令，生成扩展模块的共享库文件。
(2)直接使用gcc命令将模块编译成共享库。命令如下：
$ gcc -fpic -c -I/usr/include/python2.2 -I/usr/lib/python2.2/config foo.c wrap_foo.c
$ gcc -shared -o foo.so foo.o wrap_foo.o
这样便生成了python可用的foo模块。
4. 使用扩展内容
进入python环境，通过import foo，使用foo模块中的函数。如用foo.func()调用foo模块中的func()函数。

二、Python的C++扩展
通过C++扩展python也是可以的，只不过有一些限制。如果主函数(Python解释器)被C编译器编译和连接，那么构造器中不能使用全局和静态对象。但如果使用C++编译器则没有这个问题。另外，将被Python解释器调用的函数（特别是初始化函数），需要在函数体外使用extern "C" 声明。同时要在Python头文件处使用extern "C" {...}声明。
一个示例程序（wraper文件）：
extern "C"
{
#include "Python.h"
}
PyObject *cppextest_print_logo(PyObject *self, PyObject *args)
{
char *string;
if (!PyArg_ParseTuple(args, "s", &string)) return NULL;
return Py_None;
}
static PyMethodDef
cppextestMethods[] = {
{"print_logo", cppextest_print_logo, METH_VARARGS},
{NULL, NULL},
};
extern "C"
void initcppextest(void)
{
Py_InitModule("cppextest", cppextestMethods);
}

三、使用工具进行扩展
虽然扩展过程并不复杂，但也可以使用许多已知的工具简化扩展过程。
(1) SWIG
由David Beazley创建，是一个自动的扩展构造工具。它读入注释的C/C++头文件，为python、tcl、perl等多种脚本语言产生wrap代码。SWIG可以包装大量C++特性到Python的扩展模块中。详情可参考http://www.。
评价：swig简单，可以支持多种脚本文件，但支持的c++特性不完备。

(2) SIP
由Phil Thompson创建，是一个C++模块构造器，专门为C++的类创造wrapper。它曾经被用于创建PyQt和PyKDE扩展模块，因此比较出名。详情可参考http://www./sip/。
评价：支持C++特征很齐全，但比较复杂。

(3) bgen
该工具被包含在标准Python发布包中的模块构建工具集里，由Jack Jansen维护。它用于产生在Macintosh版本可用的Python扩展模块。

(4) pyfort
由Paul dubois创建，用来产生Fortran语言生成的扩展模块。详见http://pyfortran.。

(5) cxx
也由Paul Dubois创建，是一个库，为Python的C++扩展提供了友好的API。Cxx允许将许多python对象（如list和tuple）使用到STL的运算中。库也提供了C++异常处理到python异常处理的转化。详见http://cxx.。

(6) WrapPy
由Greg Couch创建，通过读入C++头文件来产生扩展模块。详见http://www.cgl./home/gregc/wrappy/index.html。

(7) Boost Python Library
由David Abrahams创建。该库提供了更多与众不同的C++ wrap到python扩展中，而只需要对要扩展的C++类写很少的附加信息。详见http://www./libs/python/doc。
评价：Boost为C++提供了许多实用的库，如Regex（正则表达式库）、Graph（图组件和算法）、concept check（检查泛型编程中的concept）、Thread（可移植的C++多线程库）、Python（把C++类和函数映射到Python之中）、Pool（内存池管理）等等。
Boost总体来说是实用价值很高，质量很高的库。并且强调对跨平台的支持。但是Boost中也有很多是实验性质的东西，在实际的开发中实用需要谨慎。
boost.python支持的c++特性较多，但是比较复杂。

四、扩展工具的使用
1. SWIG
SWIG可以完成多种脚本语言的C/C++扩展，包括python、tcl、perl、CHICKEN、php、XML等等许多。它通过构造接口函数和代理类来实现模拟。
1）原理
(1) 接口函数
实现一系列接口函数来隐藏一个结构体的底层实现。例如对结构体：
struct Vector {
Vector();
~Vector();
double x,y,z;
};
将被转换为以下的函数集合：
Vector *new_Vector();
void delete_Vector(Vector *v);
double Vector_x_get(Vector *v);
double Vector_y_get(Vector *v);
double Vector_y_get(Vector *v);
void Vector_x_set(Vector *v, double x);
void Vector_y_set(Vector *v, double y);
void Vector_z_set(Vector *v, double z);

　　于是，这些函数在解释器中便可以如下使用：
% set v [new_Vector]
% Vector_x_set $v 3.5
% Vector_y_get $v
% delete_Vector $v
% ...

(2) 代理类
也叫做shadow类，是真实C++类的代理。使用代理类时，实际工作的有两个对象——一个在脚本语言中，另一个是C/C++的底层对象。操作同时影响着两个对象，但用户看起来只是一个。例如，如果你有如下C++定义：
class Vector {
public:
Vector();
~Vector();
double x,y,z;
};
使用了代理类机制后，将会用很透明的方式访问结构。例如在Python中，可直接如下访问：
>>> v = Vector()
>>> v.x = 3
>>> v.y = 4
>>> v.z = -13
>>> ...
>>> del v

2）支持特性与局限
SWIG当前支持以下的C++特性：
Ø 类
Ø 类的构造和析构
Ø 虚函数
Ø 公共继承（包括多重继承）
Ø 静态函数
Ø 函数和方法重载
Ø 大多数标准运算符的重载
Ø 引用
Ø 模板
Ø 函数指针
Ø 名字空间
虽然SWIG能够解析大多数C/C++声明，但不能提供完备的解析机制。限制包括一些非常复杂类型的声明和C++的高级特性。下面是目前不被支持的一些特性：
Ø 一些非常规的类型声明。例如，SWIG不支持以下一些声明：
/* Non-conventional placement of storage specifier (extern) */
const int extern Number;
/* Extra declarator grouping */
Matrix (foo); // A global variable
/* Extra declarator grouping in parameters */
void bar(Spam (Grok)(Doh));
Ø 直接在C++源码运行SWIG会有一些问题。虽然SWIG能够解析C++类声明，但是当它遇到本身不支持的声明时，会自动跳过。
Ø 某些C++的高级特性目前不被支持。如：
Ø 友元
Ø 私有和保护成员
Ø 某些操作符的重载(如new、delete等)

3）使用方法
使用SWIG工具来进行Python的C++扩展，包括以下几个步骤：
Ø 编写C++源代码；
Ø 编写后缀为.i或者.swg的脚本文件，标记头文件和要扩展的类；
Ø 编译连接生成共享库；
Ø 使用扩展。

(1) 运行SWIG
安装SWIG成功后，使用以下格式的命令运行：
$ swig [ options ] filename
选项包括：
-chicken Generate CHICKEN wrappers
-csharp Generate C# wrappers
-guile Generate Guile wrappers
-java Generate Java wrappers
-mzscheme Generate Mzscheme wrappers
-ocaml Generate Ocaml wrappers
-perl Generate Perl wrappers
-php Generate PHP wrappers
-pike Generate Pike wrappers
-python Generate Python wrappers
-ruby Generate Ruby wrappers
-sexp Generate Lisp S-Expressions wrappers
-tcl Generate Tcl wrappers
-xml Generate XML wrappers
-c++ Enable C++ parsing
-Dsymbol Define a preprocessor symbol
-Fstandard Display error/warning messages in commonly used format
-Fmicrosoft Display error/warning messages in Microsoft format
-help Display all options
-Idir Add a directory to the file include path
-lfile Include a SWIG library file.
-module name Set the name of the SWIG module
-o outfile Name of output file
-outdir dir Set language specific files output directory
-swiglib Show location of SWIG library
-version Show SWIG version number
这只是命令行选项的一个子集。对每种目标语言都有各自附加的选项。可以使用命令"swig -help or swig -lang -help"查看全部。

filename是用户编写的SWIG标记脚本文件。

(2) SWIG的输入
输入为编写的脚本文件，通常后缀为.i或.swg。
通常该脚本文件的格式如下：
%module mymodule
%{
#include "myheader.h"
%}
// Now list ANSI C/C++ declarations
int foo;
int bar(int x);
...
模块名使用"%module"（或-module命令行选项）进行标记。这个标记必须在文件的开始出现，用于命名目标扩展模块。如果选择在命令行提供，则不需要"%module"标记。
在"%{ ... %}"中进行头文件和其它特殊的声明（如% rename、% ignore等）。它将被逐字的复制到SWIG创建的wrapper文件中。

(3) SWIG的输出
SWIG的输出是一系列wrapper文件，也可能根据目标文件的不同产生一些其它的文件。默认情况下，输入名为file.i的文件将输出文件file_wrap.c或file_wrap.cxx（依赖于是否使用了-c++选项）。编译器通常是通过文件后缀来确定源语言（C、C++等）类型的。输出文件的名字可以通过-o选项修改。例如：
$ swig -c++ -python -o example_wrap.cpp example.i
SWIG创建的wrapper文件可直接用来编译连接产生共享库，不需要再对生成文件进行编辑。

2. SIP（A Tool for Generating Python Bindings for C and C++ Libraries）
Python-SIP是一个用于为Python生成C++接口的工具。它类似于SWIG，但使用了一个不同的接口格式。它用于建造PyQt 和PyKDE，支持Qtsignal/slot机制。
SIP是一个为C/C++库自动生成Python绑定的工具。SIP最初于1998年为了PyQt（Python绑定到Qt GUI工具集）而开发的，但也适合于生成C/C++库的绑定。
SIP的命名是因为它最初是作为一个小的SWIG出现的。与SWIG不同，SIP实现是为了尽可能最小化的实现Python与C/C++的整合。

1）支持特性与局限
SIP的主要优点是绑定加载速度快，内存消耗小，尤其在只使用一个大库中的小集合时。它支持的特性主要包括：
Ø 提供标准Python和C/C++数据类型间的自动转换；
Ø 根据不同参数重载函数；
Ø 提供对C++类保护方法的接口；
Ø 可以在Python中定义C++类的子类，包括C++的抽象类；
Ø 支持原始的C++函数、类方法、静态类方法、虚类方法和抽象类方法；
Ø 可以在Python中重新实现C++虚方法和抽象方法；
Ø 支持全局变量和类变量；
Ø 支持C++的名字空间；
Ø 支持C++异常，并能将之转换为Python异常；
Ø 可以定义C++类和类似的Python数据类型之间的映射，并能自动调用；
Ø 可以在某特定文件中包括可提取文档；
Ø 可以在特定文件中包括版权信息，使其自动包含到生成的所有源代码中；
Ø 扩展过程与特定平台无关；
Ø SIP也能理解Qt实现的signal/slot类型安全回调机制。

3）使用方法
(1) 使用步骤：
Ø 写.sip规范文件；
Ø 用命令$ sip -c . foo.sip在当前目录产生C++代码；
Ø 写configure.py脚本文件，用命令$ python configure.py来生成Makefile文件；
Ø 运行$ make；make install完成编译和安装扩展模块。

(2) 运行：
SIP命令行语法如下：
$ sip [options] [specification]
其中，specification是模块规范文件（通常后缀为sip）的文件名。若被省略则默认为stdin。
命令行选项如下：
-h
Display a help message.

-V
Display the SIP version number.

-a file
The name of the Scintilla API file to generate. This file contains a description of the module API in a form that the Scintilla editor component can use for auto-completion and call tips. By default the file is not generated.

-b file
The name of the build file to generate. This file contains the information about the module needed by the SIP build system to generate a platform and compiler specific Makefile for the module. By default the file is not generated.

-c dir
The name of the directory (which must exist) into which all of the generated C or C++ code is placed. By default no code is generated.

-d file
The name of the documentation file to generate. Documentation is included in specification files using the %Doc and %ExportedDoc directives. By default the file is not generated.

-e
Support for C++ exceptions is enabled. The causes all calls to C++ code to be enclosed in try/catch blocks and C++ exceptions to be converted to Python exceptions. By default exception support is disabled.

-I dir
The directory is added to the list of directories searched when looking for a specification file given in an %Include or %Import directive. This option may be given any number of times.

-j number
The generated code is split into the given number of files. This make it easier to use the parallel build facility of most modern implementations of make. By default 1 file is generated for each C structure or C++ class.

-r
Debugging statements that trace the execution of the bindings are automatically generated. By default the statements are not generated.

-s suffix
The suffix to use for generated C or C++ source files. By default .c is used for C and .cpp for C++.

-t tag
The SIP version tag (declared using a %Timeline directive) or the SIP platform tag (declared using the %Platforms directive) to generate code for. This option may be given any number of times so long as the tags do not conflict.

-w
The display of warning messages is enabled. By default warning messages are disabled.

-x feature
The feature (declared using the %Feature directive) is disabled.

-z file
The name of a file containing more command line options.

(3) 输入：
输入为规范文件。
我们通过一个简单的规范文件示例来说明规范文件语法。假定有一个C++库实现了Word类。类有一个构造器，构造器以一个\0结束的字符串作为唯一参数。类有一个叫做reverse()的无参方法，它返回一个\0结束的字符串。
类的接口在头文件word.h中定义，如下所示：
// Define the interface to the word library.
class Word {
const char *the_word;
public:
Word(const char *w);
char *reverse() const;
};

相应的SIP规范文件如下所示：
// Define the SIP wrapper to the word library.
%Module word 0
class Word {
%TypeHeaderCode
#include "word.h"
%End
public:
Word(const char *);
char *reverse() const;
};

SIP 使用指示器（Directives）来进行C++特性的映射。指示器主要包括：
%AccessCode
%CModule 实现的是C模块，并定义模块名称；
%ConvertFromTypeCode 将C/C++类型转换为Python类型；
%ConvertToSubClassCode 同上（基于RTTI）；
%ConvertToTypeCode 同上；
%Copying 添加的手写代码会包含到SIP生成的代码文件头中；
%Doc 可以由命令提取出文档信息；
%End 标识包含代码或文本块结束标志；
%ExportedDoc 可被import的文档；
%Feature 与%If、% Platforms、%Timeline一起使用，控制规范文件中一些部分是否被处理；
%If
%Import 导入其它模块的规范文件；
%Include 包括其它文件；
%License 用来实现可选的执行字典，包括Licensee, Signature, Timestamp和Type注解；
%MappedType 定义自动类型转换映射表；
%MethodCode 全局函数、类方法、运算符、构造和解析等的实现代码；
%Module 实现的是C++模块，并定义模块名称；
%ModuleCode 编写能够被其它模块调用的函数代码；
%ModuleHeaderCode 被生成的所有文件包含的函数体声明；
%OptionalInclude 作用同%Include，但打开出错时继续处理；
%Platforms 配合%If，设置平台信息；
%PostInitialisationCode 编写模块调入初始化后立即执行的代码；
%PreInitialisationCode 编写模块调入初始化前执行的代码；
%Timeline 配合%If，设置版本信息；
%TypeCode 标注类或结构中的函数，使其可以被其它结构或类调用；
%TypeHeaderCode 定义结构或类中将包含的头文件，使得头文件中类型可以被使用；
%VirtualCatcherCode 虚函数实现相关的标识。

SIP使用注解（Annotations）来进行参数和函数的高级说明。包括参数注解、类注解、函数注解、enum注解、license注解和变量注解。注解有自己的类型和相应的可选值。举例如下：
在Python中，函数参数类型不匹配时能够自动调整为匹配，但在C/C++中将会出错。若在参数后添加Constrained注解将会解决这个问题。
void foo(double);
void foo(int);
================================================
void foo(double /Constrained/);
void foo(int);

(4) 输出：
一系列生成文件，供编译连接成为共享库。

3. Boost
1）简介
Boost是一套开放源代码、高度可移植的C++库，由C++标准委员会库工作组发起。主要有以下一些特点：
Ø 支持正则表达式和各种字符类型（如char、wchar_t及自定义字符类型）；
Ø 支持多线程（跨平台多线程库）；
Ø 支持数据结构"图"，以及即将加入标准的hash_set、hash_map、hash_multiset、hash_multimap等，C++对数据结构的支持已近完备；
Ø 支持Python语言的扩展；
Ø 智能指针，与std::auto_ptr一起使用，可杜绝内存泄露，且高效；
Ø 支持循环冗余的CRC、元组tuple、可容纳不同类型值的any等；
Ø 还在迅速扩大中，部分内容有望进入C++标准库。

Boost.Python，一个C++库，能够在C++和Python程序之间无缝连接。而且不需要任何额外的工具——只要你的C++编译器。不必为了wrap而修改C++代码，使用简单。
当前版本已经被重写，具有更灵活方便的接口和新的功能。包括：
Ø 引用和指针
Ø Globally Registered Type Coercions
Ø 自动跨模块的类型转换
Ø 有效的函数重载
Ø C++到Python的异常转换
Ø 默认参数
Ø Keyword Arguments
Ø 在C++中使用Python对象
Ø Exporting C++ Iterators as Python Iterators
Ø Documentation String

2）使用
使用步骤包括：
Ø 写C++源程序；
Ø 写C++ wrapper；
Ø 使用bjam对wrapper进行build；
Ø 在python中使用。

3）程序举例：
(1) 简单的C++函数：
char const* greet()
{
return "hello, world";
}
可以被写成如下的Boost.Python wrapper：
#include
using namespace boost::python;

BOOST_PYTHON_MODULE(hello)
{
def("greet", greet);
}
然后将它构建成共享库，便在Python中加以使用：
>>> import hello
>>> print hello.greet()
hello, world

(2) 类和结构
struct World
{
World(std::string msg): msg(msg) {} // added constructor
void set(std::string msg) { this->msg = msg; }
std::string greet() { return msg; }
std::string msg;
};
它的wrapper文件为：
#include
using namespace boost::python;

BOOST_PYTHON_MODULE(hello)
{
class_("World", init())
.def("greet", &World::greet)
.def("set", &World::set)
;
}

(3) 类的继承
struct Base { virtual ~Base(); };
struct Derived : Base {};
它的wrapper文件为：
class_("Base")
/*...*/
;
class_ >("Derived")
/*...*/
;

五、总结
本文只是对几个扩展工具的简单介绍，对每种工具将在后续文章中陆续加以说明，并附以代码。