常用的模块

印度阿三17 2018-09-28

展开全文

1.模块、包和相关语法

使用模块有什么好处？

最大的好处是大大提高了代码的可维护性。其次，编写代码不必从零开始。当一个模块编写完毕，就可以被其他地方引用。我们在编写程序的时候，也经常引用其他模块，包括Python内置的模块和来自第三方的模块。
使用模块还可以避免函数名和变量名冲突。每个模块有独立的命名空间，因此相同名字的函数和变量完全可以分别存在不同的模块中，所以，我们自己在编写模块时，不必考虑名字会与其他模块冲突

模块分类

模块分为三种：

内置标准模块（又称标准库）执行help('modules')查看所有python自带模块列表
第三方开源模块，可通过pip install 模块名联网安装
自定义模块

模块调用

import module

from module import xx

from module.xx.xx import xx as rename  

from module.xx.xx import *

View Code

模块一旦被调用，即相当于执行了另外一个py文件里的代码

自定义模块

这个最简单，创建一个.py文件，就可以称之为模块，就可以在另外一个程序里导入

模块查找路径

import sys

print(sys.path)

['', '/Library/Frameworks/Python.framework/Versions/3.6/lib/python36.zip', 
'/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6', 
'/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/lib-dynload', 
'/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages']

View Code

python解释器会按照列表顺序去依次到每个目录下去匹配你要导入的模块名，只要在一个目录下匹配到了该模块名，就立刻导入，不再继续往后找。

注意列表第一个元素为空，即代表当前目录，所以你自己定义的模块在当前目录会被优先导入。

开源模块安装、使用

https://pypi./pypi 是python的开源模块库

1.直接在上面这个页面上点download,下载后，解压并进入目录，执行以下命令完成安装

编译源码    python setup.py build
安装源码    python setup.py install

View Code

2.直接通过pip安装

pip3 install paramiko #paramiko 是模块名

View Code

pip命令会自动下载模块包并完成安装。

软件一般会被自动安装你python安装目录的这个子目录里

/your_python_install_path/3.6/lib/python3.6/site-packages

pip命令默认会连接在国外的python官方服务器下载，速度比较慢，你还可以使用国内的豆瓣源，数据会定期同步国外官网，速度快好多

sudo pip install -i http://pypi.douban.com/simple/ alex_sayhi --trusted-host pypi.douban.com   #alex_sayhi是模块名

使用

下载后，直接导入使用就可以，跟自带的模块调用方法无差，演示一个连接linux执行命令的模块

#coding:utf-8

import paramiko

ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect('192.168.1.108', 22, 'alex', '123')

stdin, stdout, stderr = ssh.exec_command('df')
print(stdout.read())
ssh.close();

执行命令 - 通过用户名和密码连接服务器

View Code

包(Package)

当你的模块文件越来越多，就需要对模块文件进行划分，比如把负责跟数据库交互的都放一个文件夹，把与页面交互相关的放一个文件夹

.
└── my_proj
    ├── crm #代码目录
    │   ├── admin.py
    │   ├── apps.py
    │   ├── models.py
    │   ├── tests.py
    │   └── views.py
    ├── manage.py
    └── my_proj #配置文件目录
        ├── settings.py
        ├── urls.py
        └── wsgi.py

像上面这样，一个文件夹管理多个模块文件，这个文件夹就被称为包,包就是文件夹，但该文件夹下必须存在 __init__.py 文件, 该文件的内容可以为空。__int__.py用于标识当前文件夹是一个包。

那不同包之间的模块互相导入呢？

crm/views.py内容

def sayhi():
    print('hello world!')

View Code

通过manage.py调用

from crm import views

views.sayhi()

View Code

跨模块导入

import sys
sys.path.append("C:\\Users\Administrator\PycharmProjects\myFirstpro\chapter4模块的学习\my_proj")#这个是静态路径，右键path它的路径。注意是找my_proj这个路径这样才能from..   
                                                                                ##如果加了proj这个路径，它就会自动往里边去找，直接import settings就可以了
#sys.path.append(r"C:\Users\Administrator\PycharmProjects\myFirstpro\chapter4模块的学习\my_proj") ##加r
print(sys.path)
from proj import settings
def sayhi():
    print('Hello World!')

View Code

import sys,os
print(dir())
print(‘file’,_file_)
BASE_DIR = os.path.dirname(os.path.dirname(_file_))
print(BASE_DIR)
sys.path.append(BASE_DIR)  #这个是动态路径 

from proj import settings
def sayhi():
    print('Hello world!')

打印：
['__annotations__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'os', 'sys']
C:/Users/Administrator/PycharmProjects/myFirstpro/chapter4模块的学习/my_proj/crm/views.py
C:/Users/Administrator/PycharmProjects/myFirstpro/chapter4模块的学习/my_proj
in proj/settings.py

View Code

import sys,os
print(dir())
print(‘file’,_file_)

BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(_file_))) #这是绝对路径，这样在哪里执行都没有问题了
print(BASE_DIR)
sys.path.append(BASE_DIR)
from proj import settings

def sayhi():
    print('Hello world!')

View Code

绝对导入&相对导入

在linux里可以通过cd ..回到上一层目录，cd ../.. 往上回2层，这个..就是指相对路径，在python里，导入也可以通过..

├── __init__.py
├── crm
│   ├── __init__.py
│   ├── admin.py
│   ├── apps.py
│   ├── models.py
│   ├── tests.py
│   ├── views.py  
├── manage.py   
└── proj
    ├── __init__.py
    ├── settings.py
    ├── urls.py
    └── wsgi.py

#from crm import models
from . import models  #from crm import models 这两个是一样的 #. 就叫相对导入，因为它俩在同一级目录

def sayhi():
    print('Hello world!')
#执行manage

View Code

time & datetime 模块

在python里通常有下面几种表示时间的方式：

时间戳：时间戳表示的是从1970年1月1日00:00:00开始按秒计算的偏移量。我们运行“type(time.time())”，返回的是float类型。
格式化的时间字符串
元组（struct_time）共九个元素。返回struct_time的函数主要有gmtime()，localtime()，strptime()。由于Python的time模块实现主要调用C库，所以各个平台可能有所不同。这种方式元组中的几种元素：

索引（Index）    属性（Attribute）    　　　　　　值（Values）
0     　　　　　　tm_year（年）                   比如2011 
1     　　　　　　tm_mon（月）             　　　　1 - 12
2     　　　　　　tm_mday（日）                　 1 - 31
3     　　　　　　tm_hour（时）                 　0 - 23
4     　　　　　　tm_min（分）             　　　　0 - 59
5     　　　　　　tm_sec（秒）             　　　　0 - 61
6     　　　　　　tm_wday（weekday）             0 - 6（0表示周日）
7     　　　　　　tm_yday（一年中的第几天）   　　　1 - 366
8     　　　　　　tm_isdst（是否是夏令时）         默认为-1

time.localtime()将当前时间转为当前时区的struct_time；wday 0-6表示周日到周六；ydat 1-366 一年中的第几天； isdst 是否为夏令时，默认为-1；

time.gmtime()

>>>import time
>>>time.time() #返回当前的时间戳
1507808689.675603

>>>time.localtime() #将一个时间戳转换为当前时区的struct_time
time.struct_time( tm_year=2017, tm_mon=10, tm_mday=12, tm_hour=19, tm_min=45, tm_sec=61, tm_wday=3, tm_yday=285, tm_isdst=0) #打印本地时间


>>>time.gmtime() #将一个时间戳转换为UTC标准时间，比我们晚8个小时
time.struct_time( tm_year=2017, tm_mon=10, tm_mday=12, tm_hour=11, tm_min=47, tm_sec=51, tm_wday=3, tm_yday=285, tm_isdst=0)

View Code

>>>a = time.localtime(1403232424) 
>>>a
time.struct_time( tm_year=2014, tm_mon=6, tm_mday=20, tm_hour=10, tm_min=47, tm_sec=4, tm_wday=4, tm_yday=171, tm_isdst=0)

View Code

字符串的拼接、 time.mktime()

>>>'%s-%s-%s'%(a.tm_year, a.tm_mon, a.tm_mday) #可以进行拼接
‘2017-10-12’

>>>time.mktime(a)  #将一个struct_time转化为时间戳。
140323242.0

View Code

time.asctime() （外国人常使用的时间形式）、time.ctime()将当前时间转换为一个字符串str

>>>time.asctime()  #把一个表示时间的元组或者struct_time表示为这种形式：'Sun Oct 1 12:04:38 2017'。
'Thu Oct 12 19:52:10 2017'

>>>time.ctime()  #把一个时间戳（按秒计算的浮点数）转化为time.asctime()的形式。相当于time.asctime(time.localtime(secs))
'Thu Oct 12 19:52:36 2017'
>>>time.ctime(123232)
'Fri Jan  2 18:13:52 1970'
>>>time.ctime(0)
'Thu Jan  1 08:00:00 1970'

View Code

time.sleep(secs)：线程推迟指定的时间运行。单位为秒。

time.strftime( ‘%Y-%m-%d %H:%M:%S %A %P %U’ ) 转化为格式化的字符串 time.strptime(s, ‘%Y-%m-%d %H:%M:%S ’ ) 把格式化字符串进行反转为struct_time

time.strftime(a,b) ， a为格式化字符串格式， b为时间戳，一般用localtime()

>>>a
time.struct_time(tm_year=1974, tm_mon=6, tm_day=13, tm_hour=10, tm_min=40, tm_sec=42, tm_wday=3, tm_yday=164, tm_isdst=0)
>>>time.strftime('%Y-%m-%d %H:%M:%S',a)   #把一个代表时间的元组或者struct_time（如由time.localtime()和time.gmtime()返回）转化为格式化的时间字符串
'1974-06-13 10:40:42'
>>>time.strftime('%Y-%m-%d %H:%M:%S',) #不加就默认当前时间
'2017-10-12 19:56:31'
>>>time.strftime('%Y-%m-%d %H:%M:%S %A',a) # %A 星期几
'1974-06-13 10:40:42 Thursday'
>>>time.strftime('%Y-%m-%d %H:%M:%S %p',a) #%p  AM or PM
'1974-06-13 10:40:42 AM'
>>>time.strftime('%Y-%m-%d %H:%M:%S %U',a) #%U一年的第几周
'1974-06-13 10:40:42 23'

>>> import time
>>> print(time.strftime('%Y-%m-%d %H:%M:%S',time.localtime()))
2018-07-28 14:12:17

View Code

>>>s=time.strftime('%Y-%m-%d %H:%M:%S')
>>>s
'2017 10-12 20:00:56'
time.strptime(s, '%Y %m-%d %H:%M:%S')  #反转   #把一个格式化时间字符串转化为struct_time
time.struct_time(tm_year=2017， tm_mon=,10, tm_day=12, tm_hour=20, tm_min=1, tm_sec=26, tm_wday=3, tm_yday=285, tm_isdst=-1)

>>>time.mktime(s) #把时间对象又变成时间戳
1507809686.0

View Code

datatime模块

datetime.datetime.now() 返回当前的datetime日期类型，d.timestamp(),d.today(), d.year,d.timetuple()等方法可以调用

>>>import datetime
>>>datetime.datetime.now()  #a.timestamp(),a.today(), a.year,a.timetuple()等方法可以调用；打印出下面的时间
datetime.datetime(2017, 10, 12, 20, 8, 19, 393189)

View Code

datetime.date.fromtimestamp() #把时间戳转化为datetime类型

>>>d2= datetime.date.fromtimestamp(time.time()) #快速的把时间戳拿到它的年月日
>>>d2
datetime.date(2017, 10, 12)
>>>d2.timetuple()
time.struct_time(tm_year=2017, tm_mon=10, tm_mday=12, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=3, tm_yday=285, tm_isdst=-1)  #其中三个为0，显示不了

View Code

重点是进行时间的运算

datetime.datetime.now() (-) datetime.timedelta()

>>>t = datetime.timedelta(1)
>>>datetime.datetime.now() - t
datetime.datetime(2017, 10, 11, 20, 12, 37, 106915)
>>>datetime.datetime.now() - datetime.timedelta(days=1)
datetime.datetime(2017, 10, 11, 20, 12, 37, 106915)

>>>datetime.datetime.now() - datetime.timedelta(hours=3)
datetime.datetime(2017, 10, 12, 17, 13, 14, 617201)

>>>datetime.datetime.now()   datetime.timedelta(minutes=10)
datetime.datetime(2017, 10, 12, 20, 24, 2, 178282)


>>>datetime.datetime.now()   datetime.timedelta(seconds=10)
datetime.datetime(2017, 10, 12, 20, 14, 23, 858086)

View Code

时间的替换

>>>d.replace()
datetime.datetime(2017, 10, 12, 20, 15, 17,457249)
>>>d.replace(year=2016)
datetime.datetime(2016, 10, 12, 20, 15, 17,457249)
>>>d.replace(year=2016,month=8)
datetime.datetime(2016, 8, 12, 20, 15, 17,457249)

View Code

random模块

>>> random.randrange(1,10) #返回1-10之间的一个随机数，不包括10
>>> random.randint(1,10) #返回1-10之间的一个随机数，包括10

>>> random.randrange(0, 100, 2) #随机选取0到100间的偶数

>>> random.random()  #返回一个随机浮点数
>>> random.choice('abce3#$@1') #返回一个给定数据集合中的随机字符
'#'

>>> random.sample('abcdefghij',3)  #从多个字符中选取特定数量的字符
['a', 'd', 'b']

#生成随机字符串
>>> import string 
>>> ''.join(random.sample(string.ascii_lowercase   string.digits, 6)) 
'4fvda1'

#洗牌
>>> a
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> random.shuffle(a)
>>> a
[3, 0, 7, 2, 1, 6, 5, 8, 9, 4]

View Code

os模块

os 模块提供了很多允许你的程序与操作系统直接交互的功能

得到当前工作目录，即当前Python脚本工作的目录路径: os.getcwd()
返回指定目录下的所有文件和目录名:os.listdir()
函数用来删除一个文件:os.remove()
删除多个目录：os.removedirs（r“c：\python”）
检验给出的路径是否是一个文件：os.path.isfile()
检验给出的路径是否是一个目录：os.path.isdir()
判断是否是绝对路径：os.path.isabs()
检验给出的路径是否真地存:os.path.exists()
返回一个路径的目录名和文件名:os.path.split()     e.g os.path.split('/home/swaroop/byte/code/poem.txt') 结果：('/home/swaroop/byte/code', 'poem.txt') 
分离扩展名：os.path.splitext()       e.g  os.path.splitext('/usr/local/test.py')    结果：('/usr/local/test', '.py')
获取路径名：os.path.dirname()
获得绝对路径: os.path.abspath()  
获取文件名：os.path.basename()
运行shell命令: os.system()
读取操作系统环境变量HOME的值:os.getenv("HOME") 
返回操作系统所有的环境变量： os.environ 
设置系统环境变量，仅程序运行时有效：os.environ.setdefault('HOME','/home/alex')
给出当前平台使用的行终止符:os.linesep    Windows使用'\r\n'，Linux and MAC使用'\n'
指示你正在使用的平台：os.name       对于Windows，它是'nt'，而对于Linux/Unix用户，它是'posix'
重命名：os.rename（old， new）
创建多级目录：os.makedirs（r“c：\python\test”）
创建单个目录：os.mkdir（“test”）
获取文件属性：os.stat（file）
修改文件权限与时间戳：os.chmod（file）
获取文件大小：os.path.getsize（filename）
结合目录名与文件名：os.path.join(dir,filename)
改变工作目录到dirname: os.chdir(dirname)
获取当前终端的大小: os.get_terminal_size()
杀死进程: os.kill(10884,signal.SIGKILL)

View Code

sys模块

sys.argv           命令行参数List，第一个元素是程序本身路径
sys.exit(n)        退出程序，正常退出时exit(0)
sys.version        获取Python解释程序的版本信息
sys.maxint         最大的Int值
sys.path           返回模块的搜索路径，初始化时使用PYTHONPATH环境变量的值
sys.platform       返回操作系统平台名称
sys.stdout.write('please:')  #标准输出 , 引出进度条的例子， 注，在py3上不行，可以用print代替
val = sys.stdin.readline()[:-1] #标准输入
sys.getrecursionlimit() #获取最大递归层数
sys.setrecursionlimit(1200) #设置最大递归层数
sys.getdefaultencoding()  #获取解释器默认编码
sys.getfilesystemencoding  #获取内存数据存到文件里的默认编码

View Code

shutil模块

高级的文件、文件夹、压缩包处理模块

shutil.copyfileobj(fsrc, fdst[, length])
将文件内容拷贝到另一个文件中

import shutil
shutil.copyfileobj(open('old.xml','r'), open('new.xml', 'w'))

View Code

shutil.copyfile(src, dst)
拷贝文件

shutil.copyfile('f1.log', 'f2.log') #目标文件无需存在

View Code

shutil.copymode(src, dst)
仅拷贝权限。内容、组、用户均不变

shutil.copymode('f1.log', 'f2.log') #目标文件必须存在

View Code

shutil.copystat(src, dst)
仅拷贝状态的信息，包括：mode bits, atime, mtime, flags

shutil.copystat('f1.log', 'f2.log') #目标文件必须存在

View Code

shutil.copy(src, dst)
拷贝文件和权限

import shutil
shutil.copy('f1.log', 'f2.log')

View Code

shutil.copy2(src, dst)
拷贝文件和状态信息

import shutil
shutil.copy2('f1.log', 'f2.log')

View Code

shutil.ignore_patterns(*patterns)
shutil.copytree(src, dst, symlinks=False, ignore=None)
递归的去拷贝文件夹

import shutil
shutil.copytree('folder1', 'folder2', ignore=shutil.ignore_patterns('*.pyc', 'tmp*')) #目标目录不能存在，注意对folder2目录父级目录要有可写权限，ignore的意思是排除

View Code

shutil.rmtree(path[, ignore_errors[, onerror]])
递归的去删除文件

import shutil
shutil.rmtree('folder1')

View Code

shutil.move(src, dst)
递归的去移动文件，它类似mv命令，其实就是重命名。

import shutil
shutil.move('folder1', 'folder3')

View Code

shutil.make_archive(base_name, format,...)
创建压缩包并返回文件路径，例如：zip、tar
创建压缩包并返回文件路径，例如：zip、tar

base_name：压缩包的文件名，也可以是压缩包的路径。只是文件名时，则保存至当前目录，否则保存至指定路径，

如 data_bak =>保存至当前路径
如：/tmp/data_bak =>保存至/tmp/

format：压缩包种类，“zip”, “tar”, “bztar”，“gztar”
root_dir：要压缩的文件夹路径（默认当前目录）
owner：用户，默认当前用户
group：组，默认当前组
logger：用于记录日志，通常是logging.Logger对象

#将 /data 下的文件打包放置当前程序目录
import shutil
ret = shutil.make_archive("data_bak", 'gztar', root_dir='/data')

#将 /data下的文件打包放置 /tmp/目录
import shutil
ret = shutil.make_archive("/tmp/data_bak", 'gztar', root_dir='/data')

View Code

shutil 对压缩包的处理是调用 ZipFile 和 TarFile 两个模块来进行的，详细：
zipfile压缩&解压缩

import zipfile

# 压缩
z = zipfile.ZipFile('laxi.zip', 'w')
z.write('a.log')
z.write('data.data')
z.close()

# 解压
z = zipfile.ZipFile('laxi.zip', 'r')
z.extractall(path='.')
z.close()

View Code

tarfile压缩&解压缩

import tarfile

# 压缩
>>> t=tarfile.open('/tmp/egon.tar','w')
>>> t.add('/test1/a.py',arcname='a.bak')
>>> t.add('/test1/b.py',arcname='b.bak')
>>> t.close()

# 解压
>>> t=tarfile.open('/tmp/egon.tar','r')
>>> t.extractall('/egon')
>>> t.close()

View Code

序列化模块

序列化是指把内存里的数据类型转变成字符串，以使其能存储到硬盘或通过网络传输到远程，因为硬盘或网络传输时只能接受bytes

用于序列化的两个模块

json，用于字符串和 python数据类型间进行转换
pickle，用于python特有的类型和 python的数据类型间进行转换

　　JSON:

　　优点：跨语言、体积小

　　缺点：只能支持int\str\list\tuple\dict

　　Pickle:

　　优点：专为python设计，支持python所有的数据类型

　　缺点：只能在python中使用，存储数据占空间大

Json模块提供了四个功能：dumps、dump、loads、load

pickle模块提供了四个功能：dumps、dump、loads、load

import pickle
data = {'k1':123,'k2':'Hello'}

# pickle.dumps 将数据通过特殊的形式转换位只有python语言认识的字符串
p_str = pickle.dumps(data)
print(p_str)

#pickle.dump 将数据通过特殊的形式转换位只有python语言认识的字符串，并写入文件
with open('D:/result.pk','wb',encoding='utf8') as fp:
    pickle.dump(data,fp)

import json
# json.dumps 将数据通过特殊的形式转换位所有程序语言都认识的字符串
j_str = json.dumps(data)
print(j_str)

#pickle.dump 将数据通过特殊的形式转换位只有python语言认识的字符串，并写入文件
with open('D:/result.json','wb',encoding='utf8') as fp:
    json.dump(data,fp)

View Code

#序列化
data = {
    'roles' : [
        {'role':'monster','type':'pig','life':50},
        {'role':'hero','type':'关羽','life':80},
    ]
}
f = open('game_status','w')
f.write(str(data))  #str把字典转换成字符串

View Code

#反序列 把字符转成内存数据类型
data = {
    'roles' : [
        {'role':'monster','type':'pig','life':50},
        {'role':'hero','type':'关羽','life':80},
    ]
}
# f = open('game_status','w')
# f.write(str(data))
f = open('game_status','r')
d = f.read()
d = eval(d) #字符串转成字典
print(d['roles'])

View Code

序列化json模块

import json
data = {
    'roles' : [
        {'role':'monster','type':'pig','life':50},
        {'role':'hero','type':'关羽','life':80},
    ]
}
d = json.dumps(data) #仅换成字符串
print(d,type(d)) #str

d2 = json.loads(d) #把字符串转成相应的数据类型
print(d2,type(d2)) #dict

print(d2['roles'])

View Code

打印：
{"roles": [{"role": "monster", "type": "pig", "life": 50}, {"role": "hero", "type": "\u5173\u7fbd", "life": 80}]} <class 'str'>
{'roles': [{'role': 'monster', 'type': 'pig', 'life': 50}, {'role': 'hero', 'type': '关羽', 'life': 80}]} <class 'dict'>
[{'role': 'monster', 'type': 'pig', 'life': 50}, {'role': 'hero', 'type': '关羽', 'life': 80}]

View Code

注意loads和load、 dumps和dump的用法和区别：

dumps是将dict转化成str格式，loads是将str转化成dict格式。dump和load也是类似的功能，只是与文件操作结合起来了。

import json
data = {
    'roles' : [
        {'role':'monster','type':'pig','life':50},
        {'role':'hero','type':'关羽','life':80},
    ]
}

f = open("test.json", "w")
json.dump(data, f)  #转成字符串并写入文件

f = open("test.json","r")
data = json.load(f) #
print(data['roles'])

View Code

自动生成一个test.json文件夹
{"roles": [{"role": "monster", "type": "pig", "life": 50}, {"role": "hero", "type": "\u5173\u7fbd", "life": 80}]}

打印出：
[{'role': 'monster', 'type': 'pig', 'life': 50}, {'role': 'hero', 'type': '关羽', 'life': 80}]

View Code

只把数据类型转换成字符串存到内存里的意义？ json.dumps 　　json.loads

　　1.把你的内存数据，通过网络，共享给远程其他人；

　　2.定义了不同语言的之前的交互规则。

　　　　1.纯文本，坏处，不能共享复杂的数据类型；

　　　　2.xml，坏处，占空间大；3.json，简单，可读性好。

import json
f = open('json_file', 'w', encoding='utf-8')
d = {'name':'alex', 'age':'23'} #字典
l = [1,2,3,5,9,'rain']  #列表
json.dump(d,f) #可以dump多次
json.dump(l,f)


自动创建json_file的文件：
{"name": "alex", "age": "23"}[1, 2, 3, 5, 9, "rain"]

View Code

import json

f = open("json_file", 'r', encoding"utf-8")
print(json.load(f))   #反序列化不能load多次，不要dump多次，load多次。只一次，为了避免问题。 json_load文件里边字典和列表都在里边不能load，会报错。

View Code

pickle（用法完全一样）

import pickle
d = {'name' : 'alex' , 'age' : 22 }
l = [1,2,3,4,'rain' ]
pk = open("data.pkl", "w")
print(pickle.dumps(d))   ##打印bytes类型 b'\x80\x03}q\x00(X\x04\x00\x00\x00nameq\x01X\x04\x00\x00\x00alexq\x02X\x03\x00\x00\x00ageq\x03K\x16u.'

View Code

import pickle
d = {'name' : 'alex' , 'age' : 22 }
l = [1,2,3,4,'rain' ]
pk = open("data.pkl", "wb")
print(pickle.dump(d,pk))  #二进制的bytes类型的文本格式，存到了硬盘上。
print(pickle.dump(l,pk) 

自动创建data.pkl文件夹：
�}q (X   nameq X   alexq X   ageq K u. �]q (K K K K X   rainq e.

View Code

从硬盘上读取数据

import pickle
f = open("data.pkl", "rb")
d = pickle.load(f)
print(d)  #{'name': 'alex', 'age': 22}
l = pickle.load(f)
print(l) #[1, 2, 3, 4, 'rain']

View Code

序列化shelve模块

shelve模块是一个简单的k,v将内存数据通过文件持久化的模块，可以持久化任何pickle可支持的python数据格式

#序列化
import shelve

f = shelve.open('shelve_test')  # 打开一个文件

names = ["alex", "rain", "test"]
info = {'name':'alex','age':22}

f["names"] = names  # 持久化列表
f['info_dic'] = info #持久化dict

f.close()

#反序列化
import shelve

d = shelve.open('shelve_test')  # 打开一个文件

print(d['names'])     #['alex', 'rain', 'test']
print(d['info_dic'])  #{'name': 'alex', 'age': 22}

#del d['test'] #还可以删除

View Code

>>>f = shelve.open("shelve_test")
>>>f
<shelve.DbfilenameShelf object at 0x101dce9e8>
>>>list(f.keys())
['name', 'info_dic']
>>>list(f.items())
[('name', ['alex', 'rain', 'test']), ('info_dic', {'name': 'alex', 'age':22})]

>>>f.get('names')
['alex', 'rain', 'text']
>>>f.get('info_dic')
{'name': 'alex', 'age':22}

>>>f['names'][1]
'rain'

>>>f['names'][1] = "Rain wang"  #不能这样修改，修改不了，要直接赋值。
>>>f['names'][1]
None

>>>f.close()
>>>f = shelve.open("shelve_test")

View Code

import shelve
f = shelve.open("shelve_test")
f['scores'] = [1,2,3,4,5,6]
print(f.get('scores'))

f['scores'] = [1,2,3,'A',4,'B']
print(f.get('scores'))

View Code

XML模块

xml是实现不同语言或程序之间进行数据交换的协议，跟json差不多，但json使用起来更简单，不过，古时候，在json还没诞生的黑暗年代，大家只能选择用xml呀，至今很多传统公司如金融行业的很多系统的接口还主要是xml。

xml的格式如下，就是通过<>节点来区别数据结构的:

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

View Code

import xml.etree.ElementTree as ET
tree = ET.parse("xmltest.xml")  #open
root = tree.getroot() #f.seek(0)
print(root.tag)
#遍历xml文档
for child in root:
    print('-------------',child.tag,child.attrib)
    #print(child.tag, child.attrib)
    for i in child:
        print(i.tag,i.text)


打印：
data
------------- country {'name': 'Liechtenstein'}
rank 2
year 2008
gdppc 141100
neighbor None
neighbor None
------------- country {'name': 'Singapore'}
rank 5
year 2011
gdppc 59900
neighbor None
------------- country {'name': 'Panama'}
rank 69
year 2011
gdppc 13600
neighbor None
neighbor None

View Code

import xml.etree.ElementTree as ET
tree = ET.parse("xmltest.xml")  #open
root = tree.getroot() #f.seek(0)
print(root.tag)

#只遍历year 节点
for node in root.iter('year'):
    print(node.tag,node.text)

打印：
data
year 2008
year 2011
year 2011

View Code

###################自己创建xml文档#############################

import xml.etree.ElementTree as ET

new_xml = ET.Element("namelist")     #root
name = ET.SubElement(new_xml,"name",attrib={"enrolled":"yes"})
age = ET.SubElement(name,"age",attrib={"checked":"no"})
sex = ET.SubElement(name,"sex")
n = ET.SubElement(name,"name")
n = "alex"
sex.text = 'man'


name2 = ET.SubElement(new_xml,"name",attrib={"enrolled":"no"})
age = ET.SubElement(name2,"age")
age.text = '19'

et = ET.ElementTree(new_xml) #生成文档对象
et.write("test.xml", encoding="utf-8",xml_declaration=True)

ET.dump(new_xml) #打印生成的格式


打印：
<?xml version='1.0' encoding='utf-8'?>
<namelist>
    <name enrolled="yes">
    <age checked="no" />
    <sex>man</sex></name>
    <name enrolled="no">
        <age>19</age>
    </name></namelist>

View Code

configparser模块（解析、配置文件）

此模块用于生成和修改常见配置文档，当前模块的名称在 python 3.x 版本中变更为 configparser。

好多软件的常见配置文件格式如下：

##conf.ini文件
[DEFAULT]
ServerAliveInterval = 45
Compression = yes
CompressionLevel = 9
ForwardX11 = yes

[bitbucket.org]
User = hg

[topsecret.server.com]
Port = 50022
ForwardX11 = no

View Code

>>> import configparser
>>> config = configparser.ConfigParser() #实例化生成一个对象
>>> config.sections() #调用sections方法；在cmd里边演示
[]



>>> config.read('example.ini')
['example.ini']
>>> config.sections() #调用sections方法（默认不会读取default方法）
['bitbucket.org', 'topsecret.server.com']
>>> 'bitbucket.org' in config #判断元素是否在sections列表内
True
>>> 'bytebong.com' in config
False
>>> config['bitbucket.org']['User'] #通过字典的形式取值
'hg'
>>> config['DEFAULT']['Compression']
'yes'
>>> topsecret = config['topsecret.server.com']
>>> topsecret['ForwardX11']
'no'
>>> topsecret['Port']
'50022'
>>> for key in config['bitbucket.org']: print(key) #for循环bitbucket.org 字典的key
...
user
compressionlevel
serveraliveinterval
compression
forwardx11
>>> config['bitbucket.org']['ForwardX11']
'yes'

View Code

import configparser
conf = configparser.ConfigParser()  #先生成一个对象
#print(conf.sections())  #[]
conf.read('conf.ini')
print(conf.sections())#['bitbucket.org', 'topsecret.server.com']
print(conf.default_section) #DEFAULT
# print(list(conf["bitbucket.org"].keys()))
print(conf["bitbucket.org"]['User']) #hg

View Code

#循环
import configparser
conf = configparser.ConfigParser()  #先生成一个对象

conf.read('conf.ini')

for k,v in conf["bitbucket.org"].items():
    print(k,v)

#打印：
user hg
serveraliveinterval 45
compression yes
compressionlevel 9
forwardx11 yes

View Code

import configparser
conf = configparser.ConfigParser()  

conf.read('conf.ini')

if 'user' in conf["bitbucket.org"]: #判断是否在
    print('in')

View Code

import configparser
conf = configparser.ConfigParser()  #先生成一个对象
conf.read("conf_text.ini")
conf.add_section("group3")
conf["group3"]['name'] = "alex"
conf["group3"]['age'] = '22'
conf.write(open('conf_test_mew.ini','w'))  ###改写

View Code

import configparser
conf = configparser.ConfigParser()  #先生成一个对象
conf.read("conf_text.ini")

#conf.remove_option("group1",'k2')
conf.remove_section("group1")
conf.write(open('conf_test2_mew.ini','w'))#删完之后要保存

View Code

hashlib模块

加密算法介绍

HASH

Hash，一般翻译做“散列”，也有直接音译为”哈希”的，就是把任意长度的输入（又叫做预映射，pre-image），通过散列算法，变换成固定长度的输出，该输出就是散列值。这种转换是一种压缩映射，也就是，散列值的空间通常远小于输入的空间，不同的输入可能会散列成相同的输出，而不可能从散列值来唯一的确定输入值。

简单的说就是一种将任意长度的消息压缩到某一固定长度的消息摘要的函数。

HASH主要用于信息安全领域中加密算法，他把一些不同长度的信息转化成杂乱的128位的编码里,叫做HASH值.也可以说，hash就是找到一种数据内容和数据存放地址之间的映射关系

MD5

什么是MD5算法

MD5讯息摘要演算法（英语：MD5 Message-Digest Algorithm），一种被广泛使用的密码杂凑函数，可以产生出一个128位的散列值（hash value），用于确保信息传输完整一致。MD5的前身有MD2、MD3和MD4。

MD5功能

输入任意长度的信息，经过处理，输出为128位的信息（数字指纹）；
不同的输入得到的不同的结果（唯一性）；

MD5算法的特点

压缩性：任意长度的数据，算出的MD5值的长度都是固定的
容易计算：从原数据计算出MD5值很容易
抗修改性：对原数据进行任何改动，修改一个字节生成的MD5值区别也会很大
强抗碰撞：已知原数据和MD5，想找到一个具有相同MD5值的数据（即伪造数据）是非常困难的。

MD5算法是否可逆？

MD5不可逆的原因是其是一种散列函数，使用的是hash算法，在计算过程中原文的部分信息是丢失了的。

MD5用途

防止被篡改：
- 比如发送一个电子文档，发送前，我先得到MD5的输出结果a。然后在对方收到电子文档后，对方也得到一个MD5的输出结果b。如果a与b一样就代表中途未被篡改。
- 比如我提供文件下载，为了防止不法分子在安装程序中添加木马，我可以在网站上公布由安装文件得到的MD5输出结果。
- SVN在检测文件是否在CheckOut后被修改过，也是用到了MD5.
防止直接看到明文：
- 现在很多网站在数据库存储用户的密码的时候都是存储用户密码的MD5值。这样就算不法分子得到数据库的用户密码的MD5值，也无法知道用户的密码。（比如在UNIX系统中用户的密码就是以MD5（或其它类似的算法）经加密后存储在文件系统中。当用户登录的时候，系统把用户输入的密码计算成MD5值，然后再去和保存在文件系统中的MD5值进行比较，进而确定输入的密码是否正确。通过这样的步骤，系统在并不知道用户密码的明码的情况下就可以确定用户登录系统的合法性。这不但可以避免用户的密码被具有系统管理员权限的用户知道，而且还在一定程度上增加了密码被破解的难度。）
防止抵赖（数字签名）：
- 这需要一个第三方认证机构。例如A写了一个文件，认证机构对此文件用MD5算法产生摘要信息并做好记录。若以后A说这文件不是他写的，权威机构只需对此文件重新产生摘要信息，然后跟记录在册的摘要信息进行比对，相同的话，就证明是A写的了。这就是所谓的“数字签名”。

SHA-1

安全哈希算法（Secure Hash Algorithm）主要适用于数字签名标准（Digital Signature Standard DSS）里面定义的数字签名算法（Digital Signature Algorithm DSA）。对于长度小于2^64位的消息，SHA1会产生一个160位的消息摘要。当接收到消息的时候，这个消息摘要可以用来验证数据的完整性。

SHA是美国国家安全局设计的，由美国国家标准和技术研究院发布的一系列密码散列函数。

由于MD5和SHA-1于2005年被山东大学的教授王小云破解了，科学家们又推出了SHA224, SHA256, SHA384, SHA512，当然位数越长，破解难度越大，但同时生成加密的消息摘要所耗时间也更长。目前最流行的是加密算法是SHA-256 .

MD5与SHA-1的比较

由于MD5与SHA-1均是从MD4发展而来，它们的结构和强度等特性有很多相似之处，SHA-1与MD5的最大区别在于其摘要比MD5摘要长32 比特。对于强行攻击，产生任何一个报文使之摘要等于给定报文摘要的难度：MD5是2128数量级的操作，SHA-1是2160数量级的操作。产生具有相同摘要的两个报文的难度：MD5是264是数量级的操作，SHA-1 是280数量级的操作。因而,SHA-1对强行攻击的强度更大。但由于SHA-1的循环步骤比MD5多80:64且要处理的缓存大160比特:128比特，SHA-1的运行速度比MD5慢。

Python的提供的相关模块

用于加密相关的操作，3.x里代替了md5模块和sha模块，主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ，MD5 算法

import hashlib

m = hashlib.md5()
m.update(b"Hello")
m.update(b"It's me")
print(m.digest())
m.update(b"It's been a long time since last time we ...")

print(m.digest()) #2进制格式hash
print(len(m.hexdigest())) #16进制格式hash
'''
def digest(self, *args, **kwargs): # real signature unknown
    """ Return the digest value as a string of binary data. """
    pass

def hexdigest(self, *args, **kwargs): # real signature unknown
    """ Return the digest value as a string of hexadecimal digits. """
    pass

'''
import hashlib

# ######## md5 ########

hash = hashlib.md5()
hash.update('admin')
print(hash.hexdigest())

# ######## sha1 ########

hash = hashlib.sha1()
hash.update('admin')
print(hash.hexdigest())

# ######## sha256 ########

hash = hashlib.sha256()
hash.update('admin')
print(hash.hexdigest())


# ######## sha384 ########

hash = hashlib.sha384()
hash.update('admin')
print(hash.hexdigest())

# ######## sha512 ########

hash = hashlib.sha512()
hash.update('admin')
print(hash.hexdigest())

View Code

hashlib md5值的用法

#加入下面这个就可以
password = input('请输入密码:')
    m = hashlib.md5()
    m.update(password.encode())
    if m.hexdigest() == data1['password']:
        print('登录成功')

View Code

subprocess模块

我们经常需要通过Python去执行一条系统命令或脚本，系统的shell命令是独立于你的python进程之外的，每执行一条命令，就是发起一个新进程，通过python调用系统命令或脚本的模块在python2有os.system，

>>> os.system('uname -a')
Darwin Alexs-MacBook-Pro.local 15.6.0 Darwin Kernel Version 15.6.0: Sun Jun  4 21:43:07 PDT 2017; root:xnu-3248.70.3~1/RELEASE_X86_64 x86_64
0

三种执行命令的方法

subprocess.run(*popenargs, input=None, timeout=None, check=False, **kwargs) #官方推荐
subprocess.call(*popenargs, timeout=None, **kwargs) #跟上面实现的内容差不多，另一种写法
subprocess.Popen() #上面各种方法的底层封装

run方法

#标准写法
subprocess.run(['df','-h'],stderr=subprocess.PIPE,stdout=subprocess.PIPE,check=True)

#涉及到管道|的命令需要这样写
subprocess.run('df -h|grep disk1',shell=True) #shell=True的意思是这条命令直接交给系统去执行，不需要python负责解析

View Code

call()方法

#执行命令，返回命令执行状态 ， 0 or 非0
>>> retcode = subprocess.call(["ls", "-l"])

#执行命令，如果命令结果为0，就正常返回，否则抛异常
>>> subprocess.check_call(["ls", "-l"])

#接收字符串格式命令，返回元组形式，第1个元素是执行状态，第2个是命令结果 
>>> subprocess.getstatusoutput('ls /bin/ls')
(0, '/bin/ls')

#接收字符串格式命令，并返回结果
>>> subprocess.getoutput('ls /bin/ls')
'/bin/ls'

#执行命令，并返回结果，注意是返回结果，不是打印，下例结果返回给res
>>> res=subprocess.check_output(['ls','-l'])
>>> res
b'total 0\ndrwxr-xr-x 12 alex staff 408 Nov 2 11:05 OldBoyCRM\n'

View Code

Popen()方法

常用参数：

args：shell命令，可以是字符串或者序列类型（如：list，元组）
stdin, stdout, stderr：分别表示程序的标准输入、输出、错误句柄
preexec_fn：只在Unix平台下有效，用于指定一个可执行对象（callable object），它将在子进程运行之前被调用
shell：同上
cwd：用于设置子进程的当前目录
env：用于指定子进程的环境变量。如果env = None，子进程的环境变量将从父进程中继承。

a=subprocess.run('sleep 10',shell=True,stdout=subprocess.PIPE)
a=subprocess.Popen('sleep 10',shell=True,stdout=subprocess.PIPE)

区别是Popen会在发起命令后立刻返回，而不等命令执行结果。这样的好处是什么呢？

如果你调用的命令或脚本需要执行10分钟，你的主程序不需卡在这里等10分钟，可以继续往下走，干别的事情，每过一会，通过一个什么方法来检测一下命令是否执行完成就好了。

Popen调用后会返回一个对象，可以通过这个对象拿到命令执行结果或状态等，该对象有以下方法

poll()

Check if child process has terminated. Returns returncode

wait()

Wait for child process to terminate. Returns returncode attribute.

terminate()终止所启动的进程Terminate the process with SIGTERM

kill() 杀死所启动的进程 Kill the process with SIGKILL

communicate()与启动的进程交互，发送数据到stdin,并从stdout接收输出，然后等待任务结束

>>> a = subprocess.Popen('python3 guess_age.py',stdout=subprocess.PIPE,stderr=subprocess.PIPE,stdin=subprocess.PIPE,shell=True)

>>> a.communicate(b'22')

(b'your guess:try bigger\n', b'')

send_signal(signal.xxx)发送系统信号

pid 拿到所启动进程的进程号

logging模块

很多程序都有记录日志的需求，并且日志中包含的信息即有正常的程序访问日志，还可能有错误、警告等信息输出，python的logging模块提供了标准的日志接口，你可以通过它存储各种格式的日志，logging的日志可以分为 debug(), info(), warning(), error() and critical()5个级别，下面我们看一下怎么用。

#最简单用法
import logging

logging.warning("user [alex] attempted wrong password more than 3 times")
logging.critical("server is down")

View Code

#输出
WARNING:root:user [alex] attempted wrong password more than 3 times
CRITICAL:root:server is down

View Code

如果想把日志写到文件里，也很简单

import logging

logging.basicConfig(filename='example.log',level=logging.INFO)
logging.debug('This message should go to the log file')
logging.info('So should this')
logging.warning('And this, too')

View Code

其中下面这句中的level=loggin.INFO意思是，把日志纪录级别设置为INFO，也就是说，只有比日志是INFO或比INFO级别更高的日志才会被纪录文件里，在这个例子，第一条日志是不会被纪录的，如果希望纪录debug的日志，那把日志级别改成DEBUG就行了.

自定义日志格式

感觉上面的日志格式忘记加上时间啦，日志不知道时间怎么行呢，下面就来加上!

import logging
logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p')
logging.warning('is when this event was logged.')

#输出
12/12/2010 11:46:36 AM is when this event was logged.

View Code

日志同时输出到屏幕和文件

如果想同时把log打印在屏幕和文件日志里，就需要了解一点复杂的知识了

Python 使用logging模块记录日志涉及四个主要类，使用官方文档中的概括最为合适：

logger提供了应用程序可以直接使用的接口；
handler将(logger创建的)日志记录发送到合适的目的输出；
filter提供了细度设备来决定输出哪条日志记录；
formatter决定日志记录的最终输出格式。

他们之间的关系是这样的

logger

每个程序在输出信息之前都要获得一个Logger。Logger通常对应了程序的模块名，比如聊天工具的图形界面模块可以这样获得它的Logger：

LOG=logging.getLogger(”chat.gui”)

而核心模块可以这样：

LOG=logging.getLogger(”chat.kernel”)

还可以绑定handler和filters

Logger.setLevel(lel):指定最低的日志级别，低于lel的级别将被忽略。debug是最低的内置级别，critical为最高
Logger.addFilter(filt)、Logger.removeFilter(filt):添加或删除指定的filter
Logger.addHandler(hdlr)、Logger.removeHandler(hdlr)：增加或删除指定的handler

Logger.debug()、Logger.info()、Logger.warning()、Logger.error()、Logger.critical()：可以设置的日志级别

handler

handler对象负责发送相关的信息到指定目的地。Python的日志系统有多种Handler可以使用。有些Handler可以把信息输出到控制台，有些Handler可以把信息输出到文件，还有些 Handler可以把信息发送到网络上。如果觉得不够用，还可以编写自己的Handler。可以通过addHandler()方法添加多个多handler

Handler.setLevel(lel):指定被处理的信息级别，低于lel级别的信息将被忽略
Handler.setFormatter()：给这个handler选择一个格式
Handler.addFilter(filt)、Handler.removeFilter(filt)：新增或删除一个filter对象

每个Logger可以附加多个Handler。接下来我们就来介绍一些常用的Handler：

logging.StreamHandler 使用这个Handler可以向类似与sys.stdout或者sys.stderr的任何文件对象(file object)输出信息。
logging.FileHandler 和StreamHandler 类似，用于向一个文件输出日志信息。不过FileHandler会帮你打开这个文件
logging.handlers.RotatingFileHandler

这个Handler类似于上面的FileHandler，但是它可以管理文件大小。当文件达到一定大小之后，它会自动将当前日志文件改名，然后创建一个新的同名日志文件继续输出。比如日志文件是chat.log。当chat.log达到指定的大小之后，RotatingFileHandler自动把文件改名为chat.log.1。不过，如果chat.log.1已经存在，会先把chat.log.1重命名为chat.log.2。。。最后重新创建 chat.log，继续输出日志信息。它的函数是：

RotatingFileHandler( filename[, mode[, maxBytes[, backupCount]]])

　　其中filename和mode两个参数和FileHandler一样。

- maxBytes用于指定日志文件的最大文件大小。如果maxBytes为0，意味着日志文件可以无限大，这时上面描述的重命名过程就不会发生。
- backupCount用于指定保留的备份文件的个数。比如，如果指定为2，当上面描述的重命名过程发生时，原有的chat.log.2并不会被更名，而是被删除。

logging.handlers.TimedRotatingFileHandler

这个Handler和RotatingFileHandler类似，不过，它没有通过判断文件大小来决定何时重新创建日志文件，而是间隔一定时间就自动创建新的日志文件。重命名的过程与RotatingFileHandler类似，不过新的文件不是附加数字，而是当前时间。它的函数是：

TimedRotatingFileHandler( filename [,when [,interval [,backupCount]]])

其中filename参数和backupCount参数和RotatingFileHandler具有相同的意义。

interval是时间间隔。

when参数是一个字符串。表示时间间隔的单位，不区分大小写。它有以下取值：

S 秒
M 分
H 小时
D 天
W 每星期（interval==0时代表星期一）
midnight 每天凌晨

formatter 组件

日志的formatter是个独立的组件，可以跟handler组合

fh = logging.FileHandler("access.log")
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

fh.setFormatter(formatter) #把formmater绑定到fh上

filter 组件

如果你想对日志内容进行过滤，就可自定义一个filter

class IgnoreBackupLogFilter(logging.Filter):
    """忽略带db backup 的日志"""
    def filter(self, record): #固定写法
        return   "db backup" not in record.getMessage()

#注意filter函数会返加True or False，logger根据此值决定是否输出此日志

　　然后把这个filter添加到logger中

logger.addFilter(IgnoreBackupLogFilter())

　　下面的日志就会把符合filter条件的过滤掉

logger.debug("test ....")
logger.info("test info ....")
logger.warning("start to run db backup job ....")
logger.error("test error ....")

　　一个同时输出到屏幕、文件、带filter的完成例子

import logging



class IgnoreBackupLogFilter(logging.Filter):
    """忽略带db backup 的日志"""
    def filter(self, record): #固定写法
        return   "db backup" not in record.getMessage()




#console handler
ch = logging.StreamHandler()
ch.setLevel(logging.INFO)
#file handler
fh = logging.FileHandler('mysql.log')
#fh.setLevel(logging.WARNING)


#formatter
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
#bind formatter to ch
ch.setFormatter(formatter)
fh.setFormatter(formatter)

logger = logging.getLogger("Mysql")
logger.setLevel(logging.DEBUG) #logger 优先级高于其它输出途径的


#add handler   to logger instance
logger.addHandler(ch)
logger.addHandler(fh)



#add filter
logger.addFilter(IgnoreBackupLogFilter())

logger.debug("test ....")
logger.info("test info ....")
logger.warning("start to run db backup job ....")
logger.error("test error ....")

View Code

文件自动截断例子

import logging

from logging import handlers

logger = logging.getLogger(__name__)

log_file = "timelog.log"
#fh = handlers.RotatingFileHandler(filename=log_file,maxBytes=10,backupCount=3)
fh = handlers.TimedRotatingFileHandler(filename=log_file,when="S",interval=5,backupCount=3)


formatter = logging.Formatter('%(asctime)s %(module)s:%(lineno)d %(message)s')

fh.setFormatter(formatter)

logger.addHandler(fh)


logger.warning("test1")
logger.warning("test12")
logger.warning("test13")
logger.warning("test14")

View Code

re模块

正则表达式就是字符串的匹配规则，在多数编程语言里都有相应的支持，python里对应的模块是re

常用的表达式规则

'.'     默认匹配除\n之外的任意一个字符，若指定flag DOTALL,则匹配任意字符，包括换行
'^'     匹配字符开头，若指定flags MULTILINE,这种也可以匹配上(r"^a","\nabc\neee",flags=re.MULTILINE)
'$'     匹配字符结尾， 若指定flags MULTILINE ,re.search('foo.$','foo1\nfoo2\n',re.MULTILINE).group() 会匹配到foo1
'*'     匹配*号前的字符0次或多次， re.search('a*','aaaabac')  结果'aaaa'
' '     匹配前一个字符1次或多次，re.findall("ab ","ab cd abb bba") 结果['ab', 'abb']
'?'     匹配前一个字符1次或0次 ,re.search('b?','alex').group() 匹配b 0次
'{m}'   匹配前一个字符m次 ,re.search('b{3}','alexbbbs').group()  匹配到'bbb'
'{n,m}' 匹配前一个字符n到m次，re.findall("ab{1,3}","abb abc abbcbbb") 结果'abb', 'ab', 'abb']
'|'     匹配|左或|右的字符，re.search("abc|ABC","ABCBabcCD").group() 结果'ABC'
'(...)' 分组匹配， re.search("(abc){2}a(123|45)", "abcabca456c").group() 结果为'abcabca45'


'\A'    只从字符开头匹配，re.search("\Aabc","alexabc") 是匹配不到的，相当于re.match('abc',"alexabc") 或^
'\Z'    匹配字符结尾，同$ 
'\d'    匹配数字0-9
'\D'    匹配非数字
'\w'    匹配[A-Za-z0-9]
'\W'    匹配非[A-Za-z0-9]
's'     匹配空白字符、\t、\n、\r , re.search("\s ","ab\tc1\n3").group() 结果 '\t'

'(?P<name>...)' 分组匹配 re.search("(?P<province>[0-9]{4})(?P<city>[0-9]{2})(?P<birthday>[0-9]{4})","371481199306143242").groupdict("city") 结果{'province': '3714', 'city': '81', 'birthday': '1993'}

re的匹配语法有以下几种

re.match 从头开始匹配
re.search 匹配包含
re.findall 把所有匹配到的字符放到以列表中的元素返回
re.split 以匹配到的字符当做列表分隔符
re.sub 匹配字符并替换
re.fullmatch 全部匹配

软件开发目录规范

Foo/
|-- bin/
|   |-- foo
|
|-- foo/
|   |-- tests/
|   |   |-- __init__.py
|   |   |-- test_main.py
|   |
|   |-- __init__.py
|   |-- main.py
|
|-- docs/
|   |-- conf.py
|   |-- abc.rst
|
|-- setup.py
|-- requirements.txt
|-- README

简要解释一下:

bin/: 存放项目的一些可执行文件，当然你可以起名script/之类的也行。
foo/: 存放项目的所有源代码。(1) 源代码中的所有模块、包都应该放在此目录。不要置于顶层目录。(2) 其子目录tests/存放单元测试代码； (3) 程序的入口最好命名为main.py。
docs/: 存放一些文档。
setup.py: 安装、部署、打包的脚本。
requirements.txt: 存放软件依赖的外部Python包列表。
README: 项目说明文件。

conf 存配置文件的，（账户数据、日志的文件名或者格式，这些用户可以配置的；）

来源：http://www./content-4-27711.html