data = list(range(1, 1000001)) result = [] for num in data: squared = num ** 2 if squared % 2 == 0: result.append(squared)
这段代码看起来没问题,但性能如何呢?在我的电脑上运行耗时约210毫秒。现在看看优化后的版本:
data = range(1, 1000001) # 注意这里去掉了list() result = [num ** 2for num in data if (num ** 2) % 2 == 0]
这个版本不仅更简洁,运行时间也降到了约150毫秒。而且我们还能进一步优化:
result = (num ** 2for num in data if (num ** 2) % 2 == 0)
使用生成器表达式后,内存占用大幅降低,处理速度更快。
为什么你的Python比别人的慢?
Python性能差异主要来自以下几个方面:
数据结构选择不当:列表用得多,元组和集合用得少
循环方式低效:总用for循环,不善用推导式和内置函数
重复计算:像上面例子中重复计算num**2
忽略Python内置工具:比如很少用functools、itertools等标准库
让我们再看一个文本处理的例子。假设我们需要统计一个大文本文件中每个单词出现的频率。常见写法是:
word_count = {} with open('large_file.txt') as f: for line in f: for word in line.split(): if word notin word_count: word_count[word] = 0 word_count[word] += 1
而高效写法是:
from collections import defaultdict word_count = defaultdict(int) with open('large_file.txt') as f: for line in f: for word in line.split(): word_count[word] += 1
更进一步,我们可以使用Counter:
from collections import Counter with open('large_file.txt') as f: word_count = Counter(word for line in f for word in line.split())
import concurrent.futures import math defis_prime(n): if n < 2: returnFalse for i in range(2, int(math.sqrt(n)) + 1): if n % i == 0: returnFalse returnTrue numbers = range(10**6, 10**6 + 1000) # 传统方式 primes = [n for n in numbers if is_prime(n)] # 并行方式 with concurrent.futures.ProcessPoolExecutor() as executor: primes = list(executor.map(is_prime, numbers))