pandas对象中拥有一组常用的数学和统计方法,跟NumPy数组相比,它们是基于没有缺失数据的加上构建的。
In [71]: df = DataFrame([[1.4,np.nan],[7.1,-4.5],[np.nan,np.nan],[0.75,-1.3]],index=['a','b','c','d'],columns=['one','two'])
In [72]: df
Out[72]:
one two
a 1.40 NaN
b 7.10 -4.5
c NaN NaN
d 0.75 -1.3
In [73]: df.sum()
Out[73]:
one 9.25
two -5.80
dtype: float64
NA值会自动排除,当然我们也可以通过skipna参数禁用该功能
有些方法可以累计统计数据
df.cumsum()
Out[74]:
one two
a 1.40 NaN
b 8.50 -4.5
c NaN NaN
d 9.25 -5.8
idmin和idmax返回间接统计,即返回达到最大致或最小值的索引
In [77]: df.idxmax()
Out[77]:
one b
two d
dtype: object
In [78]: df.idxmin()
Out[78]:
one d
two b
dtype: object
常用的描述和统计函数,详情见
http://blog.csdn.net/u011707148/article/details/76822877
相关系数和协方差
有些统计时通过参数计算得到。看个例子,它的数据来自Yahoo!Finace的股票价格和成交量
pip install pandas_datareader
In [81]: from pandas_datareader import data as web
In [82]: all_data={}
In [83]: for ticker in ['AAPL','IBM','GOOG']:
...: all_data[ticker] = web.get_data_yahoo(ticker,'1/1/2000','1/1/2010')
...:
In [84]: price = DataFrame({tic:data['Adj Close'] for tic,data in all_data.iteritems()})
In [85]: volume = DataFrame({tic:data['Volume'] for tic,data in all_data.iteritems()})
In [86]: returns = price.pct_change() #计算百分比
In [87]: returns.tail()
Out[87]:
AAPL GOOG IBM
Date
2009-12-24 0.449644 -0.585705 0.033312
2009-12-28 0.286843 0.977260 0.359968
2009-12-29 -0.309294 -0.160838 -0.278636
2009-12-30 -0.074395 0.028634 -0.075809
2009-12-31 -0.144809 -0.167703 0.092164
有关的pandas_datareader用法,详情见
https://pandas-datareader./en/latest/
获取的数据集格式为:
all_data
{'AAPL': Open High Low Close Adj Close Volume
Date
2000-01-03 3.745536 4.017857 3.631696 3.997768 3.610740 133949200
2000-01-04 3.866071 3.950893 3.613839 3.660714 3.306317 128094400
2000-01-05 3.705357 3.948661 3.678571 3.714286 3.354702 194580400
2000-01-06 3.790179 3.821429 3.392857 3.392857 3.064391 191993200
2000-01-07 3.446429 3.607143 3.410714 3.553571 3.209547 115183600
2000-01-10 3.642857 3.651786 3.383929 3.491071 3.153097 126266000
2000-01-11 3.426339 3.549107 3.232143 3.312500 2.991814 110387200
2000-01-12 3.392857 3.410714 3.089286 3.113839 2.812385 244017200
2000-01-13 3.374439 3.526786 3.303571 3.455357 3.120841 258171200
2000-01-14 3.571429 3.651786 3.549107 3.587054 3.239787 97594000
2000-01-18 3.607143 3.785714 3.587054 3.712054 3.352686 114794400
2000-01-19 3.772321 3.883929 3.691964 3.805804 3.437360 149410800
2000-01-20 4.125000 4.339286 4.053571 4.053571 3.661140 457783200
2000-01-21 4.080357 4.080357 3.935268 3.975446 3.590579 123981200
2000-01-24 3.872768 4.026786 3.754464 3.794643 3.427280 110219200
2000-01-25 3.750000 4.040179 3.656250 4.008929 3.620820 124286400
2000-01-26 3.928571 4.078125 3.919643 3.935268 3.554291 91789600
2000-01-27 3.886161 4.035714 3.821429 3.928571 3.548242 85036000
2000-01-28 3.863839 3.959821 3.593750 3.629464 3.278092 105837200
2000-01-31 3.607143 3.709821 3.375000 3.705357 3.346637 175420000
2000-02-01 3.714286 3.750000 3.571429 3.580357 3.233739 79508800
2000-02-02 3.598214 3.647321 3.464286 3.529018 3.187370 116048800
2000-02-03 3.582589 3.723214 3.580357 3.689732 3.332525 118798400
2000-02-04 3.712054 3.928571 3.700893 3.857143 3.483729 106330000
2000-02-07 3.857143 4.080357 3.783482 4.073661 3.679286 110266800
2000-02-08 4.071429 4.147321 3.973214 4.102679 3.705494 102160800
2000-02-09 4.075893 4.183036 4.015625 4.022321 3.632916 74841200
2000-02-10 4.031250 4.066964 3.928571 4.053571 3.661140 75745600
2000-02-11 4.058036 4.075893 3.866071 3.883929 3.507921 53062800
2000-02-14 3.904018 4.138393 3.879464 4.136161 3.735734 91884800
... ... ... ... ... ...
2009-11-18 29.505714 29.571428 29.142857 29.422857 26.574398 93580200
2009-11-19 29.230000 29.230000 28.542856 28.644285 25.871201 135581600
2009-11-20 28.450001 28.627142 28.251429 28.559999 25.795074 101666600
2009-11-23 29.000000 29.428572 28.992857 29.411428 26.564075 118724200
2009-11-24 29.332857 29.411428 28.985714 29.205715 26.378279 79609600
2009-11-25 29.342857 29.378571 29.108572 29.170000 26.346025 71613500
2009-11-27 28.459999 28.994286 28.338572 28.655714 25.881527 73814300
2009-11-30 28.730000 28.811428 28.395714 28.558571 25.793781 106214500
2009-12-01 28.891428 28.967142 28.118572 28.138571 25.414444 116440800
2009-12-02 28.422857 28.774286 27.964285 28.032858 25.318968 178815000
2009-12-03 28.202858 28.425714 28.038572 28.068571 25.351227 112179900
2009-12-04 28.528572 28.554285 27.182858 27.617144 24.943501 206721200
2009-12-07 27.617144 27.681429 26.954287 26.992857 24.379650 178689700
2009-12-08 27.051428 27.478571 26.957144 27.124287 24.498354 172599700
2009-12-09 27.325714 28.308571 27.187143 28.257143 25.521540 171195500
2009-12-10 28.500000 28.528572 28.017143 28.061428 25.344772 122417400
2009-12-11 28.254286 28.285715 27.632856 27.809999 25.117678 107443700
2009-12-14 27.910000 28.204287 27.508572 28.139999 25.415737 123947600
2009-12-15 27.975714 28.215714 27.610001 27.738571 25.053171 104864900
2009-12-16 27.871429 28.071428 27.792856 27.861429 25.164133 88246200
2009-12-17 27.751429 27.857143 27.285715 27.408571 24.755119 97209700
2009-12-18 27.595715 27.928572 27.514286 27.918571 25.215746 152192600
2009-12-21 28.007143 28.535715 27.952858 28.318571 25.577024 152976600
2009-12-22 28.491428 28.692858 28.379999 28.622858 25.851851 87378900
2009-12-23 28.742857 28.911428 28.687143 28.871429 26.076355 86381400
2009-12-24 29.078571 29.907143 29.049999 29.862858 26.971806 125222300
2009-12-28 30.245714 30.564285 29.944286 30.230000 27.303402 161141400
2009-12-29 30.375713 30.388571 29.818571 29.871429 26.979546 111301400
2009-12-30 29.832857 30.285715 29.758572 30.234285 27.307270 103021100
2009-12-31 30.447144 30.478571 30.080000 30.104286 27.189861 88102700
[2515 rows x 6 columns],
'GOOG': Open High Low Close Adj Close Date
2004-08-19 49.813286 51.835709 47.800831 49.982655 49.982655
2004-08-20 50.316402 54.336334 50.062355 53.952770 53.952770
2004-08-23 55.168217 56.528118 54.321388 54.495735 54.495735
2004-08-24 55.412300 55.591629 51.591621 52.239193 52.239193
2004-08-25 52.284027 53.798351 51.746044 52.802086 52.802086
2004-08-26 52.279045 53.773445 52.134586 53.753517 53.753517
2004-08-27 53.848164 54.107193 52.647663 52.876804 52.876804
2004-08-30 52.443428 52.548038 50.814533 50.814533 50.814533
2004-08-31 50.958992 51.661362 50.889256 50.993862 50.993862
2004-09-01 51.158245 51.292744 49.648903 49.937820 49.937820
2004-09-02 49.409801 50.993862 49.285267 50.565468 50.565468
2004-09-03 50.286514 50.680038 49.474556 49.818268 49.818268
2004-09-07 50.316402 50.809555 49.619015 50.600338 50.600338
2004-09-08 50.181908 51.322632 50.062355 50.958992 50.958992
2004-09-09 51.073563 51.163227 50.311420 50.963974 50.963974
2004-09-10 50.610302 53.081039 50.460861 52.468334 52.468334
2004-09-13 53.115910 54.002586 53.031227 53.549286 53.549286
2004-09-14 53.524376 55.790882 53.195610 55.536835 55.536835
2004-09-15 55.073570 56.901718 54.894241 55.790882 55.790882
2004-09-16 55.960247 57.683788 55.616535 56.772205 56.772205
2004-09-17 56.996365 58.525631 56.562988 58.525631 58.525631
2004-09-20 58.256641 60.572956 58.166977 59.457142 59.457142
2004-09-21 59.681301 59.985161 58.535595 58.699978 58.699978
2004-09-22 58.480801 59.611561 58.186901 58.968971 58.968971
2004-09-23 59.198112 61.086033 58.291508 60.184414 60.184414
2004-09-24 60.244190 61.818291 59.656395 59.691261 59.691261
2004-09-27 59.556767 60.214302 58.680054 58.909195 58.909195
2004-09-28 60.423519 63.462128 59.880554 63.193138 63.193138
2004-09-29 63.113434 67.257904 62.879314 65.295258 65.295258
2004-09-30 64.707458 65.902977 64.259140 64.558022 64.558022
... ... ... ... ...
2009-11-18 287.248322 288.309357 284.966888 287.248322 287.248322
2009-11-19 285.813690 285.928284 283.935730 285.425171 285.425171
2009-11-20 283.686676 284.732758 283.636871 283.915802 283.915802
2009-11-23 287.168610 292.204742 286.854797 290.087677 290.087677
2009-11-24 290.172363 291.054047 287.193542 290.456299 290.456299
2009-11-25 292.110107 292.433899 290.257050 291.776337 291.776337
2009-11-27 284.932007 290.142487 284.418915 288.797516 288.797516
2009-11-30 289.230896 290.745209 287.477478 290.411469 290.411469
2009-12-01 292.966888 294.506104 290.411469 293.833649 293.833649
2009-12-02 294.396515 295.397766 292.015442 292.658051 292.658051
2009-12-03 293.420197 294.620697 291.407745 291.776337 291.776337
2009-12-04 295.402771 296.304382 288.508606 291.412720 291.412720
2009-12-07 291.014221 293.245850 289.415192 292.030396 292.030396
2009-12-08 290.660522 294.227173 289.913330 292.428894 292.428894
2009-12-09 292.653076 293.564636 290.700378 293.410217 293.410217
2009-12-10 294.117584 296.244598 294.102631 294.645599 294.645599
2009-12-11 296.229645 296.264526 292.767639 294.152435 294.152435
2009-12-14 296.563416 297.539734 295.198517 296.752686 296.752686
2009-12-15 295.542236 297.076477 294.391541 295.462524 295.462524
2009-12-16 298.182343 299.064026 297.205994 297.763916 297.763916
2009-12-17 297.106384 297.704132 295.771362 295.861053 295.861053
2009-12-18 296.902130 298.346710 296.389069 297.096405 297.096405
2009-12-21 297.689178 298.800018 296.722809 298.222198 298.222198
2009-12-22 299.547211 299.626923 298.306885 299.437622 299.437622
2009-12-23 300.623199 305.290710 300.299408 304.697906 304.697906
2009-12-24 305.320587 308.603271 304.991821 308.085236 308.085236
2009-12-28 309.669281 311.826202 308.085236 310.272034 310.272034
2009-12-29 311.203522 311.253357 307.990570 308.543518 308.543518
2009-12-30 308.095184 310.202271 307.851105 310.202271 310.202271
2009-12-31 311.208527 311.532288 308.832428 308.832428 308.832428
Volume
Date
2004-08-19 44871300
2004-08-20 22942800
2004-08-23 18342800
2004-08-24 15319700
2004-08-25 9232100
2004-08-26 7128600
2004-08-27 6241200
2004-08-30 5221400
2004-08-31 4941200
2004-09-01 9181600
2004-09-02 15190400
2004-09-03 5176800
2004-09-07 5875200
2004-09-08 5009200
2004-09-09 4080900
2004-09-10 8740200
2004-09-13 7881300
2004-09-14 10880300
2004-09-15 10763900
2004-09-16 9310200
2004-09-17 9517400
2004-09-20 10679200
2004-09-21 7263000
2004-09-22 7617100
2004-09-23 8576100
2004-09-24 9166700
2004-09-27 7099600
2004-09-28 17009400
2004-09-29 30661400
2004-09-30 13823300
...
2009-11-18 3110700
2009-11-19 4352200
2009-11-20 4027400
2009-11-23 5114000
2009-11-24 3222400
2009-11-25 2933300
2009-11-27 2779500
2009-11-30 3463000
2009-12-01 4657900
2009-12-02 3338800
2009-12-03 2868000
2009-12-04 5046000
2009-12-07 3284500
2009-12-08 3059300
2009-12-09 3575200
2009-12-10 3349000
2009-12-11 3452800
2009-12-14 3841100
2009-12-15 4577800
2009-12-16 5639800
2009-12-17 5297300
2009-12-18 7089400
2009-12-21 5161600
2009-12-22 3775600
2009-12-23 4160800
2009-12-24 1723800
2009-12-28 3408400
2009-12-29 2860200
2009-12-30 2942100
2009-12-31 2448700
[1353 rows x 6 columns],
'IBM': Open High Low Close Adj Close Date
2000-01-03 112.437500 116.000000 111.875000 116.000000 86.912140
2000-01-04 114.000000 114.500000 110.875000 112.062500 83.961983
2000-01-05 112.937500 119.750000 112.125000 116.000000 86.912140
2000-01-06 118.000000 118.937500 113.500000 114.000000 85.413643
2000-01-07 117.250000 117.937500 110.625000 113.500000 85.038986
2000-01-10 117.250000 119.375000 115.375000 118.000000 88.410614
2000-01-11 117.875000 121.125000 116.625000 119.000000 89.159859
2000-01-12 119.625000 122.000000 118.250000 119.500000 89.534462
2000-01-13 119.937500 121.000000 115.750000 118.250000 88.597923
2000-01-14 120.937500 123.312500 117.500000 119.625000 89.628128
2000-01-18 119.687500 119.750000 115.000000 115.750000 86.724800
2000-01-19 115.562500 122.000000 112.687500 119.500000 89.534462
2000-01-20 123.000000 124.750000 119.000000 119.000000 89.159859
2000-01-21 121.875000 123.000000 119.937500 121.500000 91.032967
2000-01-24 121.875000 122.875000 116.562500 121.500000 91.032967
2000-01-25 116.750000 119.500000 116.000000 119.125000 89.253494
2000-01-26 119.062500 119.937500 116.000000 116.750000 87.474060
2000-01-27 118.000000 118.437500 111.625000 113.500000 85.038986
2000-01-28 112.750000 114.187500 110.062500 111.562500 83.587364
2000-01-31 111.375000 112.812500 109.625000 112.250000 84.102455
2000-02-01 112.375000 113.437500 109.125000 110.000000 82.416672
2000-02-02 112.000000 115.812500 110.375000 113.500000 85.038986
2000-02-03 115.000000 117.500000 114.062500 117.125000 87.754974
2000-02-04 118.187500 118.437500 114.812500 115.625000 86.631165
2000-02-07 116.000000 116.875000 113.125000 114.125000 85.507286
2000-02-08 116.937500 119.000000 116.187500 118.812500 89.113068
2000-02-09 119.000000 119.750000 117.000000 117.375000 88.034897
2000-02-10 117.687500 119.125000 116.875000 119.125000 89.347443
2000-02-11 119.062500 119.437500 114.562500 115.375000 86.534813
2000-02-14 116.000000 116.375000 114.500000 116.062500 87.050453
... ... ... ... ...
2009-11-18 128.050003 128.350006 127.550003 128.149994 106.745453
2009-11-19 127.279999 127.839996 126.510002 127.540001 106.237350
2009-11-20 127.169998 127.690002 126.459999 126.959999 105.754219
2009-11-23 127.699997 128.940002 127.650002 128.199997 106.787109
2009-11-24 127.900002 128.320007 127.139999 127.930000 106.562210
2009-11-25 127.879997 128.220001 127.080002 127.279999 106.020775
2009-11-27 124.589996 126.620003 124.260002 125.699997 104.704666
2009-11-30 125.120003 126.599998 124.919998 126.349998 105.246086
2009-12-01 127.290001 128.389999 126.849998 127.940002 106.570534
2009-12-02 127.320000 128.389999 127.160004 127.209999 105.962456
2009-12-03 127.599998 128.470001 127.250000 127.550003 106.245659
2009-12-04 128.399994 128.899994 126.000000 127.250000 105.995773
2009-12-07 126.879997 127.529999 126.589996 127.040001 105.820869
2009-12-08 126.970001 127.349998 126.160004 126.800003 105.620956
2009-12-09 126.699997 128.389999 126.110001 128.389999 106.945366
2009-12-10 128.130005 129.470001 128.089996 129.339996 107.736694
2009-12-11 129.009995 129.770004 128.710007 129.679993 108.019913
2009-12-14 129.649994 129.979996 129.600006 129.929993 108.228142
2009-12-15 129.460007 129.860001 127.940002 128.490005 107.028687
2009-12-16 128.740005 129.600006 128.350006 128.710007 107.211914
2009-12-17 128.000000 128.559998 127.120003 127.400002 106.120712
2009-12-18 127.970001 128.389999 127.000000 127.910004 106.545555
2009-12-21 127.800003 128.940002 127.680000 128.649994 107.161919
2009-12-22 129.300003 129.979996 129.190002 129.929993 108.228142
2009-12-23 129.699997 130.000000 129.300003 130.000000 108.286469
2009-12-24 129.889999 130.570007 129.479996 130.570007 108.761246
2009-12-28 130.990005 132.309998 130.720001 132.309998 110.210617
2009-12-29 132.279999 132.369995 131.800003 131.850006 109.827469
2009-12-30 131.229996 132.679993 130.679993 132.570007 110.427193
2009-12-31 132.410004 132.850006 130.750000 130.899994 109.036133
Volume
Date
2000-01-03 10347700
2000-01-04 8227800
2000-01-05 12733200
2000-01-06 7971900
2000-01-07 11856700
2000-01-10 8540500
2000-01-11 7873300
2000-01-12 6803800
2000-01-13 8489700
2000-01-14 10956600
2000-01-18 7643900
2000-01-19 8634500
2000-01-20 17783400
2000-01-21 7868700
2000-01-24 6499200
2000-01-25 6936900
2000-01-26 4895100
2000-01-27 8324600
2000-01-28 6669400
2000-01-31 6202700
2000-02-01 7304700
2000-02-02 7707700
2000-02-03 6213800
2000-02-04 5405300
2000-02-07 5779500
2000-02-08 5464300
2000-02-09 5374700
2000-02-10 4446600
2000-02-11 5749400
2000-02-14 4544000
...
2009-11-18 3970100
2009-11-19 5308400
2009-11-20 5354200
2009-11-23 7003500
2009-11-24 4816500
2009-11-25 3967700
2009-11-27 3319200
2009-11-30 6302700
2009-12-01 6578600
2009-12-02 4597900
2009-12-03 5760000
2009-12-04 7068500
2009-12-07 4144400
2009-12-08 5351400
2009-12-09 6071900
2009-12-10 7077800
2009-12-11 6597200
2009-12-14 5201300
2009-12-15 7862600
2009-12-16 6372500
2009-12-17 5909500
2009-12-18 9106600
2009-12-21 4772500
2009-12-22 5535500
2009-12-23 4127600
2009-12-24 4265100
2009-12-28 5800400
2009-12-29 4184200
2009-12-30 3867000
2009-12-31 4223400
[2515 rows x 6 columns]}
Series的corr方法用于计算两个Series中重叠的、非NA的、按索引对齐的值的相关系数,cov用于计算协方差:下面我们来计算AAPL和IBM的相关系数和协方差。如果直接调用两个方法,则直接返回完整的相关系数和协方差。
In [94]: returns.AAPL.corr(returns.IBM)
Out[94]: 0.27155239676044468
In [95]: returns.AAPL.cov(returns.IBM)
Out[95]: 0.061849759999671697
In [96]: returns.corr()
Out[96]:
AAPL GOOG IBM
AAPL 1.000000 0.265386 0.271552
GOOG 0.265386 1.000000 0.278625
IBM 0.271552 0.278625 1.000000
In [97]: returns.cov()
Out[97]:
AAPL GOOG IBM
AAPL 0.329952 0.048933 0.061850
GOOG 0.048933 0.174015 0.043413
IBM 0.061850 0.043413 0.157224
利用corrwith方法,可以计算其列或行跟另一个Series或DataFrame之间的相关系数。传入一个series将会返回一个相关系数值series(针对各列进行计算)
In [98]: returns.corrwith(returns.IBM)
Out[98]:
AAPL 0.271552
GOOG 0.278625
IBM 1.000000
dtype: float64
In [100]: volume = DataFrame({tic:data['Volume'] for tic,data in all_data.iteritems()})
In [101]: returns.corrwith(volume)
Out[101]:
AAPL 0.317785
GOOG 0.385612
IBM 0.440509
dtype: float64
传入axis=1,则按行计算。在计算相关系数之前,所有的数据项都会被标签对齐。