http://blog.csdn.net/pipisorry/article/details/53486777 面板数据{pandas数据结构有一维Series,二维DataFrame,这是三维Panel}pandas有一个Panel数据结构,可以将其看做一个三维版的,可以用一个由DataFrame对象组成的字典或一个三维ndarray来创建Panel对象: import pandas.io.data as web pdata = pd.Panel(dict((stk, web.get_data_yahoo(stk, '1/1/2009', '6/1/2012')) for stk in ['AAPL', 'GOOG', 'MSFT','DELL'])) Note: stk代表指标,6个指标;三维:stk,company,time. Panel中的每一项(类似于DataFrame的列)都是一个DataFrame >>> pdata <class 'pandas.core.panel.Panel'> Dimensions: 4 (items) x 868 (major_axis) x 6 (minor_axis) Items axis: AAPL to MSFT Major_axis axis: 2009-01-02 00:00:00 to 2012-06-01 00:00:00 Minor_axis axis: Open to Adj Close >>> pdata = pdata.swapaxes('items', 'minor') >>>pdata['Adj Close'] 三维度ix标签索引基于ix的标签索引被推广到了三个维度,因此可以选取指定日期或日期范围的所有数据,如下所示:>>> pdata.ix[:,'6/1/2012',:] >>>pdata.ix['Adj Close', '5/22/2012':,:] 另一个用于呈现面板数据(尤其是对拟合统计模型)的办法是“堆积式的” DataFrame 形式: >>> stacked=pdata.ix[:,'5/30/2012':,:].to_frame() >>>stacked DataFrame有一个相应的to_panel方法,它是to_frame的逆运算: >>> stacked.to_panel() <class 'pandas.core.panel.Panel'> Dimensions: 6 (items) x 3 (major_axis) x 4 (minor_axis) Items axis: Open to Adj Close Major_axis axis: 2012-05-30 00:00:00 to 2012-06-01 00:00:00 Minor_axis axis: AAPL to MSFT 皮皮Blog 字符串方法String MethodsSeries is equipped with a set of string processing methods in the strattribute that make it easy to operate on each element of the array, as in thecode snippet below. Note that pattern-matching instr generally usesregularexpressions by default (and insome cases always uses them). See more atVectorized String Methods. In [71]: s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat']) In [72]: s.str.lower() Out[72]: 0 a 1 b 2 c 3 aaba 4 baca 5 NaN 6 caba 7 dog 8 cat dtype: object 分类CategoricalsSince version 0.15, pandas can include categorical data in a DataFrame. For full docs, see thecategorical introduction and theAPI documentation. In [122]: df = pd.DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']}) Convert the raw grades to a categorical data type. In [123]: df["grade"] = df["raw_grade"].astype("category") In [124]: df["grade"] Out[124]: 0 a 1 b 2 b 3 a 4 a 5 e Name: grade, dtype: category Categories (3, object): [a, b, e] Rename the categories to more meaningful names (assigning to Series.cat.categories is inplace!) In [125]: df["grade"].cat.categories = ["very good", "good", "very bad"] Reorder the categories and simultaneously add the missing categories (methods underSeries.cat return a newSeries per default). In [126]: df["grade"] = df["grade"].cat.set_categories(["very bad", "bad", "medium", "good", "very good"]) In [127]: df["grade"] Out[127]: 0 very good 1 good 2 good 3 very good 4 very good 5 very bad Name: grade, dtype: category Categories (5, object): [very bad, bad, medium, good, very good] Sorting is per order in the categories, not lexical order. In [128]: df.sort_values(by="grade") Out[128]: id raw_grade grade 5 6 e very bad 1 2 b good 2 3 b good 0 1 a very good 3 4 a very good 4 5 a very good Grouping by a categorical column shows also empty categories. In [129]: df.groupby("grade").size() Out[129]: grade very bad 1 bad 0 medium 0 good 2 very good 3 dtype: int64 可视化PlotDataFrame内置基于matplotlib的绘图功能直接绘制Plotting docs. In [130]: ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000)) In [131]: ts = ts.cumsum() In [132]: ts.plot() Out[132]: <matplotlib.axes._subplots.AxesSubplot at 0xaf49988c> On DataFrame, plot() is a convenience to plot all of the columns with labels: In [133]: df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, .....: columns=['A', 'B', 'C', 'D']) .....: In [134]: df = df.cumsum() In [135]: plt.figure(); df.plot(); plt.legend(loc='best') Out[135]: <matplotlib.legend.Legend at 0xaf499d4c> 绘制盒图Python中有许多可视化模块,最流行的当属matpalotlib库[matplotlib绘图基础]。稍加提及,我们也可选择bokeh和seaborn模块[python高级绘图库seaborn]。 使用matplotlib # Import the module for plotting import matplotlib.pyplot asplt plt.show(df.plot(kind='box')) 使用pandas模块中集成R的ggplot主题来美化图表 要使用ggplot,我们只需要在上述代码中多加一行: importmatplotlib.pyplotasplt pd.options.display.mpl_style='default'# Sets the plotting display theme to ggplot2 df.plot(kind='box') 比matplotlib.pyplot主题简洁太多。 更好的是引入seaborn模块 该模块是一个统计数据可视化库: # Import the seaborn library import seaborn assns # Do the boxplot plt.show(sns.boxplot(df,widths=0.5,color="pastel")) plt.show(sns.violinplot(df,widths=0.5,color="pastel")) 绘制散点图scatterdf: age fat_percent plt.show(df.plot(kind='scatter', x='age', y='fat_percent')) Note: 不指定x,y会出错: ValueError: scatter requires and x and y column 绘制直方曲线图 plt.show(sns.distplot(df.ix[:,2],rug=True,bins=15)) 绘制其它图 with sns.axes_style("white"): plt.show(sns.jointplot(df.ix[:,1],df.ix[:,2],kind="kde")) plt.show(sns.lmplot("Benguet", "Ifugao", df)) from: http://blog.csdn.net/pipisorry/article/details/53486777 ref: 《利用Python进行数据分析》* Notebook Python: Getting Started with Data Analysis Python and R: Is Python really faster than R? |
|