①列转行方法
- stack函数:pandas.DataFrame.stack(self, level=-1, dropna=True)
通过?pandas.DataFrame.stack命令查看帮助文档
Signature: pandas.DataFrame.stack(self, level=-1, dropna=True) Pivot a level of the (possibly hierarchical) column labels, returning a DataFrame (or Series in the case of an object with a single level of column labels) having a hierarchical index with a new inner-most level The level involved will automatically get sorted.
a、对于普通的DataFrame而言,直接列索引转换到最内层行索引,生一个Series对象
In [16]: import pandas as pd ...: df = pd.DataFrame(np.arange(6).reshape(2,3),index=['AA','BB'],columns= ...: ['three','two','one']) In [18]: df.stack(level=0) In [19]: df.stack(level=-1)
b、对于层次化索引的DataFrame而言,可以将指定的索引层转换到行上,默认是将最内层的列索引转换到最内层行
In [31]: import pandas as pd ...: df = pd.DataFrame(np.arange(8).reshape(2,4),index=['AA','BB'],columns= ...: [['two','two','one','one'],['A','B','C','D']]) In [33]: df.stack(level=0) In [34]: df.stack(level=1) In [35]: df.stack(level=-1) In [36]: df.stack(level=[0,1])
- unstack函数:pandas.DataFrame.unstack(self, level=-1, fill_value=None)
通过?pandas.DataFrame.unstack命令查看帮助文档
Signature: pandas.DataFrame.unstack(self, level=-1, fill_value=None) Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are The level involved will automatically get sorted.
a、对于普通的DataFrame而言,直接将列索引转换到行索引的最外层索引,生成一个Series对象
b、对于层次化索引的DataFrame而言,和stack函数类似,似乎把两层索引当作一个整体,当level为列表时报错
In [42]: df.unstack(level=[0,1]) IndexError: Too many levels: Index has only 1 level, not 2
那再试下level=5,发现也正常,这里的level怎么理解?--遗留问题
In [45]: df.unstack(level=5)
- melt函数:pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None)
通过?pandas.melt查看帮助文档
Signature: pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None) "Unpivots" a DataFrame from wide format to long format, optionally leaving identifier variables set. This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (`id_vars`), while all other columns, considered measured variables (`value_vars`), are "unpivoted" to the row axis, leaving just two non-identifier columns, 'variable' and
首先拿普通的DataFrame实验下,看看melt函数怎么转换的
In [46]: df = pd.DataFrame(np.arange(8).reshape(2,4),index=['AA','BB'],columns= In [47]: pd.melt(df,id_vars=['A','C'],value_vars=['B','D'],var_name='B|D',value ...: _name='(B|D)_value') In [48]: pd.melt(df,id_vars=['A'],value_vars=['B','D'],var_name='B|D',value_nam In [49]: pd.melt(df,id_vars=['A'],value_vars=['B'],var_name='B',value_name='B_v
结论:从上述结果可以看出,id_vars可以理解为结果需要保留的原始列,value_vars可以理解为需需要列转行的列名;var_name把列转行的列变量重新命名,默认为variable;value_name列转行对应变量的值的名称
In [50]: df1 = pd.DataFrame(np.arange(8).reshape(2,4),columns=[list('ABCD'),lis In [51]: pd.melt(df1,col_level=0,id_vars=['A'],value_vars=['D'])
②行转列方法
- unstack函数:pandas.DataFrame.unstack(self, level=-1, fill_value=None)
|