分享

pandas中行列转换

 北方的白桦林 2018-12-09

①列转行方法

  • stack函数:pandas.DataFrame.stack(self, level=-1, dropna=True)
通过?pandas.DataFrame.stack命令查看帮助文档
  1. Signature: pandas.DataFrame.stack(self, level=-1, dropna=True)
  2. Docstring:
  3. Pivot a level of the (possibly hierarchical) column labels, returning a
  4. DataFrame (or Series in the case of an object with a single level of
  5. column labels) having a hierarchical index with a new inner-most level
  6. of row labels.
  7. The level involved will automatically get sorted.
a、对于普通的DataFrame而言,直接列索引转换到最内层行索引,生一个Series对象
  1. In [16]: import pandas as pd
  2. ...: import numpy as np
  3. ...: df = pd.DataFrame(np.arange(6).reshape(2,3),index=['AA','BB'],columns=
  4. ...: ['three','two','one'])
  5. ...: df
  6. ...:
  7. Out[16]:
  8. three two one
  9. AA 0 1 2
  10. BB 3 4 5
  11. In [17]: df.stack()
  12. Out[17]:
  13. AA three 0
  14. two 1
  15. one 2
  16. BB three 3
  17. two 4
  18. one 5
  19. dtype: int32
  20. In [18]: df.stack(level=0)
  21. Out[18]:
  22. AA three 0
  23. two 1
  24. one 2
  25. BB three 3
  26. two 4
  27. one 5
  28. dtype: int32
  29. In [19]: df.stack(level=-1)
  30. Out[19]:
  31. AA three 0
  32. two 1
  33. one 2
  34. BB three 3
  35. two 4
  36. one 5
  37. dtype: int32
b、对于层次化索引的DataFrame而言,可以将指定的索引层转换到行上,默认是将最内层的列索引转换到最内层行
  1. In [31]: import pandas as pd
  2. ...: import numpy as np
  3. ...: df = pd.DataFrame(np.arange(8).reshape(2,4),index=['AA','BB'],columns=
  4. ...: [['two','two','one','one'],['A','B','C','D']])
  5. ...: df
  6. ...:
  7. Out[31]:
  8. two one
  9. A B C D
  10. AA 0 1 2 3
  11. BB 4 5 6 7
  12. In [32]: df.stack()
  13. Out[32]:
  14. one two
  15. AA A NaN 0.0
  16. B NaN 1.0
  17. C 2.0 NaN
  18. D 3.0 NaN
  19. BB A NaN 4.0
  20. B NaN 5.0
  21. C 6.0 NaN
  22. D 7.0 NaN
  23. In [33]: df.stack(level=0)
  24. Out[33]:
  25. A B C D
  26. AA one NaN NaN 2.0 3.0
  27. two 0.0 1.0 NaN NaN
  28. BB one NaN NaN 6.0 7.0
  29. two 4.0 5.0 NaN NaN
  30. In [34]: df.stack(level=1)
  31. Out[34]:
  32. one two
  33. AA A NaN 0.0
  34. B NaN 1.0
  35. C 2.0 NaN
  36. D 3.0 NaN
  37. BB A NaN 4.0
  38. B NaN 5.0
  39. C 6.0 NaN
  40. D 7.0 NaN
  41. In [35]: df.stack(level=-1)
  42. Out[35]:
  43. one two
  44. AA A NaN 0.0
  45. B NaN 1.0
  46. C 2.0 NaN
  47. D 3.0 NaN
  48. BB A NaN 4.0
  49. B NaN 5.0
  50. C 6.0 NaN
  51. D 7.0 NaN
  52. In [36]: df.stack(level=[0,1])
  53. Out[36]:
  54. AA one C 2.0
  55. D 3.0
  56. two A 0.0
  57. B 1.0
  58. BB one C 6.0
  59. D 7.0
  60. two A 4.0
  61. B 5.0
  62. dtype: float64
  • unstack函数:pandas.DataFrame.unstack(self, level=-1, fill_value=None)
通过?pandas.DataFrame.unstack命令查看帮助文档
  1. Signature: pandas.DataFrame.unstack(self, level=-1, fill_value=None)
  2. Docstring:
  3. Pivot a level of the (necessarily hierarchical) index labels, returning
  4. a DataFrame having a new level of column labels whose inner-most level
  5. consists of the pivoted index labels. If the index is not a MultiIndex,
  6. the output will be a Series (the analogue of stack when the columns are
  7. not a MultiIndex).
  8. The level involved will automatically get sorted.
a、对于普通的DataFrame而言,直接将列索引转换到行索引的最外层索引,生成一个Series对象
  1. In [20]: df
  2. Out[20]:
  3. three two one
  4. AA 0 1 2
  5. BB 3 4 5
  6. In [21]: df.unstack()
  7. Out[21]:
  8. three AA 0
  9. BB 3
  10. two AA 1
  11. BB 4
  12. one AA 2
  13. BB 5
  14. dtype: int32
  15. In [22]: df.unstack(0)
  16. Out[22]:
  17. three AA 0
  18. BB 3
  19. two AA 1
  20. BB 4
  21. one AA 2
  22. BB 5
  23. dtype: int32
  24. In [23]: df.unstack(-1)
  25. Out[23]:
  26. three AA 0
  27. BB 3
  28. two AA 1
  29. BB 4
  30. one AA 2
  31. BB 5
  32. dtype: int32
b、对于层次化索引的DataFrame而言,和stack函数类似,似乎把两层索引当作一个整体,当level为列表时报错
  1. In [37]: df
  2. Out[37]:
  3. two one
  4. A B C D
  5. AA 0 1 2 3
  6. BB 4 5 6 7
  7. In [38]: df.unstack()
  8. Out[38]:
  9. two A AA 0
  10. BB 4
  11. B AA 1
  12. BB 5
  13. one C AA 2
  14. BB 6
  15. D AA 3
  16. BB 7
  17. dtype: int32
  18. In [39]: df.unstack(0)
  19. Out[39]:
  20. two A AA 0
  21. BB 4
  22. B AA 1
  23. BB 5
  24. one C AA 2
  25. BB 6
  26. D AA 3
  27. BB 7
  28. dtype: int32
  29. In [40]: df.unstack(1)
  30. Out[40]:
  31. two A AA 0
  32. BB 4
  33. B AA 1
  34. BB 5
  35. one C AA 2
  36. BB 6
  37. D AA 3
  38. BB 7
  39. dtype: int32
  40. In [41]: df.unstack(-1)
  41. Out[41]:
  42. two A AA 0
  43. BB 4
  44. B AA 1
  45. BB 5
  46. one C AA 2
  47. BB 6
  48. D AA 3
  49. BB 7
  50. dtype: int32
  51. In [42]: df.unstack(level=[0,1])
  52. IndexError: Too many levels: Index has only 1 level, not 2
那再试下level=5,发现也正常,这里的level怎么理解?--遗留问题
  1. In [44]: df
  2. Out[44]:
  3. two one
  4. A B C D
  5. AA 0 1 2 3
  6. BB 4 5 6 7
  7. In [45]: df.unstack(level=5)
  8. Out[45]:
  9. two A AA 0
  10. BB 4
  11. B AA 1
  12. BB 5
  13. one C AA 2
  14. BB 6
  15. D AA 3
  16. BB 7
  17. dtype: int32
  • melt函数:pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None)
通过?pandas.melt查看帮助文档
  1. Signature: pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None)
  2. Docstring:
  3. "Unpivots" a DataFrame from wide format to long format, optionally leaving
  4. identifier variables set.
  5. This function is useful to massage a DataFrame into a format where one
  6. or more columns are identifier variables (`id_vars`), while all other
  7. columns, considered measured variables (`value_vars`), are "unpivoted" to
  8. the row axis, leaving just two non-identifier columns, 'variable' and
  9. 'value'.
首先拿普通的DataFrame实验下,看看melt函数怎么转换的
  1. In [46]: df = pd.DataFrame(np.arange(8).reshape(2,4),index=['AA','BB'],columns=
  2. ...: ['A','B','C','D'])
  3. ...: df
  4. ...:
  5. Out[46]:
  6. A B C D
  7. AA 0 1 2 3
  8. BB 4 5 6 7
  9. In [47]: pd.melt(df,id_vars=['A','C'],value_vars=['B','D'],var_name='B|D',value
  10. ...: _name='(B|D)_value')
  11. Out[47]:
  12. A C B|D (B|D)_value
  13. 0 0 2 B 1
  14. 1 4 6 B 5
  15. 2 0 2 D 3
  16. 3 4 6 D 7
  17. In [48]: pd.melt(df,id_vars=['A'],value_vars=['B','D'],var_name='B|D',value_nam
  18. ...: e='(B|D)_value')
  19. Out[48]:
  20. A B|D (B|D)_value
  21. 0 0 B 1
  22. 1 4 B 5
  23. 2 0 D 3
  24. 3 4 D 7
  25. In [49]: pd.melt(df,id_vars=['A'],value_vars=['B'],var_name='B',value_name='B_v
  26. ...: alue')
  27. Out[49]:
  28. A B B_value
  29. 0 0 B 1
  30. 1 4 B 5
结论:从上述结果可以看出,id_vars可以理解为结果需要保留的原始列,value_vars可以理解为需需要列转行的列名;var_name把列转行的列变量重新命名,默认为variable;value_name列转行对应变量的值的名称
  1. In [50]: df1 = pd.DataFrame(np.arange(8).reshape(2,4),columns=[list('ABCD'),lis
  2. ...: t('EFGH')])
  3. ...: df1
  4. ...:
  5. Out[50]:
  6. A B C D
  7. E F G H
  8. 0 0 1 2 3
  9. 1 4 5 6 7
  10. In [51]: pd.melt(df1,col_level=0,id_vars=['A'],value_vars=['D'])
  11. Out[51]:
  12. A variable value
  13. 0 0 D 3
  14. 1 4 D 7
②行转列方法
  • unstack函数:pandas.DataFrame.unstack(self, level=-1, fill_value=None)
  1. In [26]: df2=df.stack()
  2. ...: df2
  3. ...:
  4. Out[26]:
  5. AA three 0
  6. two 1
  7. one 2
  8. BB three 3
  9. two 4
  10. one 5
  11. dtype: int32
  12. In [27]: df2.unstack()
  13. Out[27]:
  14. three two one
  15. AA 0 1 2
  16. BB 3 4 5
  17. In [28]: df2.unstack(0)
  18. Out[28]:
  19. AA BB
  20. three 0 3
  21. two 1 4
  22. one 2 5
  23. In [29]: df2.unstack(1)
  24. Out[29]:
  25. three two one
  26. AA 0 1 2
  27. BB 3 4 5
  28. In [30]: df2.unstack(-1)
  29. Out[30]:
  30. three two one
  31. AA 0 1 2
  32. BB 3 4 5



    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约