分享

Python3pandas库Series用法(基础整理)

 hdzgx 2020-01-07

构造/初始化Series的3种方法:

1)用列表list构建Series

import pandas as pd
my_list=[7,'Beijing','19大',3.1415,-10000,'Happy']
s=pd.Series(my_list)
print(type(s))
print(s)
<class 'pandas.core.series.Series'>
0           7
1     Beijing
2        19大
3      3.1415
4      -10000
5       Happy
dtype: object

1.a)pandas会默认用0到n来做Series的index,但也可以自己指定index,index你可以理解为dict里面的key

s=pd.Series([7,'Beijing','19大',3.1415,-10000,'Happy'],
index=['A','B','C','D','E','F'])
print(s)
A           7
B     Beijing
C        19大
D      3.1415
E      -10000
F       Happy
dtype: object

2)用字典dict来构建Series,因为Series本身其实就是key-value的结构

cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
print(apts)
Beijing      55000.0
Guangzhou    45000.0
Hangzhou     20000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     50000.0
Name: income, dtype: float64

3)用numpy array来构建Series

import numpy as np
d=pd.Series(np.random.randn(5),index=['a','b','c','d','e'])
print(d)
a   -0.329401
b   -0.435921
c   -0.232267
d   -0.846713
e   -0.406585
dtype: float64

选择数据:

1)可以像对待一个list一样对待一个Series,完成各种切片的操作

import pandas as pd
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
print(apts)
Beijing      55000.0
Guangzhou    45000.0
Hangzhou     20000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     50000.0
Name: income, dtype: float64
print(apts[3])
60000.0
print(apts[[3,4,1]])
Shanghai     60000.0
Suzhou           NaN
Guangzhou    45000.0
Name: income, dtype: float64
print(apts[1:])
Guangzhou    45000.0
Hangzhou     20000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     50000.0
Name: income, dtype: float64
print(apts[:-2])
Beijing      55000.0
Guangzhou    45000.0
Hangzhou     20000.0
Shanghai     60000.0
Name: income, dtype: float64
print(apts[1:]+apts[:-1])
Beijing           NaN
Guangzhou     90000.0
Hangzhou      40000.0
Shanghai     120000.0
Suzhou            NaN
shenzhen          NaN
Name: income, dtype: float64

2)Series就像一个dict,前面定义的index就是用来选择数据的

import pandas as pd
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
print(apts['Shanghai']) ###
60000.0
print('Hangzhou' in apts)
True
print('Choingqing' in apts)
False

3)boolean indexing,和numpy很像

import pandas as pd
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
less_than_50000=(apts<=50000) ###
print(apts[less_than_50000])
Guangzhou    45000.0
Hangzhou     20000.0
shenzhen     50000.0
Name: income, dtype: float64

注:可以使用numpy的各种函数mean,median,max,min

print(apts.mean()) 
46000.0

Series元素赋值:

1)直接利用索引值赋值

import pandas as pd
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
print(apts)
print('Old income of shenzhen:{}'.format(apts['shenzhen']))
Beijing      55000.0
Guangzhou    45000.0
Hangzhou     20000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     50000.0
Name: income, dtype: float64

Old income of shenzhen:50000.0
apts['shenzhen']=70000  ###
print(apts)
print('New income of shenzhen:{}'.format(apts['shenzhen']))
Beijing      55000.0
Guangzhou    45000.0
Hangzhou     20000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     70000.0
Name: income, dtype: float64

New income of shenzhen:70000.0

2)不要忘了上面的boolean indexing,在赋值里它也可以用

import pandas as pd
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
apts['shenzhen']=70000
print('New income of shenzhen:{}'.format(apts['shenzhen']))
less_than_50000=(apts<50000)  ###
print(less_than_50000)
apts[less_than_50000]=40000  ###
print(apts)
Beijing      False
Guangzhou     True
Hangzhou      True
Shanghai     False
Suzhou       False
shenzhen     False
Name: income, dtype: bool

Beijing      55000.0
Guangzhou    40000.0
Hangzhou     40000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     70000.0
Name: income, dtype: float64

数学运算

import pandas as pd
import numpy as np
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
apts['shenzhen']=70000
print('New income of shenzhen:{}'.format(apts['shenzhen']))
less_than_50000=(apts<50000)  
apts[less_than_50000]=40000  
print(apts)

print(apts/2)   ###
print(apts**1.5)   ###
print(np.log(apts))   ###
apts2=pd.Series({'Beijing':10000,'Shanghai':8000,'shenzhen':6000,'Tianjin':40000,'Guangzhou':7000,'Chongqing':30000})
print(apts2)
print(apts+apts2)   ###

数据缺失

cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
apts['shenzhen']=70000
less_than_50000=(apts<50000)
apts[less_than_50000]=40000
print(apts)
Beijing      55000.0
Guangzhou    40000.0
Hangzhou     40000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     70000.0
Name: income, dtype: float64
apts2=pd.Series({'Beijing':10000,'Shanghai':8000,'shenzhen':6000,'Tianjin':40000,'Guangzhou':7000,'Chongqing':30000})
print(apts2)
Beijing      10000
Chongqing    30000
Guangzhou     7000
Shanghai      8000
Tianjin      40000
shenzhen      6000
dtype: int64
print('Hangzhou' in apts)   ###
print('Hangzhou' in apts2)
True
False
print(apts.notnull()) #boolean条件   ###
Beijing       True
Guangzhou     True
Hangzhou      True
Shanghai      True
Suzhou       False
shenzhen      True
Name: income, dtype: bool
print(apts.isnull())   ###
Beijing      False
Guangzhou    False
Hangzhou     False
Shanghai     False
Suzhou        True
shenzhen     False
Name: income, dtype: bool
print(apts[apts.isnull()])   #利用缺失索引布尔值取元素
Suzhou   NaN
Name: income, dtype: float64
apts=apts+apts2   #索引缺失相加
print(apts)
Beijing      65000.0
Chongqing        NaN
Guangzhou    47000.0
Hangzhou         NaN
Shanghai     68000.0
Suzhou           NaN
Tianjin          NaN
shenzhen     76000.0
dtype: float64
apts[apts.isnull()]=apts.mean() #将缺失位置赋值为中值
print(apts)
Beijing      65000.0
Chongqing    64000.0
Guangzhou    47000.0
Hangzhou     64000.0
Shanghai     68000.0
Suzhou       64000.0
Tianjin      64000.0
shenzhen     76000.0
dtype: float64

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多