http://nbviewer./github/supergis/git_notebook/blob/master/pystart/pandas_quickstart.ipynb2016 Pandas_QuickStart?Origin from http://pandas./pandas-docs/stable/ 6.1 Object Creation?Creating a Series by passing a list of values, letting pandas create a default integer index: In [1]:
import pandas as pd import numpy as np import matplotlib.pyplot as plt s = pd.Series([1,3,5,np.nan,6,8]) s Out[1]:
0 1.0 1 3.0 2 5.0 3 NaN 4 6.0 5 8.0 dtype: float64 Creating a DataFrame by passing a numpy array, with a datetime index and labeled columns: In [2]:
dates = pd.date_range('20130101', periods=6) dates Out[2]:
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04', '2013-01-05', '2013-01-06'], dtype='datetime64[ns]', freq='D') In [7]:
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD')) df Out[7]:
Creating a DataFrame by passing a dict of objects that can be converted to series-like. In [8]:
df2 = pd.DataFrame({ 'A' : 1., 'B' : pd.Timestamp('20130102'), 'C' : pd.Series(1,index=list(range(4)),dtype='float32'), 'D' : np.array([3] * 4,dtype='int32'), 'E' : pd.Categorical(["test","train","test","train"]), 'F' : 'foo' }) df2 Out[8]:
In [9]:
df2.dtypes Out[9]:
A float64 B datetime64[ns] C float32 D int32 E category F object dtype: object If you’re using IPython, tab completion for column names (as well as public attributes) is automatically enabled. Here’s a subset of the attributes that will be completed:
In [11]:
df2. Out[11]:
0 1.0 1 1.0 2 1.0 3 1.0 Name: A, dtype: float64 As you can see, the columns A, B, C, and D are automatically tab completed. E is there as well; the rest of the attributes have been truncated for brevity. 6.2 Viewing Data?In [14]:
df.head() Out[14]:
In [15]:
df.tail(3) Out[15]:
In [16]:
df.index Out[16]:
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04', '2013-01-05', '2013-01-06'], dtype='datetime64[ns]', freq='D') In [17]:
df.values Out[17]:
array([[-1.33401275, -0.34829657, 0.38865407, -0.22596701], [-0.13997444, -1.34778853, 0.81707707, 0.19247685], [-1.0827386 , -0.5441047 , -1.42388302, -1.24736743], [ 0.03478847, -0.67722051, 0.12044917, 0.7943414 ], [ 0.42854678, -0.61015602, -0.95089113, -0.0580473 ], [ 0.12563068, -0.11665286, -0.54457518, -1.57878468]]) In [18]:
df.describe() Out[18]:
In [19]:
df.T Out[19]:
In [20]:
df.sort_index(axis=1, ascending=False) Out[20]:
In [21]:
df.sort_values(by='B') Out[21]:
6.3 Selection?Getting In [22]:
df['A'] Out[22]:
2013-01-01 -1.334013 2013-01-02 -0.139974 2013-01-03 -1.082739 2013-01-04 0.034788 2013-01-05 0.428547 2013-01-06 0.125631 Freq: D, Name: A, dtype: float64 In [23]:
df[0:3] Out[23]:
In [24]:
df['20130102':'20130104'] Out[24]:
6.3.2 Selection by Label?For getting a cross section using a label In [25]:
df.loc[dates[0]] Out[25]:
A -1.334013 B -0.348297 C 0.388654 D -0.225967 Name: 2013-01-01 00:00:00, dtype: float64 Selecting on a multi-axis by label In [26]:
df.loc[:,['A','B']] Out[26]:
Showing label slicing, both endpoints are included In [27]:
df.loc['20130102':'20130104',['A','B']] Out[27]:
Reduction in the dimensions of the returned object In [30]:
df.loc['20130102',['A','B']] Out[30]:
A -0.139974 B -1.347789 Name: 2013-01-02 00:00:00, dtype: float64 For getting a scalar value In [31]:
df.loc[dates[0],'A'] Out[31]:
-1.3340127475498547 For getting fast access to a scalar (equiv to the prior method) In [32]:
df.at[dates[0],'A'] Out[32]:
-1.3340127475498547 6.3.3 Selection by Position?See more in Selection by Position Select via the position of the passed integers In [33]:
df.iloc[3] Out[33]:
A 0.034788 B -0.677221 C 0.120449 D 0.794341 Name: 2013-01-04 00:00:00, dtype: float64 By integer slices, acting similar to numpy/python In [34]:
df.iloc[3:5,0:2] Out[34]:
By lists of integer position locations, similar to the numpy/python style In [35]:
df.iloc[[1,2,4],[0,2]] Out[35]:
For slicing rows explicitly In [36]:
df.iloc[1:3,:] Out[36]:
For slicing columns explicitly In [37]:
df.iloc[:,1:3] Out[37]:
For getting a value explicitly In [39]:
df.iloc[1,1] Out[39]:
-1.3477885295869219 For getting fast access to a scalar (equiv to the prior method) In [40]:
df.iat[1,1] Out[40]:
-1.3477885295869219 6.3.4 Boolean Indexing?Using a single column’s values to select data. In [41]:
df[df.A > 0] Out[41]:
A where operation for getting. In [42]:
df[df > 0] Out[42]:
Using the isin() method for filtering:?In [43]:
df2 = df.copy() 添加一列。 In [44]:
df2['E'] = ['one', 'one','two','three','four','three'] In [45]:
df2 Out[45]:
In [46]:
df2[df2['E'].isin(['two','four'])] Out[46]:
6.3.5 Setting?Setting a new column automatically aligns the data by the indexes In [48]:
s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20130102', periods=6)) s1 Out[48]:
2013-01-02 1 2013-01-03 2 2013-01-04 3 2013-01-05 4 2013-01-06 5 2013-01-07 6 Freq: D, dtype: int64 Setting values by position In [49]:
df.iat[0,1] = 0 Setting by assigning with a numpy array In [50]:
df.loc[:,'D'] = np.array([5] * len(df)) The result of the prior setting operations In [51]:
df Out[51]:
A where operation with setting. In [52]:
df2 = df.copy() In [53]:
df2[df2 > 0] = -df2 In [54]:
df2 Out[54]:
In [ ]:
|
|