建议使用交互式Python解释器进行下面的操作:
引入numpy和pandas库:
import numpy as np import pandas as pd 复制代码
用列表生成Series时,索引会自动使用从0到len(列表)-1的数值。
In [1]: import numpy as np In [2]: import pandas as pd In [3]: data = pd.Series([1, 2, 3, 4, 5]) In [4]: data Out[4]: 0 1 1 2 2 3 3 4 4 5 dtype: int64 复制代码
可以使用Pandas的日期格式的Series和多维Numpy数组生成DataFrame:
In [5]: dates = pd.date_range('20211107', periods=6) In [6]: dates Out[6]: DatetimeIndex(['2021-11-07', '2021-11-08', '2021-11-09', '2021-11-10', '2021-11-11', '2021-11-12'], dtype='datetime64[ns]', freq='D') In [7]: data = pd.DataFrame(np.random.randn(6, 4), index=da ...: tes, columns=['A', 'B', 'C', 'D']) In [8]: data Out[8]: A B C D 2021-11-07 -0.543325 -1.140889 0.037109 2.039023 2021-11-08 1.275152 -0.208459 -1.025204 -0.765965 2021-11-09 0.646048 -0.548909 0.967998 0.260784 2021-11-10 -0.668352 -0.347682 -0.878964 -1.851527 2021-11-11 -0.620460 0.587318 -0.912959 -0.989953 2021-11-12 1.479600 -1.966536 -1.360499 0.059251 复制代码
也可以使用Series作为value的字典对象生成DataFrame:
In [9]: data = pd.DataFrame({'first': pd.Series([1, 2, 3, 4 ...: ])}) In [10]: data Out[10]: first 0 1 1 2 2 3 3 4 复制代码
查看DataFrame的头部和尾部数据:
In [11]: data.head() Out[11]: first 0 1 1 2 2 3 3 4 In [12]: data.tail() Out[12]: first 0 1 1 2 2 3 3 4 复制代码
查看行标签和列标签:
In [15]: data.index Out[15]: RangeIndex(start=0, stop=4, step=1) In [16]: data.columns Out[16]: Index(['first'], dtype='object') 复制代码
把DataFrame便捷的转换为其他数据:
In [17]: data.to_dict() Out[17]: {'first': {0: 1, 1: 2, 2: 3, 3: 4}} In [18]: data.to_numpy() Out[18]: array([[1], [2], [3], [4]]) In [19]: data.to_csv() Out[19]: ',first\n0,1\n1,2\n2,3\n3,4\n' In [20]: data.to_ to_clipboard() to_feather() to_json() to_csv() to_gbq() to_latex() to_dict() to_hdf() to_markdown() > to_excel() to_html() to_numpy() 复制代码
查看DataFrame的描述性数据统计:
In [24]: data.describe() Out[24]: first count 4.000000 mean 2.500000 std 1.290994 min 1.000000 25% 1.750000 50% 2.500000 75% 3.250000 max 4.000000 复制代码
转置矩阵:
In [26]: data.T Out[26]: 0 1 2 3 first 1 2 3 4