用途:将一列数据划分为多列,而不改变原本的行数
在帮媳妇处理数据时遇到的问题,虽然后面她自己解决了,但我们用的方法不一样,在这里分享一下我的方法:
import numpy as np import pandas as pd # Import and suppress warnings import warnings warnings.filterwarnings('ignore') #更改表格文件路径: data = pd.read_csv('E:/yan/poj_with_feature.csv') #读取前五行数据: #寻找数据集中是否存在空值: display(data.isnull().any()) data = data['feature'] data = data.str.strip('[') # 去除首尾部分的括号 data = data.str.strip(']') data = data.str.replace('\n', '') # 将换行符用空格代替 data = pd.DataFrame(data) data.head(10) print("the shape of data:",data.shape) # 实现分列功能: i =0 for i in range(8): # 通过for循环产生新列 data[i+1] = data['feature'].map(lambda x:x.split()[i]) # 通过lamada函数产生新列,直接使用split() data = data.drop(['feature'],axis = 1) # 删除综合列 data.head(10) # print(data.shape) # 在所有列名前加上前缀 data = data.add_prefix('feature') data
PS: 对于一行一列或者几行一列的数据,数据量较小的情况,也可以使用EXCEL表格内置功能实现分列功能:
用途:将两个列表存入一个.txt 文件中,每个列表作为文件中单独存在的一个列
import numpy as np import pandas as pd # Import and suppress warnings import warnings warnings.filterwarnings('ignore') # 把数据存放到列表中: a = [0, 24, 140, 369, 564, 693, 864, 967, 1111, 1191, 1345, 1423, 1586, 1661, 1824, 1906, 2060, 2149, 2280, 2360, 2517, 2600, 2703, 3073, 3192, 3335, 3406, 3569, 3648, 3833, 3893, 4067, 4148, 4309, 4413, 4548, 4649, 4815, 4886, 5180, 5262, 5493, 5560, 5799, 5888, 6010] # 更改路径 b = [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0] c = [a, b] with open("E://Paper/PCG/list1.txt", "w") as file: for x in zip(*c): file.write("{0}\t{1}\n".format(*x)) with open("E://Paper/PCG/list1.txt", "r+") as file: d = file.readlines() print(d)
如果要化为科学计数法的形式,可以这么写:
import warnings warnings.filterwarnings('ignore') # 把数据存放到列表中: a = [0, 24, 140, 369, 564, 693, 864, 967, 1111, 1191, 1345, 1423, 1586, 1661, 1824, 1906, 2060, 2149, 2280, 2360, 2517, 2600, 2703, 3073, 3192, 3335, 3406, 3569, 3648, 3833, 3893, 4067, 4148, 4309, 4413, 4548, 4649, 4815, 4886, 5180, 5262, 5493, 5560, 5799, 5888, 6010] # 转换为科学计数标准型: for i in range(len(a)): a[i] = '{:e}'.format(a[i]) print(a) b = [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0] # 转换为科学计数标准型: for i in range(len(b)): b[i] = '{:e}'.format(b[i]) print(b) # 合并列表 c = [a, b] # 以写的形式打开文件 with open("E://Paper/PCG/list1.txt", "w") as file: for x in zip(*c): file.write("{0}\t{1}\n".format(*x)) # 以读的方式打开文件 with open("E://Paper/PCG/list1.txt", "r+") as file: d = file.readlines() print(d)
目前就遇到这两个问题,就写到这吧。。。