就业生态解析篇——数据预处理代码部分

本文主要是介绍就业生态解析篇——数据预处理代码部分，对大家解决编程问题具有一定的参考价值，需要的程序猿们随着小编来一起学习吧！

连接爬取到的存储在MySQL种的数据，在该部分不展示。

data = data[data.job_name.str.contains('数据')]# 工作名是否含有数据

# 月薪
import re
def salary_deal(text):
    if '万/月' in text:
        unit = 10000
    elif '千/月' in text:
        unit = 1000
    elif '元/天' in text:
        unit = 22
    elif '元/小时' in text:
        unit = 10*22
    elif '万/年' in text:
        unit = 1/12*10000
    else:
        return 0
    
    res = re.findall(r'(\d+\.*\d*)',text)
    res = list(map(eval,res))# 将第一个表达式作用于第二个
    if len(res)==1:
        return int(res[0]*unit)
    elif len(res)==2:
        return int((res[0]+res[1])*unit/2)
    else:
        raise ValueError# 转换成多少元/月，取平均数

data.loc[:,'salary'] = data.providesalary_text.apply(salary_deal)

# city切割
data.loc[:,'city'] = data.workarea_text.apply(lambda x:x.split('-')[0])
data.drop(columns='job_id',inplace=True)

# 月薪-区间
bins = [0,1]+[i for i in range(4000,14001,2000)]+[20000,30000,40000,200000] 
# 0,1
# 4000，6000，8000，100000，12000，14000
# 20000，30000，40000，200000
temp = pd.cut(data.salary,bins,right=False)# 分箱操作
data.loc[:,'salary_range'] = temp

# 公司类型
data.loc[:,'company_type'] = data.companytype_text

# 学历
def education_deal(text):
    education = ['中专','大专','本科','硕士','博士','研究生']
    for e in education:
        if e in text:
            return e
    return '其它'


data.loc[:,'education'] = data.attribute_text.apply(education_deal) # Invoke function on values of Series.

final_data =  data.iloc[:, [0,2,6,7,8,9,10,11]]


# 福利指数
final_data.loc[:,'treatment_score'] = final_data.jobwelf.apply(lambda x: len(x.split())) #福利

from provinces import PROVINCES

# 省份
# 省级市+县级市，区
def find_province(x):
    for p in PROVINCES: # 省份
        for c in p.get('city'):  
            if (x in c.get('name'))or (x in c.get('districtAndCounty')): 
                return p.get('name')
    return None

final_data.loc[:,'provinces'] = final_data.city.apply(find_province)

final_data.to_excel('job_data_shichang.xlsx')

这篇关于就业生态解析篇——数据预处理代码部分的文章就介绍到这儿，希望我们推荐的文章对大家有所帮助，也希望大家多多支持为之网！

Java教程

就业生态解析篇——数据预处理代码部分

前端开发

后端开发

移动端开发

数据库

服务器运维

人工智能

区块链

游戏开发

网站运营

大数据/云计算

软件工程

软件/开发工具使用

资讯