逗号分隔值(Comma-Separated Values,CSV,有时也称为字符分隔值,因为分隔字符也可以不是逗号),其文件以纯文本形式存储表格数据(数字和文本)。
对于这种格式的数据,我们需要利用open函数来读取文件并根据逗号分隔的特点来进行处理。
股票代码,股票名称,当前价,涨跌额,涨跌幅,年初至今 SH601778,N晶科,6.29,+1.92,-43.94%,+43.94% SH688566,吉贝尔,52.66,+6.96,+15.23%,+122.29% ...
练习题案例:下载文档中的所有图片且以用户名为图片名称存储。
ID,用户名,头像 26044585,Hush,https://hbimg.huabanimg.com/51d46dc32abe7ac7f83b94c67bb88cacc46869954f478-aP4Q3V 19318369,柒十一,https://hbimg.huabanimg.com/703fdb063bdc37b11033ef794f9b3a7adfa01fd21a6d1-wTFbnO 15529690,Law344,https://hbimg.huabanimg.com/b438d8c61ed2abf50ca94e00f257ca7a223e3b364b471-xrzoQd 18311394,Jennah·,https://hbimg.huabanimg.com/4edba1ed6a71797f52355aa1de5af961b85bf824cb71-px1nZz 18009711,可洛爱画画,https://hbimg.huabanimg.com/03331ef39b5c7687f5cc47dbcbafd974403c962ae88ce-Co8AUI 30574436,花姑凉~,https://hbimg.huabanimg.com/2f5b657edb9497ff8c41132e18000edb082d158c2404-8rYHbw 17740339,小巫師,https://hbimg.huabanimg.com/dbc6fd49f1915545cc42c1a1492a418dbaebd2c21bb9-9aDqgl 18741964,桐末tonmo,https://hbimg.huabanimg.com/b60cee303f62aaa592292f45a1ed8d5be9873b2ed5c-gAJehO 30535005,TANGZHIQI,https://hbimg.huabanimg.com/bbd08ee168d54665bf9b07899a5c4a4d6bc1eb8af77a4-8Gz3K1 31078743,你的老杨,https://hbimg.huabanimg.com/c46fbc3c9a01db37b8e786cbd7174bbd475e4cda220f4-F1u7MX 25519376,尺尺寸,https://hbimg.huabanimg.com/ee29ee198efb98f970e3dc2b24c40d89bfb6f911126b6-KGvKes 21113978,C-CLong,https://hbimg.huabanimg.com/7fa6b2a0d570e67246b34840a87d57c16a875dba9100-SXsSeY 24674102,szaa,https://hbimg.huabanimg.com/0716687b0df93e8c3a8e0925b6d2e4135449cd27597c4-gWdv24 30508507,爱起床的小灰灰,https://hbimg.huabanimg.com/4eafdbfa21b2f300a7becd8863f948e5e92ef789b5a5-1ozTKq 12593664,yokozen,https://hbimg.huabanimg.com/cd07bbaf052b752ed5c287602404ea719d7dd8161321b-cJtHss 16899164,一阵疯,https://hbimg.huabanimg.com/0940b557b28892658c3bcaf52f5ba8dc8402100e130b2-G966Uz 847937,卩丬My㊊伴er彎,https://hbimg.huabanimg.com/e2d6bb5bc8498c6f607492a8f96164aa2366b104e7a-kWaH68 31010628,慢慢即漫漫,https://hbimg.huabanimg.com/c4fb6718907a22f202e8dd14d52f0c369685e59cfea7-82FdsK 13438168,海贼玩跑跑,https://hbimg.huabanimg.com/1edae3ce6fe0f6e95b67b4f8b57c4cebf19c501b397e-BXwiW6 28593155,源稚生,https://hbimg.huabanimg.com/626cfd89ca4c10e6f875f3dfe1005331e4c0fd7fd429-9SeJeQ 28201821,合伙哼哼,https://hbimg.huabanimg.com/f59d4780531aa1892b80e0ec94d4ec78dcba08ff18c416-769X6a 28255146,漫步AAA,https://hbimg.huabanimg.com/3c034c520594e38353a039d7e7a5fd5e74fb53eb1086-KnpLaL 30537613,配䦹,https://hbimg.huabanimg.com/efd81d22c1b1a2de77a0e0d8e853282b83b6bbc590fd-y3d4GJ 22665880,日后必火,https://hbimg.huabanimg.com/69f0f959979a4fada9e9e55f565989544be88164d2b-INWbaF 16748980,keer521521,https://hbimg.huabanimg.com/654953460733026a7ef6e101404055627ad51784a95c-B6OFs4 30536510,“西辞”,https://hbimg.huabanimg.com/61cfffca6b2507bf51a507e8319d68a8b8c3a96968f-6IvMSk 30986577,艺成背锅王,https://hbimg.huabanimg.com/c381ecc43d6c69758a86a30ebf72976906ae6c53291f9-9zroHF 26409800,CsysADk7,https://hbimg.huabanimg.com/bf1d22092c2070d68ade012c588f2e410caaab1f58051-ahlgLm 30469116,18啊全阿,https://hbimg.huabanimg.com/654953460733026a7ef6e101404055627ad51784a95c-B6OFs4 15514336,W/小哥,https://hbimg.huabanimg.com/a30f5967fc0acf81421dd49650397de63c105b9ead1c-nVRrNl 17473505,椿の花,https://hbimg.huabanimg.com/0e38d810e5a24f91ebb251fd3aaaed8bb37655b14844c-pgNJBP 19165177,っ思忆゜♪,https://hbimg.huabanimg.com/4815ea0e4905d0f3bb82a654b481811dadbfe5ce2673-vMVr0B 16059616,格林熊丶,https://hbimg.huabanimg.com/8760a2b08d87e6ed4b7a9715b1a668176dbf84fec5b-jx14tZ 30734152,sCWVkJDG,https://hbimg.huabanimg.com/f31a5305d1b8717bbfb897723f267d316e58e7b7dc40-GD3e22 24019677,虚无本心,https://hbimg.huabanimg.com/6fdfa9834abe362e978b517275b06e7f0d5926aa650-N1xCXE 16670283,Y-雨后天空,https://hbimg.huabanimg.com/a3bbb0045b536fc27a6d2effa64a0d43f9f5193c177f-I2vHaI 21512483,汤姆2,https://hbimg.huabanimg.com/98cc50a61a7cc9b49a8af754ffb26bd15764a82f1133-AkiU7D 16441049,笑潇啸逍小鱼,https://hbimg.huabanimg.com/ae8a70cd85aff3a8587ff6578d5cf7620f3691df13e46-lmrIi9 24795603,v,https://hbimg.huabanimg.com/a7183cc3a933aa129d7b3230bf1378fd8f5857846cc5-3tDtx3 29819152,妮玛士珍多,https://hbimg.huabanimg.com/ca4ecb573bf1ff0415c7a873d64470dedc465ea1213c6-RAkArS 19101282,陈勇敢❤,https://hbimg.huabanimg.com/ab6d04ebaff3176e3570139a65155856871241b58bc6-Qklj2E 28337572,爱意随风散,https://hbimg.huabanimg.com/117ad8b6eeda57a562ac6ab2861111a793ca3d1d5543-SjWlk2 17342758,幸运instant,https://hbimg.huabanimg.com/72b5f9042ec297ae57b83431123bc1c066cca90fa23-3MoJNj 18483372,Beau染,https://hbimg.huabanimg.com/077115cb622b1ff3907ec6932e1b575393d5aae720487-d1cdT9 22127102,栽花的小蜻蜓,https://hbimg.huabanimg.com/6c3cbf9f27e17898083186fc51985e43269018cc1e1df-QfOIBG 13802024,LoveHsu,https://hbimg.huabanimg.com/f720a15f8b49b86a7c1ee4951263a8dbecfe3e43d2d-GPEauV 22558931,白驹过隙丶梨花泪う,https://hbimg.huabanimg.com/e49e1341dfe5144da5c71bd15f1052ef07ba7a0e1296b-jfyfDJ 11762339,cojoy,https://hbimg.huabanimg.com/5b27f876d5d391e7c4889bc5e8ba214419eb72b56822-83gYmB 30711623,雪碧学长呀,https://hbimg.huabanimg.com/2c288a1535048b05537ba523b3fc9eacc1e81273212d1-nr8M4t 18906718,西霸王,https://hbimg.huabanimg.com/7b02ad5e01bd8c0a29817e362814666a7800831c154a6-AvBDaG 31037856,邵阳的小哥哥,https://hbimg.huabanimg.com/654953460733026a7ef6e101404055627ad51784a95c-B6OFs4 26830711,稳健谭,https://hbimg.huabanimg.com/51547ade3f0aef134e8d268cfd4ad61110925aefec8a-NKPEYX
import os import requests with open('files/mv.csv', mode='r', encoding='utf-8') as file_object: file_object.readline() for line in file_object: user_id, username, url = line.strip().split(',') print(username, url) # 1.根据URL下载图片 res = requests.get( url=url, headers={ "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36" } ) # 检查images目录是否存在?不存在,则创建images目录 if not os.path.exists("images"): # 创建images目录 os.makedirs("images") # 2.将图片的内容写入到文件 with open("images/{}.png".format(username), mode='wb') as img_object: img_object.write(res.content)
ini文件是Initialization File的缩写,平时用于存储软件的的配置文件。例如:MySQL数据库的配置文件。
[mysqld] datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock log-bin=py-mysql-bin character-set-server=utf8 collation-server=utf8_general_ci log-error=/var/log/mysqld.log # Disabling symbolic-links is recommended to prevent assorted security risks symbolic-links=0 [mysqld_safe] log-error=/var/log/mariadb/mariadb.log pid-file=/var/run/mariadb/mariadb.pid [client] default-character-set=utf8
这种格式是可以直接使用open来出来,考虑到自己处理比较麻烦,所以Python为我们提供了更为方便的方式。
import configparser config = configparser.ConfigParser() config.read('files/my.ini', encoding='utf-8') # config.read('/Users/wupeiqi/PycharmProjects/luffyCourse/day09/files/my.ini', encoding='utf-8') # 1.获取所有的节点 """ result = config.sections() print(result) # ['mysqld', 'mysqld_safe', 'client'] """ # 2.获取节点下的键值 """ result = config.items("mysqld_safe") print(result) # [('log-error', '/var/log/mariadb/mariadb.log'), ('pid-file', '/var/run/mariadb/mariadb.pid')] for key, value in config.items("mysqld_safe"): print(key, value) """ # 3.获取某个节点下的键对应的值 """ result = config.get("mysqld","collation-server") print(result) """ # 4.其他 # 4.1 是否存在节点 # v1 = config.has_section("client") # print(v1) # 4.2 添加一个节点 # config.add_section("group") # config.set('group','name','wupeiqi') # config.set('client','name','wupeiqi') # config.write(open('files/new.ini', mode='w', encoding='utf-8')) # 4.3 删除 # config.remove_section('client') # config.remove_option("mysqld", "datadir") # config.write(open('files/new.ini', mode='w', encoding='utf-8'))
读取所有节点
import configparser config = configparser.ConfigParser() config.read('/Users/wupeiqi/PycharmProjects/luffyCourse/day09/files/my.conf', encoding='utf-8') # config.read('my.conf', encoding='utf-8') ret = config.sections() print(ret) >>输出 ['mysqld', 'mysqld_safe', 'client']
读取节点下的键值
import configparser config = configparser.ConfigParser() config.read('/Users/wupeiqi/PycharmProjects/luffyCourse/day09/files/my.conf', encoding='utf-8') # config.read('my.conf', encoding='utf-8') item_list = config.items("mysqld_safe") print(item_list) >>输出 [('log-error', '/var/log/mariadb/mariadb.log'), ('pid-file', '/var/run/mariadb/mariadb.pid')]
读取节点下值(根据 节点+键 )
import configparser config = configparser.ConfigParser() config.read('/Users/wupeiqi/PycharmProjects/luffyCourse/day09/files/my.conf', encoding='utf-8') value = config.get('mysqld', 'log-bin') print(value) >>输出 py-mysql-bin
检查、删除、添加节点
import configparser config = configparser.ConfigParser() config.read('/Users/wupeiqi/PycharmProjects/luffyCourse/day09/files/my.conf', encoding='utf-8') # config.read('my.conf', encoding='utf-8') # 检查 has_sec = config.has_section('mysqld') print(has_sec) # 添加节点 config.add_section("SEC_1") # 节点中设置键值 config.set('SEC_1', 'k10', "123") config.set('SEC_1', 'name', "哈哈哈哈哈") config.add_section("SEC_2") config.set('SEC_2', 'k10', "123") # 内容写入新文件 config.write(open('/Users/wupeiqi/PycharmProjects/luffyCourse/day09/files/xxoo.conf', 'w')) # 删除节点 config.remove_section("SEC_2") # 删除节点中的键值 config.remove_option('SEC_1', 'k10') config.write(open('/Users/wupeiqi/PycharmProjects/luffyCourse/day09/files/new.conf', 'w'))
可扩展标记语言,是一种简单的数据存储语言,XML 被设计用来传输和存储数据。
<data> <country name="Liechtenstein"> <rank updated="yes">2</rank> <year>2023</year> <gdppc>141100</gdppc> <neighbor direction="E" name="Austria" /> <neighbor direction="W" name="Switzerland" /> </country> <country name="Singapore"> <rank updated="yes">5</rank> <year>2026</year> <gdppc>59900</gdppc> <neighbor direction="N" name="Malaysia" /> </country> <country name="Panama"> <rank updated="yes">69</rank> <year>2026</year> <gdppc>13600</gdppc> <neighbor direction="W" name="Costa Rica" /> <neighbor direction="E" name="Colombia" /> </country> </data>
注意:在Python开发中用的相对来比较少,大家作为了解即可(后期课程在讲解微信支付、微信公众号消息处理 时会用到基于xml传输数据)。
例如:https://developers.weixin.qq.com/doc/offiaccount/Message_Management/Receiving_standard_messages.html
from xml.etree import ElementTree as ET # ET去打开xml文件 tree = ET.parse("files/xo.xml") # 获取根标签 root = tree.getroot() print(root) # <Element 'data' at 0x7f94e02763b0>
from xml.etree import ElementTree as ET content = """ <data> <country name="Liechtenstein"> <rank updated="yes">2</rank> <year>2023</year> <gdppc>141100</gdppc> <neighbor direction="E" name="Austria" /> <neighbor direction="W" name="Switzerland" /> </country> <country name="Panama"> <rank updated="yes">69</rank> <year>2026</year> <gdppc>13600</gdppc> <neighbor direction="W" name="Costa Rica" /> <neighbor direction="E" name="Colombia" /> </country> </data> """ root = ET.XML(content) print(root) # <Element 'data' at 0x7fdaa019cea0>
from xml.etree import ElementTree as ET content = """ <data> <country name="Liechtenstein" id="999" > <rank>2</rank> <year>2023</year> <gdppc>141100</gdppc> <neighbor direction="E" name="Austria" /> <neighbor direction="W" name="Switzerland" /> </country> <country name="Panama"> <rank>69</rank> <year>2026</year> <gdppc>13600</gdppc> <neighbor direction="W" name="Costa Rica" /> <neighbor direction="E" name="Colombia" /> </country> </data> """ # 获取根标签 data root = ET.XML(content) country_object = root.find("country") print(country_object.tag, country_object.attrib) gdppc_object = country_object.find("gdppc") print(gdppc_object.tag,gdppc_object.attrib,gdppc_object.text)
from xml.etree import ElementTree as ET content = """ <data> <country name="Liechtenstein"> <rank>2</rank> <year>2023</year> <gdppc>141100</gdppc> <neighbor direction="E" name="Austria" /> <neighbor direction="W" name="Switzerland" /> </country> <country name="Panama"> <rank>69</rank> <year>2026</year> <gdppc>13600</gdppc> <neighbor direction="W" name="Costa Rica" /> <neighbor direction="E" name="Colombia" /> </country> </data> """ # 获取根标签 data root = ET.XML(content) # 获取data标签的孩子标签 for child in root: # child.tag = conntry # child.attrib = {"name":"Liechtenstein"} print(child.tag, child.attrib) for node in child: print(node.tag, node.attrib, node.text)
from xml.etree import ElementTree as ET content = """ <data> <country name="Liechtenstein"> <rank>2</rank> <year>2023</year> <gdppc>141100</gdppc> <neighbor direction="E" name="Austria" /> <neighbor direction="W" name="Switzerland" /> </country> <country name="Panama"> <rank>69</rank> <year>2026</year> <gdppc>13600</gdppc> <neighbor direction="W" name="Costa Rica" /> <neighbor direction="E" name="Colombia" /> </country> </data> """ root = ET.XML(content) for child in root.iter('year'): print(child.tag, child.text)
from xml.etree import ElementTree as ET content = """ <data> <country name="Liechtenstein"> <rank>2</rank> <year>2023</year> <gdppc>141100</gdppc> <neighbor direction="E" name="Austria" /> <neighbor direction="W" name="Switzerland" /> </country> <country name="Panama"> <rank>69</rank> <year>2026</year> <gdppc>13600</gdppc> <neighbor direction="W" name="Costa Rica" /> <neighbor direction="E" name="Colombia" /> </country> </data> """ root = ET.XML(content) v1 = root.findall('country') print(v1) v2 = root.find('country').find('rank') print(v2.text)
from xml.etree import ElementTree as ET content = """ <data> <country name="Liechtenstein"> <rank>2</rank> <year>2023</year> <gdppc>141100</gdppc> <neighbor direction="E" name="Austria" /> <neighbor direction="W" name="Switzerland" /> </country> <country name="Panama"> <rank>69</rank> <year>2026</year> <gdppc>13600</gdppc> <neighbor direction="W" name="Costa Rica" /> <neighbor direction="E" name="Colombia" /> </country> </data> """ root = ET.XML(content) # 修改节点内容和属性 rank = root.find('country').find('rank') print(rank.text) rank.text = "999" rank.set('update', '2020-11-11') print(rank.text, rank.attrib) ############ 保存文件 ############ tree = ET.ElementTree(root) tree.write("new.xml", encoding='utf-8') # 删除节点 root.remove( root.find('country') ) print(root.findall('country')) ############ 保存文件 ############ tree = ET.ElementTree(root) tree.write("newnew.xml", encoding='utf-8')
<home> <son name="儿1"> <grandson name="儿11"></grandson> <grandson name="儿12"></grandson> </son> <son name="儿2"></son> </home>
from xml.etree import ElementTree as ET # 创建根标签 root = ET.Element("home") # 创建节点大儿子 son1 = ET.Element('son', {'name': '儿1'}) # 创建小儿子 son2 = ET.Element('son', {"name": '儿2'}) # 在大儿子中创建两个孙子 grandson1 = ET.Element('grandson', {'name': '儿11'}) grandson2 = ET.Element('grandson', {'name': '儿12'}) son1.append(grandson1) son1.append(grandson2) # 把儿子添加到根节点中 root.append(son1) root.append(son2) tree = ET.ElementTree(root) tree.write('oooo.xml', encoding='utf-8', short_empty_elements=False)
<user><![CDATA[你好呀]]</user>
from xml.etree import ElementTree as ET # 创建根节点 root = ET.Element("user") root.text = "<![CDATA[你好呀]]" et = ET.ElementTree(root) # 生成文档对象 et.write("test.xml", encoding="utf-8")
案例:
content = """<xml> <ToUserName><![CDATA[gh_7f083739789a]]></ToUserName> <FromUserName><![CDATA[oia2TjuEGTNoeX76QEjQNrcURxG8]]></FromUserName> <CreateTime>1395658920</CreateTime> <MsgType><![CDATA[event]]></MsgType> <Event><![CDATA[TEMPLATESENDJOBFINISH]]></Event> <MsgID>200163836</MsgID> <Status><![CDATA[success]]></Status> </xml>""" from xml.etree import ElementTree as ET info = {} root = ET.XML(content) for node in root: # print(node.tag,node.text) info[node.tag] = node.text print(info)