本文是为了科普和演示如何将VBA代码翻译成python调用的形式:
下面这段拆分Excel表的vba代码来自才哥的文章《Python对比VBA实现excel表格合并与拆分》,作者“两百斤的老涛”:
Sub 表格拆分() '屏幕刷新=false Application.ScreenUpdating = False Dim LastRow, LastCol As Long Dim Sh, Sht As Worksheet 'Sh指代当前活动页 Set Sh = ActiveSheet '当前活动页的最后一行 LastRow = Sh.Cells(Rows.Count, 1).End(xlUp).Row '当前活动页的最后一列 LastCol = Sh.Cells(1, Columns.Count).End(xlToLeft).Column '定义D为字典 Dim D As Object Set D = CreateObject("Scripting.Dictionary") Dim Col As Integer 'Col为要手动输入要拆分的列序数 Col = InputBox("输入用于分组的列序号!") '从第2行找到最后一行 For i = 2 To LastRow '查找这个要拆分行,看它在不在字典里 TempStr = CStr(Sh.Cells(i, Col)) '如果在字典里 If D.exists(TempStr) Then '将数据放到对应的页里 Set Sht = Worksheets(TempStr) '字典key值对应的项目值记录该页当前内容添加的行数,每次+1 D(TempStr) = D(TempStr) + 1 '下面一行可以注释掉了跟下面的重复了…… 'Sht.Cells(D(TempStr), 1) = Sh.Cells(i, 1) For j = 1 To LastCol Sht.Cells(D(TempStr), j) = Sh.Cells(i, j) Next Else '如果不在字典里,就添加一个新key D.Add TempStr, 1 'i = i - 1是让该行一会儿重新检索一遍就能进到if里了 i = i - 1 '在最后一页新加一页,页名就是TempStr Sheets.Add After:=Sheets(Sheets.Count) Sheets(Sheets.Count).Name = TempStr '下面一行也是可以注释掉的 'Sheets(Sheets.Count).Cells(1, 1) = Sh.Cells(1, 1) '把第一行标题行弄过去 For j = 1 To LastCol Sheets(Sheets.Count).Cells(1, j) = Sh.Cells(1, j) Next End If Next '激活初始页,视觉上保持不变 Sh.Activate 'RT,GDCDSZ MsgBox ("完成!") End Sub
下面我们将其转换为python代码来调用:
建立在已经打开Excel文件的前提下:
import win32com.client as win32 # 导入模块 import os excel_app = win32.gencache.EnsureDispatch('Excel.Application') filename = "数据源.xlsx" filename = os.path.abspath(filename) wb = excel_app.Workbooks.Open(filename)
Set Sh = ActiveSheet
等价于:
Sh = wb.ActiveSheet
对于下面这两行代码:
'当前活动页的最后一行 LastRow = Sh.Cells(Rows.Count, 1).End(xlUp).Row '当前活动页的最后一列 LastCol = Sh.Cells(1, Columns.Count).End(xlToLeft).Column
首先对于Rows和Columns可以通过顶级的’Excel.Application’对象来引用,而xlUp和xlToLeft两个常量值,我目前采用的方案是通过文档进行查阅,首先定位到vba文档的Range.End 属性,然后再点击 Direction 参数的数据类型:https://docs.microsoft.com/zh-cn/office/vba/api/excel.xldirection
于是我们翻译为:
LastRow = Sh.Cells(excel_app.Rows.Count, 1).End(-4162).Row LastCol = Sh.Cells(1, excel_app.Columns.Count).End(-4159).Column
专业的vba程序员都习惯用上面的方法获取数据的行数和列数,但一般情况下用我前面的UsedRange的方法就够了。
由于数据都直接读取到python环境中,我们直接使用python的字典,继续翻译剩下的循环部分:
D = {} Col = 2 excel_app.ScreenUpdating = False for i in range(2, LastRow+1): TempStr = Sh.Cells(i, Col).Value if TempStr in D: Sht = wb.Sheets(TempStr) D[TempStr] += 1 for j in range(1, LastCol+1): Sht.Cells(D[TempStr], j).Value = Sh.Cells(i, j).Value else: D[TempStr] = 1 excel_app.Sheets.Add(After=wb.Sheets(wb.Sheets.Count)) wb.Sheets(wb.Sheets.Count).Name = TempStr for j in range(1, LastCol+1): wb.Sheets(wb.Sheets.Count).Cells(1, j).Value = Sh.Cells(1, j).Value Sh.Activate() excel_app.ScreenUpdating = True
我再按照个人的习惯重新编写一下:
rows_dict = {} Col = 2 excel_app.ScreenUpdating = False for i in range(2, LastRow+1): k = Sh.Cells(i, Col).Value if k not in rows_dict: Sht = excel_app.Sheets.Add(After=wb.Sheets(wb.Sheets.Count)) Sht.Name = k Sht.Range(Sht.Cells(1, 1), Sht.Cells(1, LastCol)).Value = Sh.Range( Sh.Cells(1, 1), Sh.Cells(1, LastCol)).Value rows_dict[k] = 1 else: Sht = wb.Sheets(k) rows_dict[k] += 1 Sht.Range(Sht.Cells(rows_dict[k], 1), Sht.Cells( rows_dict[k], LastCol)).Value = Sh.Range(Sh.Cells(i, 1), Sh.Cells(i, LastCol)).Value Sh.Activate() excel_app.ScreenUpdating = True
最终完整代码:
import win32com.client as win32 # 导入模块 import os excel_app = win32.gencache.EnsureDispatch('Excel.Application') filename = "数据源.xlsx" filename = os.path.abspath(filename) wb = excel_app.Workbooks.Open(filename) Sh = wb.ActiveSheet LastRow = Sh.Cells(excel_app.Rows.Count, 1).End(-4162).Row LastCol = Sh.Cells(1, excel_app.Columns.Count).End(-4159).Column rows_dict = {} Col = 2 excel_app.ScreenUpdating = False for i in range(2, LastRow+1): k = Sh.Cells(i, Col).Value if k not in rows_dict: Sht = excel_app.Sheets.Add(After=wb.Sheets(wb.Sheets.Count)) Sht.Name = k Sht.Range(Sht.Cells(1, 1), Sht.Cells(1, LastCol)).Value = Sh.Range( Sh.Cells(1, 1), Sh.Cells(1, LastCol)).Value rows_dict[k] = 1 else: Sht = wb.Sheets(k) rows_dict[k] += 1 Sht.Range(Sht.Cells(rows_dict[k], 1), Sht.Cells( rows_dict[k], LastCol)).Value = Sh.Range(Sh.Cells(i, 1), Sh.Cells(i, LastCol)).Value Sh.Activate() excel_app.ScreenUpdating = True wb.SaveAs(os.path.abspath("result.xlsx")) wb.Close() excel_app.Quit()
经测试,原始vba代码在Excel环境中 运行耗时1秒以内,但运行以上python代码,耗时接近30秒。
这是因为,python通过vba读取Excel数据时,存在很频繁的交互,同时也说明并不是任何vba代码都适合用python来调用。对于大部分数据读写操作,用python自带的库会便捷很多,速度也会比vba快。
如果需要带格式拆分Excel表,使用以下vba代码即可:
Sub 带格式分列() Application.ScreenUpdating = False Set Sh = ActiveSheet max_rows = Sh.UsedRange.Rows.Count max_cols = Sh.UsedRange.Columns.Count Set Rng = Sh.Range(Sh.Cells(1, 1), Sh.Cells(max_rows, max_cols)) Rng.EntireColumn.AutoFit 'Col为要手动输入要拆分的列序数 Col = CInt(InputBox("输入用于分组的列序号!")) Range(Cells(2, Col), Cells(max_rows, Col)).AdvancedFilter Action:=xlFilterCopy, CopyToRange:=Cells(1, max_cols + 2), Unique:=True LastRow = Cells(1, max_cols + 2).End(xlDown).Row Range(Cells(1, max_cols + 2), Cells(LastRow, max_cols + 2)).RemoveDuplicates Columns:=1, Header:=xlNo LastRow = Cells(1, max_cols + 2).End(xlDown).Row For i = 1 To LastRow Name = CStr(Sh.Cells(i, max_cols + 2)) Sh.Activate Rng.AutoFilter Field:=Col, Criteria1:=Name Rng.Copy Set new_sheet = Sheets.Add(After:=Sheets(Sheets.Count)) new_sheet.Name = Name new_sheet.Range("A1").Activate new_sheet.Paste new_sheet.Range(new_sheet.Cells(1, 1), new_sheet.Cells(1, max_cols)).EntireColumn.AutoFit Next Sh.Activate Columns(max_cols + 2).Delete Shift:=xlToLeft Selection.AutoFilter Application.ScreenUpdating = True End Sub
更丰富的python实现请参考:https://blog.csdn.net/as604049322/article/details/118093775
深度剖析Excel表拆分的三项技术(已实现纯Openpyxl保留全部样式拆分,自适应单文件和多文件拆分等)