简 介: 为了了准备人工神经网络中对于SOFM数据集合,在网络上搜集了六种5×7的点阵字符图片。通过程序将其转换成 0 - 1 编码,用于课程的作业中。
关键词
: ASCII,字符点阵
这是为了2021年人工神经网络课程第二次作业(针对于竞争网络)中的作业题。在去年的作业体重使用了课件上的三种字符作为SOM的数据集合。今年计划修改成另外一组数据集合。
▲ 图1.1 2020年作业中所使用的数据集合
Self-Organizing Maps and Applications
使用网络上的 7×9点阵,选取其中 G、H、I、N、O、Q、U、Z,也就是ZHUOQING中对应的八个字符,两种不同的字体,再有这两种不同的字体增加 汉明距离 为2,生成另外两组字符进行聚类。
选择GNINOQUZ作为训练样本,其中 H-N, O-Q较为难以区分。它们之间的汉明距离很接近。
但是经过网络搜索,发现网络上5×7点阵的字符集合比较多。
下面搜集了6中ASCII点阵字符。
▲ 图1.1.1 5×7点阵字体
▲ 图1.1.2 5×7点阵字体
▲ 图1.1.3 5×7点阵字体
▲ 图1.1.4 5×7点阵字体
▲ 图1.1.5 5×7点阵字体
▲ 图1.1.6 5×7点阵字体
上述所获得的点阵模板都是图片,需要将它们转换成按照行扫描的 0-1字符串。每个字符包括长度为35个0-1字符串进行。
首先将图片通过编辑器转换成前景是深色,背景是浅色的图片。如果原始图片相反,则通过图片颜色反向来获得。
▲ 图1.2.1 将图片转换成前景是深色,背景是前侧图片
在TEASOFT软件中,按照字符确定出每个字符点阵图片的边界。
▲ 图1.2.2 按照顺序确定出字符边界
#!/usr/local/bin/python # -*- coding: gbk -*- #============================================================ # ASCIIDOT.PY -- by Dr. ZhuoQing 2021-10-31 # # Note: #============================================================ from headm import * from PIL import Image boxid = [2, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35] picfile = tspgetdopfile(boxid[0]) picrange = tspgetrange(boxid[0]) #printf(picrange) printf(picfile) #------------------------------------------------------------ IMAGE_ROW = 7 IMAGE_COL = 5 PIXEL_THRESHOLD = 230 def image2Density(size, imagePixels): global boxid imageSize = size imageWidth = imageSize[0] imageHeight = imageSize[1] picwidth = picrange[2] - picrange[0] picheight = picrange[3] - picrange[1] widthRatio = imageWidth / picwidth heightRatio = imageHeight / picheight asciidim = [] for box in boxid[1:]: boxrange = tspgetrange(box) boxpos = [boxrange[0] - picrange[0], boxrange[1] - picrange[1], boxrange[2] - picrange[0], boxrange[3] - picrange[1]] boxheight = boxrange[3] - boxrange[1] boxwidth = boxrange[2] - boxrange[0] asciistr = '' for i in range(IMAGE_ROW): startRow = int((i * boxheight / IMAGE_ROW + boxpos[1]) * heightRatio) endRow = int(((i+1) * boxheight / IMAGE_ROW + boxpos[1]) * heightRatio) col = [] for i in range(IMAGE_COL): startCol = int((i * boxwidth / IMAGE_COL + boxpos[0]) * widthRatio) endCol = int(((i+1) * boxwidth / IMAGE_COL + boxpos[0]) * widthRatio) pixelNum = (endRow - startRow - 1) * (endCol - startCol - 1) pixelSigma = 0 for ii in range(startRow, endRow): for jj in range(startCol, endCol): pixelSigma += sum(imagePixels[ii, jj]) pixelSigma = int(pixelSigma / (pixelNum)) #printf(pixelSigma) if pixelSigma > PIXEL_THRESHOLD: col.append('0') else: col.append('1') str01 = ''.join(col) printf(str01.replace('0','.').replace('1','#')) asciistr = asciistr + str01 printf(' ') asciidim.append(asciistr) return asciidim #------------------------------------------------------------ img = Image.open(picfile) r,g,b = img.split() img = Image.merge("RGB", (r,g,b)).getdata() #plt.imshow(img) #plt.show() size = img.size print(size) img = array(img).sum(axis=1)/3 imgdata = img.reshape(size[1], size[0]) #imgaverage = imgdata.sum(axis=0) #printf(shape(imgaverage)) #plt.plot(imgaverage) #plt.xlabel("x") #plt.ylabel("y") #plt.grid(True) #plt.tight_layout() #plt.show() printf(shape(imgdata)) result = image2Density(size, imgdata) for s in result: printf(s) #------------------------------------------------------------ # END OF FILE : ASCIIDOT.PY #============================================================
▲ $#Y 1:字体1
01110100011000110001111111000110001 11110100011000111110100011000111110 01110100011000010000100001000101110 11110100011000110001100011000111110 11111100001000011110100001000011111 11111100001000011110100001000010000 01110100011000010000101111000101110 10001100011000111111100011000110001 01110001000010000100001000010001110 00001000010000100001100011000101110 10001100101010011000101001001010001 10000100001000010000100001000011111 10001110111010110001100011000110001 10001100011100110101100111000110001 01110100011000110001100011000101110 11110100011000110001111101000010000 01110100011000110001101011001101111 11110100011000110001111101000110001 01110100011000001110000011000101110 11111001000010000100001000010000100 10001100011000110001100011000101110 10001100011000110001100010101000100 10001100011000110001101011101110001 10001100010101000100010101000110001 10001100011000101010001000010000100 11111000010001000100010001000011111
▲ 图1.2.4 字体2
00111010011000110001111111000110001 11110100011000111110100011000111110 01110100011000010000100001000101110 11110100011000110001100011000111110 11111100001000011110100001000011111 11111100001000011110100001000010000 01110100011000010000100111000101110 10001100011000111111100011000110001 01110001000010000100001000010001110 01111000010000100001000011000101110 10001100101010011000101001001010001 10000100001000010000100001000011111 10001110111010110101100011000110001 10001100011100110101100111000110001 01110100011000110001100011000101110 11110100011000111110100001000010000 01110100011000110001100010111000001 11110100011000111110101001001010001 01110100011000001110000011000101110 11111001000010000100001000010000100 10001100011000110001100011000101110 10001100011000110001100010101000100 10001100011000110001101011010101010 10001100010101000100010101000110001 10001100011000101111000011000101110 11111000010001000100010001000011111
▲ 图1.2.5 字体3
01110010101101110001111111000110001 11110010010100101110010010100111110 01110100011000010000100001000101110 11110010100101101001010110101011110 11111100001000011110100001000011111 11111100001000011110100001000010000 01110100011000010111100011000101110 10001100011000111111100011000110001 01110001000010000100001000010001110 00111000110001000010110101101001110 10001100101010011000101001001010001 10000100001000010000100001000011111 10001110111111110101100011000110001 10001110011110110101101111001110001 01110100011000110001100011000101110 11110100011000111110100001000010000 01110100011000110001101011001001101 11110100011000111110100101001010001 01110100011000001110000011000101110 11111001000010000100001000010000100 10001100011000110001100011000101110 10001100011000111011010100111000100 10001100011010110101111111101111011 10001110110101000100010101101110001 10001110110101001110001000010000100 11110001100010000100010000100011110
▲ 图1.2.6 字体4
00111010011000110001111111000110001 11110100011000111110100011000111110 01110100011000010000100001000101110 11110100011000110001100011000111110 11111100001000011110100001000011111 11111100001000011110100001000010000 01110100011000010000100111000101110 10001100011000111111100011000110001 01110001000010000100001000010001110 00111000010000100001000010100100110 10001100101010011000101001001010001 10000100001000010000100001000011111 10001110111010110101100011000110001 10001100011100110101100111000110001 01110100011000110001100011000101110 11110100011000111110100001000010000 01110100011000110001100010111000001 11110100011000111110101001001010001 01110100011000001110000011000101110 11111001000010000100001000010000100 10001100011000110001100011000101110 10001100011000110001100010101000100 10001100011000110001101011010101010 10001100010101000100010101000110001 10001100011000101111000011100101110 11111000010001000100010001000011111
▲ 图1.2.7 字体5
00100010101000110001111111000110001 11110010010100101110010010100111110 01110100011000010000100001000101110 11110010010100101001010010100111110 11111100001000011110100001000011111 11111100001000011110100001000010000 01110100011000010011100011000101111 10001100011000111111100011000110001 01110001000010000100001000010001110 00111000100001000010000101001001100 10001100101010011000101001001010001 10000100001000010000100001000011111 10001110111010110101100011000110001 10001100011100110101100111000110001 01110100011000110001100011000101110 11110100011000111110100001000010000 01110100011000110001101011001001101 11110100011000111110101001001010001 01110100011000001110000011000101110 11111001000010000100001000010000100 10001100011000110001100011000101110 10001100011000110001100010101000100 10001100011000110101101011010101010 10001100010101000100010101000110001 10001100011000101010001000010000100 11111000010001000100010001000011111
▲ 图1.2.8 字体6
01110100011000111111100011000110001 11110100011000111110100011000111110 01110100011000010000100001000101110 11110100011000110001100011000111110 11111100001000011110100001000011111 11111100001000011110100001000010000 01110100011000010000100111000101110 10001100011000111111100011000110001 01110001000010000100001000010001110 01111000010000100001000011000101110 10001100101010011000101001001010001 10000100001000010000100001000011111 10001110111010110001100011000110001 10001100011100110101100111000110001 01110100011000110001100011000101110 11110100011000111110100001000010000 01110100011000110001101011001101111 11110100011000111110101001001010001 01110100011000001110000011000101110 11111001000010000100001000010000100 10001100011000110001100011000101110 10001100011000110001100010101000100 10001100011000110001101011010101010 10001100011000101110100011000110001 10001100011000101110001000010000100 11111000010001000100010001000011111
在上面转换的六种字体中,实际上有的字符在所字体中编码都相似,比如C,V,K。也有的字母相差很大,比如A,B,Y等。
▲ 图2.1 六种字体
为了准备人工神经网络中对于SOFM数据集合,在网络上搜集了六种5×7的点阵字符图片。通过程序将其转换成 0 - 1 编码,用于课程的作业中。
■ 相关文献链接:
● 相关图表链接:
#!/usr/local/bin/python # -*- coding: gbk -*- #============================================================ # ASCIIDOT.PY -- by Dr. ZhuoQing 2021-10-31 # # Note: #============================================================ from headm import * from PIL import Image #boxid = [3, 10, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63] #boxid = [65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91] #boxid = [6, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 103, 104, 105, 106, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119] #boxid = [120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 147, 148] boxid = [149, 150, 151, 152, 153, 154, 155, 156, 157, 173, 172, 171, 170, 174, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 175] picfile = tspgetdopfile(boxid[0]) picrange = tspgetrange(boxid[0]) #printf(picrange) printf(picfile) #------------------------------------------------------------ IMAGE_ROW = 7 IMAGE_COL = 5 PIXEL_THRESHOLD = 230 def image2Density(size, imagePixels): global boxid imageSize = size imageWidth = imageSize[0] imageHeight = imageSize[1] picwidth = picrange[2] - picrange[0] picheight = picrange[3] - picrange[1] widthRatio = imageWidth / picwidth heightRatio = imageHeight / picheight asciidim = [] for box in boxid[1:]: boxrange = tspgetrange(box) boxpos = [boxrange[0] - picrange[0], boxrange[1] - picrange[1], boxrange[2] - picrange[0], boxrange[3] - picrange[1]] boxheight = boxrange[3] - boxrange[1] boxwidth = boxrange[2] - boxrange[0] asciistr = '' for i in range(IMAGE_ROW): startRow = int((i * boxheight / IMAGE_ROW + boxpos[1]) * heightRatio) endRow = int(((i+1) * boxheight / IMAGE_ROW + boxpos[1]) * heightRatio) col = [] for i in range(IMAGE_COL): startCol = int((i * boxwidth / IMAGE_COL + boxpos[0]) * widthRatio) endCol = int(((i+1) * boxwidth / IMAGE_COL + boxpos[0]) * widthRatio) pixelNum = (endRow - startRow - 1) * (endCol - startCol - 1) pixelSigma = 0 for ii in range(startRow, endRow): for jj in range(startCol, endCol): pixelSigma += sum(imagePixels[ii, jj]) pixelSigma = int(pixelSigma / (pixelNum)) #printf(pixelSigma) if pixelSigma > PIXEL_THRESHOLD: col.append('0') else: col.append('1') str01 = ''.join(col) printf(str01.replace('0','.').replace('1','#')) asciistr = asciistr + str01 printf(' ') asciidim.append(asciistr) return asciidim #------------------------------------------------------------ img = Image.open(picfile) r,g,b = img.split() img = Image.merge("RGB", (r,g,b)).getdata() #plt.imshow(img) #plt.show() size = img.size print(size) img = array(img).sum(axis=1)/3 imgdata = img.reshape(size[1], size[0]) #imgaverage = imgdata.sum(axis=0) #printf(shape(imgaverage)) #plt.plot(imgaverage) #plt.xlabel("x") #plt.ylabel("y") #plt.grid(True) #plt.tight_layout() #plt.show() printf(shape(imgdata)) result = image2Density(size, imgdata) for s in result: printf(s) #------------------------------------------------------------ # END OF FILE : ASCIIDOT.PY #============================================================