python kafka发送中文的编码问题

本文主要是介绍python kafka发送中文的编码问题，对大家解决编程问题具有一定的参考价值，需要的程序猿们随着小编来一起学习吧！

项目中需要构造带有中文字符非json的测试数据，格式如下：

{'userid': 0, 'ts': '2022-08-03 16:33:38.487973', 'user_name': '中国人'}

发过去之后发现消费出来的都是unicode的编码，且指定了utf-8也没用，一开始以为是kafka producer的value_serializer序列化器用的不对，后面发现其实是代码里json.dumps没用好的原因

# -*- coding: utf-8 -*-
import time
from kafka import KafkaConsumer, KafkaProducer
import json

from kafka.errors import KafkaError
import datetime


producer = KafkaProducer(sasl_mechanism='PLAIN',
                        security_protocol='SASL_PLAINTEXT',
                        sasl_plain_username='xxxxx',
                        sasl_plain_password='xxxxxxxx',
                        bootstrap_servers=['xxxxxxxxxxx'],
                        #这里的dumps可以指定ensure_ascii=False
                        value_serializer=lambda m: json.dumps(m,ensure_ascii=False).encode(),
                        api_version="2.0.0")

try:
    # produce asynchronously
    for i in range(100):
        now_time = str(datetime.datetime.now())
        send_json={
            "userid": i,
            "ts":now_time,
            "user_name":"中国人"
        }
        print(send_json)
        future = producer.send('xxxxxxxxxxx', send_json)

        try:
            record_metadata = future.get(timeout=2)
        except KafkaError:
            # Decide what to do if produce request failed...
            print("send error!")
            pass
        time.sleep(1)

    print(record_metadata.partition)
    print(record_metadata.offset)

finally:
    producer.close()

这样就可以把原来的{"userid": 1, "ts": "2022-08-03 16:12:26.595478", "user_name": "\u4e2d\u56fd\u4eba"}改成{"userid": 1, "ts": "2022-08-03 16:33:39.576068", "user_name": "中国人"}

另外1个新手容易犯的错误

1、pyhton中通过str将json强行转换成str类型时，key和value的引号是单引号的，这样发送到kafka，对下游不是很友好，比如下游用java或者flinksql消费的时候可能会出问题，建议用标准序列化json.dumps来转

这篇关于python kafka发送中文的编码问题的文章就介绍到这儿，希望我们推荐的文章对大家有所帮助，也希望大家多多支持为之网！

Python教程

python kafka发送中文的编码问题

前端开发

后端开发

移动端开发

数据库

服务器运维

人工智能

区块链

游戏开发

网站运营

大数据/云计算

软件工程

软件/开发工具使用

资讯