关于kafka的source部分请参考 上一篇: https://www.cnblogs.com/liufei1983/p/15801848.html
1: 首先下载两个和jdbc和mysql相关的jar包,注意版本,我的flink是1.13.1, 所以flink-connect-jdck_2.11也用1.13.1的版本,否则会报错误。
2: 在MYSQL里建立一个表:
-- `sql-demo`.cdn_access_statistic definition (这个在MYSQL里执行) CREATE TABLE `cdn_access_statistic` ( `province` varchar(100) DEFAULT NULL, `access_count` bigint DEFAULT NULL, `total_download` bigint DEFAULT NULL, `download_speed` bigint DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
在zeppelin里创建SINK job: 因为zeppeline是在docker运行,所以MYSQL的url的地址不能写localhost, 要写宿主机的IP
%flink.ssql DROP table if exists cdn_access_statistic; -- Please create this mysql table first in your mysql instance. Flink won't create mysql table for you. CREATE TABLE cdn_access_statistic ( province VARCHAR, access_count BIGINT, total_download BIGINT, download_speed DOUBLE ) WITH ( 'connector.type' = 'jdbc', 'connector.url' = 'jdbc:mysql://192.168.3.XXX:3306/sql-demo', 'connector.table' = 'cdn_access_statistic', 'connector.username' = 'sql-demo', 'connector.password' = 'demo-sql', 'connector.write.flush.interval' = '1s' )
3: 确定 kafak的source table和 mysql的sink table都创建了。
4: 从kafka消费数据,存储到mysql. 可以看到mysql 数据库里数据在变化
%flink.ssql insert into cdn_access_statistic select client_ip, request_time,request_time,request_time from cdn_access_log