前一篇文章中分析了事务日志的相关内容。在Zookeeper中,还有一个重要的日志就是快照日志。
快照日志本质上是Zookeeper全部节点信息的一个快照,从内存中保存在磁盘上。
快照日志默认存储在 %ZOOKEEPER_DIR%/data/文件夹下,笔者的目录下产生了如下一个快照日志文件
同样的,这也是一个二进制文件,无法直接查看。而Zookeeper也提供了一个查看类org.apache.zookeeper.server.SnapshotFormatter。通过在main()方法中指定快照日志路径即可,笔者在查看snapshot.123c2文件时产生以下内容:
ZNode Details (count=74685): ---- / cZxid = 0x00000000000000 ctime = Thu Jan 01 08:00:00 GMT+08:00 1970 mZxid = 0x00000000000000 mtime = Thu Jan 01 08:00:00 GMT+08:00 1970 pZxid = 0x000000000123c2 cversion = 74681 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x00000000000000 dataLength = 0 ---- /hello24507 cZxid = 0x00000000004c3b ctime = Tue Oct 05 17:15:30 GMT+08:00 2021 mZxid = 0x00000000004c3b mtime = Tue Oct 05 17:15:30 GMT+08:00 2021 pZxid = 0x00000000004c3b cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x00000000000000 dataLength = 10 ---- /hello24508 cZxid = 0x00000000004c3c ctime = Tue Oct 05 17:15:30 GMT+08:00 2021 mZxid = 0x00000000004c3c mtime = Tue Oct 05 17:15:30 GMT+08:00 2021 pZxid = 0x00000000004c3c cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x00000000000000 dataLength = 10 ---- ...
可以看到以上都是每个节点的基本信息(当然没有把节点value值展示出来)。
快照日志是在哪里生成的呢?在之前事务日志查看与分析中,我们有过分析,就是在SyncRequestProcessor 中生成的。代码如下
public class SyncRequestProcessor extends ZooKeeperCriticalThread implements RequestProcessor { // 默认为100000,后续会用到 private static int snapCount = ZooKeeperServer.getSnapCount(); // 后续会被设置 private static int randRoll; public void run() { try { int logCount = 0; // 设置randRoll值为一个不定值 setRandRoll(r.nextInt(snapCount/2)); while (true) { Request si = null; if (toFlush.isEmpty()) { si = queuedRequests.take(); } else { si = queuedRequests.poll(); if (si == null) { flush(toFlush); continue; } } if (si == requestOfDeath) { break; } if (si != null) { // track the number of records written to the log if (zks.getZKDatabase().append(si)) { logCount++; // 添加完事务日志后,判断总共添加的事务日志数是否大于snapCount / 2 + randRoll,snapCount默认为100000, // 也就是说至少执行50000+次事务操作才会生成一次快照日志 if (logCount > (snapCount / 2 + randRoll)) { setRandRoll(r.nextInt(snapCount/2)); // roll the log zks.getZKDatabase().rollLog(); // take a snapshot if (snapInProcess != null && snapInProcess.isAlive()) { LOG.warn("Too busy to snap, skipping"); } else { snapInProcess = new ZooKeeperThread("Snapshot Thread") { public void run() { try { // 启动一个子线程单独执行快照日志生成 zks.takeSnapshot(); } catch(Exception e) { LOG.warn("Unexpected exception", e); } } }; snapInProcess.start(); } logCount = 0; } } else if (toFlush.isEmpty()) { // optimization for read heavy workloads // iff this is a read, and there are no pending // flushes (writes), then just pass this to the next // processor if (nextProcessor != null) { nextProcessor.processRequest(si); if (nextProcessor instanceof Flushable) { ((Flushable)nextProcessor).flush(); } } continue; } toFlush.add(si); if (toFlush.size() > 1000) { flush(toFlush); } } } } catch (Throwable t) { handleException(this.getName(), t); running = false; } LOG.info("SyncRequestProcessor exited!"); } }
可以看到,快照日志生成的入口就是SyncRequestProcessor,单独启动线程来完成日志生成(我们可以自定义snapCount,笔者测试的时候就是重新设置该值,不然很难看到snapshot log的生成)。
public class ZooKeeperServer implements SessionExpirer, ServerStats.Provider { public void takeSnapshot(){ try { // 直接调用FileTxnSnapLog.save,具体见3.1 txnLogFactory.save(zkDb.getDataTree(), zkDb.getSessionWithTimeOuts()); } catch (IOException e) { LOG.error("Severe unrecoverable error, exiting", e); // This is a severe error that we cannot recover from, // so we need to exit System.exit(10); } } }
public class FileTxnSnapLog { public void save(DataTree dataTree, ConcurrentHashMap<Long, Integer> sessionsWithTimeouts) throws IOException { long lastZxid = dataTree.lastProcessedZxid; // 获取最新一次zxid,以此生成一个文件名 File snapshotFile = new File(snapDir, Util.makeSnapshotName(lastZxid)); LOG.info("Snapshotting: 0x{} to {}", Long.toHexString(lastZxid), snapshotFile); // 序列化dataTree内存信息 snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile); } }
public class FileSnap implements SnapShot { public synchronized void serialize(DataTree dt, Map<Long, Integer> sessions, File snapShot) throws IOException { if (!close) { OutputStream sessOS = new BufferedOutputStream(new FileOutputStream(snapShot)); CheckedOutputStream crcOut = new CheckedOutputStream(sessOS, new Adler32()); //CheckedOutputStream cout = new CheckedOutputStream() OutputArchive oa = BinaryOutputArchive.getArchive(crcOut); // 同样是先创建文件头 FileHeader header = new FileHeader(SNAP_MAGIC, VERSION, dbId); // 序列化DataTree,具体见3.2.1 serialize(dt,sessions,oa, header); // 写入checksum值 long val = crcOut.getChecksum().getValue(); oa.writeLong(val, "val"); oa.writeString("/", "path"); sessOS.flush(); crcOut.close(); sessOS.close(); } } }
3.2.1 SerializeUtils.serializeSnapshot()
public class SerializeUtils { public static void serializeSnapshot(DataTree dt,OutputArchive oa, Map<Long, Integer> sessions) throws IOException { HashMap<Long, Integer> sessSnap = new HashMap<Long, Integer>(sessions); // 先将sessionId --> timeout值写入 oa.writeInt(sessSnap.size(), "count"); for (Entry<Long, Integer> entry : sessSnap.entrySet()) { oa.writeLong(entry.getKey().longValue(), "id"); oa.writeInt(entry.getValue().intValue(), "timeout"); } // 调用DataTree的序列化方法 dt.serialize(oa, "tree"); } }
3.2.2 DataTree.serialize() 序列化DataTree内容
public class DataTree { public void serialize(OutputArchive oa, String tag) throws IOException { scount = 0; aclCache.serialize(oa); serializeNode(oa, new StringBuilder("")); // / marks end of stream // we need to check if clear had been called in between the snapshot. if (root != null) { oa.writeString("/", "path"); } } }
有关于DataTree序列化的具体细节就不再详述了,读者可以自行查看。主要就是将DataTree中的所有节点逐个序列化到文件中,包括节点的path value acl等相关信息。
流程并不算复杂,相对事务日志而言,不需要填充数字,而是在使用的时候一次性的将数据写入到磁盘上。
DataTree算是Zookeeper的一个核心知识点,在启动时会将文件中节点的信息反序列化到DataTree中,在内存中可以更快的响应客户端的请求,每次事务操作也都会对DataTree进行实时操作,下一篇文章中会详述DataTree的相关知识点。