Windows 是处理无限流的核心。Windows 将流分成有限大小的“桶”,我们可以在这些桶上应用计算。本文档重点介绍如何在 Flink SQL 中执行窗口化,以及程序员如何从其提供的功能中获得最大收益。
Apache Flink 提供了几个窗口表值函数 (TVF) 来将表的元素划分为窗口,包括:
请注意,每个元素在逻辑上可以属于多个窗口,具体取决于您使用的窗口表值函数。例如,HOP 窗口创建重叠窗口,其中单个元素可以分配给多个窗口。
Windowing TVF 是 Flink 定义的多态表函数(简称 PTF)。PTF 是 SQL 2016 标准的一部分,是一种特殊的表函数,但可以将表作为参数。PTF 是改变表格形状的强大功能。因为 PTF 在语义上类似于表,所以它们的调用发生在语句的FROM
子句中SELECT
。
开窗 TVF 是对传统的Grouped Window Functions的替代。窗口化 TVF 更符合 SQL 标准并且更强大,可以支持复杂的基于窗口的计算,例如 Window TopN、Window Join。但是,分组窗口函数只能支持窗口聚合。
了解更多如何应用基于窗口 TVF 的进一步计算:
Apache Flink 提供了 3 个内置的窗口 TVF TUMBLE
:HOP
和CUMULATE
. 窗口化 TVF 的返回值是一个新的关系,包括原始关系的所有列以及额外的 3 列,名为“window_start”、“window_end”、“window_time”,以指示分配的窗口。“window_time”字段是开窗TVF后窗口的时间属性,可用于后续的基于时间的操作,例如另一个开窗TVF,或interval joins,over aggregations。的值window_time
总是等于window_end - 1ms
。
该TUMBLE
函数将每个元素分配给指定窗口大小的窗口。翻滚窗口具有固定大小并且不重叠。例如,假设您指定一个大小为 5 分钟的滚动窗口。在这种情况下,Flink 将评估当前窗口,并每五分钟启动一个新窗口,如下图所示。
该函数根据时间属性TUMBLE
列为关系的每一行分配一个窗口。的返回值是一个新的关系,包括原始关系的所有列以及额外的 3 列,名为“window_start”、“window_end”、“window_time”,以指示分配的窗口。原始时间属性“timecol”将是窗口 TVF 之后的常规时间戳列。TUMBLE
TUMBLE
函数接受三个必需参数,一个可选参数:
<span style="color:#000000"><span style="background-color:#ffffff"><code class="language-sql">TUMBLE(<span style="color:#000000"><strong>TABLE</strong></span> <span style="color:#000000"><strong>data</strong></span>, <span style="color:#000000"><strong>DESCRIPTOR</strong></span>(timecol), <span style="color:#000000"><strong>size</strong></span> [, <span style="color:#000000"><strong>offset</strong></span> ]) </code></span></span>
data
: 是一个表参数,可以是与时间属性列的任何关系。timecol
: 是一个列描述符,指示数据的哪些时间属性列应映射到翻转窗口。size
: 是指定滚动窗口宽度的持续时间。offset
: 是一个可选参数,用于指定窗口起始位置的偏移量。这是对表的调用示例Bid
:
<span style="color:#000000"><span style="background-color:#ffffff"><code class="language-sql"><span style="color:#999988"><em>-- tables must have time attribute, e.g. `bidtime` in this table </em></span>Flink <span style="color:#000000"><strong>SQL</strong></span><span style="color:#000000"><strong>></strong></span> <span style="color:#000000"><strong>desc</strong></span> Bid; <span style="color:#000000"><strong>+</strong></span><span style="color:#999988"><em>-------------+------------------------+------+-----+--------+---------------------------------+ </em></span><span style="color:#000000"><strong>|</strong></span> name <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>type</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>null</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>key</strong></span> <span style="color:#000000"><strong>|</strong></span> extras <span style="color:#000000"><strong>|</strong></span> watermark <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>+</strong></span><span style="color:#999988"><em>-------------+------------------------+------+-----+--------+---------------------------------+ </em></span><span style="color:#000000"><strong>|</strong></span> bidtime <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>TIMESTAMP</strong></span>(<span style="color:#009999">3</span>) <span style="color:#000000"><strong>*</strong></span>ROWTIME<span style="color:#000000"><strong>*</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>true</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>`</strong></span>bidtime<span style="color:#000000"><strong>`</strong></span> <span style="color:#000000"><strong>-</strong></span> <span style="color:#0086b3">INTERVAL</span> <span style="color:#dd1144">'1'</span> <span style="color:#000000"><strong>SECOND</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> price <span style="color:#000000"><strong>|</strong></span> <span style="color:#0086b3">DECIMAL</span>(<span style="color:#009999">10</span>, <span style="color:#009999">2</span>) <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>true</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> item <span style="color:#000000"><strong>|</strong></span> STRING <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>true</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>+</strong></span><span style="color:#999988"><em>-------------+------------------------+------+-----+--------+---------------------------------+ </em></span> Flink <span style="color:#000000"><strong>SQL</strong></span><span style="color:#000000"><strong>></strong></span> <span style="color:#000000"><strong>SELECT</strong></span> <span style="color:#000000"><strong>*</strong></span> <span style="color:#000000"><strong>FROM</strong></span> Bid; <span style="color:#000000"><strong>+</strong></span><span style="color:#999988"><em>------------------+-------+------+ </em></span><span style="color:#000000"><strong>|</strong></span> bidtime <span style="color:#000000"><strong>|</strong></span> price <span style="color:#000000"><strong>|</strong></span> item <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>+</strong></span><span style="color:#999988"><em>------------------+-------+------+ </em></span><span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">05</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">4</span>.<span style="color:#009999">00</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>C</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">07</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2</span>.<span style="color:#009999">00</span> <span style="color:#000000"><strong>|</strong></span> A <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">09</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">5</span>.<span style="color:#009999">00</span> <span style="color:#000000"><strong>|</strong></span> D <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">11</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">3</span>.<span style="color:#009999">00</span> <span style="color:#000000"><strong>|</strong></span> B <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">13</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">1</span>.<span style="color:#009999">00</span> <span style="color:#000000"><strong>|</strong></span> E <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">17</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">6</span>.<span style="color:#009999">00</span> <span style="color:#000000"><strong>|</strong></span> F <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>+</strong></span><span style="color:#999988"><em>------------------+-------+------+ </em></span> <span style="color:#999988"><em>-- NOTE: Currently Flink doesn't support evaluating individual window table-valued function, </em></span><span style="color:#999988"><em>-- window table-valued function should be used with aggregate operation, </em></span><span style="color:#999988"><em>-- this example is just used for explaining the syntax and the data produced by table-valued function. </em></span>Flink <span style="color:#000000"><strong>SQL</strong></span><span style="color:#000000"><strong>></strong></span> <span style="color:#000000"><strong>SELECT</strong></span> <span style="color:#000000"><strong>*</strong></span> <span style="color:#000000"><strong>FROM</strong></span> <span style="color:#000000"><strong>TABLE</strong></span>( TUMBLE(<span style="color:#000000"><strong>TABLE</strong></span> Bid, <span style="color:#000000"><strong>DESCRIPTOR</strong></span>(bidtime), <span style="color:#0086b3">INTERVAL</span> <span style="color:#dd1144">'10'</span> MINUTES)); <span style="color:#999988"><em>-- or with the named params </em></span><span style="color:#999988"><em>-- note: the DATA param must be the first </em></span>Flink <span style="color:#000000"><strong>SQL</strong></span><span style="color:#000000"><strong>></strong></span> <span style="color:#000000"><strong>SELECT</strong></span> <span style="color:#000000"><strong>*</strong></span> <span style="color:#000000"><strong>FROM</strong></span> <span style="color:#000000"><strong>TABLE</strong></span>( TUMBLE( <span style="color:#000000"><strong>DATA</strong></span> <span style="color:#000000"><strong>=></strong></span> <span style="color:#000000"><strong>TABLE</strong></span> Bid, TIMECOL <span style="color:#000000"><strong>=></strong></span> <span style="color:#000000"><strong>DESCRIPTOR</strong></span>(bidtime), <span style="color:#000000"><strong>SIZE</strong></span> <span style="color:#000000"><strong>=></strong></span> <span style="color:#0086b3">INTERVAL</span> <span style="color:#dd1144">'10'</span> MINUTES)); <span style="color:#000000"><strong>+</strong></span><span style="color:#999988"><em>------------------+-------+------+------------------+------------------+-------------------------+ </em></span><span style="color:#000000"><strong>|</strong></span> bidtime <span style="color:#000000"><strong>|</strong></span> price <span style="color:#000000"><strong>|</strong></span> item <span style="color:#000000"><strong>|</strong></span> window_start <span style="color:#000000"><strong>|</strong></span> window_end <span style="color:#000000"><strong>|</strong></span> window_time <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>+</strong></span><span style="color:#999988"><em>------------------+-------+------+------------------+------------------+-------------------------+ </em></span><span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">05</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">4</span>.<span style="color:#009999">00</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>C</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">00</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">10</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">09</span>:<span style="color:#009999">59</span>.<span style="color:#009999">999</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">07</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2</span>.<span style="color:#009999">00</span> <span style="color:#000000"><strong>|</strong></span> A <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">00</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">10</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">09</span>:<span style="color:#009999">59</span>.<span style="color:#009999">999</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">09</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">5</span>.<span style="color:#009999">00</span> <span style="color:#000000"><strong>|</strong></span> D <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">00</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">10</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">09</span>:<span style="color:#009999">59</span>.<span style="color:#009999">999</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">11</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">3</span>.<span style="color:#009999">00</span> <span style="color:#000000"><strong>|</strong></span> B <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">10</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">20</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">19</span>:<span style="color:#009999">59</span>.<span style="color:#009999">999</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">13</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">1</span>.<span style="color:#009999">00</span> <span style="color:#000000"><strong>|</strong></span> E <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">10</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">20</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">19</span>:<span style="color:#009999">59</span>.<span style="color:#009999">999</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">17</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">6</span>.<span style="color:#009999">00</span> <span style="color:#000000"><strong>|</strong></span> F <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">10</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">20</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">19</span>:<span style="color:#009999">59</span>.<span style="color:#009999">999</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>+</strong></span><span style="color:#999988"><em>------------------+-------+------+------------------+------------------+-------------------------+ </em></span> <span style="color:#999988"><em>-- apply aggregation on the tumbling windowed table </em></span>Flink <span style="color:#000000"><strong>SQL</strong></span><span style="color:#000000"><strong>></strong></span> <span style="color:#000000"><strong>SELECT</strong></span> window_start, window_end, <span style="color:#000000"><strong>SUM</strong></span>(price) <span style="color:#000000"><strong>FROM</strong></span> <span style="color:#000000"><strong>TABLE</strong></span>( TUMBLE(<span style="color:#000000"><strong>TABLE</strong></span> Bid, <span style="color:#000000"><strong>DESCRIPTOR</strong></span>(bidtime), <span style="color:#0086b3">INTERVAL</span> <span style="color:#dd1144">'10'</span> MINUTES)) <span style="color:#000000"><strong>GROUP</strong></span> <span style="color:#000000"><strong>BY</strong></span> window_start, window_end; <span style="color:#000000"><strong>+</strong></span><span style="color:#999988"><em>------------------+------------------+-------+ </em></span><span style="color:#000000"><strong>|</strong></span> window_start <span style="color:#000000"><strong>|</strong></span> window_end <span style="color:#000000"><strong>|</strong></span> price <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>+</strong></span><span style="color:#999988"><em>------------------+------------------+-------+ </em></span><span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">00</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">10</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">11</span>.<span style="color:#009999">00</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">10</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">2020</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">04</span><span style="color:#000000"><strong>-</strong></span><span style="color:#009999">15</span> <span style="color:#009999">08</span>:<span style="color:#009999">20</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#009999">10</span>.<span style="color:#009999">00</span> <span style="color:#000000"><strong>|</strong></span> <span style="color:#000000"><strong>+</strong></span><span style="color:#999988"><em>------------------+------------------+-------+ </em></span></code></span></span>
注意:为了更好地理解窗口化的行为,我们简化了时间戳值的显示以不显示尾随零,例如如果类型为 ,2020-04-15 08:05
则应显示为2020-04-15 08:05:00.000
在 Flink SQL 客户端中TIMESTAMP(3)
。
该HOP
函数将元素分配给固定长度的窗口。与TUMBLE
窗口函数一样,窗口的大小由窗口大小参数配置。一个附加的窗口滑动参数控制一个跳跃窗口的启动频率。因此,如果幻灯片小于窗口大小,则跳跃窗口可能会重叠。在这种情况下,元素被分配给多个窗口。跳窗也称为“滑动窗”。
例如,您可以有大小为 10 分钟的窗口滑动 5 分钟。这样,您将每 5 分钟获得一个包含过去 10 分钟内到达的事件的窗口,如下图所示。
该HOP
函数分配覆盖大小间隔内的行的窗口,并根据时间属性列移动每张幻灯片。的返回值HOP
是一个新的关系,包括原始关系的所有列以及额外的 3 列,名为“window_start”、“window_end”、“window_time”,以指示分配的窗口。原始时间属性“timecol”将是窗口 TVF 后的常规时间戳列。
该HOP
接受四个必需参数,一个可选参数:
HOP(TABLE data, DESCRIPTOR(timecol), slide, size [, offset ])
data
: 是一个表参数,可以是与时间属性列的任何关系。timecol
:是一个列描述符,指示数据的哪些时间属性列应映射到跳跃窗口。slide
: 是一个持续时间,指定顺序跳跃窗口开始之间的持续时间size
: 是指定跳跃窗口宽度的持续时间。offset
: 是一个可选参数,用于指定窗口起始位置的偏移量。这是对表的调用示例Bid
:
-- NOTE: Currently Flink doesn't support evaluating individual window table-valued function, -- window table-valued function should be used with aggregate operation, -- this example is just used for explaining the syntax and the data produced by table-valued function. > SELECT * FROM TABLE( HOP(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '5' MINUTES, INTERVAL '10' MINUTES)); -- or with the named params -- note: the DATA param must be the first > SELECT * FROM TABLE( HOP( DATA => TABLE Bid, TIMECOL => DESCRIPTOR(bidtime), SLIDE => INTERVAL '5' MINUTES, SIZE => INTERVAL '10' MINUTES)); +------------------+-------+------+------------------+------------------+-------------------------+ | bidtime | price | item | window_start | window_end | window_time | +------------------+-------+------+------------------+------------------+-------------------------+ | 2020-04-15 08:05 | 4.00 | C | 2020-04-15 08:00 | 2020-04-15 08:10 | 2020-04-15 08:09:59.999 | | 2020-04-15 08:05 | 4.00 | C | 2020-04-15 08:05 | 2020-04-15 08:15 | 2020-04-15 08:14:59.999 | | 2020-04-15 08:07 | 2.00 | A | 2020-04-15 08:00 | 2020-04-15 08:10 | 2020-04-15 08:09:59.999 | | 2020-04-15 08:07 | 2.00 | A | 2020-04-15 08:05 | 2020-04-15 08:15 | 2020-04-15 08:14:59.999 | | 2020-04-15 08:09 | 5.00 | D | 2020-04-15 08:00 | 2020-04-15 08:10 | 2020-04-15 08:09:59.999 | | 2020-04-15 08:09 | 5.00 | D | 2020-04-15 08:05 | 2020-04-15 08:15 | 2020-04-15 08:14:59.999 | | 2020-04-15 08:11 | 3.00 | B | 2020-04-15 08:05 | 2020-04-15 08:15 | 2020-04-15 08:14:59.999 | | 2020-04-15 08:11 | 3.00 | B | 2020-04-15 08:10 | 2020-04-15 08:20 | 2020-04-15 08:19:59.999 | | 2020-04-15 08:13 | 1.00 | E | 2020-04-15 08:05 | 2020-04-15 08:15 | 2020-04-15 08:14:59.999 | | 2020-04-15 08:13 | 1.00 | E | 2020-04-15 08:10 | 2020-04-15 08:20 | 2020-04-15 08:19:59.999 | | 2020-04-15 08:17 | 6.00 | F | 2020-04-15 08:10 | 2020-04-15 08:20 | 2020-04-15 08:19:59.999 | | 2020-04-15 08:17 | 6.00 | F | 2020-04-15 08:15 | 2020-04-15 08:25 | 2020-04-15 08:24:59.999 | +------------------+-------+------+------------------+------------------+-------------------------+ -- apply aggregation on the hopping windowed table > SELECT window_start, window_end, SUM(price) FROM TABLE( HOP(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '5' MINUTES, INTERVAL '10' MINUTES)) GROUP BY window_start, window_end; +------------------+------------------+-------+ | window_start | window_end | price | +------------------+------------------+-------+ | 2020-04-15 08:00 | 2020-04-15 08:10 | 11.00 | | 2020-04-15 08:05 | 2020-04-15 08:15 | 15.00 | | 2020-04-15 08:10 | 2020-04-15 08:20 | 10.00 | | 2020-04-15 08:15 | 2020-04-15 08:25 | 6.00 | +------------------+------------------+-------+
累积窗口在某些场景中非常有用,例如在固定窗口间隔内提前触发的翻滚窗口。例如,每日仪表盘从 00:00 到每分钟绘制累积 UV,10:00 的 UV 表示从 00:00 到 10:00 的 UV 总数。这可以通过 CUMULATE 窗口轻松有效地实现。
该CUMULATE
函数将元素分配给在初始步长间隔内覆盖行的窗口,并在每一步扩展一个步长(保持窗口开始固定),直到最大窗口大小。您可以将CUMULATE
功能视为TUMBLE
首先应用最大窗口大小的窗口,并将每个翻转窗口拆分为具有相同窗口开始和窗口结束步长差异的几个窗口。所以累积窗口确实重叠并且没有固定的大小。
例如,您可以有一个 1 小时步长和 1 天最大尺寸的累积窗口,您将获得窗口:[00:00, 01:00)
, [00:00, 02:00)
, [00:00, 03:00)
, ...,[00:00, 24:00)
每一天。
这些CUMULATE
函数根据时间属性列分配窗口。的返回值CUMULATE
是一个新的关系,包括原始关系的所有列以及额外的 3 列,名为“window_start”、“window_end”、“window_time”,以指示分配的窗口。原始时间属性“timecol”将是窗口 TVF 之后的常规时间戳列。
CUMULATE
接受四个必需参数,一个可选参数:
CUMULATE(TABLE data, DESCRIPTOR(timecol), step, size)
data
: 是一个表参数,可以是与时间属性列的任何关系。timecol
: 是一个列描述符,指示数据的哪些时间属性列应映射到翻转窗口。step
: 是指定连续累积窗口结束之间增加的窗口大小的持续时间。size
: 是指定累积窗口的最大宽度的持续时间。size
必须是 的整数倍step
。offset
: 是一个可选参数,用于指定窗口起始位置的偏移量。以下是对 Bid 表的调用示例:
-- NOTE: Currently Flink doesn't support evaluating individual window table-valued function, -- window table-valued function should be used with aggregate operation, -- this example is just used for explaining the syntax and the data produced by table-valued function. > SELECT * FROM TABLE( CUMULATE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '2' MINUTES, INTERVAL '10' MINUTES)); -- or with the named params -- note: the DATA param must be the first > SELECT * FROM TABLE( CUMULATE( DATA => TABLE Bid, TIMECOL => DESCRIPTOR(bidtime), STEP => INTERVAL '2' MINUTES, SIZE => INTERVAL '10' MINUTES)); +------------------+-------+------+------------------+------------------+-------------------------+ | bidtime | price | item | window_start | window_end | window_time | +------------------+-------+------+------------------+------------------+-------------------------+ | 2020-04-15 08:05 | 4.00 | C | 2020-04-15 08:00 | 2020-04-15 08:06 | 2020-04-15 08:05:59.999 | | 2020-04-15 08:05 | 4.00 | C | 2020-04-15 08:00 | 2020-04-15 08:08 | 2020-04-15 08:07:59.999 | | 2020-04-15 08:05 | 4.00 | C | 2020-04-15 08:00 | 2020-04-15 08:10 | 2020-04-15 08:09:59.999 | | 2020-04-15 08:07 | 2.00 | A | 2020-04-15 08:00 | 2020-04-15 08:08 | 2020-04-15 08:07:59.999 | | 2020-04-15 08:07 | 2.00 | A | 2020-04-15 08:00 | 2020-04-15 08:10 | 2020-04-15 08:09:59.999 | | 2020-04-15 08:09 | 5.00 | D | 2020-04-15 08:00 | 2020-04-15 08:10 | 2020-04-15 08:09:59.999 | | 2020-04-15 08:11 | 3.00 | B | 2020-04-15 08:10 | 2020-04-15 08:12 | 2020-04-15 08:11:59.999 | | 2020-04-15 08:11 | 3.00 | B | 2020-04-15 08:10 | 2020-04-15 08:14 | 2020-04-15 08:13:59.999 | | 2020-04-15 08:11 | 3.00 | B | 2020-04-15 08:10 | 2020-04-15 08:16 | 2020-04-15 08:15:59.999 | | 2020-04-15 08:11 | 3.00 | B | 2020-04-15 08:10 | 2020-04-15 08:18 | 2020-04-15 08:17:59.999 | | 2020-04-15 08:11 | 3.00 | B | 2020-04-15 08:10 | 2020-04-15 08:20 | 2020-04-15 08:19:59.999 | | 2020-04-15 08:13 | 1.00 | E | 2020-04-15 08:10 | 2020-04-15 08:14 | 2020-04-15 08:13:59.999 | | 2020-04-15 08:13 | 1.00 | E | 2020-04-15 08:10 | 2020-04-15 08:16 | 2020-04-15 08:15:59.999 | | 2020-04-15 08:13 | 1.00 | E | 2020-04-15 08:10 | 2020-04-15 08:18 | 2020-04-15 08:17:59.999 | | 2020-04-15 08:13 | 1.00 | E | 2020-04-15 08:10 | 2020-04-15 08:20 | 2020-04-15 08:19:59.999 | | 2020-04-15 08:17 | 6.00 | F | 2020-04-15 08:10 | 2020-04-15 08:18 | 2020-04-15 08:17:59.999 | | 2020-04-15 08:17 | 6.00 | F | 2020-04-15 08:10 | 2020-04-15 08:20 | 2020-04-15 08:19:59.999 | +------------------+-------+------+------------------+------------------+-------------------------+ -- apply aggregation on the cumulating windowed table > SELECT window_start, window_end, SUM(price) FROM TABLE( CUMULATE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '2' MINUTES, INTERVAL '10' MINUTES)) GROUP BY window_start, window_end; +------------------+------------------+-------+ | window_start | window_end | price | +------------------+------------------+-------+ | 2020-04-15 08:00 | 2020-04-15 08:06 | 4.00 | | 2020-04-15 08:00 | 2020-04-15 08:08 | 6.00 | | 2020-04-15 08:00 | 2020-04-15 08:10 | 11.00 | | 2020-04-15 08:10 | 2020-04-15 08:12 | 3.00 | | 2020-04-15 08:10 | 2020-04-15 08:14 | 4.00 | | 2020-04-15 08:10 | 2020-04-15 08:16 | 4.00 | | 2020-04-15 08:10 | 2020-04-15 08:18 | 10.00 | | 2020-04-15 08:10 | 2020-04-15 08:20 | 10.00 | +------------------+------------------+-------+
Offset
是一个可选参数,可用于更改窗口分配。它可以是正持续时间和负持续时间。窗口偏移的默认值为 0。如果设置不同的偏移值,相同的记录可能分配给不同的窗口。
例如,对于2021-06-30 00:00:04
大小为 10 分钟的 Tumble 窗口的时间戳记录,将分配给哪个窗口?
offset
值为-16 MINUTE
,则记录分配给窗口 [ 2021-06-29 23:54:00
, 2021-06-30 00:04:00
)。offset
值为-6 MINUTE
,则记录分配给窗口 [ 2021-06-29 23:54:00
, 2021-06-30 00:04:00
)。offset
是-4 MINUTE
,则记录分配给窗口 [ 2021-06-29 23:56:00
, 2021-06-30 00:06:00
)。offset
是0
,则记录分配给窗口 [ 2021-06-30 00:00:00
, 2021-06-30 00:10:00
)。offset
是4 MINUTE
,则记录分配给窗口 [ 2021-06-29 23:54:00
, 2021-06-30 00:04:00
)。offset
是6 MINUTE
,则记录分配给窗口 [ 2021-06-29 23:56:00
, 2021-06-30 00:06:00
)。offset
是16 MINUTE
,则记录分配给窗口 [ 2021-06-29 23:56:00
, 2021-06-30 00:06:00
)。我们可以发现,一些窗口偏移参数可能对窗口的分配有相同的影响。在上述情况下-16 MINUTE
,-6 MINUTE
和4 MINUTE
对大小为 10 分钟的 Tumble 窗口具有相同的效果。注意:窗口偏移的影响只是为了更新窗口分配,它对水印没有影响。
我们通过一个示例来描述如何在以下 SQL 中使用 Tumble 窗口中的偏移量。
-- NOTE: Currently Flink doesn't support evaluating individual window table-valued function, -- window table-valued function should be used with aggregate operation, -- this example is just used for explaining the syntax and the data produced by table-valued function. Flink SQL> SELECT * FROM TABLE( TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10' MINUTES, INTERVAL '1' MINUTES)); -- or with the named params -- note: the DATA param must be the first Flink SQL> SELECT * FROM TABLE( TUMBLE( DATA => TABLE Bid, TIMECOL => DESCRIPTOR(bidtime), SIZE => INTERVAL '10' MINUTES, OFFSET => INTERVAL '1' MINUTES)); +------------------+-------+------+------------------+------------------+-------------------------+ | bidtime | price | item | window_start | window_end | window_time | +------------------+-------+------+------------------+------------------+-------------------------+ | 2020-04-15 08:05 | 4.00 | C | 2020-04-15 08:01 | 2020-04-15 08:11 | 2020-04-15 08:10:59.999 | | 2020-04-15 08:07 | 2.00 | A | 2020-04-15 08:01 | 2020-04-15 08:11 | 2020-04-15 08:10:59.999 | | 2020-04-15 08:09 | 5.00 | D | 2020-04-15 08:01 | 2020-04-15 08:11 | 2020-04-15 08:10:59.999 | | 2020-04-15 08:11 | 3.00 | B | 2020-04-15 08:11 | 2020-04-15 08:21 | 2020-04-15 08:20:59.999 | | 2020-04-15 08:13 | 1.00 | E | 2020-04-15 08:11 | 2020-04-15 08:21 | 2020-04-15 08:20:59.999 | | 2020-04-15 08:17 | 6.00 | F | 2020-04-15 08:11 | 2020-04-15 08:21 | 2020-04-15 08:20:59.999 | +------------------+-------+------+------------------+------------------+-------------------------+ -- apply aggregation on the tumbling windowed table Flink SQL> SELECT window_start, window_end, SUM(price) FROM TABLE( TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10' MINUTES, INTERVAL '1' MINUTES)) GROUP BY window_start, window_end; +------------------+------------------+-------+ | window_start | window_end | price | +------------------+------------------+-------+ | 2020-04-15 08:01 | 2020-04-15 08:11 | 11.00 | | 2020-04-15 08:11 | 2020-04-15 08:21 | 10.00 | +------------------+------------------+-------+
注意:为了更好地理解窗口化的行为,我们简化了时间戳值的显示以不显示尾随零,例如如果类型为 ,2020-04-15 08:05
则应显示为2020-04-15 08:05:00.000
在 Flink SQL 客户端中TIMESTAMP(3)
。
From Flink Website Url:窗口函数 | Apache Flink
-------------------------------------------------------------------禁止转载--------------------------------------------------
待修改。。。。