sql执行语句流程解析
整个处理流程在exec_simple_query函数中完成,代码架构如下:
/* * exec_simple_query * * Execute a "simple Query" protocol message. */ static void exec_simple_query(const char *query_string) { ... //原始语法树获取 /* * Do basic parsing of the query or queries (this should be safe even if * we are in aborted transaction state!) */ parsetree_list = pg_parse_query(query_string); ... //循环处理sql语句 /* * Run through the raw parsetree(s) and process each one. */ foreach(parsetree_item, parsetree_list) { ... //对原始语法树进行分析和重写,生成查询语法树 querytree_list = pg_analyze_and_rewrite(parsetree, query_string, NULL, 0, NULL); //对查询语法树进行优化,生成执行计划 plantree_list = pg_plan_queries(querytree_list, CURSOR_OPT_PARALLEL_OK, NULL); ... //执行语句 /* * Run the portal to completion, and then drop it (and the receiver). */ (void) PortalRun(portal, FETCH_ALL, true, /* always top level */ true, receiver, receiver, completionTag); ... } ... }
分析重写后的查询树不是最优化的查询树,当碰到select子查询层次很深时,最低层的基表就和树根距离较远,这样就会增加查询时间。另外查询树中的各个节点信息是独立的,就有可能造成冗余查询,所以也需要做逻辑优化。所以,查询逻辑优化就是以数据库理论中的关系代数为理论基础,以查询树为载体,通过遍历查询树并在保证查询树中的语法单元的语义和最终结果不变的情况下对其进行优化;最终得到一个没有冗余的查询树。
------代码中在pg_plan_queries函数中实现
与逻辑优化对应的是物理优化,两者策略完全不同:
逻辑优化:类似于冗余和上提,下放优化策略
物理优化:基于代价的优化策略,在下文中介绍
语法优化处理基本步骤及相关代码:
List * pg_plan_queries(List *querytrees, int cursorOptions, ParamListInfo boundParams) { List *stmt_list = NIL; ListCell *query_list; foreach(query_list, querytrees) { Query *query = lfirst_node(Query, query_list); PlannedStmt *stmt; if (query->commandType == CMD_UTILITY) { /* Utility commands require no planning. */ stmt = makeNode(PlannedStmt); stmt->commandType = CMD_UTILITY; stmt->canSetTag = query->canSetTag; stmt->utilityStmt = query->utilityStmt; stmt->stmt_location = query->stmt_location; stmt->stmt_len = query->stmt_len; } else { stmt = pg_plan_query(query, cursorOptions, boundParams); } stmt_list = lappend(stmt_list, stmt); } return stmt_list; } PlannedStmt * pg_plan_query(Query *querytree, int cursorOptions, ParamListInfo boundParams) { PlannedStmt *plan; ... /* call the optimizer */ plan = planner(querytree, cursorOptions, boundParams); ... return plan; }
在planner函数中对非工具类语法进行处理,如果设置planner_hook则调用钩子函数,默认调用standard_planner函数处理。
standard_planner函数中递归处理查询树,查询时将结果存储在PlannerGlobal全局结果中。再调用create_plan函数根据PlannerInfo类型创建执行计划树。最后将PlannerGlobal和PlannerInfo中存储的基本信息转存到PlannedStmt中并返回。
PlannedStmt * planner(Query *parse, int cursorOptions, ParamListInfo boundParams) { PlannedStmt *result; if (planner_hook) result = (*planner_hook) (parse, cursorOptions, boundParams); else result = standard_planner(parse, cursorOptions, boundParams); return result; } PlannedStmt * standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams) { ... /* primary planning entry point (may recurse for subqueries) */ root = subquery_planner(glob, parse, NULL, false, tuple_fraction); /* Select best Path and turn it into a Plan */ final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL); best_path = get_cheapest_fractional_path(final_rel, tuple_fraction); top_plan = create_plan(root, best_path); ... /* build the PlannedStmt result */ result = makeNode(PlannedStmt); result->commandType = parse->commandType; ... result->stmt_len = parse->stmt_len; result->jitFlags = PGJIT_NONE; ... return result; }
在subquery_planner函数中生成,根据类型对查询语句进行分类处理。计划优化部分涉及子链接上提、函数处理、子查询上提等操作。因为计划树的生成和优化是按照类型分类处理并同时执行,所以这里放在一起介绍。
SS_process_ctes:处理查询语句中的CTE子句(with语句),CTE是一个临时的结果集;可以将子查询的结果作为一个独立的结果集使用。所以在函数处理时遍历ctelish列表,将其中的各个子结果集通过再调用subquery_planner函数进行处理,处理的结果存储在root->glob->subplans链表中。
void SS_process_ctes(PlannerInfo *root) { ListCell *lc; Assert(root->cte_plan_ids == NIL); foreach(lc, root->parse->cteList) { ... /* * Generate Paths for the CTE query. Always plan for full retrieval * --- we don't have enough info to predict otherwise. */ subroot = subquery_planner(root->glob, subquery, root, cte->cterecursive, 0.0); ... plan = create_plan(subroot, best_path); ... /* * Add the subplan and its PlannerInfo to the global lists. */ root->glob->subplans = lappend(root->glob->subplans, plan); root->glob->subroots = lappend(root->glob->subroots, subroot); ... } }
pull_up_sublinks:将命令中的 ANY(sub-SELECT) 和 EXISTS 转换为 JOIN 。这样能够将子链接和父查询进行合并,统一进行优化处理。
ANY语句转换为Semi-join语句,转换只适用于WHERE语句或者JOIN/ON语句。
EXISTS或者NOT EXISTS语句转换为Semi-join或者Anti-Semi-join。
子链接上提时,因为WHERE相关的节点信息存储在jointree中,所以会输入root->parse->jointree到pull_up_sublinks_jointree_recurse函数进行上提操作。pull_up_sublinks_jointree_recurse函数中检查jointree中存储的类型,并按照类型分类进行处理:
RangeTblRef:直接返回,不做优化
FromExpr:fromlist中包含两个域:基表信息(fromlist)和where条件表达式(quals);处理流程如下:
JoinExpr:joinexpr中包含两个域:左右基表信息(larg和rarg)和on约束条件(quals);处理流程如下:
相关代码如下:
static Node * pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode, Relids *relids) { if (jtnode == NULL) { *relids = NULL; } else if (IsA(jtnode, RangeTblRef)) { int varno = ((RangeTblRef *) jtnode)->rtindex; *relids = bms_make_singleton(varno); /* jtnode is returned unmodified */ } else if (IsA(jtnode, FromExpr)) { FromExpr *f = (FromExpr *) jtnode; List *newfromlist = NIL; Relids frelids = NULL; FromExpr *newf; Node *jtlink; ListCell *l; /* First, recurse to process children and collect their relids */ foreach(l, f->fromlist) { Node *newchild; Relids childrelids; newchild = pull_up_sublinks_jointree_recurse(root, lfirst(l), &childrelids); newfromlist = lappend(newfromlist, newchild); frelids = bms_join(frelids, childrelids); } /* Build the replacement FromExpr; no quals yet */ newf = makeFromExpr(newfromlist, NULL); /* Set up a link representing the rebuilt jointree */ jtlink = (Node *) newf; /* Now process qual --- all children are available for use */ newf->quals = pull_up_sublinks_qual_recurse(root, f->quals, &jtlink, frelids, NULL, NULL); /* * Note that the result will be either newf, or a stack of JoinExprs * with newf at the base. We rely on subsequent optimization steps to * flatten this and rearrange the joins as needed. * * Although we could include the pulled-up subqueries in the returned * relids, there's no need since upper quals couldn't refer to their * outputs anyway. */ *relids = frelids; jtnode = jtlink; } else if (IsA(jtnode, JoinExpr)) { JoinExpr *j; Relids leftrelids; Relids rightrelids; Node *jtlink; /* * Make a modifiable copy of join node, but don't bother copying its * subnodes (yet). */ j = (JoinExpr *) palloc(sizeof(JoinExpr)); memcpy(j, jtnode, sizeof(JoinExpr)); jtlink = (Node *) j; /* Recurse to process children and collect their relids */ j->larg = pull_up_sublinks_jointree_recurse(root, j->larg, &leftrelids); j->rarg = pull_up_sublinks_jointree_recurse(root, j->rarg, &rightrelids); /* * Now process qual, showing appropriate child relids as available, * and attach any pulled-up jointree items at the right place. In the * inner-join case we put new JoinExprs above the existing one (much * as for a FromExpr-style join). In outer-join cases the new * JoinExprs must go into the nullable side of the outer join. The * point of the available_rels machinations is to ensure that we only * pull up quals for which that's okay. * * We don't expect to see any pre-existing JOIN_SEMI or JOIN_ANTI * nodes here. */ switch (j->jointype) { case JOIN_INNER: j->quals = pull_up_sublinks_qual_recurse(root, j->quals, &jtlink, bms_union(leftrelids, rightrelids), NULL, NULL); break; case JOIN_LEFT: j->quals = pull_up_sublinks_qual_recurse(root, j->quals, &j->rarg, rightrelids, NULL, NULL); break; case JOIN_FULL: /* can't do anything with full-join quals */ break; case JOIN_RIGHT: j->quals = pull_up_sublinks_qual_recurse(root, j->quals, &j->larg, leftrelids, NULL, NULL); break; default: elog(ERROR, "unrecognized join type: %d", (int) j->jointype); break; } /* * Although we could include the pulled-up subqueries in the returned * relids, there's no need since upper quals couldn't refer to their * outputs anyway. But we *do* need to include the join's own rtindex * because we haven't yet collapsed join alias variables, so upper * levels would mistakenly think they couldn't use references to this * join. */ *relids = bms_join(leftrelids, rightrelids); if (j->rtindex) *relids = bms_add_member(*relids, j->rtindex); jtnode = jtlink; } else elog(ERROR, "unrecognized node type: %d", (int) nodeTag(jtnode)); return jtnode; }
由上述介绍可以知道基本的处理流程,子链接的基表都会被上提到顶层的基表链表中。
foreach(l, f->fromlist) { Node *newchild; Relids childrelids; //在这里获取基表链表信息 newchild = pull_up_sublinks_jointree_recurse(root, lfirst(l), &childrelids); newfromlist = lappend(newfromlist, newchild); frelids = bms_join(frelids, childrelids); } //这里创建一个新节点,并把新节点中基表链表赋值 /* Build the replacement FromExpr; no quals yet */ newf = makeFromExpr(newfromlist, NULL); /* Set up a link representing the rebuilt jointree */ jtlink = (Node *) newf; //这里给qulas赋值 /* Now process qual --- all children are available for use */ newf->quals = pull_up_sublinks_qual_recurse(root, f->quals, &jtlink, frelids, NULL, NULL); /* * Note that the result will be either newf, or a stack of JoinExprs * with newf at the base. We rely on subsequent optimization steps to * flatten this and rearrange the joins as needed. * * Although we could include the pulled-up subqueries in the returned * relids, there's no need since upper quals couldn't refer to their * outputs anyway. */ *relids = frelids; //这里返回新节点 jtnode = jtlink;
但是最终是怎么完成where条件表达式或者on约束条件上提的呢?下面再分析一下pull_up_sublinks_qual_recurse函数。在quals表达式中,有三种基本类型需要处理
sublink子链接语句:
NOT语句:为子链接时参考EXISTS_SUBLINK处理
AND、OR语句:遍历BoolExpr 节点,递归调用pull_up_sublinks_qual_recurse处理
实际转换流程执行,有以下限制:
var类型变量:指查询分析和查询优化中的基表目标列;或者表示子查询计划的输出结果
convert_ANY_sublink_to_join函数介绍:
JoinExpr * convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, Relids available_rels) { JoinExpr *result; Query *parse = root->parse; Query *subselect = (Query *) sublink->subselect; Relids upper_varnos; int rtindex; RangeTblEntry *rte; RangeTblRef *rtr; List *subquery_vars; Node *quals; ParseState *pstate; Assert(sublink->subLinkType == ANY_SUBLINK); /* * The sub-select must not refer to any Vars of the parent query. (Vars of * higher levels should be okay, though.) */ if (contain_vars_of_level((Node *) subselect, 1)) return NULL; /* * The test expression must contain some Vars of the parent query, else * it's not gonna be a join. (Note that it won't have Vars referring to * the subquery, rather Params.) */ upper_varnos = pull_varnos(sublink->testexpr); if (bms_is_empty(upper_varnos)) return NULL; /* * However, it can't refer to anything outside available_rels. */ if (!bms_is_subset(upper_varnos, available_rels)) return NULL; /* * The combining operators and left-hand expressions mustn't be volatile. */ if (contain_volatile_functions(sublink->testexpr)) return NULL; /* Create a dummy ParseState for addRangeTableEntryForSubquery */ pstate = make_parsestate(NULL); /* * Okay, pull up the sub-select into upper range table. * * We rely here on the assumption that the outer query has no references * to the inner (necessarily true, other than the Vars that we build * below). Therefore this is a lot easier than what pull_up_subqueries has * to go through. */ rte = addRangeTableEntryForSubquery(pstate, subselect, makeAlias("ANY_subquery", NIL), false, false); parse->rtable = lappend(parse->rtable, rte); rtindex = list_length(parse->rtable); /* * Form a RangeTblRef for the pulled-up sub-select. */ rtr = makeNode(RangeTblRef); rtr->rtindex = rtindex; /* * Build a list of Vars representing the subselect outputs. */ subquery_vars = generate_subquery_vars(root, subselect->targetList, rtindex); /* * Build the new join's qual expression, replacing Params with these Vars. */ quals = convert_testexpr(root, sublink->testexpr, subquery_vars); /* * And finally, build the JoinExpr node. */ result = makeNode(JoinExpr); result->jointype = JOIN_SEMI; result->isNatural = false; result->larg = NULL; /* caller must fill this in */ result->rarg = (Node *) rtr; result->usingClause = NIL; result->quals = quals; result->alias = NULL; result->rtindex = 0; /* we don't need an RTE for it */ return result; }
什么是半连接(SEMI-JOIN):一张表AA在另外一张表BB中找到匹配的记录,返回第一张表AA中满足条件的记录,且BB表记录不被返回。
什么是IN语句:AA IN BB 返回AA中满足BB条件的记录。
由此而知:
原代码为pull_up_subqueries
;原代码注释为Check to see if any subqueries in the jointree can be merged into this query
。
名称为子查询上提,实际是对pull_up_sublinks子链接上提操作后jointree树进行分析,尝试是否能否再进行优化。因为子链接上提操作未将子查询中的基表添加到父查询的基表(rtable链表)中。所以这里需要检查子查询是否能合并到父查询中。
具体操作为:检查jointree树中是否还存在别名的结果集,如果存在则替换为对应的查询语句的类型(RangeTblRef、FromExpr、JoinExpr)。
最终在pull_up_subqueries_recurse函数中实现上述流程;pull_up_subqueries_recurse函数介绍:函数对jointree进行解析,jointree中包含三种类型:
RangeTblRef:上提
FromExpr:遍历fromlist,递归调用pull_up_subqueries_recurse
JoinExpr:调用pull_up_subqueries_recurse函数处理左右节点,根据join类型修改相应的参数
static Node * pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode, JoinExpr *lowest_outer_join, JoinExpr *lowest_nulling_outer_join, AppendRelInfo *containing_appendrel) { Assert(jtnode != NULL); if (IsA(jtnode, RangeTblRef)) { int varno = ((RangeTblRef *) jtnode)->rtindex; RangeTblEntry *rte = rt_fetch(varno, root->parse->rtable); /* * Is this a subquery RTE, and if so, is the subquery simple enough to * pull up? * * If we are looking at an append-relation member, we can't pull it up * unless is_safe_append_member says so. */ if (rte->rtekind == RTE_SUBQUERY && is_simple_subquery(rte->subquery, rte, lowest_outer_join) && (containing_appendrel == NULL || is_safe_append_member(rte->subquery))) return pull_up_simple_subquery(root, jtnode, rte, lowest_outer_join, lowest_nulling_outer_join, containing_appendrel); /* * Alternatively, is it a simple UNION ALL subquery? If so, flatten * into an "append relation". * * It's safe to do this regardless of whether this query is itself an * appendrel member. (If you're thinking we should try to flatten the * two levels of appendrel together, you're right; but we handle that * in set_append_rel_pathlist, not here.) */ if (rte->rtekind == RTE_SUBQUERY && is_simple_union_all(rte->subquery)) return pull_up_simple_union_all(root, jtnode, rte); /* * Or perhaps it's a simple VALUES RTE? * * We don't allow VALUES pullup below an outer join nor into an * appendrel (such cases are impossible anyway at the moment). */ if (rte->rtekind == RTE_VALUES && lowest_outer_join == NULL && containing_appendrel == NULL && is_simple_values(root, rte)) return pull_up_simple_values(root, jtnode, rte); /* Otherwise, do nothing at this node. */ } else if (IsA(jtnode, FromExpr)) { FromExpr *f = (FromExpr *) jtnode; ListCell *l; Assert(containing_appendrel == NULL); /* Recursively transform all the child nodes */ foreach(l, f->fromlist) { lfirst(l) = pull_up_subqueries_recurse(root, lfirst(l), lowest_outer_join, lowest_nulling_outer_join, NULL); } } else if (IsA(jtnode, JoinExpr)) { JoinExpr *j = (JoinExpr *) jtnode; Assert(containing_appendrel == NULL); /* Recurse, being careful to tell myself when inside outer join */ switch (j->jointype) { case JOIN_INNER: j->larg = pull_up_subqueries_recurse(root, j->larg, lowest_outer_join, lowest_nulling_outer_join, NULL); j->rarg = pull_up_subqueries_recurse(root, j->rarg, lowest_outer_join, lowest_nulling_outer_join, NULL); break; case JOIN_LEFT: case JOIN_SEMI: case JOIN_ANTI: j->larg = pull_up_subqueries_recurse(root, j->larg, j, lowest_nulling_outer_join, NULL); j->rarg = pull_up_subqueries_recurse(root, j->rarg, j, j, NULL); break; case JOIN_FULL: j->larg = pull_up_subqueries_recurse(root, j->larg, j, j, NULL); j->rarg = pull_up_subqueries_recurse(root, j->rarg, j, j, NULL); break; case JOIN_RIGHT: j->larg = pull_up_subqueries_recurse(root, j->larg, j, j, NULL); j->rarg = pull_up_subqueries_recurse(root, j->rarg, j, lowest_nulling_outer_join, NULL); break; default: elog(ERROR, "unrecognized join type: %d", (int) j->jointype); break; } } else elog(ERROR, "unrecognized node type: %d", (int) nodeTag(jtnode)); return jtnode; }
/* * If this is a simple UNION ALL query, flatten it into an appendrel. We * do this now because it requires applying pull_up_subqueries to the leaf * queries of the UNION ALL, which weren't touched above because they * weren't referenced by the jointree (they will be after we do this). */ if (parse->setOperations) flatten_simple_union_all(root);
/* * Preprocess RowMark information. We need to do this after subquery * pullup, so that all base relations are present. */ preprocess_rowmarks(root);
目标列处理、withCheckOptions处理、RETURN表达式处理、HAVING语句处理、WINDOWS语句处理、LIMIT OFF语句处理
都调用preprocess_expression函数进行处理。
处理流程
static Node * preprocess_expression(PlannerInfo *root, Node *expr, int kind) { /* * Fall out quickly if expression is empty. This occurs often enough to * be worth checking. Note that null->null is the correct conversion for * implicit-AND result format, too. */ if (expr == NULL) return NULL; /* * If the query has any join RTEs, replace join alias variables with * base-relation variables. We must do this first, since any expressions * we may extract from the joinaliasvars lists have not been preprocessed. * For example, if we did this after sublink processing, sublinks expanded * out from join aliases would not get processed. But we can skip this in * non-lateral RTE functions, VALUES lists, and TABLESAMPLE clauses, since * they can't contain any Vars of the current query level. */ if (root->hasJoinRTEs && !(kind == EXPRKIND_RTFUNC || kind == EXPRKIND_VALUES || kind == EXPRKIND_TABLESAMPLE || kind == EXPRKIND_TABLEFUNC)) expr = flatten_join_alias_vars(root->parse, expr); /* * Simplify constant expressions. * * Note: an essential effect of this is to convert named-argument function * calls to positional notation and insert the current actual values of * any default arguments for functions. To ensure that happens, we *must* * process all expressions here. Previous PG versions sometimes skipped * const-simplification if it didn't seem worth the trouble, but we can't * do that anymore. * * Note: this also flattens nested AND and OR expressions into N-argument * form. All processing of a qual expression after this point must be * careful to maintain AND/OR flatness --- that is, do not generate a tree * with AND directly under AND, nor OR directly under OR. */ expr = eval_const_expressions(root, expr); /* * If it's a qual or havingQual, canonicalize it. */ if (kind == EXPRKIND_QUAL) { expr = (Node *) canonicalize_qual((Expr *) expr, false); #ifdef OPTIMIZER_DEBUG printf("After canonicalize_qual()\n"); pprint(expr); #endif } /* Expand SubLinks to SubPlans */ if (root->parse->hasSubLinks) expr = SS_process_sublinks(root, expr, (kind == EXPRKIND_QUAL)); /* * XXX do not insert anything here unless you have grokked the comments in * SS_replace_correlation_vars ... */ /* Replace uplevel vars with Param nodes (this IS possible in VALUES) */ if (root->query_level > 1) expr = SS_replace_correlation_vars(root, expr); /* * If it's a qual or havingQual, convert it to implicit-AND format. (We * don't want to do this before eval_const_expressions, since the latter * would be unable to simplify a top-level AND correctly. Also, * SS_process_sublinks expects explicit-AND format.) */ if (kind == EXPRKIND_QUAL) expr = (Node *) make_ands_implicit((Expr *) expr); return expr; }
上述流程的多个类型转换环节最终都会调用 XXX_XXX_mutator函数。XXX_XXX_mutator函数根据各个类型来实现分类转换。
这里主要介绍一下SS_process_sublinks流程中调用的process_sublinks_mutator函数,因为子链接中的节点为不确定类型,所以再函数调用时也会根据类型实行分类处理,当类型都不满足时,调用expression_tree_mutator函数进行处理。
相关代码如下:
static Node * process_sublinks_mutator(Node *node, process_sublinks_context *context) { process_sublinks_context locContext; locContext.root = context->root; if (node == NULL) return NULL; if (IsA(node, SubLink)) { 。。。 /* * Now build the SubPlan node and make the expr to return. */ return make_subplan(context->root, (Query *) sublink->subselect, sublink->subLinkType, sublink->subLinkId, testexpr, context->isTopQual); } /* * Don't recurse into the arguments of an outer PHV or aggregate here. Any * SubLinks in the arguments have to be dealt with at the outer query * level; they'll be handled when build_subplan collects the PHV or Aggref * into the arguments to be passed down to the current subplan. */ if (IsA(node, PlaceHolderVar)) { if (((PlaceHolderVar *) node)->phlevelsup > 0) return node; } else if (IsA(node, Aggref)) { if (((Aggref *) node)->agglevelsup > 0) return node; } /* * We should never see a SubPlan expression in the input (since this is * the very routine that creates 'em to begin with). We shouldn't find * ourselves invoked directly on a Query, either. */ Assert(!IsA(node, SubPlan)); Assert(!IsA(node, AlternativeSubPlan)); Assert(!IsA(node, Query)); /* * Because make_subplan() could return an AND or OR clause, we have to * take steps to preserve AND/OR flatness of a qual. We assume the input * has been AND/OR flattened and so we need no recursion here. * * (Due to the coding here, we will not get called on the List subnodes of * an AND; and the input is *not* yet in implicit-AND format. So no check * is needed for a bare List.) * * Anywhere within the top-level AND/OR clause structure, we can tell * make_subplan() that NULL and FALSE are interchangeable. So isTopQual * propagates down in both cases. (Note that this is unlike the meaning * of "top level qual" used in most other places in Postgres.) */ if (is_andclause(node)) { ... return (Node *) make_andclause(newargs); } if (is_orclause(node)) { ... return (Node *) make_orclause(newargs); } /* * If we recurse down through anything other than an AND or OR node, we * are definitely not at top qual level anymore. */ locContext.isTopQual = false; return expression_tree_mutator(node, process_sublinks_mutator, (void *) &locContext); }
make_subplan函数流程如下:
完成子计划创建后,返回。
调用preprocess_qual_conditions函数遍历jointree节点,依据节点基础类型查找qual节点,并调用preprocess_expression函数对qual节点进行处理,:
reduce_outer_joins
/* * reduce_outer_joins * Attempt to reduce outer joins to plain inner joins. * * The idea here is that given a query like * SELECT ... FROM a LEFT JOIN b ON (...) WHERE b.y = 42; * we can reduce the LEFT JOIN to a plain JOIN if the "=" operator in WHERE * is strict. The strict operator will always return NULL, causing the outer * WHERE to fail, on any row where the LEFT JOIN filled in NULLs for b's * columns. Therefore, there's no need for the join to produce null-extended * rows in the first place --- which makes it a plain join not an outer join. * (This scenario may not be very likely in a query written out by hand, but * it's reasonably likely when pushing quals down into complex views.) * * More generally, an outer join can be reduced in strength if there is a * strict qual above it in the qual tree that constrains a Var from the * nullable side of the join to be non-null. (For FULL joins this applies * to each side separately.) * * Another transformation we apply here is to recognize cases like * SELECT ... FROM a LEFT JOIN b ON (a.x = b.y) WHERE b.y IS NULL; * If the join clause is strict for b.y, then only null-extended rows could * pass the upper WHERE, and we can conclude that what the query is really * specifying is an anti-semijoin. We change the join type from JOIN_LEFT * to JOIN_ANTI. The IS NULL clause then becomes redundant, and must be * removed to prevent bogus selectivity calculations, but we leave it to * distribute_qual_to_rels to get rid of such clauses. * * Also, we get rid of JOIN_RIGHT cases by flipping them around to become * JOIN_LEFT. This saves some code here and in some later planner routines, * but the main reason to do it is to not need to invent a JOIN_REVERSE_ANTI * join type. * * To ease recognition of strict qual clauses, we require this routine to be * run after expression preprocessing (i.e., qual canonicalization and JOIN * alias-var expansion). */ void reduce_outer_joins(PlannerInfo *root) { reduce_outer_joins_state *state; /* * To avoid doing strictness checks on more quals than necessary, we want * to stop descending the jointree as soon as there are no outer joins * below our current point. This consideration forces a two-pass process. * The first pass gathers information about which base rels appear below * each side of each join clause, and about whether there are outer * join(s) below each side of each join clause. The second pass examines * qual clauses and changes join types as it descends the tree. */ state = reduce_outer_joins_pass1((Node *) root->parse->jointree); /* planner.c shouldn't have called me if no outer joins */ if (state == NULL || !state->contains_outer) elog(ERROR, "so where are the outer joins?"); reduce_outer_joins_pass2((Node *) root->parse->jointree, state, root, NULL, NIL, NIL); }
grouping_planner函数 ,首先处理LIMIT、ORDER BY、GROUP BY语句,然后根据setOperations(集合操作语句)值判断是否为UNION/INTERSECT/EXCEPT语句还是普通语句:
处理LIMIT语句
处理UNION/INTERSECT/EXCEPT语句,调用plan_set_operations函数处理,内部进行分类处理
处理普通语句,按照基本流程就行处理
static void
grouping_planner(PlannerInfo *root, bool inheritance_update,
double tuple_fraction)
{
Query *parse = root->parse;
int64 offset_est = 0;
int64 count_est = 0;
double limit_tuples = -1.0;
bool have_postponed_srfs = false;
PathTarget *final_target;
List *final_targets;
List *final_targets_contain_srfs;
bool final_target_parallel_safe;
RelOptInfo *current_rel;
RelOptInfo *final_rel;
FinalPathExtraData extra;
ListCell *lc;
static void grouping_planner(PlannerInfo *root, bool inheritance_update, double tuple_fraction) { ... //处理LIMIT语句 /* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */ if (parse->limitCount || parse->limitOffset) { tuple_fraction = preprocess_limit(root, tuple_fraction, &offset_est, &count_est); /* * If we have a known LIMIT, and don't have an unknown OFFSET, we can * estimate the effects of using a bounded sort. */ if (count_est > 0 && offset_est >= 0) limit_tuples = (double) count_est + (double) offset_est; } /* Make tuple_fraction accessible to lower-level routines */ root->tuple_fraction = tuple_fraction; if (parse->setOperations) { ... //处理UNION/INTERSECT/EXCEPT语句 /* * Construct Paths for set operations. The results will not need any * work except perhaps a top-level sort and/or LIMIT. Note that any * special work for recursive unions is the responsibility of * plan_set_operations. */ current_rel = plan_set_operations(root); ... } else { //普通语句处理 ... } ... }
除开特殊语句,其他的语句都执行普通执行流程:
注意:阅读代码前需要对查询引擎原理进行了解,不然不知道为什么这么做。
query_planner函数处理时分为普通语句和fromlist链表长度为1的语句("SELECT expression" and "INSERT ... VALUES()"
)–该类型调用函数直接处理并返回结果。重点讲一下普通语句的处理流程(普通查询会有三个要素:数据源、输出结果、查询条件,下面依次进行填充):
setup_simple_rel_arrays收集基表信息:从root->parse->rtable表设置root->append_rel_array表。
add_base_rels_to_query构建RelOptInfo数组(基表信息)(设置数据源):根据jointree类型创建RelOptInfo数组,将RelOptInfo数据存放再root->simple_rel_array中。简单来说就是创建基表的数组,再填充基表中的数据源。
build_base_rel_tlists设置目标列(设置输出结果):设置查询语句的输出结果,遍历target list(这里入参为root->processed_tlist),查找出所有的Var类型节点并添加到Var所属基表的RelOptInfo的reltargetlist中(已存在则不做重复添加)。简单来说就是填充基表的输出结果;将列名和数据源绑定:(select a1 , b1 from aa, bb where aa.a1 = bb.b1;------将a1和aa的关系绑定起来)
deconstruct_jointree设置约束条件(设置查询条件):调用deconstruct_recurse函数按照类型进行处理:
reconsider_outer_join_clauses处理外连接:
generate_base_implied_equalities创建约束条件 :将clause中所有的约束添加进行优化,并将clause中约束条件与基表信息RelOptInfo进行绑定。
(*qp_callback) (root, qp_extra)-----standard_qp_callback回调:处理排序的pathkey
create_lateral_join_info构建lateraljoin信息:子查询中的处理,略