初来乍到了解一门新的语言,它可能和熟悉的c/c++有不小差别,整体上需要首先了解下语法文件的整体结构。例如,源文件整体结构如何。
乍看CSharp源文件(compile unit)的结构,官网主要是通过文字描述的整体结构,而下面的形式化语法,描述也不太符合自定向下这种类型的语法结构描述方法,这样对于新手来了解这种语言的整体结构来说就有些困难。
好在有一个开源的dotgnu项目,该项目的官方文档中显示,项目已经在2012年正式废弃(可能更早已经没有更新了)。从工程的语法描述文件来看,它还没有涉及到lambda表达式这种重要语法功能的支持,不知道是因为项目启动时暂时没有支持,或者是启动时CSharp还没有这种语法功能。
As of December 2012, the DotGNU project has been decommissioned, until and unless a substantial new volunteer effort arises. The exception is the libjit component, which is now a separate libjit package.
尽管该项目比较久远,但是它的语法描述是通过经典的yacc语法描述,这样对于理解整体结构时最为直观的。其中对于整体结构的描述大致如下。从这个描述来看,整个源文件的结构顶层只能包含using、namespace、class、enum、struct、module、interface、delegate这些声明。
///@file: DotGnu\pnet\cscc\csharp\cs_grammar.y /* * Outer level of the C# input file. */ CompilationUnit : /* empty */ { /* The input file is empty */ CCTypedWarning("-empty-input", "file contains no declarations"); ResetState(); } | OuterDeclarationsRecoverable { /* Check for empty input and finalize the parse */ if(!HaveDecls) { CCTypedWarning("-empty-input", "file contains no declarations"); } ResetState(); } | OuterDeclarationsRecoverable NonOptAttributes { /* A file that contains declarations and assembly attributes */ if($2) { InitGlobalNamespace(); CCPluginAddStandaloneAttrs (ILNode_StandaloneAttr_create ((ILNode*)CurrNamespaceNode, $2)); } ResetState(); } | NonOptAttributes { /* A file that contains only assembly attributes */ if($1) { InitGlobalNamespace(); CCPluginAddStandaloneAttrs (ILNode_StandaloneAttr_create ((ILNode*)CurrNamespaceNode, $1)); } ResetState(); } ; /* * Note: strictly speaking, declarations should be ordered so * that using declarations always come before namespace members. * We have relaxed this to make error recovery easier. */ OuterDeclarations : OuterDeclaration | OuterDeclarations OuterDeclaration ; OuterDeclaration : UsingDirective | NamespaceMemberDeclaration | error { /* * This production recovers from errors at the outer level * by skipping invalid tokens until a namespace, using, * type declaration, or attribute, is encountered. */ #ifdef YYEOF while(yychar != YYEOF) #else while(yychar >= 0) #endif { if(yychar == NAMESPACE || yychar == USING || yychar == PUBLIC || yychar == INTERNAL || yychar == UNSAFE || yychar == SEALED || yychar == ABSTRACT || yychar == CLASS || yychar == STRUCT || yychar == DELEGATE || yychar == ENUM || yychar == INTERFACE || yychar == '[') { /* This token starts a new outer-level declaration */ break; } else if(yychar == '}' && CurrNamespace.len != 0) { /* Probably the end of the enclosing namespace */ break; } else if(yychar == ';') { /* Probably the end of an outer-level declaration, so restart the parser on the next token */ yychar = YYLEX; break; } yychar = YYLEX; } #ifdef YYEOF if(yychar != YYEOF) #else if(yychar >= 0) #endif { yyerrok; } NestingLevel = 0; } ; ///.... OptNamespaceMemberDeclarations : /* empty */ | OuterDeclarations ; NamespaceMemberDeclaration : NamespaceDeclaration | TypeDeclaration { CCPluginAddTopLevel($1); } ; TypeDeclaration : ClassDeclaration { $$ = $1; } | ModuleDeclaration { $$ = $1; } | StructDeclaration { $$ = $1; } | InterfaceDeclaration { $$ = $1; } | EnumDeclaration { $$ = $1; } | DelegateDeclaration { $$ = $1; } ;
微软官方开源了CSharp的实现,所以最标准的解释应该是来自微软官方代码。遗憾的是这个工程是使用CSharp开发的,所以项目内对于语法的解析也不是通过yacc文件描述,而是手工实现的一个编译器解析。猜测代码应该位于
///@file: roslyn\src\Compilers\CSharp\Portable\Parser internal CompilationUnitSyntax ParseCompilationUnitCore() { SyntaxToken? tmp = null; SyntaxListBuilder? initialBadNodes = null; var body = new NamespaceBodyBuilder(_pool); try { this.ParseNamespaceBody(ref tmp, ref body, ref initialBadNodes, SyntaxKind.CompilationUnit); var eof = this.EatToken(SyntaxKind.EndOfFileToken); var result = _syntaxFactory.CompilationUnit(body.Externs, body.Usings, body.Attributes, body.Members, eof); if (initialBadNodes != null) { // attach initial bad nodes as leading trivia on first token result = AddLeadingSkippedSyntax(result, initialBadNodes.ToListNode()); _pool.Free(initialBadNodes); } return result; } finally { body.Free(_pool); } } private void ParseNamespaceBody( [NotNullIfNotNull(nameof(openBraceOrSemicolon))] ref SyntaxToken? openBraceOrSemicolon, ref NamespaceBodyBuilder body, ref SyntaxListBuilder? initialBadNodes, SyntaxKind parentKind) { // "top-level" expressions and statements should never occur inside an asynchronous context Debug.Assert(!IsInAsync); bool isGlobal = openBraceOrSemicolon == null; var saveTerm = _termState; _termState |= TerminatorState.IsNamespaceMemberStartOrStop; NamespaceParts seen = NamespaceParts.None; var pendingIncompleteMembers = _pool.Allocate<MemberDeclarationSyntax>(); bool reportUnexpectedToken = true; try { while (true) { switch (this.CurrentToken.Kind) { case SyntaxKind.NamespaceKeyword: // incomplete members must be processed before we add any nodes to the body: AddIncompleteMembers(ref pendingIncompleteMembers, ref body); var attributeLists = _pool.Allocate<AttributeListSyntax>(); var modifiers = _pool.Allocate(); body.Members.Add(adjustStateAndReportStatementOutOfOrder(ref seen, this.ParseNamespaceDeclaration(attributeLists, modifiers))); _pool.Free(attributeLists); _pool.Free(modifiers); reportUnexpectedToken = true; break; case SyntaxKind.CloseBraceToken: // A very common user error is to type an additional } // somewhere in the file. This will cause us to stop parsing // the root (global) namespace too early and will make the // rest of the file unparseable and unusable by intellisense. // We detect that case here and we skip the close curly and // continue parsing as if we did not see the } if (isGlobal) { // incomplete members must be processed before we add any nodes to the body: ReduceIncompleteMembers(ref pendingIncompleteMembers, ref openBraceOrSemicolon, ref body, ref initialBadNodes); var token = this.EatToken(); token = this.AddError(token, IsScript ? ErrorCode.ERR_GlobalDefinitionOrStatementExpected : ErrorCode.ERR_EOFExpected); this.AddSkippedNamespaceText(ref openBraceOrSemicolon, ref body, ref initialBadNodes, token); reportUnexpectedToken = true; break; } else { // This token marks the end of a namespace body return; } case SyntaxKind.EndOfFileToken: // This token marks the end of a namespace body return; case SyntaxKind.ExternKeyword: if (isGlobal && !ScanExternAliasDirective()) { // extern member or a local function goto default; } else { // incomplete members must be processed before we add any nodes to the body: ReduceIncompleteMembers(ref pendingIncompleteMembers, ref openBraceOrSemicolon, ref body, ref initialBadNodes); var @extern = ParseExternAliasDirective(); if (seen > NamespaceParts.ExternAliases) { @extern = this.AddErrorToFirstToken(@extern, ErrorCode.ERR_ExternAfterElements); this.AddSkippedNamespaceText(ref openBraceOrSemicolon, ref body, ref initialBadNodes, @extern); } else { body.Externs.Add(@extern); seen = NamespaceParts.ExternAliases; } reportUnexpectedToken = true; break; } case SyntaxKind.UsingKeyword: if (isGlobal && (this.PeekToken(1).Kind == SyntaxKind.OpenParenToken || (!IsScript && IsPossibleTopLevelUsingLocalDeclarationStatement()))) { // Top-level using statement or using local declaration goto default; } else { parseUsingDirective(ref openBraceOrSemicolon, ref body, ref initialBadNodes, ref seen, ref pendingIncompleteMembers); } reportUnexpectedToken = true; break; case SyntaxKind.IdentifierToken: if (this.CurrentToken.ContextualKind != SyntaxKind.GlobalKeyword || this.PeekToken(1).Kind != SyntaxKind.UsingKeyword) { goto default; } else { parseUsingDirective(ref openBraceOrSemicolon, ref body, ref initialBadNodes, ref seen, ref pendingIncompleteMembers); } reportUnexpectedToken = true; break; case SyntaxKind.OpenBracketToken: if (this.IsPossibleGlobalAttributeDeclaration()) { // incomplete members must be processed before we add any nodes to the body: ReduceIncompleteMembers(ref pendingIncompleteMembers, ref openBraceOrSemicolon, ref body, ref initialBadNodes); var attribute = this.ParseAttributeDeclaration(); if (!isGlobal || seen > NamespaceParts.GlobalAttributes) { RoslynDebug.Assert(attribute.Target != null, "Must have a target as IsPossibleGlobalAttributeDeclaration checks for that"); attribute = this.AddError(attribute, attribute.Target.Identifier, ErrorCode.ERR_GlobalAttributesNotFirst); this.AddSkippedNamespaceText(ref openBraceOrSemicolon, ref body, ref initialBadNodes, attribute); } else { body.Attributes.Add(attribute); seen = NamespaceParts.GlobalAttributes; } reportUnexpectedToken = true; break; } goto default; default: var memberOrStatement = isGlobal ? this.ParseMemberDeclarationOrStatement(parentKind) : this.ParseMemberDeclaration(parentKind); if (memberOrStatement == null) { // incomplete members must be processed before we add any nodes to the body: ReduceIncompleteMembers(ref pendingIncompleteMembers, ref openBraceOrSemicolon, ref body, ref initialBadNodes); // eat one token and try to parse declaration or statement again: var skippedToken = EatToken(); if (reportUnexpectedToken && !skippedToken.ContainsDiagnostics) { skippedToken = this.AddError(skippedToken, IsScript ? ErrorCode.ERR_GlobalDefinitionOrStatementExpected : ErrorCode.ERR_EOFExpected); // do not report the error multiple times for subsequent tokens: reportUnexpectedToken = false; } this.AddSkippedNamespaceText(ref openBraceOrSemicolon, ref body, ref initialBadNodes, skippedToken); } else if (memberOrStatement.Kind == SyntaxKind.IncompleteMember && seen < NamespaceParts.MembersAndStatements) { pendingIncompleteMembers.Add(memberOrStatement); reportUnexpectedToken = true; } else { // incomplete members must be processed before we add any nodes to the body: AddIncompleteMembers(ref pendingIncompleteMembers, ref body); body.Members.Add(adjustStateAndReportStatementOutOfOrder(ref seen, memberOrStatement)); reportUnexpectedToken = true; } break; } } } finally { _termState = saveTerm; // adds pending incomplete nodes: AddIncompleteMembers(ref pendingIncompleteMembers, ref body); _pool.Free(pendingIncompleteMembers); } MemberDeclarationSyntax adjustStateAndReportStatementOutOfOrder(ref NamespaceParts seen, MemberDeclarationSyntax memberOrStatement) { switch (memberOrStatement.Kind) { case SyntaxKind.GlobalStatement: if (seen < NamespaceParts.MembersAndStatements) { seen = NamespaceParts.MembersAndStatements; } else if (seen == NamespaceParts.TypesAndNamespaces) { seen = NamespaceParts.TopLevelStatementsAfterTypesAndNamespaces; if (!IsScript) { memberOrStatement = this.AddError(memberOrStatement, ErrorCode.ERR_TopLevelStatementAfterNamespaceOrType); } } break; case SyntaxKind.NamespaceDeclaration: case SyntaxKind.FileScopedNamespaceDeclaration: case SyntaxKind.EnumDeclaration: case SyntaxKind.StructDeclaration: case SyntaxKind.ClassDeclaration: case SyntaxKind.InterfaceDeclaration: case SyntaxKind.DelegateDeclaration: case SyntaxKind.RecordDeclaration: case SyntaxKind.RecordStructDeclaration: if (seen < NamespaceParts.TypesAndNamespaces) { seen = NamespaceParts.TypesAndNamespaces; } break; default: if (seen < NamespaceParts.MembersAndStatements) { seen = NamespaceParts.MembersAndStatements; } break; } return memberOrStatement; } void parseUsingDirective( ref SyntaxToken? openBrace, ref NamespaceBodyBuilder body, ref SyntaxListBuilder? initialBadNodes, ref NamespaceParts seen, ref SyntaxListBuilder<MemberDeclarationSyntax> pendingIncompleteMembers) { // incomplete members must be processed before we add any nodes to the body: ReduceIncompleteMembers(ref pendingIncompleteMembers, ref openBrace, ref body, ref initialBadNodes); var @using = this.ParseUsingDirective(); if (seen > NamespaceParts.Usings) { @using = this.AddError(@using, ErrorCode.ERR_UsingAfterElements); this.AddSkippedNamespaceText(ref openBrace, ref body, ref initialBadNodes, @using); } else { body.Usings.Add(@using); seen = NamespaceParts.Usings; } } }
因为这个这种手撕的编译器代码看起来过于晦涩,又回头看了下CSharp的官方语言描述,其中是有编译单元入口描述的,只是隐藏的位置比较深,所以刚开始没看到([流汗]),这个最顶层的语法结构就是compilation_unit,从这个依次向下可以看到对于该结构的逐层描述和细化。从这个语法描述结构来看,最顶层的结构的确只能宝库using开始的结构,然后就是namespace,以及type_declaration。
// Source: §14.2 Compilation units compilation_unit : extern_alias_directive* using_directive* global_attributes? namespace_member_declaration* ; // Source: §22.3 Attribute specification global_attributes : global_attribute_section+ ; // Source: §14.6 Namespace member declarations namespace_member_declaration : namespace_declaration | type_declaration ; // Source: §14.7 Type declarations type_declaration : class_declaration | struct_declaration | interface_declaration | enum_declaration | delegate_declaration ; // Source: §14.3 Namespace declarations namespace_declaration : 'namespace' qualified_identifier namespace_body ';'? ; global_attribute_section : '[' global_attribute_target_specifier attribute_list ']' | '[' global_attribute_target_specifier attribute_list ',' ']' ;
在众多表达式中,这种lambda是一种比较顺手的语法结构,经在很多项目中出镜率还是很高的,所以还是要看下这个语法。在这个语法描述中,可以看到,关键的是"=>"这个语法结构,在这个结构之前,可以使用括弧(explicit_anonymous_function_signature),也可以不使用(implicit_anonymous_function_signature)。这种语法其实很难使用yacc语法描述,因为它对上下文的依赖非常强。
// Source: §12.19.1 General lambda_expression : 'async'? anonymous_function_signature '=>' anonymous_function_body ; anonymous_function_signature : explicit_anonymous_function_signature | implicit_anonymous_function_signature ; explicit_anonymous_function_signature : '(' explicit_anonymous_function_parameter_list? ')' ; implicit_anonymous_function_signature : '(' implicit_anonymous_function_parameter_list? ')' | implicit_anonymous_function_parameter ; implicit_anonymous_function_parameter_list : implicit_anonymous_function_parameter (',' implicit_anonymous_function_parameter)* ; implicit_anonymous_function_parameter : identifier ;
搜索语法中的这个'=>',可以发现除了lambda表达式之外,还有其他的场景使用,例如local_function_body。同样是这种语法结构,那么如何区域分是lambda表达式还是local_function呢?其实看下语法的上下文就可以看到,localfunction中'=>'前面是需要有类型(return_type)声明,而lambda表达式中的implicit_anonymous_function_parameter是作为expression来出现的,而顾名思义,expression表达式的前面是不可能出现type这种类型前缀引导的。
这里再次看到,CSharp这种语言是很难通过yacc这种通用的语法工具来描述。
// Source: §13.6.4 Local function declarations local_function_declaration : local_function_header local_function_body ; local_function_header : local_function_modifier* return_type identifier type_parameter_list? ( formal_parameter_list? ) type_parameter_constraints_clause* ; local_function_modifier : 'async' | 'unsafe' ; local_function_body : block | '=>' null_conditional_invocation_expression ';' | '=>' expression ';' ;
一个直接的推论是:不存在类似于C/C++中“全局变量”的概念。
由于不存在全局变量或者函数,所以也不存在类似于C/C++的全局main函数入口,所以整个应用(application)的入口只能位于某个class(不特定)内部,语言规定作为必须声明为static public类型。
从语法上看,namespace并不是必须的,如果没有把声明放在namespace中,那么和C++一样,声明会放在全局globalnamespace中。
但是,按照语法规范写的代码并不代表就是合法的。例如下面根据语法规范写的代码,大部分都是错误:-(——编程好难啊……
using System; //命名空间不能直接包含字段或方法之类的成员 int leela = 1; namespace harry { class harry { public static int fry(int x, int y) { int localfunc() => x + y; //只有 assignment、call、increment、decrement 和 new 对象表达式可用作语句 z => z + 1; //error CS0149: 应输入方法名称 int dd = ((int a) => a + 1)(1); return localfunc(); } public static int Main() { return fry(3, 7); } }; } namespace tsecer { //命名空间不能直接包含字段或方法之类的成员 void tsecer(){} }