转载地址:https://www.ucloud.cn/yun/37950.html
众所周知,在 python 中可以使用 exec 函数来执行包含 python 源代码的字符串:
>>> code = """ ...: a = "hello" ...: print(a) ...: """ >>> exec(code) hello >>> a "hello"
exec 函数的这个功能很是强大,慎用。如果一定要用的话,那么就需要注意一下下面这些安全相关的问题。
全局变量和内置函数
在 exec 执行的代码中,默认可以访问执行 exec 时的局部变量和全局变量, 同样也会修改全局变量。如果 exec 执行的代码是根据用户提交的数据生产的话,这种默认行为就是一个安全隐患。
如何更改这种默认行为呢?可以通过执行 exec 函数的时候再传两个参数的方式来 修改这种行为(详见 之前 关于 exec 的文章):
>>> g = {} >>> l = {"b": "world"} >>> exec("hello = "hello" + b", g, l) >>> l {"b": "world", "hello": "helloworld"} >>> g {"__builtins__": {...}} >>> hello --------------------------------------------------------------------------- NameError Traceback (most recent call last) ... NameError: name "hello" is not defined
如果要限制使用内置函数的话,可以在 globals 参数中定义一下 builtins 这个 key:
>>> g = {} >>> l = {} >>> exec("a = int("1")", g, l) >>> l {"a": 1} >>> g = {"__builtins__": {}} >>> exec("a = int("1")", g, l) Traceback (most recent call last): File "", line 1, in File "", line 1, in NameError: name "int" is not defined >>>
现在我们限制了访问和修改全局变量以及使用内置函数,难道这样就万事大吉了吗? 然而并非如此,还是可以通过其他的方式来获取内置函数甚至 os.system 函数。
另辟蹊径获取内置函数和 os.system
通过函数对象:
>>> def a(): pass ... >>> a.__globals__["__builtins__"] >>> a.__globals__["__builtins__"].open
通过内置类型对象:
>>> for cls in {}.__class__.__base__.__subclasses__(): ... if cls.__name__ == "WarningMessage": ... b = cls.__init__.__globals__["__builtins__"] ... b["open"] ... >>>
获取 os.system:
>>> cls = [x for x in [].__class__.__base__.__subclasses__() if x.__name__ == "_wrap_close"][0] >>> cls.__init__.__globals__["path"].os >>>
对于这两种办法又如何应对呢? 一种办法就是禁止访问以 _ 开头的属性:
如果可以控制 code 的生成,那么就在生成 code 的时候判断
如果不能的话,可以通过 dis 模块分析生成的 code (dist 无法分析嵌套函数的代码)
使用 tokenize 模块:
In [68]: from io import BytesIO In [69]: code = """ ....: a = "b" ....: a.__str__ ....: def b(): ....: b.__get__ ....: """ In [70]: t = tokenize(BytesIO(code.encode()).readline) In [71]: for x in t: ....: print(x) ....: TokenInfo(type=59 (ENCODING), string="utf-8", start=(0, 0), end=(0, 0), line="") TokenInfo(type=58 (NL), string=" ", start=(1, 0), end=(1, 1), line=" ") TokenInfo(type=1 (NAME), string="a", start=(2, 0), end=(2, 1), line="a = "b" ") TokenInfo(type=53 (OP), string="=", start=(2, 2), end=(2, 3), line="a = "b" ") TokenInfo(type=3 (STRING), string=""b"", start=(2, 4), end=(2, 7), line="a = "b" ") TokenInfo(type=4 (NEWLINE), string=" ", start=(2, 7), end=(2, 8), line="a = "b" ") TokenInfo(type=1 (NAME), string="a", start=(3, 0), end=(3, 1), line="a.__str__ ") TokenInfo(type=53 (OP), string=".", start=(3, 1), end=(3, 2), line="a.__str__ ") TokenInfo(type=1 (NAME), string="__str__", start=(3, 2), end=(3, 9), line="a.__str__ ") TokenInfo(type=4 (NEWLINE), string=" ", start=(3, 9), end=(3, 10), line="a.__str__ ") TokenInfo(type=1 (NAME), string="def", start=(4, 0), end=(4, 3), line="def b(): ") TokenInfo(type=1 (NAME), string="b", start=(4, 4), end=(4, 5), line="def b(): ") TokenInfo(type=53 (OP), string="(", start=(4, 5), end=(4, 6), line="def b(): ") TokenInfo(type=53 (OP), string=")", start=(4, 6), end=(4, 7), line="def b(): ") TokenInfo(type=53 (OP), string=":", start=(4, 7), end=(4, 8), line="def b(): ") TokenInfo(type=4 (NEWLINE), string=" ", start=(4, 8), end=(4, 9), line="def b(): ") TokenInfo(type=5 (INDENT), string=" ", start=(5, 0), end=(5, 4), line=" b.__get__ ") TokenInfo(type=1 (NAME), string="b", start=(5, 4), end=(5, 5), line=" b.__get__ ") TokenInfo(type=53 (OP), string=".", start=(5, 5), end=(5, 6), line=" b.__get__ ") TokenInfo(type=1 (NAME), string="__get__", start=(5, 6), end=(5, 13), line=" b.__get__ ") TokenInfo(type=4 (NEWLINE), string=" ", start=(5, 13), end=(5, 14), line=" b.__get__ ") TokenInfo(type=6 (DEDENT), string="", start=(6, 0), end=(6, 0), line="") TokenInfo(type=0 (ENDMARKER), string="", start=(6, 0), end=(6, 0), line="")
从上面的输出我们可以知道当 type 是 OP 并且 string 等于 "." 时,下一条记录就是
点之后的属性名称。所以我们的检查代码可以这样写:
import io import tokenize def check_unsafe_attributes(string): g = tokenize.tokenize(io.BytesIO(string.encode("utf-8")).readline) pre_op = "" for toktype, tokval, _, _, _ in g: if toktype == tokenize.NAME and pre_op == "." and tokval.startswith("_"): attr = tokval msg = "access to attribute "{0}" is unsafe.".format(attr) raise AttributeError(msg) elif toktype == tokenize.OP: pre_op = tokval
转载请注明本文地址:https://www.ucloud.cn/yun/37950.html