.pyc
文件几乎相同:.pyc文件的结构。我有:
从源代码编译的代码对象。
此代码对象的编组表示。
其(代码对象)代码节的递归反汇编。
其所有字段值。
主要目的:
我想找出不同的代码对象如何相互存储和引用。即,如何存储到子代码对象的链接?该模块应引用其所有功能。该函数应该具有对所有其他函数的引用,这些引用可以从中调用。当虚拟机将代码对象
id
存储到.pyc
时,是否将其保留?我不这样认为,因为在id
文件中看不到.pyc
。例如,我在反汇编的源代码中有这样的指令:LOAD_CONST 2 (<code object baz at 0x7f380995e5d0, file "foo.py", line 7>)
因此:
虚拟机如何找到
baz
代码对象?我看不到所有这些信息:编组字符串中的0x7f380995e5d0, file "foo.py", line 7
。是在每次运行程序时创建存储在编组代码中的对象id 0x7f380995e5d0
还是创建该对象? 如果不存储,如何在封送处理的代码对象(
.pyc
文件)中保留对象的连接?我想,我将进一步对
gdb
进行调查,但是也许这种方法(.pyc
文件解密)也可以完成这项工作。当前结果:
我将所有这些信息用于创建下一个文件:第一列是封送处理的二进制表示形式代码对象,第二个是我已经确定的每个字节序列的含义。
b'
\xe3 <don't know>
\x00\x00\x00\x00 <foo.py: co_argcount: 0>
\x00\x00\x00\x00 <foo.py: co_kwonlyargcount: 0>
\x00\x00\x00\x00 <foo.py: co_nlocals: 0>
\x03\x00\x00\x00 <foo.py: co_stacksize: 3>
@\x00\x00\x00 <foo.py: co_flags = '@' = 0x40 = 64>
s.\x00\x00\x00 <foo.py: number of bytes for module instructions = '.' = 46>
d\x00 <foo.py: co_code: 0 LOAD_CONST 0 (1)
Z\x00 <foo.py: co_code: 2 STORE_NAME 0 (a)
d\x01 <foo.py: co_code: 4 LOAD_CONST 1 (2)
Z\x01 <foo.py: co_code: 6 STORE_NAME 1 (b)
e\x00 <foo.py: co_code: 8 LOAD_NAME 0 (a)
e\x01 <foo.py: co_code: 10 LOAD_NAME 1 (b)
\x17\x00 <foo.py: co_code: 12 BINARY_ADD
Z\x02 <foo.py: co_code: 14 STORE_NAME 2 (c)
d\x02 <foo.py: co_code: 16 LOAD_CONST 2 (<code object baz at 0x7f380995e5d0, file "foo.py", line 7>)
d\x03 <foo.py: co_code: 18 LOAD_CONST 3 ('baz')
\x84\x00 <foo.py: co_code: 20 MAKE_FUNCTION 0
Z\x03 <foo.py: co_code: 22 STORE_NAME 3 (baz)
e\x03 <foo.py: co_code: 24 LOAD_NAME 3 (baz)
e\x00 <foo.py: co_code: 26 LOAD_NAME 0 (a)
e\x01 <foo.py: co_code: 28 LOAD_NAME 1 (b)
\x83\x02 <foo.py: co_code: 30 CALL_FUNCTION 2
Z\x04 <foo.py: co_code: 32 STORE_NAME 4 (multiplication)
e\x04 <foo.py: co_code: 34 LOAD_NAME 4 (multiplication)
d\x01 <foo.py: co_code: 36 LOAD_CONST 1 (2)
\x13\x00 <foo.py: co_code: 38 BINARY_POWER
Z\x05 <foo.py: co_code: 40 STORE_NAME 5 (square)
d\x04 <foo.py: co_code: 42 LOAD_CONST 4 (None)
S\x00 <foo.py: co_code: 44 RETURN_VALUE
)\x05 <foo.py: co_const: size>
\xe9\x01\x00\x00\x00 <foo.py: co_const[0]: 1>
\xe9\x02\x00\x00\x00 <foo.py: co_const[1]: 2>
c <TYPE_CODE>
\x02\x00\x00\x00 <baz: co_argcount: 2>
\x00\x00\x00\x00 <baz: co_kwonlyargcount: 0>
\x02\x00\x00\x00 <baz: co_nlocals: 2>
\x02\x00\x00\x00 <baz: co_stacksize: 2>
C\x00\x00\x00 <baz: co_flags = 'C' = 0x43 = 67>
s\x08\x00\x00\x00 <baz: co_code: size = 8 bytes>
|\x00 <baz: co_code: 0 LOAD_FAST 0 (x)
|\x01 <baz: co_code: 2 LOAD_FAST 1 (y)
\x14\x00 <baz: co_code: 4 BINARY_MULTIPLY
S\x00 <baz: co_code: 6 RETURN_VALUE
)\x01 <baz: co_const: size>
N <baz: co_const[0]: None>
\xa9\x00 <don't know>
)\x02 <baz: co_varnames: size>
\xda\x01 <baz: number of characters of next item>
x <baz: co_varnames[0]: x>
\xda\x01 <baz: number of characters of next item>
y <baz: co_varnames[1]: y>
r\x03\x00\x00\x00 <baz: don't know. But the 'r' = 'TYPE_REF'>
r\x03\x00\x00\x00 <baz: don't know. But the 'r' = 'TYPE_REF'>
\xfa\x06 <baz: next item length>
foo.py <baz: co_filename>
\xda\x03 <baz: number of characters of next item>
baz <baz: co_name: 'baz'>
\x07\x00\x00\x00 <baz: co_firstlineno: 7>
s\x02\x00\x00\x00 <baz: co_lnotab: size = 2 >
\x00\x01 <baz: co_lnotab>
r\x07\x00\x00\x00 <foo.py: co_const[3]: reference to baz>
N <foo.py: co_const[4]: None>
)\x06 <foo.py: co_names: size>
\xda\x01 <foo.py: number of characters of next item>
a <foo.py: co_names[0]: a>
\xda\x01 <foo.py: number of characters of next item>
b <foo.py: co_names[1]: b>
\xda\x01 <foo.py: number of characters of next item>
c <foo.py: co_names[2]: c>
r\x07\x00\x00\x00 <foo.py: co_names[3]: reference to baz>
Z\x0e <foo.py: number of characters of next item>
multiplication <foo.py: co_names[4]: multiplication>
Z\x06 <foo.py: number of characters of next item>
square <foo.py: co_names[5]: square>
r\x03\x00\x00\x00 <foo.py: don't know>
r\x03\x00\x00\x00 <foo.py: don't know>
r\x03\x00\x00\x00 <foo.py: don't know>
r\x06\x00\x00\x00 <foo.py: don't know>
\xda\x08 <foo.py: number of characters of next item>
<module> <foo.py: co_name>
\x03\x00\x00\x00 <foo.py: co_firstlineno>
s\n\x00\x00\x00 <foo.py: co_lnotab: size = '\n' = 0A>
\x04\x01 <foo.py: o_lnotab>
\x04\x01 <foo.py: o_lnotab>
\x08\x02 <foo.py: o_lnotab>
\x08\x07 <foo.py: o_lnotab>
\n\x01' <foo.py: o_lnotab>
复制所需的代码段:
1)源代码:
foo.py
a = 1
b = 2
c = a + b
def baz(x,y):
return x * y
multiplication = baz(a,b)
square = multiplication ** 2
2)
foo.py
的编组表示。source_py = "foo.py"
with open(source_py) as f_source:
source_code = f_source.read()
code_obj_compile = compile(source_code, source_py, "exec")
data = marshal.dumps(code_obj_compile)
print(data)
3)代码对象的完整(递归)反汇编。
import types
dis.dis(code_obj_compile)
for x in code_obj_compile.co_consts:
if isinstance(x, types.CodeType):
sub_byte_code = x
func_name = sub_byte_code.co_name
print('\nDisassembly of %s:' % func_name)
dis.dis(sub_byte_code)
4)所有代码对象的字段值。
def print_co_obj_fields(code_obj):
# Iterating through all instance attributes
# and calling all having the 'co_' prefix
for name in dir(code_obj):
if name.startswith('co_'):
co_field = getattr(code_obj, name)
print(f'{name:<20} = {co_field}')
print_co_obj_fields(code_obj_compile)
#1 楼
下面的答案是参考Python 2.7的。虚拟机如何找到baz代码对象?我看不到所有这些信息:0x7f380995e5d0,文件“ foo.py”,已编组的字符串中的第7行。对象ID 0x7f380995e5d0是存储在编组代码中还是在程序每次运行时创建?baz
代码对象位于co_consts
成员内。以您的示例为例。>>> import marshal
>>> import dis
>>>
>>> source_py = "foo.py"
>>>
>>> with open(source_py) as f_source:
... source_code = f_source.read()
>>>
>>> code_obj_compile = compile(source_code, source_py, "exec")
如果反汇编新生成的代码对象,则可以找到对
baz
的引用>>> dis.dis(code_obj_compile)
1 0 LOAD_CONST 0 (7)
3 STORE_NAME 0 (a)
2 6 LOAD_CONST 1 (5)
9 STORE_NAME 1 (b)
3 12 LOAD_NAME 0 (a)
15 LOAD_NAME 1 (b)
18 BINARY_ADD
19 STORE_NAME 2 (c)
5 22 LOAD_CONST 2 (<code object baz at 0x7f1dcdb06bb0, file "foo.py", line 5>)
25 MAKE_FUNCTION 0
... snip...
baz
代码对象位于父代码对象的co_consts
数组内,如下所示。>>> code_obj_compile.co_consts[2]
<code object baz at 0x7f1dcdb06bb0, file "foo.py", line 5>
也可以将其拆解。
/>
>>> dis.dis(code_obj_compile.co_consts[2])
6 0 LOAD_FAST 0 (x)
3 LOAD_FAST 1 (y)
6 BINARY_MULTIPLY
7 RETURN_VALUE
每次运行程序时都会创建对象。因此,地址将相应地更改。
如果不存储,则如何在封送处理的代码对象(.pyc文件)中保留对象的连接?
只解释一下。如果仔细看一下指令,您会发现
LOAD_CONST
指令将偏移量作为参数-操作数。 5 22 LOAD_CONST 2 (<code object baz at 0x7f1dcdb06bb0, file "foo.py", line 5>)
这里的偏移量是2,它指示Python虚拟机器将
co_consts
数组中的第三个(从零开始)项目加载到评估堆栈上。因此,使用其他元数据成员中的偏移量保留了“连接”。#2 楼
代码对象封送处理的目的是将程序存储到文件或从文件还原程序。因此,它应该具有针对Python所有功能的编码方案:对象,字节码,名称等,否则它将无法从文件中还原程序。因此,它使用了多种类型标识符,可以将其分为四个组:
单个类型:{类型标识符},大小为1个字节。
Example: TYPE_NONE = 'N'`, `TYPE_TRUE = 'T'.
短类型:{类型标识符} + 1个字节值
Example: TYPE_SHORT_ASCII_INTERNED = 'Z'.
long TYPE:{类型标识符} + 4个字节值
Example: TYPE_STRING = 's'.
对象类型:{类型标识符} +所有不同类型的组合,包括
object TYPE
本身。也就是说,它具有递归结构。 Example: TYPE_CODE = 'c'
所有类型都可以在这里看到:cpython / Python / marshal.c
此外,代码对象具有多个
int
字段。它们在编组的字符串中没有标识符,只有四个字节值的序列。 int co_argcount; /* #arguments, except *args */
int co_kwonlyargcount; /* #keyword only arguments */
int co_nlocals; /* #local variables */
int co_stacksize; /* #entries needed for evaluation stack */
int co_flags; /* CO_..., see below */
int co_firstlineno; /* first source line number */
完整的代码对象结构在这里:cpython / Include / code.h
这很有用知道转储代码对象的顺序,因为这样我们就可以计算结果字符串中的每个字段偏移量,例如-前四个字节是
co_argcount
,第二个是co_kwonlyargcount
,依此类推。代码对象转储的说明:
# PyCodeObject *co - pointer to the code object
# p - pointer to the file object,
that accumulating marshaled code object before
writing to the file.
W_TYPE(TYPE_CODE, p);
w_long(co->co_argcount, p);
w_long(co->co_kwonlyargcount, p);
w_long(co->co_nlocals, p);
w_long(co->co_stacksize, p);
w_long(co->co_flags, p);
w_object(co->co_code, p);
w_object(co->co_consts, p);
w_object(co->co_names, p);
w_object(co->co_varnames, p);
w_object(co->co_freevars, p);
w_object(co->co_cellvars, p);
w_object(co->co_filename, p);
w_object(co->co_name, p);
w_long(co->co_firstlineno, p);
w_object(co->co_lnotab, p);
结果:foo.py编组的字符串已完全解密:
b'
\xe3 <foo.py: '\xe3' & 0x80 (FLAG_REF) = 'c' (TYPE_CODE)>
\x00\x00\x00\x00 <foo.py: co_argcount: 0>
\x00\x00\x00\x00 <foo.py: co_kwonlyargcount: 0>
\x00\x00\x00\x00 <foo.py: co_nlocals: 0>
\x03\x00\x00\x00 <foo.py: co_stacksize: 3>
@\x00\x00\x00 <foo.py: co_flags = '@' = 0x40 = 64>
s.\x00\x00\x00 <foo.py: number of bytes for module instructions = '.' = 46>
d\x00 <foo.py: co_code: 0 LOAD_CONST 0 (1)
Z\x00 <foo.py: co_code: 2 STORE_NAME 0 (a)
d\x01 <foo.py: co_code: 4 LOAD_CONST 1 (2)
Z\x01 <foo.py: co_code: 6 STORE_NAME 1 (b)
e\x00 <foo.py: co_code: 8 LOAD_NAME 0 (a)
e\x01 <foo.py: co_code: 10 LOAD_NAME 1 (b)
\x17\x00 <foo.py: co_code: 12 BINARY_ADD
Z\x02 <foo.py: co_code: 14 STORE_NAME 2 (c)
d\x02 <foo.py: co_code: 16 LOAD_CONST 2 (<code object baz at 0x7f380995e5d0, file "foo.py", line 7>)
d\x03 <foo.py: co_code: 18 LOAD_CONST 3 ('baz')
\x84\x00 <foo.py: co_code: 20 MAKE_FUNCTION 0
Z\x03 <foo.py: co_code: 22 STORE_NAME 3 (baz)
e\x03 <foo.py: co_code: 24 LOAD_NAME 3 (baz)
e\x00 <foo.py: co_code: 26 LOAD_NAME 0 (a)
e\x01 <foo.py: co_code: 28 LOAD_NAME 1 (b)
\x83\x02 <foo.py: co_code: 30 CALL_FUNCTION 2
Z\x04 <foo.py: co_code: 32 STORE_NAME 4 (multiplication)
e\x04 <foo.py: co_code: 34 LOAD_NAME 4 (multiplication)
d\x01 <foo.py: co_code: 36 LOAD_CONST 1 (2)
\x13\x00 <foo.py: co_code: 38 BINARY_POWER
Z\x05 <foo.py: co_code: 40 STORE_NAME 5 (square)
d\x04 <foo.py: co_code: 42 LOAD_CONST 4 (None)
S\x00 <foo.py: co_code: 44 RETURN_VALUE
)\x05 <foo.py: co_const: size>
\xe9\x01\x00\x00\x00 <foo.py: co_const[0]: 1; '\xe9' & 0x80 (FLAG_REF) = 'i' (TYPE_INT)>
\xe9\x02\x00\x00\x00 <foo.py: co_const[1]: 2; '\xe9' & 0x80 (FLAG_REF) = 'i' (TYPE_INT)>
c <foo.py: co_const[2]: 'c' = TYPE_CODE>
\x02\x00\x00\x00 <baz: co_argcount: 2>
\x00\x00\x00\x00 <baz: co_kwonlyargcount: 0>
\x02\x00\x00\x00 <baz: co_nlocals: 2>
\x02\x00\x00\x00 <baz: co_stacksize: 2>
C\x00\x00\x00 <baz: co_flags = 'C' = 0x43 = 67>
s\x08\x00\x00\x00 <baz: co_code: size = 8 bytes>
|\x00 <baz: co_code: 0 LOAD_FAST 0 (x)
|\x01 <baz: co_code: 2 LOAD_FAST 1 (y)
\x14\x00 <baz: co_code: 4 BINARY_MULTIPLY
S\x00 <baz: co_code: 6 RETURN_VALUE
)\x01 <baz: co_const: size>
N <baz: co_const[0]: None>
\xa9\x00 <baz: co_names: size = 0 '\xa9' & 0x80 (FLAG_REF) = ')'>
)\x02 <baz: co_varnames: size>
\xda\x01 <baz: number of characters of next item; '\xda' & 0x80 (FLAG_REF) = 'Z'>
x <baz: co_varnames[0]: x>
\xda\x01 <baz: number of characters of next item; '\xda' & 0x80 (FLAG_REF) = 'Z'>
y <baz: co_varnames[1]: y>
r\x03\x00\x00\x00 <baz: co_freevars: reference to empty tuple '()'>
r\x03\x00\x00\x00 <baz: co_cellvars: reference to empty tuple '()'>
\xfa\x06 <baz: next item length>
foo.py <baz: co_filename>
\xda\x03 <baz: number of characters of next item>
baz <baz: co_name: 'baz'>
\x07\x00\x00\x00 <baz: co_firstlineno: 7>
s\x02\x00\x00\x00 <baz: co_lnotab: size = 2 >
\x00\x01 <baz: co_lnotab>
r\x07\x00\x00\x00 <foo.py: co_const[3]: reference to 'baz'>
N <foo.py: co_const[4]: None>
)\x06 <foo.py: co_names: size>
\xda\x01 <foo.py: number of characters of next item>
a <foo.py: co_names[0]: a>
\xda\x01 <foo.py: number of characters of next item>
b <foo.py: co_names[1]: b>
\xda\x01 <foo.py: number of characters of next item>
c <foo.py: co_names[2]: c>
r\x07\x00\x00\x00 <foo.py: co_names[3]: reference to 'baz'>
Z\x0e <foo.py: number of characters of next item>
multiplication <foo.py: co_names[4]: multiplication>
Z\x06 <foo.py: number of characters of next item>
square <foo.py: co_names[5]: square>
r\x03\x00\x00\x00 <foo.py: co_varnames: reference to empty tuple '()'>
r\x03\x00\x00\x00 <foo.py: co_freevars: reference to emtpy tuple '()'>
r\x03\x00\x00\x00 <foo.py: co_cellvars: reference to empty tuple '()'>
r\x06\x00\x00\x00 <foo.py: co_filename: reference to 'foo.py'>
\xda\x08 <foo.py: number of characters of next item>
<module> <foo.py: co_name>
\x03\x00\x00\x00 <foo.py: co_firstlineno>
s\n\x00\x00\x00 <foo.py: co_lnotab: size = '\n' = 0A>
\x04\x01 <foo.py: o_lnotab>
\x04\x01 <foo.py: o_lnotab>
\x08\x02 <foo.py: o_lnotab>
\x08\x07 <foo.py: o_lnotab>
\n\x01' <foo.py: o_lnotab>
有用的信息:
如何在python中创建代码对象?