用x86汇编语言编写Brainfuck解释器后,我决定是时候进行下一步了,用Java编写Brainfuck编译器,该程序可以生成x86汇编并将其编译为可执行文件。

当前它仅支持Windows,它仍然使用NASM和GCC作为依赖项,以将x86汇编代码转换为实际的可执行文件。这不是理想的方法,但这也是最简单的方法。

当前,编译器仍未完全优化,并且非常乐意重复增加/减少指针和/或内存位置。

这次我没有视频,但是我的解释器在60秒内执行了Mandelbrot BF文件,从此编译器获取的可执行文件在4秒内执行了该视频。

我的代码打算遵循由以下几个阶段组成的通用编译器设计:


读取源代码文件
将源文件转换为令牌流的词法分析器
语法分析器从令牌流中构建一个抽象语法树(AST),并检查诸如循环是否匹配的简单内容
中间代码生成器,它将源AST转换成中间AST
。目标代码生成器,将中间AST转换为目标AST
目标代码生成器设置目标AST并写入实际的x86汇编代码作为输出。

有一些值得注意的事情要提到:


没有语义分析器。用更完整的语言,该阶段将检查是否在兼容类型之间进行分配,等等。由于Brainfuck不够先进,因此无需进行此阶段。
现在,在以后的阶段中,现在不直接使用中间AST所有优化都将在此中间AST上完成。考虑将+++替换为伪代码addmemorycell(3)<<<替换为addpointer(-3)[-]替换为setmemorycell(0)

现在进入代码,让我们从AST和Expression接口开始:字段或方法并且与它们相等,唯一的例外是ExpressionIntegerExpressionStringExpression,它们分别具有RegisterExpressionIntegerString作为字段。

我认为不值得复制所有表达式在这里,因此我将按包将它们列出。

Register包包含:


DecrementExpression
IncrementExpression
InputExpression
LoopExpression
OutputExpression
PoinerLeftExpression

这些实际上只是Brainfuck令牌。

expression.source软件包包含:


MemoryInputExpression
MemoryLoopExpression
MemoryOutputExpression
MemoryPointerChangeExpression
MemoryValueChangeExpres sion

这些是Brainfuck令牌的中间表示形式,以便于对其进行更轻松的推理,您可以看到Increment / Decrement和PointerLeft / PointerRight已合并。

然后是expression.intermediate软件包:


AddExpression
BssSectionExpression
CallExpression
DataSectionExpression
DefineByteExpression
DefineLabelExpression
DwordExpression
ExternExpression
函数表达式
全局表达式
IdentifierExpression
JmpExpression
JmpShortExpression
JnzExpression
JzExpression
LabelExpression
MemoryAddressExpression
MovExpression
OperandExpression
PushExpression
注册
RegisterExpression
ReserveDoubleExpression
RetExpression
TestExpression
TextSectionExpression
ValueExpression
ValueListExpression

这些包含很多汇编表达式,我认为其中有很多。

为了让您感觉到它们的外观,我将使用expression.target

public interface Expression {
}


RegisterExpression

public class RegisterExpression implements Expression {
    private final Register register;

    public RegisterExpression(final Register register) {
        this.register = register;
    }

    public Register getRegister() {
        return register;
    }

    @Override
    public String toString() {
        return "Register(" + register + ")";
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) return true;
        if (obj == null || getClass() != obj.getClass()) return false;
        RegisterExpression that = (RegisterExpression)obj;
        return register == that.register;
    }

    @Override
    public int hashCode() {
        return Objects.hash(register);
    }
}


现在已经解释了表达式,现在是显示AST结构的时候了。

我们有Register类:

public enum Register {
    EAX,
    AX,
    AH,
    AL,
    EBX,
    BX,
    BH,
    BL,
    ECX,
    CX,
    CH,
    CL,
    EDX,
    DX,
    DH,
    DL,
    ESI,
    SI,
    EDI,
    DI,
    EBP,
    BP,
    ESP,
    SP;
}


然后是大型的AST类:

public class AST {
    private final ASTNode root;

    public AST(final ASTNode root) {
        this.root = root;
    }

    public ASTNode getRoot() {
        return root;
    }

    public Stream<String> prettyPrintStream() {
        return root.prettyPrintStream("", false);
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) return true;
        if (obj == null || getClass() != obj.getClass()) return false;
        AST ast = (AST)obj;
        return Objects.equals(root, ast.root);
    }

    @Override
    public int hashCode() {
        return Objects.hash(root);
    }
}


现在我们可以看一下第一阶段,读取ASTNode类中的源文件。

public class ASTNode {
    private final Expression expression;

    private ASTNode parent;

    private final List<ASTNode> children = new ArrayList<>();

    public ASTNode(final Expression expression) {
        this.expression = expression;
    }

    public static ASTNode newWithChild(final Expression expression, final ASTNode node) {
        return newWithChildren(expression, Arrays.asList(node));
    }

    public static ASTNode newWithChildren(final Expression expression, final List<ASTNode> nodes) {
        ASTNode newNode = new ASTNode(expression);
        nodes.forEach(newNode::addChild);
        return newNode;
    }

    public static ASTNode newWithMappedChildren(final Expression expression, final List<ASTNode> nodes, final UnaryOperator<ASTNode> mapper) {
        ASTNode newNode = new ASTNode(expression);
        nodes.stream().map(mapper).forEach(newNode::addChild);
        return newNode;
    }

    public Expression getExpression() {
        return expression;
    }

    public <T extends Expression> T getExpression(final Class<T> clazz) {
        if (clazz.isInstance(expression)) {
            return clazz.cast(expression);
        }
        throw new IllegalStateException("Expression " + expression + " cannot be cast to class " + clazz);
    }

    public ASTNode getParent() {
        return parent;
    }

    public void addChild(ASTNode node) {
        children.add(node);
        node.parent = this;
    }

    public List<ASTNode> getChildren() {
        return children;
    }

    public ASTNode getChild(final int index, final Class<? extends Expression> clazz) {
        ASTNode node = children.get(index);
        Expression childExpression = node.getExpression();
        if (!clazz.isInstance(childExpression)) {
            throw new IllegalStateException("Expected child " + index + " to be of class " + clazz + ", but it is " + childExpression);
        }
        return node;
    }

    public <T extends Expression> T getChildExpression(final int index, final Class<T> clazz) {
        ASTNode node = children.get(index);
        return node.getExpression(clazz);
    }

    protected Stream<String> prettyPrintStream(String prefix, boolean tail) {
        return Stream.concat(
            Stream.of(prefix + (tail ? "└── " : "├── ") + expression),
            IntStream.range(0, children.size())
                .boxed()
                .flatMap(i -> children.get(i).prettyPrintStream(prefix + (tail ? "    " : "│   "), (i == children.size() - 1)))
        );
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) return true;
        if (obj == null || getClass() != obj.getClass()) return false;
        ASTNode node = (ASTNode)obj;
        return Objects.equals(expression, node.expression) &&
            Objects.equals(children, node.children);
    }

    @Override
    public int hashCode() {
        return Objects.hash(expression, children);
    }
}


接下来是词法分析器:

public class SourceFile {
    private final Path path;

    public SourceFile(final Path path) {
        this.path = path;
    }

    public IntStream getCharacterStream() throws IOException {
        return Files.lines(path, StandardCharsets.UTF_8)
            .flatMapToInt(CharSequence::chars);
    }
}


使用SourceFile类:

public class LexicalAnalyzer {
    private final SourceFile sourceFile;

    public LexicalAnalyzer(final SourceFile sourceFile) {
        this.sourceFile = sourceFile;
    }

    public Stream<Token> getTokens() throws IOException {
        return sourceFile.getCharacterStream()
            .mapToObj(character -> Token.forCharacter((char)character))
            .filter(token -> token != null);
    }
}


下一个是语法分析器:

public enum Token {
    POINTER_RIGHT('>'),
    POINTER_LEFT('<'),
    INCREMENT('+'),
    DECREMENT('-'),
    OUTPUT('.'),
    INPUT(','),
    JUMP_PAST('['),
    JUMP_BACK(']');

    private static final Map<Character, Token> CHARACTER_TOKEN_MAP = Arrays.stream(Token.values())
        .collect(Collectors.toMap(Token::getCharacter, i -> i));

    public static Token forCharacter(final char character) {
        return CHARACTER_TOKEN_MAP.get(character);
    }

    private final char character;

    Token(final char character) {
        this.character = character;
    }

    public char getCharacter() {
        return character;
    }
}


使用以下简单的Token类:

public class SyntaxAnalyzer {
    public AST getAST(final Stream<Token> tokens) {
        ASTNode rootNode = new ASTNode(new RootExpression());
        AST ast = new AST(rootNode);
        ASTNode currentNode = rootNode;

        int loopCounter = 0;

        for (Token token : (Iterable<Token>)tokens::iterator) {
            ASTNode newNode = null;
            ASTNode nextNode = null;

            switch (token) {
                case POINTER_RIGHT:
                    newNode = new ASTNode(new PointerRightExpression());
                    break;
                case POINTER_LEFT:
                    newNode = new ASTNode(new PointerLeftExpression());
                    break;
                case INCREMENT:
                    newNode = new ASTNode(new IncrementExpression());
                    break;
                case DECREMENT:
                    newNode = new ASTNode(new DecrementExpression());
                    break;
                case OUTPUT:
                    newNode = new ASTNode(new OutputExpression());
                    break;
                case INPUT:
                    newNode = new ASTNode(new InputExpression());
                    break;
                case JUMP_PAST:
                    loopCounter++;
                    newNode = new ASTNode(new LoopExpression());
                    nextNode = newNode;
                    break;
                case JUMP_BACK:
                    if (loopCounter == 0) {
                        throw new InvalidSyntaxException("A ] has been found without a matching opening [.");
                    }
                    loopCounter--;
                    nextNode = currentNode.getParent();
                    break;
            }

            if (newNode != null) {
                currentNode.addChild(newNode);
            }

            if (nextNode != null) {
                currentNode = nextNode;
            }
        }

        if (loopCounter > 0) {
            throw new InvalidSyntaxException("A ] is missing.");
        }

        return ast;
    }
}


如果您已深入到问题中,请停一会儿,看一下下面的Hello World程序,该程序我已经在我的代码中进行了测试:

public class InvalidSyntaxException extends RuntimeException {
    public InvalidSyntaxException(String message) {
        super(message);
    }
}


我确定所有Brainfuc k专家将立即了解它是如何工作的,如果您不这样做,那么这不是什么大问题,因为一开始我也不确定。我唯一知道的是它不是最佳程序,因为在某一点上有一个InvalidSyntaxException序列,这意味着该存储单元递减,然后递增直到它为零(我们使用8位存储单元,所以255溢出到0),然后再次增加两次。

将此Brainfuck代码放入我们的语法分析器中,我们获得以下AST(通过-[+]++类中的prettyPrintStream()方法获得:AST)。
>++++++++[-<+++++++++>]<.>>+>-[+]++>++>+++[>[->+++<<+++>]<<]
>-----.>->+++..+++.>-.<<+[>[+>+]>>]<--------------.>>.+++.------.--------.>+.>+.



然后我们可以继续使用中间代码生成器:

├── Root
│   ├── PointerRight
│   ├── Increment
│   ├── Increment
│   ├── Increment
│   ├── Increment
│   ├── Increment
│   ├── Increment
│   ├── Increment
│   ├── Increment
│   ├── Loop
│   │   ├── Decrement
│   │   ├── PointerLeft
│   │   ├── Increment
│   │   ├── Increment
│   │   ├── Increment
│   │   ├── Increment
│   │   ├── Increment
│   │   ├── Increment
│   │   ├── Increment
│   │   ├── Increment
│   │   ├── Increment
│   │   └── PointerRight
│   ├── PointerLeft
│   ├── Output
│   ├── PointerRight
│   ├── PointerRight
│   ├── Increment
│   ├── PointerRight
│   ├── Decrement
│   ├── Loop
│   │   └── Increment
│   ├── Increment
│   ├── Increment
│   ├── PointerRight
│   ├── Increment
│   ├── Increment
│   ├── PointerRight
│   ├── Increment
│   ├── Increment
│   ├── Increment
│   ├── Loop
│   │   ├── PointerRight
│   │   ├── Loop
│   │   │   ├── Decrement
│   │   │   ├── PointerRight
│   │   │   ├── Increment
│   │   │   ├── Increment
│   │   │   ├── Increment
│   │   │   ├── PointerLeft
│   │   │   ├── PointerLeft
│   │   │   ├── Increment
│   │   │   ├── Increment
│   │   │   ├── Increment
│   │   │   └── PointerRight
│   │   ├── PointerLeft
│   │   └── PointerLeft
│   ├── PointerRight
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Output
│   ├── PointerRight
│   ├── Decrement
│   ├── PointerRight
│   ├── Increment
│   ├── Increment
│   ├── Increment
│   ├── Output
│   ├── Output
│   ├── Increment
│   ├── Increment
│   ├── Increment
│   ├── Output
│   ├── PointerRight
│   ├── Decrement
│   ├── Output
│   ├── PointerLeft
│   ├── PointerLeft
│   ├── Increment
│   ├── Loop
│   │   ├── PointerRight
│   │   ├── Loop
│   │   │   ├── Increment
│   │   │   ├── PointerRight
│   │   │   └── Increment
│   │   ├── PointerRight
│   │   └── PointerRight
│   ├── PointerLeft
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Output
│   ├── PointerRight
│   ├── PointerRight
│   ├── Output
│   ├── Increment
│   ├── Increment
│   ├── Increment
│   ├── Output
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Output
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Decrement
│   ├── Output
│   ├── PointerRight
│   ├── Increment
│   ├── Output
│   ├── PointerRight
│   ├── Increment
│   └── Output


哪个生成此中间AST:


public class IntermediateCodeGenerator {
    public AST generateAST(final AST sourceAST) {
        return new AST(sourceToIntermediateNode(sourceAST.getRoot()));
    }

    private ASTNode sourceToIntermediateNode(final ASTNode node) {
        Expression expression = node.getExpression();
        if (expression instanceof RootExpression) {
            return ASTNode.newWithMappedChildren(new RootExpression(), node.getChildren(), this::sourceToIntermediateNode);
        }
        if (expression instanceof PointerRightExpression) {
            return ASTNode.newWithChild(new MemoryPointerChangeExpression(), new ASTNode(new IntegerExpression(1)));
        }
        if (expression instanceof PointerLeftExpression) {
            return ASTNode.newWithChild(new MemoryPointerChangeExpression(), new ASTNode(new IntegerExpression(-1)));
        }
        if (expression instanceof IncrementExpression) {
            return ASTNode.newWithChild(new MemoryValueChangeExpression(), new ASTNode(new IntegerExpression(1)));
        }
        if (expression instanceof DecrementExpression) {
            return ASTNode.newWithChild(new MemoryValueChangeExpression(), new ASTNode(new IntegerExpression(-1)));
        }
        if (expression instanceof OutputExpression) {
            return new ASTNode(new MemoryOutputExpression());
        }
        if (expression instanceof InputExpression) {
            return new ASTNode(new MemoryInputExpression());
        }
        if (expression instanceof LoopExpression) {
            return ASTNode.newWithMappedChildren(new MemoryLoopExpression(), node.getChildren(), this::sourceToIntermediateNode);
        }
        throw new IllegalArgumentException("Node with unknown expression type: " + expression);
    }
}



接下来是目标代码生成器:

├── Root
│   ├── MemoryPointerChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryLoop
│   │   ├── MemoryValueChange
│   │   │   └── Integer(-1)
│   │   ├── MemoryPointerChange
│   │   │   └── Integer(-1)
│   │   ├── MemoryValueChange
│   │   │   └── Integer(1)
│   │   ├── MemoryValueChange
│   │   │   └── Integer(1)
│   │   ├── MemoryValueChange
│   │   │   └── Integer(1)
│   │   ├── MemoryValueChange
│   │   │   └── Integer(1)
│   │   ├── MemoryValueChange
│   │   │   └── Integer(1)
│   │   ├── MemoryValueChange
│   │   │   └── Integer(1)
│   │   ├── MemoryValueChange
│   │   │   └── Integer(1)
│   │   ├── MemoryValueChange
│   │   │   └── Integer(1)
│   │   ├── MemoryValueChange
│   │   │   └── Integer(1)
│   │   └── MemoryPointerChange
│   │       └── Integer(1)
│   ├── MemoryPointerChange
│   │   └── Integer(-1)
│   ├── MemoryOutput
│   ├── MemoryPointerChange
│   │   └── Integer(1)
│   ├── MemoryPointerChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryPointerChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryLoop
│   │   └── MemoryValueChange
│   │       └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryPointerChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryPointerChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryLoop
│   │   ├── MemoryPointerChange
│   │   │   └── Integer(1)
│   │   ├── MemoryLoop
│   │   │   ├── MemoryValueChange
│   │   │   │   └── Integer(-1)
│   │   │   ├── MemoryPointerChange
│   │   │   │   └── Integer(1)
│   │   │   ├── MemoryValueChange
│   │   │   │   └── Integer(1)
│   │   │   ├── MemoryValueChange
│   │   │   │   └── Integer(1)
│   │   │   ├── MemoryValueChange
│   │   │   │   └── Integer(1)
│   │   │   ├── MemoryPointerChange
│   │   │   │   └── Integer(-1)
│   │   │   ├── MemoryPointerChange
│   │   │   │   └── Integer(-1)
│   │   │   ├── MemoryValueChange
│   │   │   │   └── Integer(1)
│   │   │   ├── MemoryValueChange
│   │   │   │   └── Integer(1)
│   │   │   ├── MemoryValueChange
│   │   │   │   └── Integer(1)
│   │   │   └── MemoryPointerChange
│   │   │       └── Integer(1)
│   │   ├── MemoryPointerChange
│   │   │   └── Integer(-1)
│   │   └── MemoryPointerChange
│   │       └── Integer(-1)
│   ├── MemoryPointerChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryOutput
│   ├── MemoryPointerChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryPointerChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryOutput
│   ├── MemoryOutput
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryOutput
│   ├── MemoryPointerChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryOutput
│   ├── MemoryPointerChange
│   │   └── Integer(-1)
│   ├── MemoryPointerChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryLoop
│   │   ├── MemoryPointerChange
│   │   │   └── Integer(1)
│   │   ├── MemoryLoop
│   │   │   ├── MemoryValueChange
│   │   │   │   └── Integer(1)
│   │   │   ├── MemoryPointerChange
│   │   │   │   └── Integer(1)
│   │   │   └── MemoryValueChange
│   │   │       └── Integer(1)
│   │   ├── MemoryPointerChange
│   │   │   └── Integer(1)
│   │   └── MemoryPointerChange
│   │       └── Integer(1)
│   ├── MemoryPointerChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryOutput
│   ├── MemoryPointerChange
│   │   └── Integer(1)
│   ├── MemoryPointerChange
│   │   └── Integer(1)
│   ├── MemoryOutput
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryOutput
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryOutput
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryValueChange
│   │   └── Integer(-1)
│   ├── MemoryOutput
│   ├── MemoryPointerChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   ├── MemoryOutput
│   ├── MemoryPointerChange
│   │   └── Integer(1)
│   ├── MemoryValueChange
│   │   └── Integer(1)
│   └── MemoryOutput


AST类别:

public class TargetCodeGenerator {
    private final BFOptions bfOptions;

    public TargetCodeGenerator(final BFOptions bfOptions) {
        this.bfOptions = bfOptions;
    }

    public AST generateAST(final AST intermediateAST) {
        int stderr = 2;
        int bfMemoryCellAmount = bfOptions.getMemoryCellAmount();

        ASTNode rootNode = new ASTNode(new RootExpression());
        AST ast = new AST(rootNode);

        // Header of the BF.exe program -do not modify-
        ASTNode externNode = new ASTNode(new ExternExpression());
        externNode.addChild(ASTNode.newWithChild(new FunctionExpression(), new ASTNode(new StringExpression("_calloc"))));
        externNode.addChild(ASTNode.newWithChild(new FunctionExpression(), new ASTNode(new StringExpression("_fdopen"))));
        externNode.addChild(ASTNode.newWithChild(new FunctionExpression(), new ASTNode(new StringExpression("_fprintf"))));
        externNode.addChild(ASTNode.newWithChild(new FunctionExpression(), new ASTNode(new StringExpression("_getchar"))));
        externNode.addChild(ASTNode.newWithChild(new FunctionExpression(), new ASTNode(new StringExpression("_putchar"))));
        rootNode.addChild(externNode);

        ASTNode dataSectionNode = new ASTNode(new DataSectionExpression());
        dataSectionNode.addChild(ASTNode.newWithChildren(new DefineByteExpression(), Arrays.asList(
            ASTNode.newWithChild(new IdentifierExpression(), new ASTNode(new StringExpression("write_mode"))),
            ASTNode.newWithChild(new ValueExpression(), ASTNode.newWithChildren(new ValueListExpression(), Arrays.asList(
                new ASTNode(new StringExpression("w")),
                new ASTNode(new IntegerExpression(0))
            )))
        )));
        dataSectionNode.addChild(ASTNode.newWithChildren(new DefineByteExpression(), Arrays.asList(
            ASTNode.newWithChild(new IdentifierExpression(), new ASTNode(new StringExpression("error_outofmemory"))),
            ASTNode.newWithChild(new ValueExpression(), ASTNode.newWithChildren(new ValueListExpression(), Arrays.asList(
                new ASTNode(new StringExpression("Fatal: The Operating System does not have enough memory available.")),
                new ASTNode(new IntegerExpression(0))
            )))
        )));
        rootNode.addChild(dataSectionNode);

        ASTNode bssSectionNode = new ASTNode(new BssSectionExpression());
        bssSectionNode.addChild(ASTNode.newWithChildren(new ReserveDoubleExpression(), Arrays.asList(
            ASTNode.newWithChild(new IdentifierExpression(), new ASTNode(new StringExpression("bf_memory"))),
            ASTNode.newWithChild(new ValueExpression(), new ASTNode(new IntegerExpression(1)))
        )));
        rootNode.addChild(bssSectionNode);

        ASTNode textSectionNode = new ASTNode(new TextSectionExpression());
        textSectionNode.addChild(ASTNode.newWithChild(new GlobalExpression(),
            ASTNode.newWithChild(new LabelExpression(), new ASTNode(new StringExpression("_main")))));
        textSectionNode.addChild(ASTNode.newWithChild(new DefineLabelExpression(),
            ASTNode.newWithChild(new LabelExpression(), new ASTNode(new StringExpression("_main")))));
        textSectionNode.addChild(ASTNode.newWithChildren(new MovExpression(), Arrays.asList(
            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.EBP))),
            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.ESP)))
        )));
        textSectionNode.addChild(ASTNode.newWithChild(new PushExpression(),
            ASTNode.newWithChild(new OperandExpression(),
                ASTNode.newWithChild(new DwordExpression(), new ASTNode(new IntegerExpression(1))))));
        textSectionNode.addChild(ASTNode.newWithChild(new PushExpression(),
            ASTNode.newWithChild(new OperandExpression(),
                ASTNode.newWithChild(new DwordExpression(), new ASTNode(new IntegerExpression(bfMemoryCellAmount))))));
        textSectionNode.addChild(ASTNode.newWithChild(new CallExpression(),
            ASTNode.newWithChild(new OperandExpression(),
                ASTNode.newWithChild(new FunctionExpression(), new ASTNode(new StringExpression("_calloc"))))));
        textSectionNode.addChild(ASTNode.newWithChildren(new AddExpression(), Arrays.asList(
            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.ESP))),
            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new IntegerExpression(8)))
        )));
        textSectionNode.addChild(ASTNode.newWithChildren(new TestExpression(), Arrays.asList(
            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.EAX))),
            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.EAX)))
        )));
        textSectionNode.addChild(ASTNode.newWithChild(new JzExpression(),
            ASTNode.newWithChild(new OperandExpression(),
                ASTNode.newWithChild(new LabelExpression(), new ASTNode(new StringExpression("error_exit_outofmemory"))))));
        textSectionNode.addChild(ASTNode.newWithChildren(new MovExpression(), Arrays.asList(
            ASTNode.newWithChild(new OperandExpression(),
                ASTNode.newWithChild(new MemoryAddressExpression(),
                    ASTNode.newWithChild(new IdentifierExpression(), new ASTNode(new StringExpression("bf_memory"))))),
            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.EAX)))
        )));
        textSectionNode.addChild(ASTNode.newWithChildren(new MovExpression(), Arrays.asList(
            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.EDI))),
            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.EAX)))
        )));

        // Code Generated from the Intermediate AST
        convertIntermediateToTargetNodes(intermediateAST.getRoot(), "").forEach(textSectionNode::addChild);

        // Bottom of the BF.exe program -do not modify-
        textSectionNode.addChild(ASTNode.newWithChild(new JmpExpression(),
            ASTNode.newWithChild(new OperandExpression(),
                ASTNode.newWithChild(new LabelExpression(), new ASTNode(new StringExpression("normal_exit"))))));

        textSectionNode.addChild(ASTNode.newWithChild(new DefineLabelExpression(),
            ASTNode.newWithChild(new LabelExpression(), new ASTNode(new StringExpression("error_exit_outofmemory")))));
        textSectionNode.addChild(ASTNode.newWithChild(new PushExpression(),
            ASTNode.newWithChild(new OperandExpression(),
                ASTNode.newWithChild(new IdentifierExpression(), new ASTNode(new StringExpression("write_mode"))))));
        textSectionNode.addChild(ASTNode.newWithChild(new PushExpression(),
            ASTNode.newWithChild(new OperandExpression(),
                ASTNode.newWithChild(new DwordExpression(), new ASTNode(new IntegerExpression(stderr))))));
        textSectionNode.addChild(ASTNode.newWithChild(new CallExpression(),
            ASTNode.newWithChild(new OperandExpression(),
                ASTNode.newWithChild(new FunctionExpression(), new ASTNode(new StringExpression("_fdopen"))))));
        textSectionNode.addChild(ASTNode.newWithChildren(new AddExpression(), Arrays.asList(
            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.ESP))),
            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new IntegerExpression(8)))
        )));
        textSectionNode.addChild(ASTNode.newWithChild(new PushExpression(),
            ASTNode.newWithChild(new OperandExpression(),
                ASTNode.newWithChild(new IdentifierExpression(), new ASTNode(new StringExpression("error_outofmemory"))))));
        textSectionNode.addChild(ASTNode.newWithChild(new PushExpression(),
            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.EAX)))));
        textSectionNode.addChild(ASTNode.newWithChild(new CallExpression(),
            ASTNode.newWithChild(new OperandExpression(),
                ASTNode.newWithChild(new FunctionExpression(), new ASTNode(new StringExpression("_fprintf"))))));
        textSectionNode.addChild(ASTNode.newWithChildren(new AddExpression(), Arrays.asList(
            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.ESP))),
            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new IntegerExpression(8)))
        )));
        textSectionNode.addChild(ASTNode.newWithChildren(new MovExpression(), Arrays.asList(
            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.EAX))),
            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new IntegerExpression(-1)))
        )));
        textSectionNode.addChild(ASTNode.newWithChild(new JmpShortExpression(),
            ASTNode.newWithChild(new OperandExpression(),
                ASTNode.newWithChild(new LabelExpression(), new ASTNode(new StringExpression("exit"))))));

        textSectionNode.addChild(ASTNode.newWithChild(new DefineLabelExpression(),
            ASTNode.newWithChild(new LabelExpression(), new ASTNode(new StringExpression("normal_exit")))));
        textSectionNode.addChild(ASTNode.newWithChildren(new MovExpression(), Arrays.asList(
            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.EAX))),
            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new IntegerExpression(0)))
        )));

        textSectionNode.addChild(ASTNode.newWithChild(new DefineLabelExpression(),
            ASTNode.newWithChild(new LabelExpression(), new ASTNode(new StringExpression("exit")))));
        textSectionNode.addChild(new ASTNode(new RetExpression()));
        rootNode.addChild(textSectionNode);

        return ast;
    }

    private Stream<ASTNode> convertIntermediateToTargetNodes(final ASTNode node, final String uniqueIndex) {
        Expression expression = node.getExpression();
        if (expression instanceof RootExpression) {
            return IntStream.range(0, node.getChildren().size())
                .boxed()
                .flatMap(i -> convertIntermediateToTargetNodes(node.getChildren().get(i), uniqueIndex + "_" + i));
        }
        if (expression instanceof MemoryLoopExpression) {
            return Stream.concat(
                Stream.of(
                    ASTNode.newWithChildren(new MovExpression(), Arrays.asList(
                        ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.AL))),
                        ASTNode.newWithChild(new OperandExpression(),
                            ASTNode.newWithChild(new MemoryAddressExpression(), new ASTNode(new RegisterExpression(Register.EDI))))
                    )),
                    ASTNode.newWithChildren(new TestExpression(), Arrays.asList(
                        ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.AL))),
                        ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.AL)))
                    )),
                    ASTNode.newWithChild(new JzExpression(),
                        ASTNode.newWithChild(new OperandExpression(),
                            ASTNode.newWithChild(new LabelExpression(), new ASTNode(new StringExpression("loop_end" + uniqueIndex))))),
                    ASTNode.newWithChild(new DefineLabelExpression(),
                        ASTNode.newWithChild(new LabelExpression(), new ASTNode(new StringExpression("loop_start" + uniqueIndex))))
                ),
                Stream.concat(
                    IntStream.range(0, node.getChildren().size())
                        .boxed()
                        .flatMap(i -> convertIntermediateToTargetNodes(node.getChildren().get(i), uniqueIndex + "_" + i)),
                    Stream.of(
                        ASTNode.newWithChildren(new MovExpression(), Arrays.asList(
                            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.AL))),
                            ASTNode.newWithChild(new OperandExpression(),
                                ASTNode.newWithChild(new MemoryAddressExpression(), new ASTNode(new RegisterExpression(Register.EDI))))
                        )),
                        ASTNode.newWithChildren(new TestExpression(), Arrays.asList(
                            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.AL))),
                            ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.AL)))
                        )),
                        ASTNode.newWithChild(new JnzExpression(),
                            ASTNode.newWithChild(new OperandExpression(),
                                ASTNode.newWithChild(new LabelExpression(), new ASTNode(new StringExpression("loop_start" + uniqueIndex))))),
                        ASTNode.newWithChild(new DefineLabelExpression(),
                            ASTNode.newWithChild(new LabelExpression(), new ASTNode(new StringExpression("loop_end" + uniqueIndex))))
                    )
                )
            );
        }
        if (expression instanceof MemoryPointerChangeExpression) {
            int changeValue = node.getChildren().get(0).getExpression(IntegerExpression.class).getInteger();
            if (changeValue == 0) {
                return Stream.empty();
            }
            return Stream.of(
                ASTNode.newWithChildren(new AddExpression(), Arrays.asList(
                    ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.EDI))),
                    ASTNode.newWithChild(new OperandExpression(), new ASTNode(new IntegerExpression(changeValue)))
                ))
            );
        }
        if (expression instanceof MemoryValueChangeExpression) {
            int changeValue = node.getChildren().get(0).getExpression(IntegerExpression.class).getInteger();
            if (changeValue == 0) {
                return Stream.empty();
            }
            return Stream.of(
                ASTNode.newWithChildren(new MovExpression(), Arrays.asList(
                    ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.AL))),
                    ASTNode.newWithChild(new OperandExpression(),
                        ASTNode.newWithChild(new MemoryAddressExpression(), new ASTNode(new RegisterExpression(Register.EDI))))
                )),
                ASTNode.newWithChildren(new AddExpression(), Arrays.asList(
                    ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.AL))),
                    ASTNode.newWithChild(new OperandExpression(), new ASTNode(new IntegerExpression(changeValue)))
                )),
                ASTNode.newWithChildren(new MovExpression(), Arrays.asList(
                    ASTNode.newWithChild(new OperandExpression(),
                        ASTNode.newWithChild(new MemoryAddressExpression(), new ASTNode(new RegisterExpression(Register.EDI)))),
                    ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.AL)))
                ))
            );
        }
        if (expression instanceof MemoryInputExpression) {
            return Stream.of(
                ASTNode.newWithChild(new CallExpression(),
                    ASTNode.newWithChild(new OperandExpression(),
                        ASTNode.newWithChild(new FunctionExpression(), new ASTNode(new StringExpression("_getchar"))))),
                ASTNode.newWithChildren(new MovExpression(), Arrays.asList(
                    ASTNode.newWithChild(new OperandExpression(),
                        ASTNode.newWithChild(new MemoryAddressExpression(), new ASTNode(new RegisterExpression(Register.EDI)))),
                    ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.AL)))
                ))
            );
        }
        if (expression instanceof MemoryOutputExpression) {
            return Stream.of(
                ASTNode.newWithChildren(new MovExpression(), Arrays.asList(
                    ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.AL))),
                    ASTNode.newWithChild(new OperandExpression(),
                        ASTNode.newWithChild(new MemoryAddressExpression(), new ASTNode(new RegisterExpression(Register.EDI))))
                )),
                ASTNode.newWithChild(new PushExpression(),
                    ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.EAX)))),
                ASTNode.newWithChild(new CallExpression(),
                    ASTNode.newWithChild(new OperandExpression(),
                        ASTNode.newWithChild(new FunctionExpression(), new ASTNode(new StringExpression("_putchar"))))),
                ASTNode.newWithChildren(new AddExpression(), Arrays.asList(
                    ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.ESP))),
                    ASTNode.newWithChild(new OperandExpression(), new ASTNode(new IntegerExpression(4)))
                ))
            );
        }
        throw new IllegalArgumentException("Node with unsupported expression type: " + expression);
    }
}


由于生成的Target AST确实很大,因此可以在Pastebin上找到它。

最后一步,我们有了Target Code Writer:

public class BFOptions {
    private final int memoryCellAmount;

    private BFOptions(final Builder builder) {
        this.memoryCellAmount = builder.memoryCellAmount;
    }

    public int getMemoryCellAmount() {
        return memoryCellAmount;
    }

    public static class Builder {
        private int memoryCellAmount;

        public Builder memoryCellAmount(final int memoryCellAmount) {
            this.memoryCellAmount = memoryCellAmount;
            return this;
        }

        public BFOptions build() {
            return new BFOptions(this);
        }
    }
}


它生成下面的ASM代码,可在Pastebin上使用。

然后是BFOptions类,用于将x86汇编代码实际编译为可执行文件:

public class TargetCodeWriter {
    public Stream<String> write(final AST targetAst) {
        return targetAst.getRoot().getChildren().stream()
            .flatMap(this::resolveNode);
    }

    private Stream<String> resolveNode(final ASTNode node) {
        Expression expression = node.getExpression();
        if (expression instanceof ExternExpression) {
            String functions = node.getChildren().stream()
                .flatMap(n -> resolveNode(n, FunctionExpression.class))
                .collect(Collectors.joining(", "));
            return Stream.of("extern " + functions);
        }
        if (expression instanceof FunctionExpression) {
            return Stream.of(resolveFunction(node));
        }
        if (expression instanceof DataSectionExpression) {
            return Stream.concat(
                Stream.of("section .data"),
                resolveChildren(node)
            );
        }
        if (expression instanceof DefineByteExpression) {
            String identifier = resolveIdentifier(node.getChild(0, IdentifierExpression.class));
            String value = resolveValue(node.getChild(1, ValueExpression.class));
            return Stream.of(identifier + " db " + value);
        }
        if (expression instanceof BssSectionExpression) {
            return Stream.concat(
                Stream.of("section .bss"),
                resolveChildren(node)
            );
        }
        if (expression instanceof ReserveDoubleExpression) {
            String identifier = resolveIdentifier(node.getChild(0, IdentifierExpression.class));
            String value = resolveValue(node.getChild(1, ValueExpression.class));
            return Stream.of(identifier + " resd " + value);
        }
        if (expression instanceof TextSectionExpression) {
            return Stream.concat(
                Stream.of("section .text"),
                resolveChildren(node)
            );
        }
        if (expression instanceof GlobalExpression) {
            String label = resolveLabel(node.getChild(0, LabelExpression.class));
            return Stream.of("global " + label);
        }
        if (expression instanceof DefineLabelExpression) {
            String label = resolveLabel(node.getChild(0, LabelExpression.class));
            return Stream.of(label + ":");
        }
        if (expression instanceof MovExpression) {
            String operand1 = resolveOperand(node.getChild(0, OperandExpression.class));
            String operand2 = resolveOperand(node.getChild(1, OperandExpression.class));
            return Stream.of("mov " + operand1 + ", " + operand2);
        }
        if (expression instanceof PushExpression) {
            String operand = resolveOperand(node.getChild(0, OperandExpression.class));
            return Stream.of("push " + operand);
        }
        if (expression instanceof CallExpression) {
            String operand = resolveOperand(node.getChild(0, OperandExpression.class));
            return Stream.of("call " + operand);
        }
        if (expression instanceof AddExpression) {
            String operand1 = resolveOperand(node.getChild(0, OperandExpression.class));
            String operand2 = resolveOperand(node.getChild(1, OperandExpression.class));
            return Stream.of("add " + operand1 + ", " + operand2);
        }
        if (expression instanceof TestExpression) {
            String operand1 = resolveOperand(node.getChild(0, OperandExpression.class));
            String operand2 = resolveOperand(node.getChild(1, OperandExpression.class));
            return Stream.of("test " + operand1 + ", " + operand2);
        }
        if (expression instanceof JzExpression) {
            String operand = resolveOperand(node.getChild(0, OperandExpression.class));
            return Stream.of("jz " + operand);
        }
        if (expression instanceof JnzExpression) {
            String operand = resolveOperand(node.getChild(0, OperandExpression.class));
            return Stream.of("jnz " + operand);
        }
        if (expression instanceof JmpExpression) {
            String operand = resolveOperand(node.getChild(0, OperandExpression.class));
            return Stream.of("jmp " + operand);
        }
        if (expression instanceof JmpShortExpression) {
            String operand = resolveOperand(node.getChild(0, OperandExpression.class));
            return Stream.of("jmp short " + operand);
        }
        if (expression instanceof RetExpression) {
            return Stream.of("ret");
        }
        throw new IllegalStateException("Unsupported expression: " + expression);
    }

    private Stream<String> resolveNode(final ASTNode node, final Class<? extends Expression> expectedExpression) {
        Expression expression = node.getExpression();
        if (!expectedExpression.isInstance(expression)) {
            throw new IllegalStateException("Cannot resolve node " + node + ": Expected " + expectedExpression + ", found " + expression);
        }
        return resolveNode(node);
    }

    private Stream<String> resolveChildren(final ASTNode node) {
        return node.getChildren().stream().flatMap(this::resolveNode);
    }

    private String resolveIdentifier(final ASTNode node) {
        return node.getChildExpression(0, StringExpression.class).getString();
    }

    private String resolveValue(final ASTNode node) {
        Expression expression = node.getExpression();
        if (expression instanceof ValueExpression) {
            return resolveValue(node.getChild(0, Expression.class));
        }
        if (expression instanceof ValueListExpression) {
            return node.getChildren().stream().map(this::resolveValue).collect(Collectors.joining(", "));
        }
        if (expression instanceof StringExpression) {
            return "\"" + resolveString(node) + "\"";
        }
        if (expression instanceof IntegerExpression) {
            return resolveInteger(node);
        }
        throw new IllegalStateException("Unsupported expression while resolving value: " + expression);
    }

    private String resolveString(final ASTNode node) {
        return node.getExpression(StringExpression.class).getString();
    }

    private String resolveInteger(final ASTNode node) {
        return String.valueOf(node.getExpression(IntegerExpression.class).getInteger());
    }

    private String resolveLabel(final ASTNode node) {
        return node.getChildExpression(0, StringExpression.class).getString();
    }

    private String resolveOperand(final ASTNode node) {
        Expression expression = node.getExpression();
        if (expression instanceof OperandExpression) {
            return resolveOperand(node.getChild(0, Expression.class));
        }
        if (expression instanceof RegisterExpression) {
            return resolveRegister(node);
        }
        if (expression instanceof DwordExpression) {
            return "dword " + resolveOperand(node.getChild(0, Expression.class));
        }
        if (expression instanceof IntegerExpression) {
            return resolveInteger(node);
        }
        if (expression instanceof FunctionExpression) {
            return resolveFunction(node);
        }
        if (expression instanceof LabelExpression) {
            return resolveLabel(node);
        }
        if (expression instanceof MemoryAddressExpression) {
            return "[" + resolveOperand(node.getChild(0, Expression.class)) + "]";
        }
        if (expression instanceof IdentifierExpression) {
            return resolveIdentifier(node);
        }
        throw new IllegalStateException("Unsupported expression while resolving operand: " + expression);
    }

    private String resolveRegister(final ASTNode node) {
        return node.getExpression(RegisterExpression.class).getRegister().name().toLowerCase(Locale.ENGLISH);
    }

    private String resolveFunction(final ASTNode node) {
        return node.getChildExpression(0, StringExpression.class).getString();
    }
}


BFCompiler类:

public class BFCompiler {
    public void compile(final Stream<String> targetCode, Path workingDirectory, String programName) throws IOException, InterruptedException {
        Path assemblyFile = workingDirectory.resolve(programName + ".asm");
        Files.deleteIfExists(assemblyFile);
        Files.write(assemblyFile, (Iterable<String>)targetCode::iterator, StandardCharsets.UTF_8, StandardOpenOption.CREATE_NEW);

        ProcessBuilder processBuilder = new ProcessBuilder().directory(workingDirectory.toFile()).redirectErrorStream(true);

        Process nasmProcess = processBuilder.command("nasm", "-f", "win32", assemblyFile.getFileName().toString()).start();
        nasmProcess.waitFor();
        List<String> nasmOutput = ProcessUtils.toInputStream(nasmProcess.getInputStream());
        if (!nasmOutput.isEmpty()) {
            throw new IllegalStateException("Compiling failed when invoking NASM: " + String.join(System.lineSeparator(), nasmOutput));
        }

        Process gccProcess = processBuilder.command("gcc", "-o", programName, programName + ".obj").start();
        gccProcess.waitFor();
        if (!nasmOutput.isEmpty()) {
            throw new IllegalStateException("Compiling failed when invoking GCC: " + String.join(System.lineSeparator(), nasmOutput));
        }
    }
}


最后是ProcessUtils类,它还显示了其他类的一些用法:

public class ProcessUtils {
    private ProcessUtils() {
        throw new UnsupportedOperationException();
    }

    public static List<String> toInputStream(final InputStream inputStream) throws IOException {
        List<String> output = new ArrayList<>();
        try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream, StandardCharsets.UTF_8))) {
            String line;
            while ((line = bufferedReader.readLine()) != null) {
                output.add(line);
            }
        }
        return output;
    }
}


完整的存储库可以在GitHub上找到。

我想回顾一下代码的各个方面,尽管我会提到以下内容其中一些我不太满意,可能会帮助您撰写评论:


Main子类的数量可能不是一个好主意。
Expression编写所有代码考虑到数据结构可能不是一个好主意,使用Stream可能更清楚。
我不喜欢ListASTNode之间有很多耦合。
很多不同阶段的构造函数实际上没有任何参数。
从一种表达式类型尤其是:


Expression到源表达式
从源表达式到中间表达式
从中间表达式到目标表达式
从目标表达式到装配线



我不确定是否应在Stream<Token>
子类中处理解析部分。

我为撕裂此代码的精神影响做好了准备,但至少可以产生正确的结果。

评论

我喜欢这部分-“我已准备好将这段代码撕裂的精神影响力” :)

兄弟,把它发布在Github上。

#1 楼

您的TargetCodeGenerator的generateAST具有嵌入到一个方法中的多个抽象级别。

这是一个很长的方法。

它具有所有这些非常复杂的细节-当我看到诸如

    dataSectionNode.addChild(ASTNode.newWithChildren(new DefineByteExpression(), Arrays.asList(
        ASTNode.newWithChild(new IdentifierExpression(), new ASTNode(new StringExpression("write_mode"))),
        ASTNode.newWithChild(new ValueExpression(), ASTNode.newWithChildren(new ValueListExpression(), Arrays.asList(
            new ASTNode(new StringExpression("w")),
            new ASTNode(new IntegerExpression(0))
        )))
    )));


,我开始略读而不是阅读。这些超级详细的内容是怎么回事?

当然有什么帮助呢?

// Header of the BF.exe program -do not modify-


这是一个标头。

在回头讨论generateAST的复杂性之前,让我们快速看一下该注释-它说“请勿修改”。听众的评论?可能只有你。但是您也是(这样的人)知道为什么会出现警告的人。

您是否考虑过将注释转换为函数?

    ASTNode externNode = new ASTNode(new ExternExpression());
    ...
    rootNode.addChild(externNode);

    ASTNode dataSectionNode = new ASTNode(new DataSectionExpression());
    ...
    rootNode.addChild(dataSectionNode);

    ASTNode bssSectionNode = new ASTNode(new BssSectionExpression());
    ...
    rootNode.addChild(bssSectionNode);

    ASTNode textSectionNode = new ASTNode(new TextSectionExpression());
    textSectionNode.addChild(...);
    textSectionNode.addChild(...);
    textSectionNode.addChild(...);
    textSectionNode.addChild(...);
    textSectionNode.addChild(...;
    textSectionNode.addChild(...);

    // Code Generated from the Intermediate AST
    convertIntermediateToTargetNodes(intermediateAST.getRoot(), "").forEach(textSectionNode::addChild);

    // Bottom of the BF.exe program -do not modify-
    textSectionNode.addChild(...);

    textSectionNode.addChild(...);
    textSectionNode.addChild(...);
    textSectionNode.addChild(...);
    textSectionNode.addChild(...);
    textSectionNode.addChild(...);
    textSectionNode.addChild(...);
    textSectionNode.addChild(...);
    textSectionNode.addChild(...);
    textSectionNode.addChild(...);
    textSectionNode.addChild(...);

    textSectionNode.addChild(...);

    textSectionNode.addChild(...);
    textSectionNode.addChild(new ASTNode(new RetExpression()));
    rootNode.addChild(textSectionNode);


我将添加子位的主体转换为椭圆形。

我认为externNodedataSectionNodebssSectionNode都可能是addHeader方法的一部分。也许有单独的函数来创建externNodedataSectionNode等。

其余的可能进入了createBody方法-老实说,没有页脚,因为标头之后的所有内容都进入textSectionNode。好像是定义calloc和main方法之类的东西。我希望createBody考虑像textSectionNode.addChild(createNodeForCalloc())这样的行。当然,内部也必须有一个主体,因此createBody可能需要中间的AST。

您不希望看到的(这是您现在拥有的)是70行(如果不是更多的话,我只是估计一下)看起来像是由IDE的GUI工具生成的初始化代码的代码。您会不经意间浏览一下这样的代码,因为它是自动生成的。

如此重要的主要原因是因为编译隐藏在所有这些步骤之间。

在所有这些固定值/节点创建之间的是这行

    // Code Generated from the Intermediate AST
    convertIntermediateToTargetNodes(intermediateAST.getRoot(), "").forEach(textSectionNode::addChild);


这不是应该隐藏的行,因为它是实际的代码。

...

也许,当您将代码拆分成许多小函数时,您会找到一种摆脱添加Add和Mov操作之间重复的方法。他们俩似乎都接受两个论点。

    textSectionNode.addChild(ASTNode.newWithChildren(new AddExpression(), Arrays.asList(
        ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.ESP))),
        ASTNode.newWithChild(new OperandExpression(), new ASTNode(new IntegerExpression(8)))
    )));


将其转换为

textSectionNode.addChild(createNewAddNode(new RegisterExpression(Register.ESP), new IntegerExpression(8)));

的原因是,您摆脱了琐事并展示了相关代码。简而言之,

写一些东西可以从表达式创建表达式,而不必自己创建ASTNode。如果您需要更复杂的结构,则可以创建ASTNode,但是在定义程序时ASTNode似乎有些杂讯。

使用创建表达式的简化方式来生成代码,可以重新构造generateAST,这样更容易阅读并查看其功能。

#2 楼

我更改的内容(又名TOC):


将不同的AST彼此分开
尽可能将ASTNode类折叠成枚举
生成ASTNode和AST以生成一个<T extends AstExpression>

以基于枚举的变压器实现IntermediateCodeGenerator


代码可以(也不能)区分intermediateASTsourceASTtargetAST,因为它们是

要缓解这一点,第一步是使这两个不同。因此,我向ASTNodeAST引入了一个通用参数,该通用参数仅限于ASTExpression的子类型。
>

AST节点类和AST检查可以在很大程度上替换为以下枚举:

public class ASTNode<T extends AstExpression> {
    private final T expression;

    private ASTNode<T> parent;

    private final List<ASTNode<T>> children = new LinkedList<>();

    public ASTNode(final T expression) {
        this.expression = expression;
    }
    // ...
}



public enum SourceExpressions implements AstExpression {
    INCREMENT, DECREMENT, INPUT, LOOP, OUTPUT, POINTER_LEFT, POINTER_RIGHT, ROOT;

    @Override
    public boolean isLogicalLeaf() {
        return this != LOOP;
    }
}



public enum IntermediateExpressions implements AstExpression {
    ADD_MULTIPLE_OF_STORED_VALUE, INPUT, LOOP, OUTPUT, POINTER_CHANGE, VALUE_SET, VALUE_STORE, VALUE_CHANGE, VALUE, ROOT;

    @Override
    public boolean isLogicalLeaf() {
        return this != LOOP;
    }
}


这会大大减少您拥有的类的数量,并允许现有的静态分析器在处理各个表达式时推论假设。尤其有用的是“应该处理每个枚举成员”的检查。虽然我们处于以下instanceof检查中:现在,我们将这些表达式替换为命名良好的枚举,我们可以更改instanceof的工作原理:

我们可以从执行上述映射所需的数据中提取从IntermediateCodeGeneratorSourceExpressions的映射过程。

请考虑以下内容:

public enum TargetExpressions implements TargetExpression {
    ADD, BSS_SECTION, BYTE, CALL, DATA_SECTION, DEFINE_BYTE, DEFINE_LABEL, DWORD, EXTERN, FUNCTION, GLOBAL, IDENTIFIER,
    JMP, JNZ, JZ, LABEL, MEMORY_ADDRESS, MOV, MUL, OPERAND, PUSH, REGISTER, RESERVE_DOUBLE, RET, TEST, TEXT_SECTION,
    VALUE, VALUE_LIST, XOR, ROOT, JMP_SHORT;

    @Override
    public boolean isLogicalLeaf() {
        return false;
    }
}


我们需要的仅仅是映射的以下定义:

public static AST<IntermediateExpressions> generateAST(final AST<SourceExpressions> sourceAST) {
    return new AST<>(sourceToIntermediateNode(sourceAST.getRoot()));
}

private static ASTNode<IntermediateExpressions> sourceToIntermediateNode(final ASTNode<SourceExpressions> node) {
    SourceExpressions expression = node.getExpression();
    final Function<ASTNode<SourceExpressions>, ASTNode<IntermediateExpressions>> nodeFunction = transformer.get(expression);
    if (nodeFunction == null) {
        throw new IllegalArgumentException("Node with unknown expression type: " + expression);
    }
    return nodeFunction.apply(node);
}   


不幸的是,目标代码生成并没有。这些更改与中间代码生成一样,从中受益匪浅。我们要做的是,可以添加IntermediateExpressions而不是链接if语句。


我可能还做了其他一些更改,但是对于我自己的一生,我无法弄清自从我起就坐着的4.5kLoC差异中到底发生了什么

如果需要的话,请给我发笔记:)