Psellos
Life So Short, the Craft So Long to Learn

Convert ARM Assembly Code for Apple’s iOS Assembler

July 10, 2012

Recently I’ve been working on getting OCaml 4.00.0 working on iOS. As I write this, 4.00.0 is the newest version of OCaml, not yet released but available as a beta. I’m treating it as a new project, not trying to re-use any of the patches we’ve been using for OCaml 3.10.2.

The first interesting problem I hit is that Apple’s ARM assembler for iOS (called “as”, the traditional Unix name) is quite different from other ARM assemblers. Although it derives ultimately from the same GNU codebase, it appears the Apple assembler split off many years ago and has followed a separate evolutionary path.

This needs to be solved for an OCaml-on-iOS port, because part of the OCaml runtime is written in assembly code—a file named arm.S. For the work on OCaml 3.10.2, we rewrote arm.S extensively by hand. This time, I decided to write a Python script to convert ARM assembly code from the current GNU format to the format used by Apple’s iOS assembler. This keeps the changes consistent, and it ought to help when arm.S is rewritten in the future.

Note: I wrote a new, improved version of this script, described in Convert Linux ARM Assembly Code for iOS (Update 3).

So now I have a script named arm-as-to-ios that works well enough to convert arm.S to a form that can be assembled for iOS. It’s nothing fancy; currently it just makes the following changes:

  • Replace uses of =value notation by explicit loads from memory. The usual ARM assemblers interpret ldr rM, =value to mean that the value should be loaded into register M immediately (using mov) if possible, and loaded from memory (using ldr with PC-relative addressing) otherwise. The Apple assembler seems not to support this. arm-as-to-ios replaces uses of =value with explicit memory loads, emitting the pool of values into the .text segment at the end of the file.

  • Remove uses of two pseudo-ops, .type and .size. They aren’t supported by Apple’s assembler. This is done by defining null macros for them.

  • Define a macro cbz. The cbz instruction is Thumb-only. This defininition replaces it with a pair of ARM instructions. Note that the macro parameter syntax of Apple’s assembler is different (possibly just more restrictive) than the usual GNU tools.

Another advantage of using a script is that it might be useful to other people who need to port assembly code to iOS. Granted, there probably aren’t a lot of people doing this. But if you are, maybe the script will provide a useful starting point.

You can download the script here:

I have no doubt that I’ll need to update the script as the project progresses. I’ll keep the linked script up to date. If there are large changes I’ll make another post about them.

Here, also, is the current text of the script:

#!/usr/bin/env python
#
# arm-as-to-ios     Modify ARM assembly code for the iOS assembler
#
# Copyright (c) 2012 Psellos   http://psellos.com/
# Licensed under the MIT License:
#     http://www.opensource.org/licenses/mit-license.php
#
# Resources for running OCaml on iOS: http://psellos.com/ocaml/
#
import sys
import re

VERSION = '1.0.0'


def add_macro_defs(instrs):
    # Emit compatibility macros.
    #
    # cbz:    Thumb only; replace with cmp/beq for ARM
    # .type:  Not supported by Apple assembler
    # .size:  Not supported by Apple assembler
    #
    skippable = '$|\.syntax[ \t]'
    i = 0
    for i in range(len(instrs)):
        if not re.match(skippable, instrs[i][1]):
            break
    instrs[i:0] = [
        ('', '', '\n'),
        ('/* Apple compatibility macros */', '', '\n'),
        ('        ', '.macro  cbz', '\n'),
        ('        ', 'cmp     $0, #0', '\n'),
        ('        ', 'beq     $1', '\n'),
        ('        ', '.endm', '\n'),
        ('        ', '.macro  .type', '\n'),
        ('        ', '.endm', '\n'),
        ('        ', '.macro  .size', '\n'),
        ('        ', '.endm', '\n'),
        ('', '', '\n')
    ]
    return instrs


# Prefix for derived symbols
#
g_prefix = 'PL'

# Regular expression for modified ldr lines
#
g_ldre = '(ldr[ \t][^,]*,[ \t]*)=(([^ \t\n@,/]|/(?!\*))*)(.*)'


def explicit_address_loads(instrs):
    # The Gnu assembler allows the following:
    #
    #     ldr rM, =symbol
    #
    # which loads rM with [mov] (immediately) if possible, or creates an
    # entry in memory for the symbol value and loads it PC-relatively
    # with [ldr].
    #
    # The Apple assembler doesn't seem to support this notation.  If the
    # value is a suitable constant, it emits a valid [mov].  Otherwise
    # it seems to emit an invalid [ldr] that always generates an error.
    # (At least I have not been able to make it work).  So, change uses
    # of =symbol to explicit PC-relative loads.
    #
    # This requires a pool containing the addresses to be loaded.  For
    # now, we just keep track of it ourselves and emit it into the text
    # segment at the end of the file.
    syms = {}
    result = []

    def change1((syms, result), (a, b, c)):
        global g_prefix
        global g_ldre
        mo = re.match(g_ldre, b, re.DOTALL)
        if mo:
            if mo.group(2) not in syms:
                syms[mo.group(2)] = len(syms)
            newb = (mo.group(1) + g_prefix + mo.group(2) + mo.group(4))
            result.append((a, newb, c))
        else:
            result.append((a, b, c))
        return (syms, result)

    def pool1(result, s):
        global g_prefix
        result.append(('', g_prefix + s + ':', '\n'))
        result.append(('        ', '.long ' + s, '\n'))
        return result

    reduce(change1, instrs, (syms, result))
    if len(syms) > 0:
        result.append(('', '', '\n'))
        result.append(('/* Pool of addresses loaded into registers */',
                        '', '\n'))
        result.append(('', '', '\n'))
        result.append(('        ', '.text', '\n'))
        result.append(('        ', '.align 2', '\n'))
        reduce(pool1, sorted(syms, key=syms.get), result)
    return result


def read_input():
    # Concatenate all the input files into a string.
    #
    def fnl(s):
        if s == '' or s[-1] == '\n':
            return s
        else:
            return s + '\n'

    if len(sys.argv) < 2:
        return fnl(sys.stdin.read())
    else:
        input = ""
        for f in sys.argv[1:]:
            try:
                fd = open(f)
                input = input + fnl(fd.read())
                fd.close()
            except:
                sys.stderr.write('arm-as-to-ios: cannot open ' + f + '\n')
        return input


def parse_instrs(s):
    # Parse the string into assembly instructions while tolerating C
    # preprocessor lines.  Each instruction is represented as a triple:
    # (space/comments, instruction, end).  The end is either ';' or
    # '\n'.  Instructions can have embedded comments, but they won't get
    # fixed up if they do.  (I've never seen it in real code.)
    #
    def goodmo(mo):
        if mo == None:
            # Should never happen
            sys.stderr.write('arm-as-to-ios: internal parsing error\n')
            sys.exit(1)

    cpp_re = '([ \t]*#([^\n]*\\\\\n)*[^\n]*[^\\\\\n])\n'
    instr_re = (
        '(([ \t]|/\*.*?\*/|@[^\n]*)*)'  # Spaces & comments
        '(([ \t]|/\*.*?\*/|[^;\n])*)'   # "Instruction"
        '([;\n])'                       # End
    )
    instrs = []
    while s != '':
        if re.match('[ \t]*#', s):
            mo = re.match(cpp_re, s)
            goodmo(mo)
            instrs.append((mo.group(1), '', '\n'))
        else:
            mo = re.match(instr_re, s, re.DOTALL)
            goodmo(mo)
            instrs.append((mo.group(1), mo.group(3), mo.group(5)))
        s = s[len(mo.group(0)):]
    return instrs


def main():
    instrs = parse_instrs(read_input())
    instrs = add_macro_defs(instrs)
    instrs = explicit_address_loads(instrs)
    for (a, b, c) in instrs:
        sys.stdout.write(a + b + c)


main()

Copy and paste the lines into a file named arm-as-to-ios (or download it from the above link). Mark it as a script with chmod:

$ chmod +x arm-as-to-ios

To use the script, specify the name of an ARM assembly file. If no files are given, the script processes its standard input. The following shows a successful assembly of arm.S:

$ PLT=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform
$ PLTBIN=$PLT/Developer/usr/bin
$ arm-as-to-ios asmrun/arm.S | cpp > armios.S
$ $PLTBIN/as -arch armv6 -o armios.o armios.S
$ file armios.o
armios.o: Mach-O object arm
$ otool -tv armios.o | head
armios.o:
(__TEXT,__text) section
caml_call_gc:
00000000        e59fc2a0        ldr     ip, [pc, #672]  @ 0x2a8
00000004        e58ce000        str     lr, [ip]
.Lcaml_call_gc:
00000008        e59fc29c        ldr     ip, [pc, #668]  @ 0x2ac
0000000c        e58cd000        str     sp, [ip]
00000010        ed2d0b10        vstmdb  sp!, {d0-d7}
00000014        e92d50ff        push    {r0, r1, r2, r3, r4, r5, r6, r7, ip, lr}

If you have any corrections, improvements, or other comments, leave them below or email me at jeffsco@psellos.com. I’d be very pleased to hear if the script has been helpful to anyone.

Posted by: Jeffrey

Comments

blog comments powered by Disqus