OCamlXSim 3.1 for Mountain Lion

October 23, 2012

For those interested in building iOS Simulator apps in OCaml 4, I’ve just revamped OCamlXSim 3.1 for the latest OS X release, OS X 10.8 (Mountain Lion). The only difference is in the default iOS SDK, which I changed from iOS 5.1 to iOS 6.0. Otherwise, this was just a recompile.

You can get binary releases of OCamlXSim here:

For information on how to build from sources and how to test an installation, see the updated version of Compile OCaml for iOS Simulator.

If you’re new to this site, you might also be interested in OCamlXARM, a modified version of OCaml 4.00.0 that builds iOS apps. I also revamped it recently to work under Mountain Lion. You can read about it on Compile OCaml for iOS

OCaml Cross Compilation Build Howto

OCamlXSim and OCamlXARM are both cross compilers, and they’re built using exactly the same approach. I think the strategy could be useful for building other OCaml cross compilers, so I thought I’d explain how the build process works in some detail. I’m not claiming that the method is original; however, I did develop it independently and it works for my host and targets.

Since the stock version of OCaml doesn’t want to be a cross compiler, the overall goal is to beguile it into being one without disrupting the build process too much. To keep things simple for now, I build a bytecode cross compiler that generates native code for the target; i.e., a cross-compiling version of ocamlopt. The approach requires that OCaml already supports the host system with at least a bytecode implementation, and the target system with a native code implementation.

Building the equivalent “optimized” cross compiler (ocamlopt.opt) doesn’t seem too much harder, given a native OCaml compiler for the host system. I’d like to get this working at some point.

Compiler Source Changes

This note just describes the commands I use to build the cross compilers. It doesn’t describe the changes to the compiler source itself. These will vary a lot depending on the target and the differences between the host and the target.

There are no source changes for OCamlXSIM when building a 32-bit OS X host executable, because the host and target have virtually identical properties. Even for a 64-bit OS X executable, the changes are minimal, because the host and target are quite similar. There is one change in asmrun/signals_osdep.h, which must be modified to include the proper signal handling code in a cross-compiling environment (when the host and the target architectures are different). Another change in the code generator makes sure that emitted native int values don’t exceed 32 bits.

The compiler source changes for OCamlXARM are much more extensive, because the iOS target isn’t directly supported in the stock OCaml release. The same signal-handling change was required, and many (reasonably straightforward) changes were required in the emission of assembly code to allow for the particular syntax of the iOS assembler.

In cases where the host and target machines are very different, it may be necessary to make significant changes to the architecture-dependent code that emits instructions and data.

If you’re interested in the exact compiler changes for OCamlXSim or OCamlXARM, see their associated pages (linked above) for a description of how to retrieve the patches.

Ordinary OCaml Build

As a starting point for the build process, consider the ordinary OCaml build process:

$ ./configure
$ make world
$ make opt

The configure step does many things:

Guess the CPU type and operating system of the host.
Find a C compiler and associated assembler and linker.
Determine properties of the machine (integer sizes, endianness).
Determine properties of the system (available system calls and libraries).

Since OCaml sees itself as a native compiler, all these configuration properties are assumed to apply both to the compiler itself and to the programs it generates. This isn’t the case for a cross compiler, and the key undertaking is to separate the two.

The make world step builds the bytecode compiler (ocamlc) and bytecode runtime. The bytecode runtime consists of a native-code program named ocamlrun and a set of dynamically loadable executables for extra libraries. ocamlrun, in turn, consists of a bytecode interpreter and native-code primitives. Each dynamic library contains bytecode plus extra native-code primitives.

The make opt step builds the native code compiler (ocamlopt) and a native runtime. The native runtime consists of a set of native libraries, very similar to the bytecode runtime minus the interpreter.

When you do an ordinary compile of an OCaml program with ocamlopt, ocamlopt itself uses the bytecode runtime created in the make world step. The compiled program links against the native runtime created in the make opt step.

Cross Compiling Requirements

To get a cross compiler using the same build system requires a reconsideration of the configuration properties:

The CPU type is used to select the correct native code generator. So the CPU type of the host isn’t so interesting. We want to specify the CPU type of the target.
The C compiler and linker are needed for building the bytecode runtime for the host. However, we also want a target toolchain C compiler, assembler, and linker to be used for generated programs.
Similarly, the machine and system properties are correct for building the bytecode runtime on the host. But we want the target machine and system properties for building the runtime to be used by generated programs.

This suggests a two-phase build process:

Phase 1: run configure as usual to determine the properties of the host system. Post-modify the configuration properties just enough to create a native-code cross compiler for the target. Then build the native-code compiler as usual. This native-code compiler runs on the bytecode interpreter (ocamlrun) of the host, and generates native code for the target.
Phase 2: run configure on the target system to determine the properties of the target system. Then rebuild just the runtime on the host using the target toolchain and these properties of the target system. The resulting runtime works for the compiled programs.

If the target system is insufficiently Unix-like to run the configure script, it will be necessary to determine the configuration parameters by some other method.

This is how both OCamlXARM and OCamlXSim are built. For people really interested in the details, the following sections show the build process for OCamlXSim 3.1.7. You’ll find the code in an OS X shell script named xsim-build.

Phase 1

The configuration step of Phase 1 looks essentially like this:

export PLT=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform
export SDK=/Developer/SDKs/iPhoneSimulator6.0.sdk

config1 () {
    # Configure for building bytecode interpreter to run on Intel OS X.
    # But specify iOSSim parameters for assembly and partial link.
    ./configure \
            -cc "gcc" \
            -as "$PLT/Developer/usr/bin/gcc -arch i386 -c" \
            -aspp "$PLT/Developer/usr/bin/gcc -arch i386 -c"
    # Post-modify config/Makefile to select i386 back end for ocamlopt
    # (i386 assembly code).
    sed \
        -e 's/^ARCH[    ]*=.*/ARCH=i386/' \
        -e 's/^MODEL[    ]*=.*/MODEL=default/' \
        -e "s#^PARTIALLD[    ]*=.*#PARTIALLD=$PLT/Developer/usr/bin/ld -r#" \
        config/Makefile
    # Post-modify utils/config.ml.
    make utils/config.ml
    sed \
        -e 's#let[      ][      ]*mkexe[        ]*=.*#let mkexe ="'"$PLT/Developer/usr/bin/gcc -arch i386 -Wl,-objc_abi_version,2 -Wl,-no_pie -gdwarf-2 -isysroot $PLT$SDK"'"#' \
        -e 's#let[      ][      ]*bytecomp_c_compiler[  ]*=.*#let bytecomp_c_compiler ="'"$PLT/Developer/usr/bin/gcc -arch i386 -gdwarf-2 -isysroot $PLT$SDK"'"#' \
        -e 's#let[      ][      ]*native_c_compiler[    ]*=.*#let native_c_compiler ="'"$PLT/Developer/usr/bin/gcc -arch i386 -gdwarf-2 -isysroot $PLT$SDK"'"#' \
        utils/config.ml
}

The configure step itself specifies the C compiler of the host (gcc), which is needed to build the bytecode runtime. The assembler, however, isn’t needed in this phase. So the configure step can specify the target tools for the two types of assembly—in both cases, it specifies the gcc of the target toolchain. This means that the generated cross compiler will run the proper tools when it assembles its generated native code.

After generating configuration information for the host, the script then post-modifies it to become a cross compiler. Most importantly, it modifies config/Makefile to set its ARCH variable to the target architecture. As mentioned above, this is the key step that attaches the target code generator to the host compiler. The other changes specify a more particular model of CPU (not really used for OCamlXSim) and the target tool chain command for doing partial linking.

Note that for OCamlXSim, the target architecture is i386. The iOS Simulator is a 32-bit Intel hardware environment with libraries that recreate the software environment of iOS devices. In the build script for OCamlXARM, the target architecture is armv7.

This leaves the question of how the cross compiler should compile any C programs that are given on its command line, and how it should link the results into an OCaml executable. These commands are inserted at an even deeper level, to avoid interfering with the compilation and linking of the cross compiler runtime. The second set of modifications works by generating utils/config.ml and modifying its commands to be those of the target toolchain.

The build step of Phase 1 looks like this:

build1 () {
    # Don't assemble asmrun/i386.S for Phase 1 build.  Modify
    # asmrun/Makefile temporarily to disable.  Be really sure to put
    # back for Phase 2.
    trap 'mv -f asmrun/Makefile.aside asmrun/Makefile' EXIT
    grep -q '^[         ]*ASMOBJS[      ]*=' asmrun/Makefile && \
        mv -f asmrun/Makefile asmrun/Makefile.aside
    sed -e '/^[        ]*ASMOBJS[      ]*=/s/^/#/' \
        asmrun/Makefile.aside > asmrun/Makefile
    make world && make opt
    mv -f asmrun/Makefile.aside asmrun/Makefile
    trap - EXIT
    # Save the Phase 1 shared (dynamically loadable) libraries and
    # restore them after Phase 2.  They're required by some OCaml
    # utilities, such as camlp4.
    #
    find . -name '*.so' -exec mv {} {}phase1 \;
}

This step basically just runs make world and make opt as usual. However, it turns out to be necessary to make some tricky changes before and after.

First, the assembled output of asmrun/i386.S won’t be compatible with the rest of the bytecode runtime. So we remove it from the build rule of asmrun/Makefile, and restore it later. This works because this file is needed only for native executables, and we’re producing only bytecode executables at this point.

Second, the dynamically loadable libraries of the bytecode runtime will be overwritten during Phase 2. These libraries are needed by the bytecode executables. So we move them aside temporarily, and restore them at the end of Phase 2.

Phase 2

For Phase 2, we’d like to run configure on our target system. This can be tricky in general, but for OCamlXSim it’s relatively easy. The iOS Simulator actually runs as a separate software environment on OS X, our host system. It’s possible to generate and run code in this environment by specifying the proper command-line options.

If you aren’t so lucky, the requirement is to generate three files: config/s.h, config/m.h, and config/Makefile. A possible plan is to generate these by running configure on a Unix-like system that’s as similar as possible to your target, then make any other modifications by hand.

The configuration step of Phase 2 looks essentially like this:

config2 () {
    # Clean out OS X runtime
    cd asmrun; make clean; cd ..
    cd stdlib; make clean; cd ..
    cd otherlibs/bigarray; make clean; cd ../..
    cd otherlibs/dynlink; make clean; cd ../..
    cd otherlibs/num; make clean; cd ../..
    cd otherlibs/str; make clean; cd ../..
    cd otherlibs/systhreads; make clean; cd ../..
    cd otherlibs/threads; make clean; cd ../..
    cd otherlibs/unix; make clean; cd ../..
    # Reconfigure for iOSSim environment
    ./configure \
            -host i386-apple-darwin10.0.0d3 \
            -cc "$PLT/Developer/usr/bin/gcc -arch i386 -gdwarf-2 -isysroot $PLT$SDK" \
            -as "$PLT/Developer/usr/bin/gcc -arch i386 -c" \
            -aspp "$PLT/Developer/usr/bin/gcc -arch i386 -c"
    # Rebuild ocamlmklib, so libraries work with iOSSim.
    rm myocamlbuild_config.ml
    cd tools
    make ocamlmklib
    cd ..
}

The purpose of Phase 2 is to build a runtime for the target. So we start by clearing out the old runtime for the host. Now that we’ve built the cross compiler, it won’t be needed.

Next, we rerun configure, specifying the C compiler and assembler of the target toolchain (in our case, the iOS Simulator). We also specify a specific -host, so that configure doesn’t attempt to guess the CPU and operating system.

Then we rebuild ocamlmklib so it works with the target toolchain rather than the host toolchain.

The build step of Phase 2 looks like this:

build2 () {
    # Make iOSSim runtime
    cd asmrun; make all; cd ..
    cd stdlib; make all allopt; cd ..
    cd otherlibs/unix; make all allopt; cd ../..
    cd otherlibs/str; make all allopt; cd ../..
    cd otherlibs/num; make all allopt; cd ../..
    cd otherlibs/dynlink; make all allopt; cd ../..
    cd otherlibs/bigarray; make all allopt; cd ../..
    cd otherlibs/systhreads; make all allopt; cd ../..
    cd otherlibs/threads; make all allopt; cd ../..
    # Restore the saved Phase 1 .so files (see above).
    find . -name '*.sophase1' -print | \
        while read f; do \
            fso="$(expr "$f" : '\(.*\)sophase1$')so"; mv -f $f $fso; \
        done
}

These commands rebuild the runtime using the new toolchain, then restore the dynamically loaded libraries of the host runtime that were saved at the end of Phase 1. These libraries are used by some of the compiling tools—notably, the camlp4 family uses the Unix library.

Serendipitously, the resulting executables and objects look just like those of a traditional OCaml release. So they can be installed using the unmodified install rule of the top-level Makefile. It works out this way because there are two distinct parts: the bytecode subsystem (which works on the host), and the native-code subsystem (which works on the target). Things don’t have to be separated this way, but it’s convenient for now.

If you have comments or questions, please leave them below, or email me at jeffsco@psellos.com.

Posted by: Jeffrey