Psellos
Life So Short, the Craft So Long to Learn

Undefined caml_atom_table

November 3, 2014

This week I spent some time tracking down a problem when linking an OCaml application for iOS. It turns out that the same problem shows up when linking for OS X. I can imagine that somebody else might see the problem someday and wonder what’s going on, so I thought I’d write it up.

The problem shows up when you have a C program that calls out to an OCaml function. I made a tiny example; the C code looks like this (main.c):

#include <stdio.h>
#include "caml/mlvalues.h"
#include "caml/callback.h"

int main(int ac, char *av[])
{
    value *fact_closure = caml_named_value("fact");
    value result = caml_callback(*fact_closure, Val_int(atoi(av[1])));
    printf("%ld\n", Long_val(result));
    exit(0);
}

The OCaml code looks like this (fact.ml):

let rec fact n = if n < 2 then 1 else n * fact (n - 1)

let () = Callback.register "fact" fact

If you compile and link this for OS X, you see the following:

$ uname -rs
Darwin 13.3.0
$ ocamlopt -output-obj -o factobj.o fact.ml
$ cc -I /usr/local/lib/ocaml -o cfact main.c factobj.o -L /usr/local/lib/ocaml -lasmrun
Undefined symbols for architecture x86_64:
  "_caml_atom_table", referenced from:
      _caml_alloc in libasmrun.a(alloc.o)
      _caml_alloc_array in libasmrun.a(alloc.o)
      _caml_alloc_dummy in libasmrun.a(alloc.o)
      _caml_alloc_dummy_float in libasmrun.a(alloc.o)
      _intern_alloc in libasmrun.a(intern.o)
      _intern_rec in libasmrun.a(intern.o)
  "_caml_code_area_end", referenced from:
      _segv_handler in libasmrun.a(signals_asm.o)
  "_caml_code_area_start", referenced from:
      _segv_handler in libasmrun.a(signals_asm.o)

This struck me as strange, as these are clearly symbols over which I have no control. Furthermore, if you look in libasmrun.a, the symbols are in fact defined:

$ nm /usr/local/lib/ocaml/libasmrun.a | egrep 'atom_table|code_area'
0000000000000800 C _caml_atom_table
0000000000000008 C _caml_code_area_end
0000000000000008 C _caml_code_area_start
                 U _caml_code_area_end
                 U _caml_code_area_start
                 U _caml_atom_table
                 U _caml_atom_table
                 U _caml_atom_table
                 U _caml_atom_table

The C next to their names shows that the symbols are defined. The other appearances with U are the unsatisfied references that the linker is complaining about.

One interesting thing, though, is that these are so-called “common” symbols. That is, they represent uninitialized (zero-filled) values that will be added to an executable only if there are no other definitions that provide an initial value. The technical name for this in C is a “tentative definition.” (The justified, ancient name “common” comes, I believe, from Fortran of 1958, may it rest in peace.)

To make a long story short, what I found out through web searching and testing is that Apple decided to change the semantics of common symbols appearing in libraries. In particular, Apple’s archiver ar doesn’t list common symbols in the table of contents (TOC) for an archive like libasmrun.a. So, although the symbols are defined in individual modules, they don’t appear in the TOC, which is where the linker actually looks. This means that the symbols will not be found by the linker unless the module is included for other reasons.

This is a pretty big change from age-old Unix semantics, and if you search you can find a fair number of developers confused by the behavior. There’s a little more detail on this page at Stack Overflow:

OS X linker unable to find symbols from a C file which only contains variables

What’s suspicious, however, is that I’ve never seen this problem before when building OCaml apps for OS X or iOS. So why do I see it in this code?

The answer is that the code is wrong! When setting up a main program in C that calls out to OCaml, you’re supposed to call caml_main() before things get rolling in your program. The main function is actually supposed to look like this:

int main(int ac, char *av[])
{
    caml_main(av);
    value *fact_closure = caml_named_value("fact");
    value result = caml_callback(*fact_closure, Val_int(atoi(av[1])));
    printf("%ld\n", Long_val(result));
    exit(0);
}

If you make this change, everything works totally great:

$ ocamlopt -output-obj -o factobj.o fact.ml
$ cc -I /usr/local/lib/ocaml -o cfact main.c factobj.o -L /usr/local/lib/ocaml -lasmrun
$ cfact 20
2432902008176640000

In summary, although I have some mild reservations about this Apple change to ar, when building OCaml apps it’s actually helpful, as it indirectly detects the failure to call caml_main().

To see what I mean, compile and link the original (incorrect) example under a system with more traditional Unix semantics. On a cloudy 64-bit Linux system, for example, it looks like this:

$ uname -rs
Linux 3.2.20-1.29.6.amzn1.x86_64
$ ocamlopt -output-obj -o factobj.o fact.ml
$ cc -I /usr/lib64/ocaml -o cfact main.c factobj.o -L /usr/lib64/ocaml -lasmrun -lm -ldl

You notice that there are no problems in the link step. If you try to run the program, however, you see this:

$ cfact 20
Segmentation fault

The program fails because the registration of fact hasn’t taken place.

I hope this may help some other lonely OCaml developer who sees an undefined atom table. If you have any comments, leave them below or email me at jeffsco@psellos.com.

Posted by: Jeffrey

Comments

blog comments powered by Disqus