Aaron,
I have no specific solution, just a number of ideas, in the hope that
one may be something you haven't tried already.
In message <cqa6u8$1rac$1@soapbox.cs.bham.ac.uk>, A.Sloman@cs.bham.ac.uk
writes
>This may be related to the problem with blas and lapack libraries
>reported previously.
>
>I am using vision code written in C and invoked from Pop11, developed
>and tested under solaris on a Sun at Sussex University by David Young.
>It works on Solaris but we get strange problems with the dynamic linker
>on linux on a PC.
>
>David reduced the problem to a simple test case.
>
>There are two files foo.c defining a function foo, and a file baz.c
>defining function baz, where foo calls baz.
>
>We do the following to compile the files into shareable libaries, then
>later dynamically link the files into a running pop11 process, then try
>to run foo inside pop11. It should call baz, but instead causes a crash
>with 'baz' undefined.
>
>1. Compile file foo.c to produce shareable foo.so
>
> gcc -o foo.so -fpic -shared foo.c
>
>2. Compile file baz.c to produce shareable baz.so
>
> gcc -o baz.so -fpic -shared baz.c
>
Shouldn't there be a "-ldl" in there, if both are calling symbols from
shared libraries? I'm assuming that foo contains (or should contain) a
"hLib = dlopen("baz.so", RTLD_LAZY);", or similar, to obtain a handle to
the dynamically loaded library, followed by a dlsym() call to the symbol
accessed via the handle.
I think you need to distinguish between shared libraries, i.e.
independent libraries that are loaded by the Linux linker when a program
starts, and dynamically loaded (DL) libraries, i.e. possibly dependent
libraries that are loaded, on demand, just prior to a symbol being
referenced in one of them*.
In your example that works, you have resolved the two libraries into one
shared one, hence the symbol references are all correct.
In your example that fails, you haven't told the compiler that each
library needs to be dynamically loadable, and I suspect that the
run-time linking performed by the LD library is not in place, (e.g.
dlopen(), dlclose(), dlsym(), etc.).
* There is another class of dynamic link-loading, which is performed by
object request brokers, where the loaded object is accessed through a
pre-defined interface, and can be loaded into separate processes,
possibly on another host, with the method call and results return
carried out by a combination of wrapping stub and proxy code via some
form of RPC mechanism.
>3. Dynamically link both .so files into pop11, using external pop11
>loading mechanisms that work on a Sun and seem to work with other
>libraries on linux.
>
>Create Pop11 file testlink.p
>
>
> exload foobaz ['foo.so' 'baz.so']
> foo(1):int;
> endexload;
>
> exacc foo(101) =>
>
Poplog seems to have its own link-loader, based on a trawl through the
specified file, although in extern_symbols.p it does load the LD shared
library to support dynamic linking.
>4. Call the external procedure foo, by running the file:
>
> pop11 testlink.p
>
>System crashes with message:
> relocation error: ./foo.so: undefined symbol: baz
>
>However on Solaris it all works as expected. Is there
>some difference in the compiler/linker conventions?
>
Have you tried using the Gnu compiler and linker on Solaris to verify
that this fails? You may be able to compare the Link Command Language
output from the two systems (Solaris native vs Gnu) to identify
appropriate differences.
There is a note in extern_ptr.p, which may be relevant.
NOTE
In non-SPARC systems, every procedure that calls _call_external must
have all pop registers localised (by having the dummy variable
\<ALL_POP_REGISTERS\> dlocal). This means that (a) _call_external does
not have to localise them itself, and (b) their values are saved in a
proper stack frame where they can be processed by a GC during callback
(hence callback need only localise and set them to pop 0/false, etc).
Because of this (and the fact that ^Q_call_external may use the pop
registers nonlocally), procedures calling _call_external should not
actually use any pop lvars (or at least, not assume their values will
survive the _call_external).
In the "Linux No Motif" version (labelled 15.53, but given the number of
changes to 15.53 over the last couple of years, the version is now
meaningless), there is a specific block for Linux/ELF to set the default
dlopen() flags (to RTLD_LAZY). Provided the DLOPEN_FLAGS is defined, the
remaining mechanisms should work. SunOS/Solaris uses
RTLD_LAZY|RTLD_GLOBAL. This may lead to a behaviour difference.
In Poplog, extern_symbols.p defines the shXXXX functions for managing
shared object libraries. The functions shlib_error, Shlib_open,
Shlib_close and Shlib_findsym do most of the work. On most platforms,
Shlib_open and Shlib_close are wrappers to the dynamically loaded DL
functions dlopen() and dlclose().
This reinforces the suggestion that foo should be calling baz() via
dlopen("baz.so") and dlsym().
>We can avoid the crash by first compiling with the -c option to create
>two .o files, then combine them to one .so file and simply link that
>file into pop11 using exload.
>
>But we would like to have the flexibility to link sharable
>libraries in different combinations.
>
Then I suspect you need to use the dynamic loading mechanisms.
Without changing any code, you could try specifying the linker option -E
(or --export-dynamic) to produce a dynamic symbol table from the
original compile and link. This may match Solaris behaviour.
>I've scanned manuals for gcc and ld on linux and cannot find
>any clue.
>
>Has anyone ever met this sort of problem on linux, i.e. dynamically
>linking two separately compiled libraries fails to make a connection
>between a symbol used in one and defined in the other?
>
>I have heard of people who don't use pop11 having similar 'undefined
>symbol' problems with the blas and lapack mathematical libraries, but
>don't know if the problems were ever resolved.
>
>It is just possible that there is something wrong with how pop11
>externally loads sharable libraries in linux, but that seems unlikely
>given that, for example, it allows all the X11 code to be dynamically
>loaded successfully.
>
If you are using the XFree86 system, then it uses its own dynamic
linker. Your other code is, presumably, using ld.so?
Again, without any code changes, have you tried preloading the second
library by specifying it in $LD_PRELOAD?
>Is there some sort of compile-time switch to gcc which says that when
>foo is invoked it will have to resolve baz in another library also
>dynmically linked.
>
The URL
http://www.dwheeler.com/program-library/Program-Library-HOWTO/dl-librarie
s.html might be a useful reference.
There is a possibility that your linker options are being lost by the
gcc tool-chain. You might need to prefix them in the Gnu equivalent of
-Wl, to ensure they get passed to LD.
I hope this helps,
Regards,
--
Jeff
|