分享

Using C object files in Delphi

 quasiceo 2013-12-27

Using C object files in Delphi

C is a very widely used language, and this has made the worldwide code library for C huge. The code library for Delphi is comparably small, so it would be nice if we could use parts of that huge library directly, without a translation of the entire code in Delphi. Fortunately, Delphi allows you to link compiled C object files. But there is this problem with "unsatisfied externals".

Being a simple but powerful language, C gets most of its functionality from its runtime library. Almost all non-trivial C code needs some of the functions in this library. But Delphi’s runtime doesn’t contain these functions. So simply linking the C object file will make the linker complain about "unsatisfied external declarations". Luckily, C accepts any implementation of such a function, no matter in which code module it is defined. If the linker can find a function with the desired name, it can be used. You can use this to provide missing parts of the runtime yourself, in your Delphi code.

In this article I will demonstrate how to compile and link an object file into a Delphi unit, and provide the missing parts of the C runtime that it needs. For this, I will use the well known public domain regular expression search code that Henry Spencer of the University of Toronto wrote. I only slightly modified it to make it compile with C++Builder. Regular expressions are explained in short in the Delphi help files, and are a way of defining nifty search patterns.

Object files

C normally generates object files, that are to be linked to an executable. On 32 bit Windows, these usually have the file extension ".obj". But these come in different, incompatible formats. Microsoft’s C++ compiler, and some other compatible compilers, generate object files in a slightly modified COFF format. These can’t be used in Delphi. Delphi requires OMF formatted object files. There is no practicable way of converting normal COFF object files to OMF, so you will need the source, and a compiler that generates OMF files.

Note that the COFF2OMF utility which comes with many versions of C++Builder is no help at all for this problem. It is only meant to convert import libraries from one format to the other. Import library files only contain information about the exported functions of a DLL, and can be generated from the DLL directly, using IMPLIB or a similar utility. They contain a very limited subset of what real C or C++ library files contain. COFF2OMF will not convert C or C++ object or real library files (see note below) in the COFF format. You really need source code and a C++Builder compiler to produce OMF object files usable with Delphi.

Update

Thaddy de Koning told me that the COFF2OMF converter from DigitalMars can do a complete conversion. I didn’t try it, but he said it is worth its money.

Update 2

Agner Fog, well known optimization guru, pointed me to his ObjConv tool, wich will convert several types of object files to several others. We are also working on making C++-generated OMF and non-OMF generated object files usable in Delphi.

Update 3

Delphi XE2 and following versions can also link COFF object files, so there is no need to use any of the converters described above. Also take a look at the new section about Win64.

Borland/CodeGear/Embarcadero’s C++ Builder does generate such OMF object files. But not each Delphi user has C++Builder as well. Luckily, Embarcadero still makes the command line compiler that came with Borland C++ Builder 5 freely available. It can be downloaded from the last entry in the list on this page, if you provide some information. If you don’t have it yet, get it now.

There is another limitation to what kind of files you can use. You can only use object files that are compiled as C files, not C++ files. For some reason, the Delphi linker has problems with object files that contain C++. This means that your source files must have the extension ".c" and not ".cpp". But since you can’t use C++ classes directly anyway, that is not a severe limitation.

One note: C often uses library (".lib") files as well. These simply contain multiple object files, and some C compilers come with a librarian program to extract, insert, replace or simply list object files in them. In Delphi, you can’t link .lib files directly. But you can use the TDUMP utility that comes with Delphi and C++Builder to see what is stored in them. The free C++ compiler comes with the TLIB librarian to get at the single object files.

The code

I will not discuss the mechanism or use of regular expressions here. There is enough material available in books and on the Internet. But to exploit them with this code, you first pass a regular expression pattern to a kind of very simple compiler, that turns the textual representation into a version that can easily be interpreted by the search code. The compilation is done by the function regcompile(). To search a string for a regular expression pattern, you pass the compiled pattern and the string to the regexec() function. It will return information about if, and where in the string, it found text matching the pattern.

The complete implementation code for the regular expression search is rather complicated and long, so I will not show that. But the header file is of course important for the Delphi code using the object file. Here it is.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
/***************************************************************************/
/*                                                                         */
/* regexp.h                                                                */
/*                                                                         */
/* Copyright (c) 1986 by Univerisity of Toronto                            */
/*                                                                         */
/* This public domain file was originally written by Henry Spencer for the */
/* University of Toronto and was modified and reformatted by Rudy Velthuis */
/* for use with Borland C++ Builder 5.                                     */
/*                                                                         */
/***************************************************************************/


#ifndef REGEXP_H
#define REGEXP_H

#define RE_OK                   0
#define RE_NOTFOUND             1
#define RE_INVALIDPARAMETER     2
#define RE_EXPRESSIONTOOBIG     3
#define RE_OUTOFMEMORY          4
#define RE_TOOMANYSUBEXPS       5
#define RE_UNMATCHEDPARENS      6
#define RE_INVALIDREPEAT        7
#define RE_NESTEDREPEAT         8
#define RE_INVALIDRANGE         9
#define RE_UNMATCHEDBRACKET     10
#define RE_TRAILINGBACKSLASH    11
#define RE_INTERNAL             20
#define RE_NOPROG               30
#define RE_NOSTRING             31
#define RE_NOMAGIC              32
#define RE_NOMATCH              33
#define RE_NOEND                34
#define RE_INVALIDHANDLE        99

#define NSUBEXP  10

/*
 * The first byte of the regexp internal "program" is actually this magic
 * number; the start node begins in the second byte.
 */
#define	MAGIC	0234

#pragma pack(push, 1)

typedef struct regexp
{
    char *startp[NSUBEXP];
    char *endp[NSUBEXP];
    char regstart;              /* Internal use only. */
    char reganch;               /* Internal use only. */
    char *regmust;              /* Internal use only. */
    int regmlen;                /* Internal use only. */
    char program[1];            /* Internal use only. */
} regexp;

#ifdef __cplusplus
extern "C" {
#endif

extern int regerror;
extern regexp *regcomp(char *exp);
extern int regexec(register regexp* prog, register char *string);
extern int reggeterror(void);
extern void regseterror(int err);

extern void regdump(regexp *exp);

#ifdef __cplusplus
}
#endif

#pragma pack(pop)

#endif // REGEXP_H

The header above defines a few constant values, a structure to pass information between the regular expression code and the caller, and also between the different functions of the code, and the functions that the user can call.

The #define values that start with RE_ are constants that are returned from the functions to indicate success or an error. NSUBEXP is the number of subexpressions a regular expression may have in this implementation. The number called MAGIC is a value that must be present in each compiled regular expression. If it is missing, the structure obviously doesn’t contain a valid compiled regular expression. Note that 0234 is not a decimal value. The leading zero tells the C compiler that this is an octal value. Like hexadecimal uses 16 as number base, and decimal uses 10, octal uses 8. The decimal value is calculated this way:

0234(oct) = 2 * 82 + 3 * 81 + 4 * 80 = 128 + 24 + 4 = 156(dec)

The #pragma pack(push, 1) pushes the current alignment state, and sets it to bytewise alignment. #pragma pack(pop) restores the previous state. This is important, because it makes the structure compatible with Delphi’s packed record.

Compiling the code

If you have C++ Builder, or BDS2006, it is a little easier to compile the code. You create a new project, and add the file "regexp.c" to it via the menu selections "Project", "Add to project", and compile the project. As a result of this, the directory will contain a file "regexp.obj".

If you have the command line compiler, and that is set up correctly, you open a command prompt, go to the directory that contains the file "regexp.c" and enter:

bcc32 -c regexp.c

Perhaps you’ll get a warning about an unused variable, or about conversions losing significant digits, but you can ignore them in this case, since you didn’t write the code anyway. I am using this code myself for years already, without any problems. After compilation, you’ll find the object file "regexp.obj" in the same directory as the source file.

To import the object file in Delphi, you should now copy the object file to the directory with your Delphi source.

Importing the object file

To use the code in the object file, you’ll have to write some declarations. The Delphi linker doesn’t know anything about the parameters of the functions, about the regexp type in the header, and about the values that were defined in the file "regexp.h". It doesn’t know what calling convention was used, either. To do this, you write an import unit.

Here is the interface part of the Delphi unit that is used to import the functions and values from the C object file into Delphi:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
unit RegExpObj;

interface

const
  NSUBEXP = 10;

  // The first byte of the regexp internal "program" is actually this magic
  // number; the start node begins in the second byte.
  MAGIC = 156;

type
  PRegExp = ^_RegExp;
  _RegExp = packed record
    StartP: array[0..NSUBEXP - 1] of PChar;
    EndP: array[0..NSUBEXP - 1] of PChar;
    RegStart: Char;             // Internal use only.
    RegAnch: Char;              // Internal use only.
    RegMust: PChar;             // Internal use only.
    RegMLen: Integer;           // Internal use only.
    Prog: array[0..0] of Char;  // Internal use only.
  end;

function _regcomp(exp: PChar): PRegExp; cdecl;
function _regexec(prog: PRegExp; str: PChar): LongBool; cdecl;
function _reggeterror: Integer; cdecl;
procedure _regseterror(Err: Integer); cdecl;

You’ll notice that all the functions got an underscore in front of them. This is because, for historic reasons, most C compilers still generate C functions with names that start with an underscore. To import them, you’ll have to use the "underscored" names. You could tell the C++Builder compiler to omit the underscores, but I normally don’t do that. The underscores clearly show that we are using C functions. These must be declared with the C calling convention, which is called cdecl in Delphi parlance. Forgetting this can produce bugs that are very hard to trace.

The original code of Henry Spencer didn’t have the reggeterror() and regseterror() functions. I had to introduce them, because you can’t use variables in the object files from the Delphi side directly, and the code requires access to reset the error value to 0, and to get the error value. But you can use Delphi variables from the C object file. Sometimes object files even require external variables to be present. If they don’t exist, you can declare them somewhere in your Delphi code.

Ideally, the implementation part of the unit would look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
implementation

uses
  SysUtils;

{$LINK 'regexp.obj'}

function _regcomp(exp: PChar): PRegExp; cdecl; external;
function _regexec(prog: PRegExp; str: PChar): LongBool; cdecl; external;
function _reggeterror: Integer; cdecl; external;
procedure _regseterror(Err: Integer); cdecl; external;

end.

But if you compile that, the Delphi linker will complain about unsatisfied externals. The Delphi unit will have to provide them. Most runtime functions are simple, and can easily be coded in Delphi. Only functions that take a variable number of arguments, like printf() or scanf(), are impossible to do without resorting to assembler. Perhaps, if you could find the code of printf() or scanf() in the C++ libraries, you could extract the object file and link that file in as well. I have never tried this.

The regular expression code needs the C library functions malloc() to allocate memory, strlen() to calculate the length of a string, strchr() to find a single character in a string, strncmp() to compare two strings, and strcspn() to find the first character from one string in another string.

The first four functions are simple, and can be coded in one line of Delphi code, since Delphi has similar functions as well. But for strcspn() there is no equivalent function in the Delphi runtime library, so it must be coded by hand. Fortunately, I had (admittedly, rather ugly) C code for such a function, and I only had to translate that to Delphi. Otherwise I’d have had to read the specifications really carefully, and try to implement it myself.

The missing part of the implementation section of the unit looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// since this unit provides the code for _malloc, it can use FreeMem to free
// the PRegExp it gets. But normally, a _regfree() would be nice.

function _malloc(Size: Cardinal): Pointer; cdecl;
begin
  GetMem(Result, Size);
end;

function _strlen(const Str: PChar): Cardinal; cdecl;
begin
  Result := StrLen(Str);
end;

function _strcspn(s1, s2: PChar): Cardinal; cdecl;
var
  SrchS2: PChar;
begin
  Result := 0;
  while S1^ <> #0 do
  begin
    SrchS2 := S2;
    while SrchS2^ <> #0 do
    begin
      if S1^ = SrchS2^ then
        Exit;
      Inc(SrchS2);
    end;
    Inc(S1);
    Inc(Result);
  end;
end;

function _strchr(const S: PChar; C: Integer): PChar; cdecl;
begin
  Result := StrScan(S, Chr(C));
end;

function _strncmp(S1, S2: PChar; MaxLen: Cardinal): Integer; cdecl;
begin
  Result := StrLComp(S1, S2, MaxLen);
end;

As you can see, these functions must also be declared cdecl and have a leading underscore. The function names are also case sensitive, so their correct spelling is important.

In my project, I don’t use this code directly. The _RegExp structure contains information that should not be changed from outside, and is a bit awkward to use. So I wrapped it up in a few simple functions, and provided a RegFree function as well, which simply calls FreeMem, since the _malloc() I provided uses GetMem. Ideally, the regular expression code should have provided a regfree() function.

The entire C source code, the code for the import unit and the wrapper unit, as well as a very simple grep program can be found on my Downloads page.

Circular references

If you have to include several object files, it is very well possible that one object file, let’s call it one.obj, references a function in another object file, say, seven.obj. The Delphi compiler is, more or less, one pass, so if one.obj is listed before seven.obj, you will get an unsatisfied externals error. This can be avoided by rearranging your {$LINK} entries, so that seven.obj is linked before one.obj is linked in.

But it is well possible, that seven.obj also references a function or variable in one.obj. Then, rearranging will not work. The solution is to declare the function or functions as external, before the link code:

1
2
3
4
5
function _getLocaleName; external;      // defined in one.obj, used in seven.obj
function _convertCase; external;        // defined in seven.obj, used in one.obj

{$LINK 'one.obj'}
{$LINK 'seven.obj'}

This is a bit like a forward declaration and will tell the compiler that these functions will be defined in one of the linked in object files.

Of course, if you also need to use these functions, you must provide a full declaration, including parameters, return type and calling convention, but otherwise, the simple declaration of the name is enough.

Thanks to David Heffernan for pointing this out, in this StackOverflow answer.

Win64

Under Win64, a few things are different.

Object file format

The object file format recognized for 64 bit Windows (Win64) is 64 bit COFF. This is the format produced by, for instance, the native C and C++ compiler of Microsoft Visual Studio.

It is, unfortunately, not the file format generated by the 64 bit C++Builder (BCC64) compiler, which is a completely new compiler, derived from the clang compiler, which generates object files and libraries in ELF64 format, with the extensions .o and .a respectively.

So you will have to get a version of Visual Studio, for instance Visual Studio Express for Windows Desktop, which can compile C and produce 64 bit COFF files. I have not tried this, so I’ll have to update this article as soon as I have.

David Heffernan suggested that another way to compile C sources in 64 bit COFF format is to use the cl.exe compiler that comes with the Microsoft Platform SDK.

To prevent that the compiler includes a reference to the chkstk() function in your object file, you should use the /Gs- (Stack Checking Calls off) command line option.

Take a look at this StackOverflow answer to find out how to compile or cross-compile to 64 bit using cl.exe.

Calling convention

There is only one calling convention in Win64 (well, actually two, but the other one is very low level), so there is no need to declare one anymore. For code that must compile for Win32 as well as for Win64, you can leave the stdcall, register or cdecl calling conventions in your source. They will be ignored by the 64 bit compiler.

Decoration

On Win64, the (Microsoft) convention is that function names are not decorated with a leading underscore. As far as I know, Delphi also follows this convention. To make code compile under both Win64 and Win32, you can use the same trick Embarcadero uses in their code for the Mac, i.e. something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
const
  {$IFDEF WIN64}
  _PU = '';
  {$ELSE}
  _PU = '_';
  {$ENDIF}
  
  ...

function regcomp(exp: PAnsiChar): PRegExp; cdecl; external '' name _PU + 'regcomp';

Using msvcrt.dll

Some of the following has been superseded by the introduction of the unit System.Win.Crtl.pas in Delphi XE2, which wraps some of the routines exposed by msvcrt.dll. Each of the functions is both declared with and without leading underscore. This means you probably won’t have to use the unit I wrote and describe below.

Instead of writing all these functions yourself, you could also use functions from the Microsoft Visual C++ runtime library. This is a DLL which Windows also uses, and that is why it should be present on all versions of Windows.

FWIW, this is not my idea, I got it as a suggestion from Rob Kennedy in the former Borland newsgroups. It seems the JEDI project also uses this technique in some of their sources.

Using msvcrt.dll, instead of the code above, you could simply declare most of the routines external. Be sure to use the name clause in the external declaration, because these routines do not have an underscore in the DLL:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
// Note that you don't want to use the C memory manager,
// so you must still rewrite routines like _malloc() in Delphi.

function _malloc(Size: Cardinal): Pointer; cdecl;
begin
  GetMem(Result, Size);
end;

// The rest can be imported from msvcrt.dll directly.

function _strlen(const Str: PChar): Cardinal; cdecl;
  external 'msvcrt.dll' name 'strlen';

function _strcspn(s1, s2: PChar): Cardinal; cdecl;
  external 'msvcrt.dll' name 'strcspn';

function _strchr(const S: PChar; C: Integer): PChar; cdecl;
  external 'msvcrt.dll' name 'strchr';

function _strncmp(S1, S2: PChar; MaxLen: Cardinal): Integer;
  cdecl; external 'msvcrt.dll' name 'strncmp';

This will even work for complicated routines like sprintf() or scanf(). Where C requires a file handle, you simply declare a pointer. The effect is the same. Examples:

1
2
3
4
5
function _sprintf(S: PChar; const Format: PChar): Integer;
  cdecl; varargs; external 'msvcrt.dll' name 'sprintf';

function _fscanf(Stream: Pointer; const Format: PChar): Integer;
  cdecl; varargs; external 'msvcrt.dll' name 'fscanf';

I put a slightly tested version of an interface unit for msvcrt.dll on my Downloads page. I will change this page as soon as it is well tested.

Problems

In the meantime, I have encountered a few problems which might be interesting for the reader. For the conversion, I wrote a simple test program in C, which uses many of the routines imported from msvcrt.dll. But it turned out that some of the routines are not routines at all. They are implemented as macros, which directly access structures, and these don’t always exist or are accessible in this mix of BCB C, Delphi and Microsoft C.

getchar() and putchar()

For instance, take the getchar() routine. In stdio.h, this is declared as a macro, which accesses stdin->level, and stdin is again a macro for &_streams[0]. If this level variable is positive, the routine uses a character from the buffer, otherwise it will use _fgetc() (IOW, __fgetc() on the Delphi side). So no matter how you declare your own routine, it will simply not get called.

This meant I had to declare __streams, and initialize the level fields to something negative. The problem is that the msvcrt.dll routines will have their own versions of similar structs (there is no guarantee that the FILE struct is the same, there), and these do not set or read from the BCB variable _streams. So I wrote my own Delphi version of __fgetc() which checks if the stream parameter passed is the same as @__streams[0], indicating it is called with stdin, the standard input stream. If it is, this means it is called as _fgetch(stdin); which is what the getchar() macro amounts to. If this is the case, it calls Delphi’s Read, otherwise it uses the _fgetc() routine in msvcrt.dll.

I hope that this will do, but I’d like to know it if it doesn’t. Please email me about any problems you encounter.

FWIW, I am aware of the fact that one of the routines (fwrite(), I think) stops at an int 3 breakpoint in ntdll.DbgBreakPoint, if it is run in the debugger. If you do an F9 or Run, the program will continue.

But the putchar() routine is a macro too, and this could increment the level again. So there may be similar problems with that routine. I did not encounter any, yet. But the changes I made for getchar() mean that probably ungetc() might not work properly (AFAICS, this is a macro too). If necessary, I might have to emulate the entire system in Delphi. Just because C uses a few macros.

This goes to show that it is not always a case of simply redirecting calls to something like msvcrt.dll after all. Macros ruin this idea.

FWIW, macros are evil evil evil.

fgetpos() and fsetpos()

It seems that in msvcrt.dll these two routines store additional data in the pos parameter. In BCB, pos is a simple long integer. Using _fgetpos() with the declaration of fpos_t in BCB’s stdio.h caused an access violation. So I wrote my own versions of these routines, using _fseek() and _ftell().

In theory, fpos_t is an opaque type, and both routines only use pointers to one. But in BCB it is declared as a long, so it won’t be larger than 4 bytes, and that is what is allocated. So if msvcrt.dll tries to store more than that into it, some kind of data will be overwritten, and your program will not work properly, or cause an access violation. That is the risk of using code that is written with a different compiler.

Conclusion

Provided you have a little knowledge of C, and are not afraid to write a replacement for a few missing C runtime library functions yourself (if you use msvcrtl.dll, that number will be very limited), linking C object files to a Delphi unit is easy. It allows you to create a program that does not need a DLL, and can be deployed in one piece.

If you need help with using the free C++ Builder command line compiler (compiler version 5.5), you will find excellent help in the Borland newsgroups or forums, such as:

Or you can use the equivalent NNTP newsgroups and a newsreader. I wish you a nice time experimenting.

Rudy Velthuis

Standard Disclaimer for External Links

These links are being provided as a convenience and for informational purposes only; they do not constitute an endorsement or an approval of any of the products, services or opinions of the corporation or organization or individual. I bear no responsibility for the accuracy, legality or content of the external site or for that of subsequent links. Contact the external site for answers to questions regarding its content.

Disclaimer and Copyright

The coding examples presented here are for illustration purposes only. The author takes no responsibility for end-user use. All content herein is copyrighted by Rudy Velthuis, and may not be reproduced in any form without the author's permission. Source code written by Rudy Velthuis presented as download is subject to the license in the files.

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多