Difference between revisions of "Size Matters"

From Free Pascal wiki
Jump to navigationJump to search
m (Initial version)
 
m
Line 36: Line 36:
 
** When the moment is reached where extra code doesn't add a dependancy on more LCL code, this quick growth subsides.  
 
** When the moment is reached where extra code doesn't add a dependancy on more LCL code, this quick growth subsides.  
 
** The 1.5MB point above is a rule-of-thumb. It depends very much on your GUI creation style and the amount of different widgets (and their complexity) that you use.
 
** The 1.5MB point above is a rule-of-thumb. It depends very much on your GUI creation style and the amount of different widgets (and their complexity) that you use.
 +
** For lazarus apps quite a percentage of the binary is non-code, mostly strings and tables.
 +
* Linux/FreeBSD simple binaries are bigger than the corresponding GCC ones. This is because they don't use shared libraries (which you can easily see using "ldd ./xx") 
  
 
== Incorrect compiler configuration ==
 
== Incorrect compiler configuration ==
Line 82: Line 84:
 
The reason for this is simple. LD only could "smart" link units that were the size of an entire .o file until fairly recently. This means that for each symbol a separate .o file must be crafted. (and then these tens of thousands of .o files are archived in .a files). This is a time (and linker memory) consuming task, thus it is optional, and is only turned on for release versions, not for snapshots.  [b]Often people having problem swith smartlinking use a snapshot that contains RTL/FCL etc that aren't compiled with smartlinking on.[/b] Only solution is to recompile the source with smartlinking (-CX) on. See buildfaq for more info.
 
The reason for this is simple. LD only could "smart" link units that were the size of an entire .o file until fairly recently. This means that for each symbol a separate .o file must be crafted. (and then these tens of thousands of .o files are archived in .a files). This is a time (and linker memory) consuming task, thus it is optional, and is only turned on for release versions, not for snapshots.  [b]Often people having problem swith smartlinking use a snapshot that contains RTL/FCL etc that aren't compiled with smartlinking on.[/b] Only solution is to recompile the source with smartlinking (-CX) on. See buildfaq for more info.
  
In the future this will be improved when the compiler will emit smartlinking code by default, at least for the main targets. This is made possible by two distinct developments. First, the GNU linker LD now can smartlink more finely grained, second the arrival of the FPC internal linker (in the 2.1.1 branch).
+
In the future this will be improved when the compiler will emit smartlinking code by default, at least for the main targets. This is made possible by two distinct developments. First, the GNU linker LD now can smartlink more finely grained, second the arrival of the FPC internal linker (in the 2.1.1 branch). The smartlinking using LD still has a lot of problems because the exact assembler layout and numerous details with respect to tables must be researched, the internal linker can now smartlink Lazarus (17 seconds for a full smartlink on my Athlon64 3700+ using about 250MB) which is quite good, but is windows only for now.
 +
 
 +
The internal linker also opens the door to more advanced smartlinking, like leaving out unused virtual methods (20% code size on Lazarus examples, 5% on the IDE as first estimation), and being smarter about unused resourcestrings. This is all still in alpha, and above  numbers are probably too optimistic, since lazarus is not working with these optimizations yet.
 +
 
 +
=== Initialization and finalization sections ===
 +
 
 +
If you USES an unit, even when USES'd indirectly via a different unit, then IF the unit contains initialization or finalization sections, that code and its dependancies is always linked in.
 +
 
 +
An unit for which this is important sysutils. As per Delphi compability, sysutils converts runtime errors to exceptions with a textual message. All the strings in sysutils together are a bit bulky. There is nothing that can do about this, except removing a lot of initialisation from sysutils that would make it delphi incompatible. So this is more something for a embedded release, if such a team would ever volunteer.
 +
 
 +
=== static binaries ===
 +
 
 +
One can also make fully static binaries on any OS, incorporating all libraries into the binary. This is usually done to ease deployment, but has as tradeoff huge binaries. Since this is wizard territory I only mention this for the sake of completeness. People that can do this, hopefully know what they are doing.
 +
 
 +
=== Optimization ===
 +
 
 +
Optimization can also shave off a bit of code size. Optimized code is usually tighter. (but only tenths of a percent) Make sure you use  -O3

Revision as of 21:04, 8 April 2006

Introduction

This page is about binary sizes. Through the years there has been a lot of confusion about FPC and Lazarus binary sizes. Before you make any remark on the maillist, please read this FAQ.

The main reason for this faq is that most discussions about this subject tend to get caught up in details to quickly. Also the opinion of people that shout "bloat" at nearly everything clouds the global picture often more than they contribute.

Why are the binaries so big ?

[b]Answer[/b]: They are not supposed to be big.

If you think they are big, then

  • either you didn't configure FPC properly, or
  • have an unrealistic expectancy of the binary
  • You are trying to do something that FPC is not designed to do.

The last one is the least likely of all three. I'm going to treat all cases quickly in the next paragraphs.

Is it bad when binaries are big ?

In general, only the embedded world and people that really distribute daily by modem (e.g. in my last job we did a lot of deployment via remote desktop over ISDN) have to worry about this. And even with a 56k modme you can squeeze a MB through quite quickly. Note that an often cited misconception is that bigger binaries are slower in operation. In general this is not true, exotic last-cycle stuff as code cachelines aside.

While Free Pascal is reasonably usable for embedded or system purposes, the final release engineering and tradeoffs are more oriented at general application building. For really specialistic purposes, people could set up a shadow project, more in the way like e.g. there are specialised versions of certain Linux distro's. Worrying the already overburdenen FPC team with such specialistic needs is not an option, specially since half of the serious embedded users will roll their own anyway.

Rule of thumb, how big should binaries be?

  • anything under 1MB is considered not a problem.
    • Make sure they are properly stripped and smartlinked before measuring, and that ALL libraries are built using smartlinking.
    • DO NOT UPX windows binaries, unless you have extremely good reasons to do so (see below). The size is less important than the memory load a depacked binary poses. And memory is still more expensive than disc.
  • With small apps it is a bit harder. This because RTL size is OS dependant. However 100k standalone binaries that do something can be done, usually even below 50k.
    • On windows 20k GUI using binaries are no problem.
    • Unit Sysutils contains internationalisation, textual errormessages, and exception handling and some other stuff that is always linked in when this unit is used (think 40-100k total).
  • Lazarus apps on Windows are about 500k, but quickly grow to 1.5 MB as more and more of the lazarus widgets are used.
    • This is a more than when recompiling it with a recent delphi, which is the price for cross-platformness and project maintainability.
    • When the moment is reached where extra code doesn't add a dependancy on more LCL code, this quick growth subsides.
    • The 1.5MB point above is a rule-of-thumb. It depends very much on your GUI creation style and the amount of different widgets (and their complexity) that you use.
    • For lazarus apps quite a percentage of the binary is non-code, mostly strings and tables.
  • Linux/FreeBSD simple binaries are bigger than the corresponding GCC ones. This is because they don't use shared libraries (which you can easily see using "ldd ./xx")

Incorrect compiler configuration

I'm not going to go explain every aspect of the compiler configuration in greate lengths, since it is a FAQ, not the manual. This is meant as an overview only. Read manuals and buildfaq thoroughly for more background info.

Generally, there are several reasons why the binary would be bigger than expected. These are, in descending order of likelihood:

  1. The binary still contains debug information.
  2. The binary was not (fully) smartlinked
  3. The binary includes units that have initialisation sections that execute a lot of code.
  4. You link in complete (external) libraries statically.
  5. Optimization is not (entirely) turned on.

In the future, shared linking to a FPC and/or Lazarus runtime library might significantly alter this picture. Of course then you will have to distribute a big DLL with lots of other stuff in it. This is all still some time in the future to quantify what the impact on binary sizes would be.

Debug information

Free Pascal uses GDB as debugger and LD as linker. These work with a system of in-binary debuginfo, be it [i]stabs[/i] or [i]dwarf[/i]. People often see e.g. Lazarus binaries that are 40MB. The correct size should be about 6MB, the rest is debuginfo (and maybe 6 MB from not smartlinking properly)

Stabs debuginfo is quite bulky, but has as advantage that it is relatively independant of the binary format. In time it will be replaced on all but the most legacy platforms by DWARF.

There is often confusion with respect to the debuginfo, which is caused by the internal strip in a lot of win32. Also some versions of the win32 strip binary don't fully strip the debuginfo. So people toggle some (lazarus/IDE or FPC commandline) flag like -Xs and assume it worked, while it didn't.

So, when in doubt, [b]always[/b] try to strip manually, and, on windows, preferably with several different binaries.

In time, when 2.1.1 goes gold, this kind of problems might get less rarer since the internal linker provides a more consistent treatment of these problems. However they may apply to people using non-core targets for quite some time to come.

Smartlinking

The base principle of smartlinking is simple and commonly known: Don't link in what is not used. This of course has a good effect on binary size.

However the compiler is merely a program, and doesn't have a magic crystal ball to see what is used, so the base implementation is more like this

  • The compiler finely divides the code up in so called "sections".
  • Then basically the linker determines what sections are used using the rule "if no label in the section is referenced, and the symbols in the section are not approachable from the outside, then it can be removed".

The last limitation is necessary to e.g. avoid stripping exported functions from shared libraries.

There are some problems with this simplistic view:

  • virtual methods may be implicitely called via their VMTs. The linker doesn't know about VMTs, so they must all be linked in.
  • Tables for resource strings reference every stringconstant, and thus all stringconstants are linked in (one reason for sysutils being big)

Another important sideeffect that is logical, but often forgotten is that this algoritm will link in everything referenced in the initialization and finalization parts of units, even if no functionality from those units are used. So be careful what you USE.

Anyway, most problems using smartlinking stem from the fact that FPC generally requires "compile with smartlinking" on.

The reason for this is simple. LD only could "smart" link units that were the size of an entire .o file until fairly recently. This means that for each symbol a separate .o file must be crafted. (and then these tens of thousands of .o files are archived in .a files). This is a time (and linker memory) consuming task, thus it is optional, and is only turned on for release versions, not for snapshots. [b]Often people having problem swith smartlinking use a snapshot that contains RTL/FCL etc that aren't compiled with smartlinking on.[/b] Only solution is to recompile the source with smartlinking (-CX) on. See buildfaq for more info.

In the future this will be improved when the compiler will emit smartlinking code by default, at least for the main targets. This is made possible by two distinct developments. First, the GNU linker LD now can smartlink more finely grained, second the arrival of the FPC internal linker (in the 2.1.1 branch). The smartlinking using LD still has a lot of problems because the exact assembler layout and numerous details with respect to tables must be researched, the internal linker can now smartlink Lazarus (17 seconds for a full smartlink on my Athlon64 3700+ using about 250MB) which is quite good, but is windows only for now.

The internal linker also opens the door to more advanced smartlinking, like leaving out unused virtual methods (20% code size on Lazarus examples, 5% on the IDE as first estimation), and being smarter about unused resourcestrings. This is all still in alpha, and above numbers are probably too optimistic, since lazarus is not working with these optimizations yet.

Initialization and finalization sections

If you USES an unit, even when USES'd indirectly via a different unit, then IF the unit contains initialization or finalization sections, that code and its dependancies is always linked in.

An unit for which this is important sysutils. As per Delphi compability, sysutils converts runtime errors to exceptions with a textual message. All the strings in sysutils together are a bit bulky. There is nothing that can do about this, except removing a lot of initialisation from sysutils that would make it delphi incompatible. So this is more something for a embedded release, if such a team would ever volunteer.

static binaries

One can also make fully static binaries on any OS, incorporating all libraries into the binary. This is usually done to ease deployment, but has as tradeoff huge binaries. Since this is wizard territory I only mention this for the sake of completeness. People that can do this, hopefully know what they are doing.

Optimization

Optimization can also shave off a bit of code size. Optimized code is usually tighter. (but only tenths of a percent) Make sure you use -O3