FPC JVM/Language

From Free Pascal wiki
Jump to navigationJump to search

English (en) русский (ru)

General information

This is a compiler-only port. That means that except for the system unit and a unit that imports the JDK classes, no other standard RTL or other units are available. Furthermore, even the system unit is quite limited in terms of the functionality that it provides (details are below).

Over time, it is possible to implement most of the standard RTL and other unit functionality in a way that it can be compiled using the JVM-targeted compiler. That is however not the goal of this initial port and bug reports about missing unit-level functionality are not useful at this point (but patches are of course welcome! :).

Compiling code for the JVM target is currently quite slow. The reason is that the used Java assembler, Jasmin, is very slow. The reason is probably not so much that it is written in Java, but simply that it is quite a slow program.

The minimum JDK version required for running programs generated by the FPC JVM port is 1.5.

Used terminology and conventions

  • on the Pascal side/on the Java side: refers to code written in resp. Pascal or Java. A number of features are only available when programming in Pascal, because the Java language has no similar concept.
  • implicit pointer types: a number of types that are not pointers in Pascal, are implemented on top of Java classes or arrays. This means that they are implicitly pointers to the actual data, since a class or array is also just a pointer in Java. While possibly counterintuitive, this means that variables of such types can actually be made to behave more like their counterparts on native targets than other types.
  • fully qualified class names: the compiler currently does not support namespaces (such as org.freepascal or java.lang) as a syntactic element, nor does it support dotted unit names. This means that it is not possible to use identifiers such as java.lang.Object in the source code. All Pascal headers generated for imported classes abbreviate class names by taking the first letter of each part of the package name, followed by the full class name (e.g., java.lang.Object becomes JLObject). For classes or other types declared in Pascal source code, the regular unit.identifier syntax works. Nevertheless, all identifiers used below will appear using their fully qualified Java name, since the Pascal name can be derived from that while the opposite is not the case.

Base platform behaviour

Internally the Java/JVM target is treated by the compiler as a 32 bit target. This only means that arithmetic expressions are by default evaluated using 32 bit arithmetic, rather than by 64 bit arithmetic, just like for native 32 bit targets. The generated code will still run perfectly on 64 bit JVMs, and the compiler will also use the built-in 64 bit arithmetic opcodes of the JVM when performing 64 bit computations.

Reasons for this choice include:

  • 32 bit arithmetic can be expressed more efficiently in Java bytecode than 64 bit arithmetic
  • array indices are always 32 bit in the JVM

The extended floating point type maps to double, just like it does on other targets that do not support native 80 bits floating point support.

Language feature support

While the language supported by the JVM port is as close as possible to the one supported for native FPC targets, there are some differences due to the nature of the JVM platform. Any language feature not mentioned below can be expected to behave the same as it does on native FPC targets (except for accidental omissions). This behaviour is only guaranteed at the purely semantic level, and not in any way in terms of absolute or relative performance.

General language features information

Unsupported language features

  • Turbo Pascal-style objects. Support may be added in the future, since they are very similar to records.
  • Bitpacking, or indeed any other kind of feature that influences the data layout ({$packset xxx}, {$packrecords xxx}, {$packenum xxx}, ...).
  • Class helpers. Possible to implement, but non-trivial. On the Java side, they would at best only be usable by explicitly calling the methods of these class helpers, since the Java language does not support automatically redirecting method calls from one class to another.
  • Variants. Could be implemented, although using them in Java code would not be very convenient (Java does not support operator overloading).
  • Delphi-style RTTI. Could probably be emulated.
  • Resourcestring. Unknown how difficult/easy this would be to implement.
  • Nested procedure variables. While both nested procedures and procedure variables are supported via emulation, the combination is not yet supported. This may be added in the future.
  • Non-local goto. This will probably never be implemented (it may be theoretically possible via a combination of exceptions and intraprocedural gotos).
  • Inline assember. There is no support (yet?) for inline assembler, aka Java byte code.

Partially supported language features

  • System unit functionality. At this time, the system unit is extremely limited and misses support routines for several standard language features (including any kind of input/output, resource handling, and variants). Most such features can still be implemented in the future. The currently supported system routines are randomize/random, copy (on arrays and strings), halt, lo/hi, abs, sqr, odd, endian swapping routines, ror*/rol*/sar*/bsf*/bsr*, upcase/lowercase, runerror, insert, pos, delete, val, str and most math routines (cos, sin, etc).
  • Pointers: it is possible to declare pointer types. It is however only possible to take the address of var/out/constref-parameters and of #Implicit_pointer_types. Pointer arithmetic is not supported. Indexing pointers to non-implicit pointer types as arrays is however supported when {$pointermath on} is activated. FIXME: currently it's always enabled.
  • Variant records: the variant parts of variant records do not overlap in memory, and hence cannot be used to map the same data in different ways. As a result, they also do not save any memory unlike on native targets.
  • Call-by-reference parameters (var/out/constref): the JVM does not support call-by-reference, nor taking the address of a local variable. For #Implicit_pointer_types, this is no problem since there the compiler always has a pointer to the actual data available. For other types, call-by-reference is emulated by the compiler via copy-in/copy-out, which means that changes are not immediately visible outside the called routine. The steps followed by the compiler are:
    • construct an array of one element
    • store the parameter value in the array (in case of var/constref)
    • pass the array to the called routine
    • the routine can change the value in the array
    • on return, the potentially changed value is copied out of the array back into the original variable (in case of var/out)
  • Untyped const/var/out parameters. These are supported in the same way as they are in Delphi.NET, see http://hallvards.blogspot.com/2007/10/dn4dp24-net-vs-win32-untyped-parameters.html (with the same limitations as regular var/out parameters on the JVM target)
  • Include files. While include files will work fine at compile time, the Java class file format does not support referring to more than one source file. As a result, the compiler will only insert debugging line information for code in the main unit file. This limitation may be resolved in the future through the use of SMAP files as described in http://jcp.org/aboutJava/communityprocess/final/jsr045/index.html
  • Resources. Currently, files specified in {$r xxx} directives will be copied without any further processing into a jar file with the same name as the program, under the directory org/freepascal/rawresources. If you add this jar file to the Java class path when executing the program, you can load the resource files using the JDK's built-in resource file helpers (see java.lang.Class.getResource() and java.lang.Class.getResourceAsStream()). Delphi-style resource support may be (partially) added in the future.
  • Unit initialization code. If a unit is used from an FPC-compiled program, then the unit initialization code will be run on startup. If the main program is a Java program, then the unit initialization code will only run when the first method is entered that accesses a global constant or variable or calls a global procedure/function from that unit. Note: using classes, types or class variables from a unit does not by itself trigger executing the unit's initialzation code. If you wish to manually trigger this, you can do so by adding a statement such as Object dummy = new unitName(); to your Java code.

New language features

  • {$namespace x.y.z} directive. While dotted unit names are not supported, a (global) namespace directive can be used to tell the compiler to put all definitions in the current unit under the specified Java package name.
  • Formal class definitions. These do not exist in Java, but can be required to solve problems with circular references amongst Java classes. They are the same as their Objective-C equivalents, except that they use the class rather than objcclass keyword.
  • External class definitions. These are used to import the JDK units. Again, they are very similar to their Objective-C counterparts, except that additionally a package can be specified (notice that an explicit package can only be specified for external classes; non-external class definitions always use the one from the {$namespace xxx} directive):
type
JavaClassName = class [external ['package.name'] [name 'ExternalClassName']] [(JavaSuperClassName|InterfaceName [, InterfaceName, InterfaceName])] 
[strict][private, protected, public]
 [variables declarations]
 [method declarations]
end;
  • Final fields. Both class and instance fields can be declared as final. A final class field can only be written in the class constructor, while a final instance field can only be written in a regular constructor. Outside those routines, they can only be read. The compiler will not be able to fully check this limitation if you use pointers to final fields.

Overloading

A number of different Pascal types and parameter passing conventions map to the same type signature in the JVM. The reason is that the JVM supports less primitive types than Pascal, and because some parameter passing constructs have to be emulated using other types.

This means that certain overloads that are valid for native targets are invalid when targeting the JVM, because the internal method name has to encode the parameter types on that target. The compiler will detect such situations and print an error when they occur.

Cases that you may encounter include:

  • Class references. All class reference types map to the same Java type (java.lang.Class). As a result, overloading based on different class reference types is not possible.
  • Ordinal types (except for enumerations). The JVM only supports signed ordinal types. This means that overloading based on signed and unsigned ordinal types of the same size is not possible. Additionally, the currency type maps to int64, and the ansichar type to shortint. Widechar (= unicodechar) is a separate type.
  • Enumerations. Every base enumeration type is a separate type, but subtypes of enumerations map to the same type as the base type.
  • Sets. All sets of non-enumeration types map to the same type (org.freepascal.rtl.FpcBitSet). All sets of enumeration types (regardless of the enumeration involved) also map to the same type (java.util.EnumSet).
  • Array types. Every array of a particular type (integer, a class type, an enumeration, ...) maps to the same signature, regardless of the size or kind of the array. This includes fixed size, open and dynamic arrays.
  • Call-by-reference parameters and arrays. As explained earlier, call-by-reference parameters are emulated using arrays. This means that e.g. a var byte and a const array of byte parameter will map to the same type signature and cannot be overloaded. Pointers to types other than the implicit pointer types also map to arrays.

Specific language features information

Classes

Classes are implemented as Java classes:

  • The system unit contains an extremely minimal TObject implementation based on java.lang.Object. It basically only adds a virtual destroy destructor and the Free instance method, and lacks all other standard TObject functionality. Notice that the destructor is only guaranteed to be called if you explicitly call Free, or when the dead object is collected by the garbage collector.
  • If no super class is specified, a class will inherit from TObject. It is also possible to inherit from java.lang.Object by specifying JLObject as superclass instead.
  • There is no TInterfacedObject, nor is it required. Any class type can implement any kind of interface.
  • Message handlers are not supported.
  • Java classes do not support non-virtual instance methods. The compiler emulates this Pascal language feature by translating such instance methods into virtual; final; methods. This means that it is not possible to "hide" such methods in child classes by declaring other (virtual or non-virtual) methods with the same name and type signature.
  • Java classes do not support starting a new inheritance tree by reintroducing a method. This is only a problem if the reintroduced method has the same signature as a method in a parent class. If the signature is different, there is no problem and the compiler will allow it.
  • Java classes do not support virtual class methods nor virtual constructors. These are emulated by the compiler, but the resulting code is fairly slow. Do not use them in performance-sensitive code.
  • Constructors do not have a name in Java (and a fixed name in JVM bytecode). As a result, you cannot declare multiple constructors for a class with a different name but with the same parameters, as these would map to the same Java constructor signature.
  • By default, properties are only supported on the Pascal side and no getters or setters are automatically generated for use on the Java side. As of r22959, you can use the -CTautogetterprefix=XXX and -CTautosetterprefix=YYY parameters to tell the compiler to automatically generate getters/setters for all properties with the specified prefixes.
  • It is not possible to call a virtual constructor for the current instance from inside one of its other constructors (virtual or not). The reason is that the JVM does not support virtual constructors, and no emulation for them has been found that can be safely used in that scenario under all circumstances.

Issues to take into account when using FPC-defined classes from Java code:

  • Virtual constructors are exposed as class methods. These class methods have first parameter that specifies the class type of which you want to invoke the virtual constructor, and their name matches the constructor's name in the Pascal code. Example:
type
TMyClass = class
constructor Create(l: longint); virtual;
end;

becomes

class TMyClass {
TMyClass(int l) {
...
}

// the generic parameter declaration means "any class reference
// type for TMyClass or one of its descendants")
static TMyClass Create(java.lang.Class<? extends TMyClass> self, int l)
{
...
}
  • virtual and regular (non-static) class methods are exposed in a similar way as virtual class constructors, including the special first class parameter that indicates the class type from which the class method has been invoked (class methods in Java do not have a hidden self-parameter like in Pascal, so it has to be specified explicitly)
  • Overriding virtual class methods and constructors on the Java side can be done by overriding the class method whose name consists of the virtual constructor/class method's name followed by __fpcvirtualclassmethod__.
  • static class methods behave the same in Pascal as in Java

Interfaces

Interfaces are implemented as Java interfaces:

  • No reference counting, since they are garbage collected
  • No GUIDs
  • No support for mapping an interface method to a method in a class definition with a different name

Implicit pointer types

As mentioned in the #Used_terminology_and_conventions, certain types that are not pointer-based in the Pascal language are implemented on top of Java classes or arrays. These data structures are pointers in Java, and as a result we can do some things with them that we cannot do with other types that map more directly onto a Java type.

The implicit pointer types and their corresponding Java representations are listed below:

  • Record: sealed subclasses of org.freepascal.rtl.FpcBaseRecordClass
  • Set: sets of enumerations are implemented as specialized java.util.EnumSet instances, while other sets are implemented as org.freepascal.rtl.FpcBitSet
  • Shortstring: instances of org.freepascal.rtl.ShortStringClass
  • Procedure of object: subclasses of org.freepascal.rtl.FpcBaseProcVarClass
  • non-dynamic arrays: regular Java arrays

Specific properties of implicit pointer types are:

  • It is always possible to take their address and store it in a pointer
  • When passing them to var or out parameters, changes made via the parameter are also immediately visible in the original location (and when passing them as a constref parameters, changes made to the original value are also immediately visible in the parameter value)
  • Pointers to implicit pointer types can never be indexed as arrays

Strings

String, unicodestring and java.lang.String

The JVM internally only supports a 16 bit unicodestring type (java.lang.String). For this reason, unicodestring is generally somewhat more efficient than either ansistring or shortstring on the JVM target.

A unicodestring directly maps to the java.lang.String type. This means that if your code is supposed to be called from Java code, all public string parameters and constants should be unicodestring rather than some other string type.

In order to facilitate this process, a new {$modeswitch unicodestrings} has been added to the compiler. If used together with {$h+}, it will make uncodestring the default string type and widechar the default char type. This is similar to the change in Delphi 2009, except that it is available via a modeswitch rather than forced onto all code.

If {$modeswitch unicodestrings} is used with {$h-}, then the default string type will be shortstring but the default char type will remain widechar. Notice that {$h+} is the default for {$mode delphi}.

Ansistring and shortstring code page

The "ansi" code page is set by the java program on startup. A default code page is chosen based on the environment on Linux and Windows. On Mac OS X, for legacy reasons the default code page is always MacRoman, which almost never is what you really want to use. See the Usage instructions for details on how to specify the default code page.

String conversions

The compiler will implicitly convert any string type to java.lang.String and vice versa.

Type aliases and subrange types

The JVM supports neither type aliases nor subrange types. As a result, none of the marked type definitions in the following example will be visible to Java code. Of course, they still work as expected in Pascal code.

<syntaxhighlight lang=pascal> type

 tc = class
 end;
 tenum = (ea,eb,ec);
 tc2 = tc;         // Pascal-only
 intsub = 3..41;   // Pascal-only
 enumsub = eb..ec; // Pascal-only

</source>

Using JDK functionality

The JVM port of FPC includes a Java utility called javapp. It is based on the source code of the standard JDK program javap, which can be used to print the contents of a Java .class file.

The javapp utility can be used to create Pascal headers for compiled Java classes. The system unit contains the subset of the standard JDK classes that is required to implement standard language functionality. The rest of the standard classes that are part of the JDK 1.5 are available via the jdk15 unit. It includes all JDK classes from the java.*, javax.* and org.* hierarchies. Other classes from the JDK are not exported because they are platform-specific and/or can change between different JDK versions.

Some things to watch out for:

  • All field names in the translated headers are prefixed by f. The reason is that Java is case-sensitive and the JDK often declares constants and fields with the same name (except for case) in a single class, which is invalid in Pascal.
  • The javapp utility cannot handle circular references involving nested classes, because they cannot be expressed in Pascal. There is one such circular reference in the standard JDK, between java.awt.Window and java.awt.Dialog. This is currently worked around declaring java.awt.Dialog as a formal class, which means that none of its methods, fields or nested classes are available.
  • In case a field, constant or method name is a reserved Pascal keyword, it is escaped using &. You will have to do the same in your Pascal code to use them.

Run java -jar javapp.jar -help to see usage information.

Using Android SDK functionality

The principle is the same as with the JDK, except that the javapp utility cannot fully automatically translate the Android SDK class hierarchy due to a circular class reference, a bug related to constants called create, and the use of an inner class. The FPC RTL for the Android/JVM target however includes a unit called androidr14 in which those problems are manually resolved. It contains header translations for all classes in R14 of the Android SDK (corresponds to Android 4.0) from the java.*, javax.*, org., junit.* and android.* hierarchies.

Issues to watch out for

  • Uninitialized values. The JVM requires that all variables are initialized before their first use, and similarly that function results are initialized before they are returned. Some specific situations that are likely to arise in existing Pascal code:
    • Non-implicit pointer type variables passed to var-parameters must be initialized prior to the call. Alternatively, you can change such var-parameters into out-parameters.
    • Even if a case-statement handles all possible situations that could arise while running the program (e.g., because it handles all possible values of an enumeration), you also have to initialise any unitialized variable you may use afterwards in the else statement
    • If a variable is first initialized in a try block, the JVM will assume that an exception could happen before this initialization is executed and hence will consider it to be still uninitialized in the except and finally blocks, as well as after the try block.
  • Uninitialized enumeration values. In Java, enumerations are classes. When constructing a new class with enumeration fields or when declaring a global variable whose type is an enumeration, FPC will automatically initialize them with the enumeration instance corresponding to the ordinal value 0. If no such enumeration exists, the field/variable will remain nil. The Java compiler does not perform any such initializations, which means that it is technically possible for Java code to pass nil pointers as enumeration parameters. This is not supported by FPC and if your code attempts to read such parameters it will abort with a java.lang.NullPointerException.
  • Call-by-reference parameters. As described earlier, except for #Implicit_pointer_types all call-by-reference parameters (var, out, constref) are emulated via copy-in/copy-out. This means:
    • updates to the parameter values will only be globally visible on return from the routine
    • such parameters are inconvenient to use from Java code, because there the programmer will have to manually create the temporary arrays and copy the values in and/or out
  • Calling constructors. The JVM requires that every constructor calls either another constructor for the same class, or an inherited constructor, before accessing any field or calling any method of the instance. If you do not call any other constructor at all, the compiler is required to insert a call to the parameterless constructor in the parent class. You are not allowed to call a constructor again on an already initialized instance to reinitialize it. Of all of these conditions, FPC currently only performs the automatic calling of the parent's parameterless constructor when required. It does not check the other requirements, which means you will get a run time error when you do not observe them.
  • Classes with abstract methods. The JVM requires that classes containing one or more abstract methods are marked as abstract themselves. The compiler will do this for you if you do not do so yourself. However, unlike Pascal, the JVM will throw an exception when you try to instantiate an abstract class.
  • Empty strings and dynamic arrays. In most cases, empty arrays and ansi/unicodestrings will be represented internally by a non-nil array/string with length 0. This is different from native FPC targets, where such arrays/strings are represented by nil pointers. This is a conscious choice to make it easier to interact with Java code, since in Java code a nil string will cause a NullPointerException rather than behaving like an empty string. The same goes for arrays. Assigning nil to dynamic arrays to initialize them with an empty array is still completely valid though (the compiler will convert such assignments to emptying the array).
  • Synchronized. Access to Java's synchronized functionality is currently not yet exposed (neither the method modifier nor synchronized blocks). This will still be added in the future, in a way that also works for native targets.
  • Known bugs. See FPC_JVM/Usage#Known_bugs.