Difference between revisions of "ZSeries/Part 1"

From Free Pascal wiki
Jump to navigationJump to search
 
(23 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
=Introduction=
 
=Introduction=
:''[[ZSeries|Go back to zSeries]]''  
+
:''[[ZSeries|Go back to zSeries]]'' — [[ZSeries/Part 2|Onward to Part 2]]
 
My name is Paul Robinson, I am the chief programmer at Viridian Development Corporation, which has decided to develop a cross-compiler version of Free Pascal for the IBM 370/390/zSeries mainframe computer.  I decided it would be a learning experience, it would allow me to better understand how the Free Pascal compiler works, and because I didn't particularly want to work with a compiler written in C (such as the GCC Pascal Compiler), I wanted to work with one written in Pascal.   
 
My name is Paul Robinson, I am the chief programmer at Viridian Development Corporation, which has decided to develop a cross-compiler version of Free Pascal for the IBM 370/390/zSeries mainframe computer.  I decided it would be a learning experience, it would allow me to better understand how the Free Pascal compiler works, and because I didn't particularly want to work with a compiler written in C (such as the GCC Pascal Compiler), I wanted to work with one written in Pascal.   
  
There is an existing more-or-less open source compiler for the 370 architecture which I have a copy of the sources and run-time library, it was modified from an earlier incrementally updated Pascal Compiler called P6 or P7 (depending on which release it was), I think it was P6 when it was on the Decsystem/20 mainframe back in the late 1970s and early 1980s (I have the source to that one too), and may have been P7 by the time it was upgraded for the latest release for Control Data Corporation computers.  This version of P6/P7 was developed by the Australian Atomic Energy Commission.  Only problem with it is it's over 25 years old and only supports Standard Pascal.  No objects, no strings, and doesn't even come up to the level of capability of Turbo Pascal 3 for DOSIt's old, and I wanted to work with something more recent. I can, however, borrow from it to figure out how some Pascal keywords are translated into 370 code. 
+
There is an existing more-or-less open source compiler for the 370 architecture which I have a copy of the sources and run-time library, it was modified from an earlier incrementally updated Pascal Compiler called P6 or P7 (depending on which release it was), I think it was P6 when it was on the Decsystem/20 mainframe back in the late 1970s and early 1980s (I have the source to that one too), and may have been P7 by the time it was upgraded for the latest release for Control Data Corporation computers.  (When Nicklaus Wirth was creating the Pascal language at ETH in Zurich, that's what they had, so Pascal originally started with Control Data computersIn fact, because of the functionality provided by the existing Control Data libraries, Wirth wrote the first Pascal compiler using Fortran.  I am not kidding.)
  
I also should have a copy of the Stanford Pascal Compiler sources (also over 20 years old), and I believe that the CBT tapes archive (a huge set of over 200 mainframe source code magnetic tapes originally collected by Connecticut Bank and Trust) may contain either a Pascal Compiler or some other compiler source I can use.  It also contains lots of IBM 360/370 assembly language sources which will be useful.  All these resources and others should help in doing this implementation.
+
This version of P6/P7 Pascal Compiler (the internal comments refer to it as "Stepwise refinement of a Pascal Compiler") was developed by the Australian Atomic Energy Commission.  Only problem with it is it's over 25 years old and only supports Standard Pascal.  No objects, no strings (you can do "array [1..256] of character" but concatenating two strings?  Write your own function!), and doesn't even come up to the level of capability of Turbo Pascal 3 for DOS.  It's old, and I wanted to work with something more recent.  I can, however, borrow from it to figure out how some Pascal keywords are translated into 370 code. 
 +
 
 +
I also should have a copy of the [[Stanford Pascal Compiler]] sources (also over 20 years old), and I believe that the CBT tapes archive (a huge set of over 200 mainframe source code magnetic tapes originally collected by Connecticut Bank and Trust) may contain either a Pascal Compiler or some other compiler source I can use.  It also contains lots of IBM 360/370 assembly language sources which will be useful.  All these resources and others should help in doing this implementation.
 +
 
 +
Also, because I have regularly used the Free Pascal Compiler, I wanted to see it available on another architecture where the full object pascal capability would be available.  When I started this project back in January of 2012, I didn't realize how long it would take, because it was the sort of thing that was what is called a "slopsucker" task.  On a computer, the "slopsucker" is the task that gets whatever processor time is available after everything else has gotten a chance.  Sort of like the amount of attention you give to the sump pump in the basement of your house (if you have one), unless, of course, it fails.  So, I'm busy with other things, doing a little here, a little there, and what do you know, 9 months go by and I haven't accomplished a thing.  So I decided to "up the priority" of this task and give it more attention.  That's when I realized how much of my initial presumptions I made when writing this were wrong.  More on that, later.
 +
 
 +
=Background on this project=
 +
I am aware, first, that this will be a huge undertaking; this is not a weekend project.  I'm probably looking at a minimum <s>six months</s> two years work, possibly longer.  I have a story.  When I wasn't doing programming I did mobile notary.  I was, then, a commissioned Notary Public for the Commonwealth of Virginia (I'm now commissioned in Virginia and Maryland), and one of the things that was developed was, when people refinanced their mortgages, the company would send a notary right to the person's home or office.  Well, one of the customers happened to discuss how they wanted to upgrade the application that their people ran to handle booking of services.  It was a DOS application, they wanted to do more things with it and they wanted it to be a GUI under windows.  At that time Free Pascal did not have the equivalent of Lazarus, so I unfortunately had to use Visual Basic to do so.  (Please do not send me hate mail, I have to use what I have available.)
 +
 
 +
Well, anyway, so I had several meetings with the customer and arranged terms including price.  I worked on it on a regular basis, implemented it through "accretion" in which you get part of it working and they can see how it's coming along.  We had to change things along the way and various fits and starts, but in the end, they were very pleased with the program that they can use for their employees who book orders on their laptop computers.  It wasn't a very big program in terms of what was going on, they have it running on about 60 computers, but from when we first sat down to decide what they wanted, until it was fully running and distributed out to everyone took one solid year. 
 +
 
 +
So the point of this shaggy dog story is that I'm aware that this is a long-term project and will probably take months.  I am also aware that the compiler won't work when I first make changes because I have to learn where the main part of the compiler hands off control to the machine-specific and OS-specific routines so that it can ask that routine, "Hey, the user is implementing a FOR statement, you need to create the code to do this." or "Hey, the guy is declaring the start of a procedure that has an integer argument, here's the information on how it's supposed to be defined."
 +
 
 +
So anyway, I know there's a lot involved, there will be fits, and starts, and things won't always go right.  But it's a learning experience, and, if you follow this as it goes along over the months as it progresses, maybe you'll learn something too.
  
 
=Before Getting Started=
 
=Before Getting Started=
 
The normal distribution includes most of the sources including the run-time library but '''does not''' include the sources to the compiler itself because most people do not need it and it's about another 40 meg.  You need to obtain the zip/tar archive file for the compiler from the download location you're using for the rest of Free Pascal (probably Sourceforge or a mirror) and extract from that archive the compiler directory, and include it with the 2.6.0 source release.
 
The normal distribution includes most of the sources including the run-time library but '''does not''' include the sources to the compiler itself because most people do not need it and it's about another 40 meg.  You need to obtain the zip/tar archive file for the compiler from the download location you're using for the rest of Free Pascal (probably Sourceforge or a mirror) and extract from that archive the compiler directory, and include it with the 2.6.0 source release.
  
First thing is to start by creating a new directory and copying all files and subdirectories from the Compiler subdirectory (and all files in its subdirectories) to a new directory, in order not to contaminate the pristine sources of the current compiler.  This compiler will be a cross-compiler, it will run on a PC and will generate an assembly language file for the 370 Architecture with the use of the standard High-Level Assembler syntax.  That file will be uploaded to the target mainframe (real or simulated) and run through the high-level assembler there.
+
There are two ways to go about this. First thing is to start by deciding how you're going to carry the source files on your system.  If you're just trying to do a simple change in the compiler to fix something, you can begin by creating a new directory and copying all files and subdirectories from the Compiler subdirectory (and all files in its subdirectories) to a new directory, in order not to contaminate the pristine sources of the current compiler.   
 
 
You would also want to copy the '''rtl''' directory (which is normally outside of the Compiler directory) because you may need to modify some files there.  That will also require an i370 subdirectory for its run-time library, which might be different depending on which mainframe OS is targeted.  We'll worry about that later.
 
 
 
There will also be created a new '''I370''' subdirectory within the Compiler directory for all the local files related to that architecture.
 
  
Note that the pages here will just basically walk through what was done, if a correction is more than a few lines, the user will be directed to the replacement source file. Once the work is completed a zip file containing all of the new or changed source files will be available.
+
If you're planning to do serious development such as a full-blown port, you're definitely wanting to use full source code control including version management.  For that, you need to get an SVN client and use SVN to download everything. (If you're using Mercurial, CVS or Git, quit complaining!  Everybody has their own particular style, or toys, or whatever, and you have to go along with that particular crowd to work with them.  I mean, there are people who argue over which word processor is best and will almost "go to the mattresses" to defend their decision. (See the movie ''Sleepless in Seattle'' about how the movie ''The Godfather'' is the answer to everything!)
  
=Issues=
+
Or let's consider formatting style: Which of the following is the correct way to format a block of code:
There are a number of issues when doing this.  The 370 has a number of quirks different from the Wintel architecture or Mac hardware
+
      (1)                             (2)
*It's big-endian
+
    BEGIN                          BEGIN
*It uses non-IEEE floating point so it may have different limit values
+
        Instruction1;                        Instruction1;
*While it has more registers (15), about 5 of these are generally not usable due to conventions or hardware requirements
+
        IF something THEN                    IF something THEN
*The maximum amount of memory you can directly address at one time is much smaller (you can only address about 4K at a time, either as code or data).  If you're working with two pieces of data, either may be up to 4K in size that you can work with directly.
+
          BEGIN                            BEGIN
*Depending on whether you target a 370, a 390 or a zSystem you may have access to a 32-bit address space or a 64-bit address space and a much larger area than 4K.  Since the target I'm going to be using is a 370, I have to restrict the code to a 4K "page" and 32-bit addressing.
+
              Instruction2;                        Instruction2;
*There are several different operating systems that could be targeted, either the MUSIC/SP emulator I'm using (not going to be very popular), a program running under VM, a program running on the TSO timesharing system, a program running as a batch job on OS/VS1, a program potentially running as a screen application on the CICS terminal monitor (very similar to how Windows programs work, with a few gotchas), or a program running on Linux/370.  This will be dealt with by using generic I/O instructions and having an appropriate run-time library for the particular system.
+
              Instruction3;                        Instruction3;
*The IBM 370/390/zSystem uses the EBCDIC character set, PCs use ASCII (or Unicode).  Unicode support may be available but I won't depend upon it.  Where the program has to generate constants, only standard, known characters will be used because they translate fine from ASCII to EBCDIC.  If there is anything important where the value matters, hexadecimal constants will be used.
+
          END                              END
==Target Choice==
+
        ELSE                                ELSE
I don't really have access to a real zSystem or 390, I have a 370 simulator running on my PC that has an operating system (MUSIC/SP) and an assembler, so the cross-compiler will be restricted to 370 instructions and 32-bit addressing, with a potential for upgrading this if circumstances permit.
+
          BEGIN                            BEGIN
 +
              Instruction3;                        Instruction3;
 +
              Instruction2;                        Instruction2;
 +
          END;                              END;
 +
    END;                            END;
  
=To Begin=
+
Or are neither of these right?  Well, guess what, whichever one you picked, you're rightYou're also wrongSomeone won't like the way someone else does indentations, the number of spaces for each level, whether the BEGIN should be on the same line as the IF or ELSE, or whether the THEN should be moved down and the begin put there, as well as whether they capitalize keywords or use Camel Case, or whateverIt's all a matter of taste, and mostly it's arbitrary. You pick what style works for you to be able to read the code easily, and use that.
This compiler is ''huge''.  It's hundreds of source files, and is going to be an enormous taskWhere do you start?  Well, you start with the main program of the command-line compiler, and you look at itThat file is '''pp.pas'''.
 
  
Note that line numbers indicated in any source file are from the version 2.60 compiler sources and as such, as lines are added, other line numbers where things were found and changed will increase. So line numbers will be referenced in a file from top to bottom so the references should matchAlso, so as not to brand this as "windows centric" since the hope is to build a cross-compiler for I370 that could run on either Windows or Linux, when file names are specified, directory separators will use /.
+
So, anyway, if you don't have SVN, get a Linux client if you're on Linux or [http://tortoisesvn.net Install Tortoise SVN client for Windows], then download the Free Pascal Sources using SVNIf you're going to do an approved project, then you'd get write access to the repository, otherwise all you need is read access.  I won't go into how to do this because there's already plenty of information here on how to use SVN to do this, I'll go on to the port I'm working on.
  
Note that from this point on, all editing occurs in our "sandbox" directory separate from the original compiler.
+
This compiler will ''initially'' be a cross-compiler, it will run on a PC and will generate an assembly language file for the 370 Architecture with the use of the standard High-Level Assembler syntax.  That assembly-language source file will be uploaded to the target mainframe (real or simulated) and run through the high-level assembler there.  Eventually, as time goes by, it will be ported over and will run natively there.  (At least I ''hope'' that's how it will work out...)
  
[[Zseries|Go back to Zseries]] &mdash; [[Zseries/Part 2|Go on to Part 2]]
+
Then I discover that if you're porting for Linux, you do not have the High-Level Assembler, you have a bastardized version of Intel and AT&T syntax because the assembler isn't the powerful High-Level Macro Assembler for mainframes, it's the GCC assembler for the GNU C Compiler that uses a form of the syntax for the 386 series microcomputer. So that has to be taken into consideration.
==pp.pas==
 
:''Main Program, includes '''fpcdefs.inc''', calls units '''cmem, profile, catch, globals, compiler'''''
 
'''pp.pas''' is the main program of the compiler.  We're going to edit this file to create an IBM-370 cross-compiler.  First, we'll decide what is the switch for this, and we'll use I370.  So we'll add that to the source comments to indicate that, by inserting the middle line in the comment block (about line 35):
 
  VIS                generate a compiler for the VIS
 
  I370                generate a compiler for the IBM 370/390/zSeries mainframes
 
  DEBUG              version with debug code is generated     
 
This program includes '''fpcdefs.inc''' so we'll check that later.  We have to add the indication to only select one target compiler, so we'll select for I370 (about line 142, where we'll add everything starting at the sixth line where we check to see if i370 is defined):
 
{$ifdef support_mmx}
 
  {$ifndef i386}
 
    {$fatal I386 switch must be on for MMX support}
 
  {$endif i386}
 
{$endif support_mmx}
 
{$ifdef i370}
 
  {$ifdef CPUDEFINED}
 
    {$fatal ONLY one of the switches for the CPU type must be defined}
 
  {$endif CPUDEFINED}
 
  {$define CPUDEFINED}
 
  {$endif i370}
 
The rest of the main program seems to be okay, but we will have to go through and look at all the units that this program uses, which, depending on which options have been set, are or can be: '''<code>cmem, profile, catch, globals,</code> and <code>compiler</code>'''.  The rest of this file seems okay, so we'll save it.
 
  
==fpcdefs.inc==
+
So presuming you don't use SVN and you need to manage the directories manually, you would also want to copy the '''rtl''' directory (which is normally outside of the Compiler directory) because you may need to modify some files thereThat will also require an i370 subdirectory for its run-time library, which might be different depending on which mainframe OS is targetedWe'll worry about that later.
:''Called as include file from '''pp.pas''', called as include file from units '''catch, compiler''' (and virtually all other units), no called units noted''
 
'''fpcdefs.inc''' provides various definitions regarding what processor we're compiling for to most units and many other files.  We need to define the processor, so we'll borrow the generic one, and add or remove items as we need themSo we'll start with the block beginning with the line <code>{$ifdef generic_cpu}</code> through the line <code>{$endif generic_cpu}</code>.  From later work, I discover I'll have to mark the machine as big endian, so I'll include that.  (Note that the definitions will include things as I discover them, so this may include things I haven't explained.)
 
  
So around line 151, between the lines
+
There will also be created a new <s>i370</s> '''s370''' subdirectory within the Compiler directory for all the local files related to that architecture. That's another thing I learned, straight out of the Linux development for the z/System (the extended version of the 370 and 390 series, is that it's called the S370, not the I370.  So that gets changed, too.
{$endif mips}
 
 
{$IFDEF MACOS}     
 
We'll update this to the following:
 
{$endif mips}
 
 
{$ifdef i370}
 
  {$define cpu32bit}
 
  {$define cpu32bitaddr}
 
  {$define cpu32bitalu}
 
  {$define cpuflags}
 
  {$define cpuextended}
 
  {$define ENDIAN_BIG}
 
{$endif i370}
 
 
{$IFDEF MACOS}     
 
Otherwise, now, '''fpcdefs.inc''' looks okay.
 
  
==Unit cmem==
+
Note that the pages here will just basically walk through what was done, if a correction is more than a few lines, the user will be directed to the replacement source fileOnce the work is completed a zip file containing all of the new or changed source files will be available.
:''Called from: '''pp.pas''', file located in directory '''rtl/inc''' outside of the ''compiler'' directory, no called units noted, requires external procedures ''' malloc, free, realloc, calloc  ''
 
This is used to bridge to the C memory management library and its functions <code>malloc, free, realloc, calloc</code>. I will probably borrow these from an existing C library or simulate them.  Or this module may be rewrittenFor now, there is nothing I need to do here.
 
  
==Unit profile==
+
=Issues=
:''Called from: '''pp.pas''', file located in directory '''rtl/go32v2''' outside of the ''compiler'' directory, no units noted
+
There are a number of issues when doing this.  The S370/390/zSystem has a number of quirks different from the Wintel architecture or Mac hardware
This is only used for profiling under the Go32 system, so this module is not relevant to our cross-compiler.
+
*It's big-endian (The number 1 stored as an integer 32-bit word internally as 00000001 while the I386 would store it as 00010000.)
 +
*It uses non-IEEE floating point so it may have different limit values (the zSystem has IEEE floating point available)
 +
*While it has more registers (15), about 5 of these are generally not usable due to conventions or hardware requirements (it's worse on Linux for S/390, instead of using one register to point to a list, it uses registers 0 through 6 to hold up to 6 integer arguments.)
 +
*The maximum amount of memory you can directly address at one time is much smaller (in ESA mode on the 370, you can only address about 4K at a time, either as code or data, the contents of one register with an unsigned offset of 2^12, or 4096).  If you're working with two pieces of data, either may be up to 4K in size that you can work with directly.  This has been expanded with the new 20-bit signed offset on the S/390 and zSystem, which means you can work with 512K in an area.
 +
* Depending on whether you target a 370, a 390 or a zSystem you may have access to a 32-bit address space or a 64-bit address space and a much larger area than 4K.
 +
*There are several different operating systems that could be targeted, such as
 +
**The MUSIC/SP emulator I'm using (not going to be very popular as MUSIC was essentially deprecated by its distributor, McGill University in Montreal)
 +
**a program running on a terminal under VM/370
 +
**a program running on the TSO timesharing system
 +
**a program running as a batch job on OS/VS1
 +
**a program potentially running as a screen application on the CICS terminal monitor (very similar to how Windows programs work, with a few gotchas)
 +
**a program running on Linux/370, or
 +
**The z/390 portable mainframe emulator (http://www.z390.org), written in Java by Don Higgins, runs on Windows and released as open source by Automated Software Tools Corporation. Allows running of Assembler and Cobol programs, and supports a subset of the VS/1 supervisor call set, which is entirely different from the Linux Supervisor call set and the argument passing rules are different under the old mainframe standard calling conventions.  For example, to use dynamic memory in pascal you use the '''new''' procedure to allocate memory, and '''dispose''' to release it. At the system level, Linux uses the C-programming language names '''malloc()''' to allocate memory, and '''free()''' to release it. In VS/1, you use the operating system macro '''GETMAIN''' to allocate memory, and '''FREEMAIN''' to release it.  Also, arguments on Linux are passed in registers 1 through 6; arguments on VS/1 are passed as pointers from a list pointed to by register 1.
  
==Unit catch==
+
This issue of where the program will run will be dealt with by using generic I/O instructions (basically private macros) and having an appropriate run-time library for the particular system.
:''Called from: '''pp.pas''', includes '''fpcdefs.inc''', no called units noted''
+
*The IBM 370/390/zSystem uses the EBCDIC character set, PCs use ASCII (or Unicode). The zSystem allows use of ASCII, and runs code in ASCII natively, so that's no longer a problem. But you may get some gotchas when transferring files; Windows can release source code using UTF-8, which allows for things like non-roman character sets (for doing other written languages like Japansese, Arabic, Greet or Russian natively in code) but it means you're not using plain 7-bit ASCII, you might even be using double-byte characters, which can be a surprise. I found this out when transferring a C source file so I could test a small program on one of a Linux/z390 machine IBM makes available at no charge for people testing development of applications and porting to the z/System, that text files can have extra characters you don't see because Windows handles Unicode internally, and usually seamlessly, but when those files move to a non-windows environment some of the extra characters show up.
'''catch.pas''' deals with handling control-c and segfaults, neither of which we'll really have to worry about since the compiler doesn't run locally on the mainframe, it runs on the user's PC.  So in the absence of anything requiring we deal with this, we'll leave it alone for now.
 
  
==Unit globals==
+
==Target Choice==
:''Called from: '''pp.pas''', includes '''fpcdefs.inc''', calls unit '''comphook'''''
+
IBM offers access to an actual z/System running Linux for up to 90 days without charge for porting applications to Linux, so I will take advantage of an actual mainframe to target.  I will also use an S/390 emulator program written in Java on my Windows PC that allows programs to be run in an emulated OS/VS1 or VSE environmentThat is, however, an EBCDIC environment so I have to be careful. I also got it partially wrong when I was making corrections in that I confused the S/370, S/390 or z/System processor, the CPU, with the operating system, which could be VS/1, the z390 emulator running a version of VS/1, or Linux.
'''globals.pas''', starting at about line 372, has some definitions for specific target processorsSo, again, I'll borrow from <code>{$ifdef GENERIC_CPU}</code> and change as necessaryAt line 424, we have
 
  {$endif mips}
 
so we'll add the 370 after this, and put in the following:
 
  {$endif mips}
 
  {$ifdef i370}
 
        cputype : cpu_i370;
 
        optimizecputype : cpu_i370;
 
        fputype : fpu_i370;
 
  {$endif i370}
 
Note, these values will be defined elsewhere. Note that this unit references a few other units depending on the OS and machine, but the only one relevant to us will be '''<code>comphook</code>''' so we'll put that on the "stack" of units that have to be inspected and possibly edited.
 
  
==Unit compiler==
+
=To Begin=
:''Called from: '''pp.pas''', includes '''fpcdefs.inc''', calls units '''fksysutils, sysutils, math, verbose, comphook, systems, cutils, cfileutl, cclasses, globals, options, fmodule, parser, symtable, assemble, link, dbgbase, import, export, tokens, pass_1, wpobase, wpo, cpupara, cpupi, cgcpu, cpunode, cputarg, i_i370, globtype'''''
+
This compiler is ''huge''. It's hundreds of source files, and is going to be an enormous task. (I compiled it once on my computer, the entire compiler, targeting itself, I386 on Windows.  It is over 250,000 lines of code and about 208 units.  But my computer is fairly fast, the entire compilation took less than 15 seconds.
  
Starting at about line 24, this unit lists the units it uses. Some units depend on other flags, but the units it does always require are: '''<code>verbose, comphook, systems, cutils, cfileutl, cclasses, globals, options, fmodule, parser, symtable, assemble, link, dbgbase, import, export, tokens, pass_1, wpobase, wpo, cpupara, cpupi, cgcpu, cpunode,</code>''' and '''<code>cputarg</code>'''These are also added to the "stack" for checking on references.
+
Okay, so, given the size of the compiler, where do you start? Well, you start with the main program of the command-line compiler, and you look at it.  That file is '''pp.pas'''.  Recursively follow the sources of every unit referenced by it or any unit they reference until every one has been done (I more-or-less explain this in [[ZSeries/Part 2|Part 2]]) and then you know that you caught all the places you might need to declare, add or change something(It also gives you at least a fleeting understanding of what each unit does.)
  
At this point I do not know whether or not I need to set the flag '''<code>USE_FAKE_SYSUTILS</code>''', if I do, I only need to add one unit, '''<code>fksysutils</code>''', but if I do need the "real" sysutils unit, then it will automatically include the units '''<code>sysutils, math</code>'''So I'll presume all three have to be looked atThey also go on the stack for checking.  Doing a directory scan, apparently ''every'' architecture does use the <code>sysutils</code> unit, and ''none'' use '''<code>fksysutils</code>''', so I'll presume that to be the case.  
+
I originally did this and realized I was looking at too much code.  PP, which has conditional compilation directives for various machines, calls the procedure compile, which is located in module compilerAnd that module uses conditional compilation to target certain processors or operating systemsAnd then we can follow that and see where it leads.
  
This unit also defines the machine that is the target for this compiler, so we have to add a unit which will define our target architecture. There are units for every machine, so we have to insert oneAt the code around line 116:
+
Note that line numbers indicated in any source file are from the version 2.6.0 compiler sources (having just been released, I'll also look at 2.6.2 if I can) and as such, as lines are added, other line numbers where things were found and changed will increaseSo line numbers will be referenced in a file from top to bottom so the references should match.  Also, so as not to brand this as "Windows centric" since the hope is to build a cross-compiler for I370 that could run on either Windows or Linux, when file names are specified, directory separators will use /.
{$ifdef nativent}
 
  ,i_nativent
 
{$endif nativent}
 
  ,globtype;   
 
we change to:
 
{$ifdef nativent}
 
  ,i_nativent
 
{$endif nativent}
 
{$ifdef i370}
 
  ,i_i370
 
{$endif i370}
 
  ,globtype;   
 
This means we've now added a new source file which will eventually have to be created, '''i_i370.pas''' for the new unit '''<code>i_i370</code>''', which almost certainly will be in the I370 subdirectory.  Also unit '''<code>globtype</code>''' which is at the end of the list of units used.  The rest looks okay so for the moment we're done with this unit.
 
  
==Unit comphook==
+
Note that from this point on, all editing occurs in our "sandbox" directory separate from the original compiler.  So let's get started, with [[ZSeries/Part 2|Part 2]] of this article.
:''Called from: '''globals.pas''', includes '''fpcdefs.inc''', no called units noted''
 
For the moment, there does not appear to be anything I need to do here.
 
==Unit fksysutils==
 
This is only used if the particular architecture sets the flag that says it does not use the '''<code>sysutils</code>''' and '''<code>math</code>''' units.  Given that all of them do use them, I will presume for now I do not need to create this file.
 
==Unit sysutils==
 
:''Called from: '''compiler.pas''', '''rtl/objpas/math.pas''' (outside Compiler directory), located in '''rtl/i370''' outside the ''compiler'' directory, includes '''fpcdefs.inc''', no called units noted''
 
This file is a local file for each architecture, so it will be in the rtl subdirectory in its i370 subdirectoryI'll take a look at a couple of architectures to see what's expected to be included.  It is used for platform dependent calls, is used by math, and if there are none, it will simply be a stub unit.  For the moment, I'll just create a stub file and come back later if I need to.
 
{
 
 
    This file is part of the Free Pascal run time library.
 
    Copyright (c) 2012 by Viridian Development Corporation
 
 
    Sysutils unit for IBM 370/390/zSystem
 
 
    See the file COPYING.FPC, included in this distribution,
 
    for details about the copyright.
 
 
    This program is distributed in the hope that it will be useful,
 
    but WITHOUT ANY WARRANTY; without even the implied warranty of
 
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
 
  **********************************************************************}
 
unit sysutils;
 
 
interface
 
implementation
 
 
end. 
 
  
==Unit math==
+
:''[[ZSeries|Go back to zSeries]]'' &mdash; [[ZSeries/Part 2|Onward to Part 2]]
:''Called from: '''compiler.pas''', located in '''rtl/objpas''' outside the ''compiler'' directory, includes '''fpcdefs.inc''', calls unit '''sysutils'''''
 
This might need to be changed later.  The IBM 370 series uses a different floating point level and its numeric limits are not the same as IEEE.  For the moment I'll leave this.  Note that when it is compiled, a copy of the PPU needs to be in '''rtl/i370''' (outside of the '''Compiler''' directory).
 
==Unit verbose==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit comphook==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit systems==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit cutils==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit cfileutl==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit cclasses==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit globals==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit options==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit fmodule==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit parser==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit symtable==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit assemble==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit link==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit dbgbase==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit import==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit export==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit tokens==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit pass_1==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit wpobase==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit wpo==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit cpupara==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit cpupi==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit cgcpu==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit cpunode==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit cputarg==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
==Unit globtype==
 
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
 
(To be added later)
 
  
==Unit i_i370==
+
[[Category:Mainframes]]
:''Called from: '''compiler.pas''', includes '''fpcdefs.inc''', no called units noted''
+
[[Category:High-performance computing]]
(This will be created later once I know what has to go into it)
 
{{Stub}}
 

Latest revision as of 13:00, 25 December 2016

Introduction

Go back to zSeriesOnward to Part 2

My name is Paul Robinson, I am the chief programmer at Viridian Development Corporation, which has decided to develop a cross-compiler version of Free Pascal for the IBM 370/390/zSeries mainframe computer. I decided it would be a learning experience, it would allow me to better understand how the Free Pascal compiler works, and because I didn't particularly want to work with a compiler written in C (such as the GCC Pascal Compiler), I wanted to work with one written in Pascal.

There is an existing more-or-less open source compiler for the 370 architecture which I have a copy of the sources and run-time library, it was modified from an earlier incrementally updated Pascal Compiler called P6 or P7 (depending on which release it was), I think it was P6 when it was on the Decsystem/20 mainframe back in the late 1970s and early 1980s (I have the source to that one too), and may have been P7 by the time it was upgraded for the latest release for Control Data Corporation computers. (When Nicklaus Wirth was creating the Pascal language at ETH in Zurich, that's what they had, so Pascal originally started with Control Data computers. In fact, because of the functionality provided by the existing Control Data libraries, Wirth wrote the first Pascal compiler using Fortran. I am not kidding.)

This version of P6/P7 Pascal Compiler (the internal comments refer to it as "Stepwise refinement of a Pascal Compiler") was developed by the Australian Atomic Energy Commission. Only problem with it is it's over 25 years old and only supports Standard Pascal. No objects, no strings (you can do "array [1..256] of character" but concatenating two strings? Write your own function!), and doesn't even come up to the level of capability of Turbo Pascal 3 for DOS. It's old, and I wanted to work with something more recent. I can, however, borrow from it to figure out how some Pascal keywords are translated into 370 code.

I also should have a copy of the Stanford Pascal Compiler sources (also over 20 years old), and I believe that the CBT tapes archive (a huge set of over 200 mainframe source code magnetic tapes originally collected by Connecticut Bank and Trust) may contain either a Pascal Compiler or some other compiler source I can use. It also contains lots of IBM 360/370 assembly language sources which will be useful. All these resources and others should help in doing this implementation.

Also, because I have regularly used the Free Pascal Compiler, I wanted to see it available on another architecture where the full object pascal capability would be available. When I started this project back in January of 2012, I didn't realize how long it would take, because it was the sort of thing that was what is called a "slopsucker" task. On a computer, the "slopsucker" is the task that gets whatever processor time is available after everything else has gotten a chance. Sort of like the amount of attention you give to the sump pump in the basement of your house (if you have one), unless, of course, it fails. So, I'm busy with other things, doing a little here, a little there, and what do you know, 9 months go by and I haven't accomplished a thing. So I decided to "up the priority" of this task and give it more attention. That's when I realized how much of my initial presumptions I made when writing this were wrong. More on that, later.

Background on this project

I am aware, first, that this will be a huge undertaking; this is not a weekend project. I'm probably looking at a minimum six months two years work, possibly longer. I have a story. When I wasn't doing programming I did mobile notary. I was, then, a commissioned Notary Public for the Commonwealth of Virginia (I'm now commissioned in Virginia and Maryland), and one of the things that was developed was, when people refinanced their mortgages, the company would send a notary right to the person's home or office. Well, one of the customers happened to discuss how they wanted to upgrade the application that their people ran to handle booking of services. It was a DOS application, they wanted to do more things with it and they wanted it to be a GUI under windows. At that time Free Pascal did not have the equivalent of Lazarus, so I unfortunately had to use Visual Basic to do so. (Please do not send me hate mail, I have to use what I have available.)

Well, anyway, so I had several meetings with the customer and arranged terms including price. I worked on it on a regular basis, implemented it through "accretion" in which you get part of it working and they can see how it's coming along. We had to change things along the way and various fits and starts, but in the end, they were very pleased with the program that they can use for their employees who book orders on their laptop computers. It wasn't a very big program in terms of what was going on, they have it running on about 60 computers, but from when we first sat down to decide what they wanted, until it was fully running and distributed out to everyone took one solid year.

So the point of this shaggy dog story is that I'm aware that this is a long-term project and will probably take months. I am also aware that the compiler won't work when I first make changes because I have to learn where the main part of the compiler hands off control to the machine-specific and OS-specific routines so that it can ask that routine, "Hey, the user is implementing a FOR statement, you need to create the code to do this." or "Hey, the guy is declaring the start of a procedure that has an integer argument, here's the information on how it's supposed to be defined."

So anyway, I know there's a lot involved, there will be fits, and starts, and things won't always go right. But it's a learning experience, and, if you follow this as it goes along over the months as it progresses, maybe you'll learn something too.

Before Getting Started

The normal distribution includes most of the sources including the run-time library but does not include the sources to the compiler itself because most people do not need it and it's about another 40 meg. You need to obtain the zip/tar archive file for the compiler from the download location you're using for the rest of Free Pascal (probably Sourceforge or a mirror) and extract from that archive the compiler directory, and include it with the 2.6.0 source release.

There are two ways to go about this. First thing is to start by deciding how you're going to carry the source files on your system. If you're just trying to do a simple change in the compiler to fix something, you can begin by creating a new directory and copying all files and subdirectories from the Compiler subdirectory (and all files in its subdirectories) to a new directory, in order not to contaminate the pristine sources of the current compiler.

If you're planning to do serious development such as a full-blown port, you're definitely wanting to use full source code control including version management. For that, you need to get an SVN client and use SVN to download everything. (If you're using Mercurial, CVS or Git, quit complaining! Everybody has their own particular style, or toys, or whatever, and you have to go along with that particular crowd to work with them. I mean, there are people who argue over which word processor is best and will almost "go to the mattresses" to defend their decision. (See the movie Sleepless in Seattle about how the movie The Godfather is the answer to everything!)

Or let's consider formatting style: Which of the following is the correct way to format a block of code:

      (1)                              (2)
    BEGIN                           BEGIN
       Instruction1;                        Instruction1;
       IF something THEN                    IF something THEN
          BEGIN                             BEGIN 
              Instruction2;                         Instruction2;
              Instruction3;                         Instruction3;
          END                               END
       ELSE                                 ELSE
          BEGIN                             BEGIN
              Instruction3;                         Instruction3;
              Instruction2;                         Instruction2;
          END;                              END;
    END;                            END;

Or are neither of these right? Well, guess what, whichever one you picked, you're right. You're also wrong. Someone won't like the way someone else does indentations, the number of spaces for each level, whether the BEGIN should be on the same line as the IF or ELSE, or whether the THEN should be moved down and the begin put there, as well as whether they capitalize keywords or use Camel Case, or whatever. It's all a matter of taste, and mostly it's arbitrary. You pick what style works for you to be able to read the code easily, and use that.

So, anyway, if you don't have SVN, get a Linux client if you're on Linux or Install Tortoise SVN client for Windows, then download the Free Pascal Sources using SVN. If you're going to do an approved project, then you'd get write access to the repository, otherwise all you need is read access. I won't go into how to do this because there's already plenty of information here on how to use SVN to do this, I'll go on to the port I'm working on.

This compiler will initially be a cross-compiler, it will run on a PC and will generate an assembly language file for the 370 Architecture with the use of the standard High-Level Assembler syntax. That assembly-language source file will be uploaded to the target mainframe (real or simulated) and run through the high-level assembler there. Eventually, as time goes by, it will be ported over and will run natively there. (At least I hope that's how it will work out...)

Then I discover that if you're porting for Linux, you do not have the High-Level Assembler, you have a bastardized version of Intel and AT&T syntax because the assembler isn't the powerful High-Level Macro Assembler for mainframes, it's the GCC assembler for the GNU C Compiler that uses a form of the syntax for the 386 series microcomputer. So that has to be taken into consideration.

So presuming you don't use SVN and you need to manage the directories manually, you would also want to copy the rtl directory (which is normally outside of the Compiler directory) because you may need to modify some files there. That will also require an i370 subdirectory for its run-time library, which might be different depending on which mainframe OS is targeted. We'll worry about that later.

There will also be created a new i370 s370 subdirectory within the Compiler directory for all the local files related to that architecture. That's another thing I learned, straight out of the Linux development for the z/System (the extended version of the 370 and 390 series, is that it's called the S370, not the I370. So that gets changed, too.

Note that the pages here will just basically walk through what was done, if a correction is more than a few lines, the user will be directed to the replacement source file. Once the work is completed a zip file containing all of the new or changed source files will be available.

Issues

There are a number of issues when doing this. The S370/390/zSystem has a number of quirks different from the Wintel architecture or Mac hardware

  • It's big-endian (The number 1 stored as an integer 32-bit word internally as 00000001 while the I386 would store it as 00010000.)
  • It uses non-IEEE floating point so it may have different limit values (the zSystem has IEEE floating point available)
  • While it has more registers (15), about 5 of these are generally not usable due to conventions or hardware requirements (it's worse on Linux for S/390, instead of using one register to point to a list, it uses registers 0 through 6 to hold up to 6 integer arguments.)
  • The maximum amount of memory you can directly address at one time is much smaller (in ESA mode on the 370, you can only address about 4K at a time, either as code or data, the contents of one register with an unsigned offset of 2^12, or 4096). If you're working with two pieces of data, either may be up to 4K in size that you can work with directly. This has been expanded with the new 20-bit signed offset on the S/390 and zSystem, which means you can work with 512K in an area.
  • Depending on whether you target a 370, a 390 or a zSystem you may have access to a 32-bit address space or a 64-bit address space and a much larger area than 4K.
  • There are several different operating systems that could be targeted, such as
    • The MUSIC/SP emulator I'm using (not going to be very popular as MUSIC was essentially deprecated by its distributor, McGill University in Montreal)
    • a program running on a terminal under VM/370
    • a program running on the TSO timesharing system
    • a program running as a batch job on OS/VS1
    • a program potentially running as a screen application on the CICS terminal monitor (very similar to how Windows programs work, with a few gotchas)
    • a program running on Linux/370, or
    • The z/390 portable mainframe emulator (http://www.z390.org), written in Java by Don Higgins, runs on Windows and released as open source by Automated Software Tools Corporation. Allows running of Assembler and Cobol programs, and supports a subset of the VS/1 supervisor call set, which is entirely different from the Linux Supervisor call set and the argument passing rules are different under the old mainframe standard calling conventions. For example, to use dynamic memory in pascal you use the new procedure to allocate memory, and dispose to release it. At the system level, Linux uses the C-programming language names malloc() to allocate memory, and free() to release it. In VS/1, you use the operating system macro GETMAIN to allocate memory, and FREEMAIN to release it. Also, arguments on Linux are passed in registers 1 through 6; arguments on VS/1 are passed as pointers from a list pointed to by register 1.

This issue of where the program will run will be dealt with by using generic I/O instructions (basically private macros) and having an appropriate run-time library for the particular system.

  • The IBM 370/390/zSystem uses the EBCDIC character set, PCs use ASCII (or Unicode). The zSystem allows use of ASCII, and runs code in ASCII natively, so that's no longer a problem. But you may get some gotchas when transferring files; Windows can release source code using UTF-8, which allows for things like non-roman character sets (for doing other written languages like Japansese, Arabic, Greet or Russian natively in code) but it means you're not using plain 7-bit ASCII, you might even be using double-byte characters, which can be a surprise. I found this out when transferring a C source file so I could test a small program on one of a Linux/z390 machine IBM makes available at no charge for people testing development of applications and porting to the z/System, that text files can have extra characters you don't see because Windows handles Unicode internally, and usually seamlessly, but when those files move to a non-windows environment some of the extra characters show up.

Target Choice

IBM offers access to an actual z/System running Linux for up to 90 days without charge for porting applications to Linux, so I will take advantage of an actual mainframe to target. I will also use an S/390 emulator program written in Java on my Windows PC that allows programs to be run in an emulated OS/VS1 or VSE environment. That is, however, an EBCDIC environment so I have to be careful. I also got it partially wrong when I was making corrections in that I confused the S/370, S/390 or z/System processor, the CPU, with the operating system, which could be VS/1, the z390 emulator running a version of VS/1, or Linux.

To Begin

This compiler is huge. It's hundreds of source files, and is going to be an enormous task. (I compiled it once on my computer, the entire compiler, targeting itself, I386 on Windows. It is over 250,000 lines of code and about 208 units. But my computer is fairly fast, the entire compilation took less than 15 seconds.

Okay, so, given the size of the compiler, where do you start? Well, you start with the main program of the command-line compiler, and you look at it. That file is pp.pas. Recursively follow the sources of every unit referenced by it or any unit they reference until every one has been done (I more-or-less explain this in Part 2) and then you know that you caught all the places you might need to declare, add or change something. (It also gives you at least a fleeting understanding of what each unit does.)

I originally did this and realized I was looking at too much code. PP, which has conditional compilation directives for various machines, calls the procedure compile, which is located in module compiler. And that module uses conditional compilation to target certain processors or operating systems. And then we can follow that and see where it leads.

Note that line numbers indicated in any source file are from the version 2.6.0 compiler sources (having just been released, I'll also look at 2.6.2 if I can) and as such, as lines are added, other line numbers where things were found and changed will increase. So line numbers will be referenced in a file from top to bottom so the references should match. Also, so as not to brand this as "Windows centric" since the hope is to build a cross-compiler for I370 that could run on either Windows or Linux, when file names are specified, directory separators will use /.

Note that from this point on, all editing occurs in our "sandbox" directory separate from the original compiler. So let's get started, with Part 2 of this article.

Go back to zSeriesOnward to Part 2