git migration

From Free Pascal wiki
Jump to: navigation, search

This page is about migrating FPC from SVN to git

Why?

  • More flexible and modern SCM (normally faster than svn, FPC development workflow might improve)
  • Some of the developers (known: Florian, Jonas) already use git svn regularily, for a lot of reasons (easier testing on different machines, recording of commits before pushing, parallel work on different features easily possible, before/after testing before pushing easily possible).
  • Third party contributors might have an easier life.
  • Stitching of the svn repository with the old cvs repositories possible: very complete history of FPC available

Why not ?

  • Some of the developers don't use git (at least Marco, Nikolay), so it is an extra burden.
  • conversion time, retraining.
  • Client configuration needed after install ? SVN client does not need much configuration, if any.
  • see list "misc"
  • simple operations get more complicated. Complications also affect users that don't benefit from git. (double so if every commit must be done in a branch to emulate current merging model)
  • uncertain merge model.
  • The third party contribution (pull request and the like) is very informal, and many of the new, inexperienced contributors routinely run formatters and the like. Practicality might be limited. In the Linux kernel these go through a layer of contributors and then branch lieutenants, we don't have such tiered structure for vetting, and require some discretion on the hand of the submitter.

Concerns/Questions

What part of SVN to migrate ?

  • More is better.
    Jonas has a very complete git mirror of the SVN+CVS part.
    (care needs to be taken: there used to be a time when copyrighted code was checked in)
    A first test conversion by Florian using subgit was attempted: completed in 5 hours 1, crash. Looks OK.
  • Does git have some form of obliterate option (remove revs based on copyrighted sources) ? Would solve some of the importing old history problems.
Yes, it seems it does. See this StackOverflow topic --Graeme (talk) 14:42, 19 December 2017 (CET)
The SO topic does not help much, but after some experimenting I got git filter-branch --index-filter 'git merge-base --is-ancestor <hash> $GIT_COMMIT ; if [ $? -eq 1 ] ; then git rm --cached --ignore-unmatch <path in repo/file> ; fi' HEAD which removes all revisions of <path in repo/file> which are not ancestor of <hash>. So this removes old versions of files which were cleaned by the commit identified by <hash>. This is reasonable fast (~ 1 hour with my currently converted repository). Afterwards, the file appears to be committed by <hash>. Drawbacks: All history before of the file is lost, also of clean parts. All revisions afterwards get new hashes, so this means, all clones done before must be removed. --FPK (talk) 19:59, 19 December 2017 (CET)
  • In order to save on diskspace, find ways to tell user how to clone only a part.
Use git clone --depth 1 to get only the latest revision (with fetch, later on more revisions can be fetched if needed)
Is this really a problem? A git clone of FPC trunk with full history is 50% smaller (disk space usage) than a HEAD revision 'svn co' checkout using SubVersion. --Graeme (talk) 14:36, 19 December 2017 (CET)
It is not: my test respositry is around 500 MB on linux after gc, sources itself are around 350 MB

--FPK (talk) 20:18, 19 December 2017 (CET)

Just checked, only .svn directory of trunk checkout is 800MB If git clone is 500MB it is about 50% smaller. --AlexVinS (talk) 00:35, 20 December 2017 (CET)
But after svn cleanup it gets reduced to 327MB. --Nickysn (talk) 01:19, 20 December 2017 (CET)
With aggressive gc .git gets 425 MB --FPK (talk) 21:32, 21 December 2017 (CET)
And do not forget: using git worktrees, you need only one .git dir for "trunk" and "fixes" --FPK (talk) 21:35, 21 December 2017 (CET)
And I worry mainly about fpcbuild with its non differential history, since my most major disk constraints are VMs for release building Marcov (talk) 15:01, 21 December 2017 (CET)
--depth 1 works with submodules too--AlexVinS (talk) 15:55, 21 December 2017 (CET)
Don't worry too much: fpcbuild trunk .svn: 57.2 MB; full fpcbuild .git: 40.4 MB. So if you are still worried about disk space, we should switch to git asap --FPK (talk) 21:32, 21 December 2017 (CET)
That would resolve my scenario with the biggest (feared) size problems. Marcov (talk) 12:01, 22 December 2017 (CET)

Build repository

The fpc build repository uses svn:external references.

Git has modules, which is in essence the same. This needs to be properly set up.

Branching model ?

How to handle the fixes branch

FPC uses trunk for development. Release are generated from the fixes branch. The fixes branch consists of cherry picked commits from trunk. After a patch proved to be good in trunk, it is cherry picked and committed to the fixes branch. This is a development model which works very well with subversion and is used by FPC for almost 20 years. However, to make this development work well, it requires that the SCM supports cherry pick tracking, something git does not have. So this workflow does not work well with git. Changing the workflow of FPC is not an option at this point, it proved to work, furthermore there are good reasons to use it: When making a patch, it is often hard to know if it is good enough for fixes or not, normally, it is decided after some time or even short before the next release, if a patch is good and safe enough to make it into fixes. This might not happen because it required more changes and those changes got too invasive or because it e.g. turned out that caused incompatibilites which make the patch a candidate for the next minor release instead. So this is probably the key question for a full git migration: can git be used with the workflow fpc uses?

So far, various workflow models exist:

A successful Git branching model

Also known as "git-flow" model.

  • Advantages
    • A simple and logical workflow that plays to the strengths of Git - branching and merging.
    • 'master', 'develop' and 'release' are good branch names which immediately makes it obvious what they are for. Many developers clone a repo and want or expect a stable version. This workflow allows just that - the 'master' branch is the default branch, and is always the latest stable release of the product.
    • Features or multi-commit (complex) bug fixes show a clear commit history or related commits, that is easily tracked, viewed or even rolled-back in the commit history using a tool like gitk.
  • Disadvantages
    • It works only well if the the whole release policy of FPC is changed. This is something very likely not going to happen.
    • cluttered and basically unreadable history
    • doubled regression testing effort during development of new stuff
      • test branch before merge
      • test branch after merge, as merging could have broken things
    • micro management of branches: every single bug fixes requires and results in a new branch This is simply not the case. single bug fixes can be committed into the 'develop' branch as a linear history. Only more complex (multi-commit) fixes or features will result in a parallel history or development, but quickly merged back into 'develop'. --Graeme (talk) 18:50, 17 December 2017 (CET)
You do not see the problem: if it is in develop, it will never make it into master without merging whole develop into master (i.e. in terms of FPC releases this would mean a new minor release x.y+1.0). So each bug fix has to go into an own hotfixes branch (e.g. hotfix_bug12345) which is branched from master and merged first into develop. If it works, it will be merged back into master. --FPK (talk) 22:58, 17 December 2017 (CET)
    • One has to know before pushing how invasive a change is because it has to go in the right branch. For compiler development this is very cumbersome.
    • develop and release have no clearly described function it seems.

The cactus model

Also known as the anti-"A successful Git branching Model".

  • Advantages
    • clear and straight history
    • no time consuming micro management of branches
    • history and logs not cluttered with merge commits
    • no breakage by wrong conflict resolving during merge commits by less experienced users
  • Disadvantages
    • Git was designed to work well with branches and handle merging as a common task. This workflow suggests merging as a bad idea, which is crazy.
    • This model recommends that a human must now keep track of which commits to cherry-pick into other branches. This is bounds to fail in the long run and vital commits will be missed at some point. Merging two or more branches on the contrary will automatically handle all commits in a branch seamlessly, so why not use what Git was designed to do well.
    • This model suggests that you can also cherry-pick from an unstable branch back into a stable release branch. This is just the wrong way round.
    • This model sees some of its own flaws and suggests a 3rd party tool to help keep track of things like cherry-picking. This simply isn't needed if you use a better workflow.
    • It suggests that 'rebase' get used often. This has always been frowned upon in the Git community, and it is widely known to use rebase sparingly as it rewrites history and doesn't result in a logical evolution of the code history.
    • This worklfow suggests that a merge commit is a bad thing. A merge commit is a commit like any other. It can be a simple commit (with no conflicts), or can be a commit that resolves conflicts (thus code changes were required - which this commit records). Nothing bad about that.

See the Git project itself

  • Advantages
    • The git project itself gets an extreme amount of contributions via pull requests and emailed patches. Their workflow model clearly works well and uses branching and merging extensively - even though the commit history looks quite scary. :-)
  • Disadvantages
    • No real description of the process.

User management ?

Git has no concept of users. To manage permissions on a server, a separate program is needed.

  • Correction: A separate program makes some functionality a bit easier but not mandatory. Simple UNIX style group and file permissions with SSH public keys work very well too. --Graeme (talk) 14:32, 2 January 2018 (CET)

gitorious

  • Advantages
    • uses git repo for administration
    • No server binary
  • Disadvantages
    • No web interface
    • administration needs ssh key, only ssh possible.
    • Web integration ?

gitea

  • Advantages
    • Web based
    • Fine tuning possible
  • Disadvantages
    • Separate config
    • Requires running binary all the time, on a separate port.

Git differentiates between the person that does a commit, and the author of a change/patch. The latter is always available and doesn't need any repository permissions. Somebody with write permissions can commit patches (or merge pull requests) and keep both the Committer and Author information intact. SubVersion doesn't track the author of a change, only the person that did the commit.

What about Lazarus ?

Dependencies on Unix-world tools / ports ?

There seem to be dependencies on quite a few tools common on Unix platforms. Unfortunately, ports of these tools may not be up-to-date and/or working well on non-Unix platforms. As an example, port of GIT 2.13.3 to OS/2 lists dependencies on a Unix shell, Python, Perl and CURL among others. Using Unix shell for launching other programs on a non-Unix platform may have various unwanted effects (remember cygwin ports which we tried to avoid on MS Windows as much as possible for similar reasons). The list of dependencies seems to suggest that quite a few operations require external tools (similarly to merging with early SVN versions). It may be difficult to test all such cases potentially important for certain workflows in advance before the final switch.

OS/2 being the first commercial operating system to ship with Java built-in. So why not simply use the Java version of Git. The Eclipse and IntelliJ IDEA project do on a daily basis. --Graeme (talk) 14:39, 18 December 2017 (CET)

Misc.

Collected from mails of Marco which is not handled in other sections:

  • I really dislike losing global revs.
  • Will we need to store anything in one repo (build and fpc?) Would be huge with all histo, and problematic for release building VMs etc . What are the options for partial checkouts/externals ?
  • Can branches be in a not usable (multi-head) state? I heard some complaints about that, where users avoided work till it was fixed, and had flashbacks of locking VCSes all over again. Should not be possible.
  • git requiring sequences of commands for simple operations (local than global push etc)
  • lineendings, should be entirely server dictated. In general, most scenarios should be simple singular commands not several with bunches of parameters.
  • from the "branching" paragraph: readable logs. Marco didn't add it because he never would have guessed it would be a problem.
  • I heard github mentioned a few times, did sb check GH licensing?
Answer from Chain|Q:
https://help.github.com/articles/github-terms-of-service/
Point "D" caused an uproar a while ago, because some thought that it implied that you're granting GitHub rights to use your content as they see fit. This has been clarified since, I think.

Scenarios to document

SVN scenarios that need an equivalent: (note that equivalency means that changes should be visible on the central server)

  • install on new machine, mandatory configuration (preferably: none mandatory other than URL) for at least linux and windows supported client (tortoise?). Things to setup/configure(server dictated?)
    • crlf?
      • The defaults which git supplies (Linux, Mac and Windows) should be just fine. Repo internal storage is LF and checked out code is the native line endings of your system. The ".gitattributes" file could control line endings of specific file types.
        • this file is in the repo, iow server dictated, making it zero conf?
    • add username to commit message?
  • Checkout a working repo
    • command: git clone <url>
  • update/sync
    • Get latest changes via command: git fetch [remote_server]
      • This only downloads the changes, without applying them. In order to apply them, you have to either do a 'git merge' (which, in case you have local commits, will create a merge commit) or 'git rebase', which will try to reapply your local commits on top of the remote changes. See: [[1]] Also: [[2]]
    • Push latest changes via command: git push
  • check for modifications (svn status)
    • command: git status
  • see history log (svn log -v)
    • command (console only): git log
    • or command (gui showing all branches): gitk --all
  • commit a simple modification.
    • A two/three step process:
      • git add <file_that_changed>
      • git commit -m "commit message"
      • git push ?
        • This will refuse to work if there have been any remote changes (even if they are in different files). For comparison, Subversion will ask you to update and merge the remote version, only if there are any changes in the files you're trying to commit.
  • get the list of eligibles
  • create a branch/tag (fixes/release/rc)
    • All-in-one command: git checkout -b <new_branch_name> [starting_point]
  • merge a revision to a different branch (trunk-> fixes or fixes ->release/rc) (added later:I assumed this was tracked. The answer below might not reflect that)
    • Very easy via the GUI gitk tool. Right-click and select "cherry pick"
  • edit commit message
    • If it is the last commit, then simply use 'git gui', select "amend last commit" and modify the committed files and message. Only for local commits though!

(new) git specific scenarios:

  •  ?

Tool for migration

subgit

Advantages

  • very flexible
  • very fast (remote cloning of the whole FPC repository takes 5-6 hours)

Disadvantages

  • external java-based tool, so some dependencies

git-svn

Advantages

  • included in git it self

Disadvantages

  • less flexible

Work to do

Migrate SVN repo.

  • 2017-12-16: A first test conversion by Florian using subgit was attempted: completed in 5 hours, 1 crash. Looks OK

Branch mapping for subgit

Proposal for branch mapping

trunk = trunk:refs/heads/master
branches = branches/merged/*:refs/heads/merged/*
branches = branches/joost/*:refs/heads/joost/*
branches = branches/laksen/*:refs/heads/laksen/*
branches = branches/maciej/*:refs/heads/maciej/*
branches = branches/olivier/*:refs/heads/olivier/*
branches = branches/paul/*:refs/heads/paul/*
branches = branches/*:refs/heads/svn/*
branches = branches/svenbarth/*:refs/heads/svenbarth/*
branches = branches/tg74/*:refs/heads/tg74/*	
tags = tags/*:refs/tags/*
shelves = shelves/*:refs/shelves/*
excludeBranches = branches/aspect
excludeBranches = branches/avr32
excludeBranches = branches/blaise
excludeBranches = branches/cpstr
excludeBranches = branches/cpstrnew
excludeBranches = branches/ctypes
excludeBranches = branches/dodi
excludeBranches = branches/florian
excludeBranches = branches/foxsen
excludeBranches = branches/fcl-web_joost
excludeBranches = branches/generics
excludeBranches = branches/genfunc
excludeBranches = branches/janbruns
excludeBranches = branches/linker
excludeBranches = branches/merged/avr
excludeBranches = branches/merged/generics
excludeBranches = branches/merged/nodeopt
excludeBranches = branches/newthreading
excludeBranches = branches/peterjan
excludeBranches = branches/ssa
excludeBranches = branches/tg74/rtl
excludeBranches = branches/tg74/tests
excludeBranches = branches/tg74/utils
excludeBranches = branches/unitrw
excludeBranches = branches/wkrenn
excludeBranches = branches/FIXES_2_2
excludePath = /fixes_2_0
excludePath = /fixes_2_4
excludePath = /trunk

Set up user management and permissions.

Set up and automate github mirror