Difference between revisions of "git migration"
Line 143: | Line 143: | ||
* update/sync | * update/sync | ||
**Get latest changes via command: git fetch [remote_server] | **Get latest changes via command: git fetch [remote_server] | ||
− | ***This only downloads the changes, without applying them. In order to apply them, you have to either do a 'git merge' (which, in case you have local commits, will create a merge commit) or 'git rebase', which will try to reapply your local commits on top of the remote changes. See: [[http://kristopherwilson.com/2015/02/12/stop-merging-master/]] | + | ***This only downloads the changes, without applying them. In order to apply them, you have to either do a 'git merge' (which, in case you have local commits, will create a merge commit) or 'git rebase', which will try to reapply your local commits on top of the remote changes. See: [[http://kristopherwilson.com/2015/02/12/stop-merging-master/]] Also: [[https://www.atlassian.com/git/tutorials/merging-vs-rebasing]] |
**Push latest changes via command: git push | **Push latest changes via command: git push | ||
* check for modifications (svn status) | * check for modifications (svn status) |
Revision as of 15:22, 10 February 2018
This page is about migrating FPC from SVN to git
Why?
- More flexible and modern SCM (normally faster than svn, FPC development workflow might improve)
- Some of the developers (known: Florian, Jonas) already use git svn regularily, for a lot of reasons (easier testing on different machines, recording of commits before pushing, parallel work on different features easily possible, before/after testing before pushing easily possible).
- Third party contributors might have an easier life.
- Stitching of the svn repository with the old cvs repositories possible: very complete history of FPC available
Why not ?
- Some of the developers don't use git (at least Marco), so it is an extra burden.
- conversion time, retraining.
- Client configuration needed after install ? SVN client does not need much configuration, if any.
- see list "misc"
- simple operations get more complicated. Complications also affect users that don't benefit from git. (double so if every commit must be done in a branch to emulate current merging model)
- uncertain merge model.
- The third party contribution (pull request and the like) is very informal, and many of the new, inexperienced contributors routinely run formatters and the like. Practicality might be limited. In the Linux kernel these go through a layer of contributors and then branch lieutenants, we don't have such tiered structure for vetting, and require some discretion on the hand of the submitter.
Concerns/Questions
What part of SVN to migrate ?
- More is better.
Jonas has a very complete git mirror of the SVN+CVS part.
(care needs to be taken: there used to be a time when copyrighted code was checked in)
A first test conversion by Florian using subgit was attempted: completed in 5 hours 1, crash. Looks OK. - Does git have some form of obliterate option (remove revs based on copyrighted sources) ? Would solve some of the importing old history problems.
- Yes, it seems it does. See this StackOverflow topic --Graeme (talk) 14:42, 19 December 2017 (CET)
- The SO topic does not help much, but after some experimenting I got
git filter-branch --index-filter 'git merge-base --is-ancestor <hash> $GIT_COMMIT ; if [ $? -eq 1 ] ; then git rm --cached --ignore-unmatch <path in repo/file> ; fi' HEAD
which removes all revisions of <path in repo/file> which are not ancestor of <hash>. So this removes old versions of files which were cleaned by the commit identified by <hash>. This is reasonable fast (~ 1 hour with my currently converted repository). Afterwards, the file appears to be committed by <hash>. Drawbacks: All history before of the file is lost, also of clean parts. All revisions afterwards get new hashes, so this means, all clones done before must be removed. --FPK (talk) 19:59, 19 December 2017 (CET)
- The SO topic does not help much, but after some experimenting I got
- Yes, it seems it does. See this StackOverflow topic --Graeme (talk) 14:42, 19 December 2017 (CET)
- In order to save on diskspace, find ways to tell user how to clone only a part.
- Use
git clone --depth 1
to get only the latest revision (with fetch, later on more revisions can be fetched if needed) - Is this really a problem? A git clone of FPC trunk with full history is 50% smaller (disk space usage) than a HEAD revision 'svn co' checkout using SubVersion. --Graeme (talk) 14:36, 19 December 2017 (CET)
- It is not: my test respositry is around 500 MB on linux after gc, sources itself are around 350 MB
- Use
--FPK (talk) 20:18, 19 December 2017 (CET)
- Just checked, only .svn directory of trunk checkout is 800MB If git clone is 500MB it is about 50% smaller. --AlexVinS (talk) 00:35, 20 December 2017 (CET)
- But after svn cleanup it gets reduced to 327MB. --Nickysn (talk) 01:19, 20 December 2017 (CET)
- With aggressive gc .git gets 425 MB --FPK (talk) 21:32, 21 December 2017 (CET)
- And do not forget: using git worktrees, you need only one .git dir for "trunk" and "fixes" --FPK (talk) 21:35, 21 December 2017 (CET)
- And I worry mainly about fpcbuild with its non differential history, since my most major disk constraints are VMs for release building Marcov (talk) 15:01, 21 December 2017 (CET)
- Just checked, only .svn directory of trunk checkout is 800MB If git clone is 500MB it is about 50% smaller. --AlexVinS (talk) 00:35, 20 December 2017 (CET)
Build repository
The fpc build repository uses svn:external references.
Git has modules, which is in essence the same. This needs to be properly set up.
Branching model ?
How to handle the fixes branch
FPC uses trunk for development. Release are generated from the fixes branch. The fixes branch consists of cherry picked commits from trunk. After a patch proved to be good in trunk, it is cherry picked and committed to the fixes branch. This is a development model which works very well with subversion and is used by FPC for almost 20 years. However, to make this development work well, it requires that the SCM supports cherry pick tracking, something git does not have. So this workflow does not work well with git. Changing the workflow of FPC is not an option at this point, it proved to work, furthermore there are good reasons to use it: When making a patch, it is often hard to know if it is good enough for fixes or not, normally, it is decided after some time or even short before the next release, if a patch is good and safe enough to make it into fixes. This might not happen because it required more changes and those changes got too invasive or because it e.g. turned out that caused incompatibilites which make the patch a candidate for the next minor release instead. So this is probably the key question for a full git migration: can git be used with the workflow fpc uses?
So far, various workflow models exist:
A successful Git branching model
Also known as "git-flow" model.
- Advantages
- A simple and logical workflow that plays to the strengths of Git - branching and merging.
- 'master', 'develop' and 'release' are good branch names which immediately makes it obvious what they are for. Many developers clone a repo and want or expect a stable version. This workflow allows just that - the 'master' branch is the default branch, and is always the latest stable release of the product.
- Features or multi-commit (complex) bug fixes show a clear commit history or related commits, that is easily tracked, viewed or even rolled-back in the commit history using a tool like gitk.
- Disadvantages
- It works only well if the the whole release policy of FPC is changed. This is something very likely not going to happen.
- cluttered and basically unreadable history
- doubled regression testing effort during development of new stuff
- test branch before merge
- test branch after merge, as merging could have broken things
micro management of branches: every single bug fixes requires and results in a new branchThis is simply not the case. single bug fixes can be committed into the 'develop' branch as a linear history. Only more complex (multi-commit) fixes or features will result in a parallel history or development, but quickly merged back into 'develop'. --Graeme (talk) 18:50, 17 December 2017 (CET)
- You do not see the problem: if it is in develop, it will never make it into master without merging whole develop into master (i.e. in terms of FPC releases this would mean a new minor release x.y+1.0). So each bug fix has to go into an own hotfixes branch (e.g. hotfix_bug12345) which is branched from master and merged first into develop. If it works, it will be merged back into master. --FPK (talk) 22:58, 17 December 2017 (CET)
- One has to know before pushing how invasive a change is because it has to go in the right branch. For compiler development this is very cumbersome.
- develop and release have no clearly described function it seems.
The cactus model
Also known as the anti-"A successful Git branching Model".
- Advantages
- clear and straight history
- no time consuming micro management of branches
- history and logs not cluttered with merge commits
- no breakage by wrong conflict resolving during merge commits by less experienced users
- Disadvantages
- Git was designed to work well with branches and handle merging as a common task. This workflow suggests merging as a bad idea, which is crazy.
- This model recommends that a human must now keep track of which commits to cherry-pick into other branches. This is bounds to fail in the long run and vital commits will be missed at some point. Merging two or more branches on the contrary will automatically handle all commits in a branch seamlessly, so why not use what Git was designed to do well.
- This model suggests that you can also cherry-pick from an unstable branch back into a stable release branch. This is just the wrong way round.
- This model sees some of its own flaws and suggests a 3rd party tool to help keep track of things like cherry-picking. This simply isn't needed if you use a better workflow.
- It suggests that 'rebase' get used often. This has always been frowned upon in the Git community, and it is widely known to use rebase sparingly as it rewrites history and doesn't result in a logical evolution of the code history.
- This worklfow suggests that a merge commit is a bad thing. A merge commit is a commit like any other. It can be a simple commit (with no conflicts), or can be a commit that resolves conflicts (thus code changes were required - which this commit records). Nothing bad about that.
See the Git project itself
- Advantages
- The git project itself gets an extreme amount of contributions via pull requests and emailed patches. Their workflow model clearly works well and uses branching and merging extensively - even though the commit history looks quite scary. :-)
- Disadvantages
- No real description of the process.
User management ?
Git has no concept of users. To manage permissions on a server, a separate program is needed.
- Correction: A separate program makes some functionality a bit easier but not mandatory. Simple UNIX style group and file permissions with SSH public keys work very well too. --Graeme (talk) 14:32, 2 January 2018 (CET)
gitorious
- Advantages
- uses git repo for administration
- No server binary
- Disadvantages
- No web interface
- administration needs ssh key, only ssh possible.
- Web integration ?
gitea
- Advantages
- Web based
- Fine tuning possible
- Disadvantages
- Separate config
- Requires running binary all the time, on a separate port.
Git differentiates between the person that does a commit, and the author of a change/patch. The latter is always available and doesn't need any repository permissions. Somebody with write permissions can commit patches (or merge pull requests) and keep both the Committer and Author information intact. SubVersion doesn't track the author of a change, only the person that did the commit.
What about Lazarus ?
Dependencies on Unix-world tools / ports ?
There seem to be dependencies on quite a few tools common on Unix platforms. Unfortunately, ports of these tools may not be up-to-date and/or working well on non-Unix platforms. As an example, port of GIT 2.13.3 to OS/2 lists dependencies on a Unix shell, Python, Perl and CURL among others. Using Unix shell for launching other programs on a non-Unix platform may have various unwanted effects (remember cygwin ports which we tried to avoid on MS Windows as much as possible for similar reasons). The list of dependencies seems to suggest that quite a few operations require external tools (similarly to merging with early SVN versions). It may be difficult to test all such cases potentially important for certain workflows in advance before the final switch.
Misc.
Collected from mails of Marco which is not handled in other sections:
- I really dislike losing global revs.
- Will we need to store anything in one repo (build and fpc?) Would be huge with all histo, and problematic for release building VMs etc . What are the options for partial checkouts/externals ?
- Can branches be in a not usable (multi-head) state? I heard some complaints about that, where users avoided work till it was fixed, and had flashbacks of locking VCSes all over again. Should not be possible.
- git requiring sequences of commands for simple operations (local than global push etc)
- lineendings, should be entirely server dictated. In general, most scenarios should be simple singular commands not several with bunches of parameters.
- from the "branching" paragraph: readable logs. Marco didn't add it because he never would have guessed it would be a problem.
- I heard github mentioned a few times, did sb check GH licensing?
- Answer from Chain|Q:
- https://help.github.com/articles/github-terms-of-service/
- Point "D" caused an uproar a while ago, because some thought that it implied that you're granting GitHub rights to use your content as they see fit. This has been clarified since, I think.
Scenarios to document
SVN scenarios that need an equivalent: (note that equivalency means that changes should be visible on the central server)
- install on new machine, mandatory configuration (preferably: none mandatory other than URL) for at least linux and windows supported client (tortoise?). Things to setup/configure(server dictated?)
- crlf?
- The defaults which git supplies (Linux, Mac and Windows) should be just fine. Repo internal storage is LF and checked out code is the native line endings of your system. The ".gitattributes" file could control line endings of specific file types.
- this file is in the repo, iow server dictated, making it zero conf?
- The defaults which git supplies (Linux, Mac and Windows) should be just fine. Repo internal storage is LF and checked out code is the native line endings of your system. The ".gitattributes" file could control line endings of specific file types.
- add username to commit message?
- crlf?
- Checkout a working repo
- command: git clone <url>
- update/sync
- Get latest changes via command: git fetch [remote_server]
- Push latest changes via command: git push
- check for modifications (svn status)
- command: git status
- see history log (svn log -v)
- command (console only): git log
- or command (gui showing all branches): gitk --all
- commit a simple modification.
- A two/three step process:
- git add <file_that_changed>
- git commit -m "commit message"
- git push ?
- This will refuse to work if there have been any remote changes (even if they are in different files). For comparison, Subversion will ask you to update and merge the remote version, only if there are any changes in the files you're trying to commit.
- A two/three step process:
- get the list of eligibles
- create a branch/tag (fixes/release/rc)
- All-in-one command: git checkout -b <new_branch_name> [starting_point]
- merge a revision to a different branch (trunk-> fixes or fixes ->release/rc)
- Very easy via the GUI gitk tool. Right-click and select "cherry pick"
- edit commit message
- If it is the last commit, then simply use 'git gui', select "amend last commit" and modify the committed files and message. Only for local commits though!
(new) git specific scenarios:
- ?
Tool for migration
subgit
Advantages
- very flexible
- very fast (remote cloning of the whole FPC repository takes 5-6 hours)
Disadvantages
- external java-based tool, so some dependencies
git-svn
Advantages
- included in git it self
Disadvantages
- less flexible
Work to do
Migrate SVN repo.
- 2017-12-16: A first test conversion by Florian using subgit was attempted: completed in 5 hours, 1 crash. Looks OK
Branch mapping for subgit
Proposal for branch mapping
trunk = trunk:refs/heads/master branches = branches/merged/*:refs/heads/merged/* branches = branches/joost/*:refs/heads/joost/* branches = branches/laksen/*:refs/heads/laksen/* branches = branches/maciej/*:refs/heads/maciej/* branches = branches/olivier/*:refs/heads/olivier/* branches = branches/paul/*:refs/heads/paul/* branches = branches/*:refs/heads/svn/* branches = branches/svenbarth/*:refs/heads/svenbarth/* branches = branches/tg74/*:refs/heads/tg74/* tags = tags/*:refs/tags/* shelves = shelves/*:refs/shelves/* excludeBranches = branches/aspect excludeBranches = branches/avr32 excludeBranches = branches/blaise excludeBranches = branches/cpstr excludeBranches = branches/cpstrnew excludeBranches = branches/ctypes excludeBranches = branches/dodi excludeBranches = branches/florian excludeBranches = branches/foxsen excludeBranches = branches/fcl-web_joost excludeBranches = branches/generics excludeBranches = branches/genfunc excludeBranches = branches/janbruns excludeBranches = branches/linker excludeBranches = branches/merged/avr excludeBranches = branches/merged/generics excludeBranches = branches/merged/nodeopt excludeBranches = branches/newthreading excludeBranches = branches/peterjan excludeBranches = branches/ssa excludeBranches = branches/tg74/rtl excludeBranches = branches/tg74/tests excludeBranches = branches/tg74/utils excludeBranches = branches/unitrw excludeBranches = branches/wkrenn excludeBranches = branches/FIXES_2_2 excludePath = /fixes_2_0 excludePath = /fixes_2_4 excludePath = /trunk
Set up user management and permissions.
Set up and automate github mirror
- I host the unofficial Git mirror at (http://github.com/graemeg/freepascal.git). I can share my cron jobs and script if anybody is interested. --Graeme (talk) 14:25, 2 January 2018 (CET)