Google Summer of Code 2008
For the last few years, Google has offered a fantastic opportunity for students to help out Open Source software projects in the summer while getting paid for it. It's called Google Summer of Code™, and it provides free software projects a great way of attracting development effort while providing software developers who are still in university with some interesting and useful experiences. Find out more about the Summer of Code (SoC) from [http://code.google.com/summerofcode.html their site].
Mercurial is a Distributed Version Control System (DVCS), and we are one of the mentoring organizations for 2008's Summer of Code (see also http://code.google.com/soc/2008/hg/about.html). In recent years (notably since the Linux kernel project started looking for a new VCS), DVCSs have rapidly started to gain traction. For Open Source projects, especially, the use of a distributed system can be a great enabler, in that it's much easier to keep track of your own changes and publish them back to the community in a structured way. With centralized systems, you have a central repository that only a happy few have (write) access to, and people spend a lot of time mailing around diffs. With distributed systems, everyone has their own branch, and it's very easy to publish it on the web, compare the changes to those on some other branch or merge two branches back together.
We believe distributed version control is the future, and it looks like many other technologists believe the same. There's some healthy competition between Open Source VCS systems. In one corner, there's the old CVS and the better, but still centralized and complex Subversion, and in the other corner, there are three main contenders in the distributed VCS space: git, Bazaar-NG and Mercurial. The competition for users is fierce and stimulates fast-paced development in all three systems. It's therefore important (and good for the competition) that projects like ours gain more developers, to keep up with the race for robustness, performance and features. We view the Summer of Code as a great opportunity to help us in this competition.
Written largely in Python (with the exceptions of a few performance-critical core modules), it is easy to develop for and allows for good code organization and clean abstractions. Moreover, it has been optimized from the start for good performance, using an append-only datastore, minimizing random disk access and smart uses of compression. Finally, it has extensive features for sharing changesets, over HTTP, SSH and in files over email and has an innovative extension (Mercurial Queues) to help building and organizing changesets into a thoroughly useful source history.
1. Project Ideas
Here are a bunch of project ideas you might like to apply for. Of course, if you have a different idea of something in Mercurial that badly needs fixing or some feature you think would make a difference, go ahead and apply with it! Some more project ideas can be found via NewFeatureDiscussions, CategoryNewFeatures and NewIdeas.
1.1. Instantaneous status on Windows, OS X
The InotifyExtension makes a huge difference to performance on moderate to large [:Repository:repositories] on Linux. Since Windows and Mac OS X provide file status notification APIs, it should be possible to port the inotify extension to one or other (or both) of these platforms, providing the same kinds of speed improvements as on Linux.
1.2. Improved named branches
While Mercurial already somewhat supports having multiple actual [:Branch:branches] in one repository, this support is perhaps less than polished. It would be nice, for example, if the web interface had better ways to expose the branches and if it became possible to explicitly close old branches. The [:NamedBranches:named branch] support really could use a lot of spit & polish.
1.3. Partial cloning
Currently, it's only possible to clone one whole repository at a time. PartialClone and TrimmingHistory could help make cloning more efficient by limiting the cloning process in either of two dimensions: time or space. For time, we could maybe clone the last few [:ChangeSet:changesets] and lazily fetch the rest as needed. For space, it would be nice if it was possible to clone just a subtree of any repositories. For these features, any number of thorny issues can arise because of current assumptions in Mercurial code.
1.4. Lightweight copies/renames
Copies and renames currently are not light-weight. Mercurial copies the copied/renamed source file to the new initial revision of the target file in its internal history store. For renames, this is especially counter-intuitive, as renaming a large file grows the store by the file's size. It would be better if Mercurial had some way of referring to the existing revision from the new file, while preserving its bounded I/O guarantees for retrieving revisions.
1.5. Mercurial Queues improvements
Mercurial Queues (MqExtension) is a somewhat unique feature of Mercurial allowing a very flexible way of accumulating history before finally writing it to the actual changelog. Currently, it has a number of rough edges that sometimes cause problems when Mercurial is suddenly interrupted or when the user acts in unforeseen ways. Additionally, rebasing with MQ, while generally pretty easy, is a more elaborate process than it needs to be. It would be nice if MQ grew a little smarter about some of the common cases and a little more robust in the face of inexperienced users.
1.6. Repository forests
The ForestExtension implements a solution for repositories that want to include a number of subrepositories (similar to svn:externals). For large projects and because DVCS systems in general advocate smaller repositories, it can be helpful to implement a coherent set of repositories. The extension currently tries to do this, but it has proved to be less than intuitive and possibly not the best design. It would be good if Mercurial could incorporate an improved version of this extension.
TortoiseHg is a GUI front-end, similar to TortoiseCVS and TortoiseSVN. For many people, this makes interacting with Mercurial much easier. Since TortoiseHG is not that mature yet, there's a lot of room for improvement, in many ways. This is also a great way of making Mercurial more accessible for new VCS users and converts from other VCSs.
1.8. Conversion tools
Mercurial is a relatively new entrant in the VCS market, and many projects are still using older VCSs such as CVS and SVN. While we currently have some tools to help migrate to Mercurial in the form of the ConvertExtension, these tools could certainly use more improvements. Specifically, enabling the use of Mercurial as a client for SVN or even git and/or Bazaar-NG repositories would be very nice, as it enables developers to make their own choice regarding the use of their VCS client, thereby drastically enlarging our userbase.
1.9. Rebase Command
Rebasing allows linear sequence of revisions to be moved from one parent revision to another, using merge tools to resolve conflicts. This can be done already using MQ extension, but the process is not straightforward and hard to reuse from other commands like conversion tools.
More details in RebasePlan
1.10. C version of some lower-level operations
It should be possible to extract a large performace improvement from writing a C version of some basic operations. The main candidate for this would be the code that handles the index of a [:RevlogNG:revlog]; other parts of the revlog implementation (like the heads operation that is used in the WireProtocol and when calculating tags) could also benefit quite a bit.
2. Notes on applying
Here are some tips on what you might want to include in your application.
- Tell us something about yourself. We don't know you, so it helps if you give an outline of your background and what your prior experiences are (e.g. Open Source development, using Python, general software engineering experience or education). But don't hesitate to apply just because this would be your first time working in Open Source or with Python; at some point this was new for each of us.
Make it clear that you've thought through your application. Don't use a project idea from this page verbatim. Instead, come up with your own proposal, or expand on a proposal from this page. Explore the code a little bit (it's Python, quite easy to read), read CategoryContributing to get a feel for how the community works. Make something of a schedule with some intermediate milestones that progress toward your project's goal.
- Communicate why you care, what you like about DVCSs in general and Mercurial specifically, what things you think could be improved and how they could be improved. We like working on this project because this project makes our work more effective, and we hope you'll like it, too. Showing your motivation helps.
Get feedback on your proposal from the community. The MailingLists and IRC (#mercurial on irc.freenode.net) are quite responsive. Ask around and get to know some people, see if they think your project is feasible or how you should change the scope to better fit the timeline and the project. (If you are new to Open Source development, reading [http://producingoss.com/en/communications.html#you-are-what-you-write this part of "Producing OSS"] may help you find the right tone.)
3. Getting things done
Some idea of how we would expect the project to be carried out.
- Working on Mercurial in the summer should be your main activity. Having a vacation for one or two weeks is fine, of course, but we want you to take the project and the time mentors put into it seriously. This also means that we want you to set some intermediate milestones to be able to keep track of your progress.
We want you to work in the open, with our community. Get on the MailingLists, both to ask for help and to provide it to other users, spend some time on IRC (#mercurial on irc.freenode.net) discussing your work with other developers and explaining Mercurial to all the new users coming in with questions. Set up a public repository with an MQ containing your patches against the [:CrewRepository:crew repository] (on [http://freehg.org/ freehg.org], for instance; see also MercurialHosting).
- We're not going to just compare what you did at the end to what you stated you'd be doing in the beginning. We want you to put effort into the project, to think about the feature you're doing, to communicate with the community and integrate your code with the project. If you end up implementing some other cool feature or fixing some annoyance, that is great as well.
[wiki:mpm MattMackall] (mpm on IRC; creator of Mercurial)
- Alexis S. L. Carvalho (asak on IRC)
PeterArrenbrecht (parren on IRC)
DirkjanOchtman (djc on IRC)
5. Students Considering Application
StefanoTortarolo (astratto on IRC) for improving rebase (see http://marc.info/?l=mercurial-devel&m=120639540315984&w=2)
- (Bucciarati on IRC) for C versions and maybe named branches
FredericRechtenstein (Mc2 on IRC) for partial clones (see http://article.gmane.org/gmane.comp.version-control.mercurial.devel/15576)
AnantNarayanan for instant status on Mac OS X
AmauryGauthier for partial clones, mq, or conversion tools
"Google Summer of Code" is a trademark by Google Inc.