Nested repositories

/!\ An alternate implementation of this feature has been added to Mercurial 1.3 (1st July 2009). See subrepos for usage documentation.

Our intention is to integrate a subset of the functionality of the ForestExtension into the core of Mercurial, while maintaining simplicity. This isn't quite a design document: it's more an exploration of the different design decisions that might make sense, and what the tradeoffs are.

1. Similar concepts in other systems

git

svn

Perforce

2. Goals

The goal is to be able to use multiple repositories as a single, loosely coupled, unit. A "parent" has a notion of several "modules" that live under it. In at least some cases, performing a command in the parent should affect the modules.

By "loosely coupled", we mean that repositories are largely independent.

Relationships are hierarchical and one-to-many: a parent knows about its modules, but they do not know about their parent or sibling repositories.

2.1. Use cases

Here are the important needs we would like to at least consider.

2.2. Terminology

Names used by sundry systems:

I'm arbitrarily choosing "module".

3. Managing modules

Modules are listed explicitly, in a directory named .hgmodules in the root of the tree (suggested by BrendanCully). Each directory under .hgmodules corresponds to a module that will be present in the working directory. For example, a directory .hgmodules/foo/bar contains information about a module that will be located in foo/bar in the working directory.

The files and directories under .hgmodules are intended to be read and written by machine.

For each module, its directory must contain the following files:

The repository directory structure for the .hgmodules given above looks like this:

  parent-repo-dir/
    .hg/
    .hgmodules/foo/bar/default
    .hgmodules/quux/default
    <working dir content from parent-repo-dir>
    foo/bar/
      .hg/
      <working dir of foo/bar module>
    quux/
      .hg/
      <working dir of quux module>

The configuration files in these directories are plain text, but not intended to be edited by hand. How do we modify them?

  1. Do we modify the add, remove, and rename commands to edit them?

  2. Do we add a hg module command that will do some or all of the editing?

Probably the latter.

3.1. Discussion

  • KellyOhair:

    • A generic 'description = words' field per repository might be helpful
    • Some kind of 'forest tag' or ability to describe the state of the entire forest with one file or simple set of changeset ids. So something like the existing forest command 'hg fsnap' ('hg module --dumpstate > state'?) then being able to re-create a forest later in time with a 'hg clone -M state'? [BenoitAllard: see identify]

    • Do we need to protect ourselves from overlapping managed files? e.g. an outer repository managing files that are inside the inner repository directories? DovFeldstern: I think that we are already protected: given repo1/.hg and repo1/repo2/.hg, mercurial does not allow adding any file under repo2 to repo1 -- if you try, it responds with abort: path 'repo2/a' is inside repo 'repo2'. JensWWulf: hg will allow to add a file to two repos if you add it to the parent repo before adding it to the underlying repo. RomanBarczynski: hg should allow you to add/move/remove repo1/repo2/foo/bar file to both repo1 and repo2 anytime you want and not ignoring by default repo2 files in repo1 (e.g. repo2 contains third-party lib with your own config file you don't want to push upstream but you do want to manage it under repo1).

  • DovFeldstern:

    • Two additional suggestions which should not be hard to implement, and which would allow much more flexibility --- to the point of being able to build a full-fledged Configuration Management solution over mercurial:
      1. allow the user to specify a modules tree other than .hgmodules, using an option --modules-file / hgrc setting / environment variable.

      2. provide some kind of include mechanism in the modules file, with a clear override scheme for which version to use of specific modules which are included in more than one file (e.g., the latest included file overrides previously included files; the including file overrides included files)

  • JesseGlick:

    • Can a module itself have a .hgmodules files, recursively interpreted?
  • MartinGeisler:

    • Why use a directory structure instead of a single configuration file? I see that these files are not supposed to be edited by humans, but I see several advantages of a normal configuration file:
      1. I can (probably) edit a file much more quickly with an editor than using hg module ... commands.

      2. I can edit it in more advanced ways: search/replace, moving sections around.
      3. I can easily mail the configuration to other places. With a directory structure I have to wrap it up in a tarball/zipfile first.
      4. If the configuration file is parsed using, say, ConfigObj then it could also preserve comments left behind by the user. (But in a directory structure one might simply ignore unknown files and so treat them as "comments" so this might not be a big difference.)

  • ChrisSuter:

    • Why can we not just have Mercurial figure out that it's an existing Mercurial project when you do hg add, and simply mark the file as so, in the same way that you mark files as files, directories as directories, symbolic links as symbolic links, etc? The existing add, remove and rename commands would work fine thus negating the need for hg module add et al. Any additional metadata would need to be handled in a configuration file or, better, as versioned properties of elements in the repository. (I don't know if versioned properties exist, but it's better to have them as versioned properties as that will mean that when things are renamed, the properties will follow.)

    • It should be possible to have a project that contains source to also have sub projects, i.e. the root of the tree should not be just for referencing sub projects. It doesn't sound like anything proposed here would affect that but it's not clear.
    • All changes to the configuration must be versioned. DovFeldstern: I'm not sure if this is what was meant or not, but IMO changes to the configuration should not automatically trigger a commit; rather, .hgmodules will be changed as required, and the changes should be committed like any other change when the user chooses to commit.

  • CaseyLeedom:

    • This page needs a "Status" section describing the status of this proposed change. I.e. In design phase, in coding, proposed timeline, etc. I have a need of this kind of feature and from reading the page I have no idea as to whether it's "right around the corner" or still in the "vigorous hand waving" stage.
    • CVS also has the ability to manage multiple repository subdirectories from the current working directory. In fact, it doesn't even need a "CVS" directory in the "parent" directory. It simply automatically recurses down subdirectories to find */CVS/Repository and */CVS/Entries files. (Although I'm not sure how far down it will recurse -- maybe only a single level.

    • Has anyone discussed the idea of having changeset dependencies between modules? I.e. if I make a change in modules A and B and the changes in B require the corresponding changes in A (think of A as a library that B uses).

4. Important open questions

Does it only make sense to think about modules when we have a working directory? Presumably yes, but this introduces the need to possibly have a network connection in order to clone missing modules during a hg update or similar.

  • If not, where do modules live when we don't have a working directory? (It would be technically possible to separate a module's working directory from its repository, for example, though I'm not sure we want to go there.)

For now, I'm assuming that if there's no working directory, there are no modules.

Here's another sticky question without an obvious answer: By default, should commands that operate in the working directory recurse into modules?

  • A nice idea prompted on the MailingList is to enable/disable it via an option in the .hgrc file.

The alternative that I lean towards is to not recurse unless explicitly instructed to. Most probably, only a few commands should arguably even be aware of modules.

This model assumes that modules will usually only be read, and checked out at a fixed revision, such that automatically running status queries or updates in them makes little sense: they won't change often enough to be worth the effort. This is in line with the usual use of externals in SVN, and with CVS vendor branches.

For people who would be actively developing in multiple repositories, however, this provides poor support. If you have a better idea, let's hear it! Note that the existing config mechanism lets you add a "--modules" option to whatever commands you think need it.

If a command like "add" is run in a parent repository's working directory, and given a path to a file in a modules's working directory, what should its behaviour be? The current behaviour is to complain and fail: should this remain?

What about nested nested-repositories ? If I have a .hgmodules tree in one of my modules, should a command issued at the root level also recurse in those "sub-modules" ? I guess so.

/root/
  .hg/
  .hgmodules
  module1/
    .hg/
  module2/
    .hg/
    .hgmodules
    module21/
      .hg/
    module22/
      .hg/
  module3/
    .hg/

In the structure above, does a command issued at the root level should also take into account module21 and module22 ? If only module21 is listed in the .hgmodules of module2. What if I have module22 recorded as a module of root ?

5. User interface changes

5.1. The module command

We add the "module" command, for managing modules. It has several subcommands.

  • "add" introduces a single new module. A local copy of the repository must already be present. Options:
    • "-r": the revision to use.
    • "-b": the branch to use.
    • "-u": the URL to use.

      ~-Note by RonnyPfannschmidt: why not just use hg clone/hg init since most of the commands need to be module-aware anyway

      ~-Note by ArneBab: Why not hg module clone, as in hgsuversion? It would add the module to .hgmodules and then clone it to the specified location.

  • "remove" removes one or more modules. (This could be done by modifying the regular remove command...)

  • "record" updates the changeset ID associated with each module. Uses the working directory's parent from each module. Aborts if any module has zero or two parents.

To clone optional modules, do we extend the behaviour of the built-in clone command, or add a "clone" command here (+1) ?

5.2. Changes to existing commands

5.2.1. Uniform option naming

We introduce a standard -M / --modules option for commands that need to become module-aware. The name of the option is standard: its interpretation can change, depending on the command.

5.2.2. clone

5.2.3. update

Ideas that probably don't make sense:

5.2.4. add, remove, rename

5.2.5. pull

JesseGlick: I would expect pull -u (or fetch) with --modules to first update the parent, then inspect its updated .hgmodules to see what modules might be there that also need to be updated.

5.2.6. push

5.2.7. bundle

JesseGlick: I'm not sure what bundle --modules should do, actually. The current format can only bundle changesets from one repo.

5.2.8. incoming, outgoing

5.2.9. tag

5.2.10. branch, branches

5.2.11. status

5.2.12. identify

5.3. Questionable commands

Here are some possible behaviours for commands where it's really not clear that being module-aware makes sense at all.

5.3.1. commit

We have the possibility of rolling every commit back if any commit fails, when using --modules. Do we want to do this?

JesseGlick: commit --modules would be nice (for a forest of loosely synchronized repositories) but not essential.

MarcusLindblom: add an option in .hgmodules whether this is allowed or not, to allow any policy. Default to not allowed (unless forced) ?

5.3.2. Next sticky question

If we make "commit" module-aware, why not status, diff, and all the rest?

6. Implementation

Alexander Solovyov has a proof-of-concept implementation to provide subrepositories written as an extension. To make it work, a patched version of Mercurial is needed. See details in the extension docs.

There's also an implementation of subrepos as an integrated feature of the mercurial core.

One more implementation as an extension: subrepo extension

An extension for handling external dependencies (and Mercurial subrepositories): hgdeps extension


CategoryOldFeatures

NestedRepositories (last edited 2010-10-15 03:57:48 by mpm)