Differences between revisions 15 and 17 (spanning 2 versions)
Revision 15 as of 2010-10-15 04:03:53
Size: 3363
Editor: mpm
Revision 17 as of 2012-05-08 14:44:08
Size: 3395
Editor: mpm
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Line 62: Line 65:
CategoryWindows CategoryOldFeatures CategoryAudit CategoryWindows CategoryOldFeatures

{i} This page does not meet our wiki style guidelines. Please help improve this page by cleaning up its formatting.


This page appears to contain material that is no longer relevant. Please help improve this page by updating its content.

Case-folding Plan

<!> This page is intended for developers.

To deal with CaseFolding on the repo side, we need to:

  • escape uppercase ASCII characters in filenames
  • escape high ASCII
    • Unicode and other characters may be case-folded as well
    • Filesystems and operating systems may do other unfortunate things to
      • filenames which will cause interoperability trouble
  • use the same scheme by default on all systems to avoid backup and media sharing issues

A simple escaping scheme is as follows:

  • replace _ with __

  • replace A-Z with _a, etc.
  • replace characters 126-255 and '\:*?"<>|' with ~7e to ~ff (note this escapes tilde as well

Note that we rarely need to

Implementation plan:

  • add separate localrepo access methods for all store data (changelog, manifest, data/*, journal, lock) (./)

  • if .hg/data exists at localrepo init time, use old access scheme (./)

  • if not, access all store data with escaped paths inside .hg/store/ (eg .hg/store/00changelog.i or .hg/store/data/_readme.i) (./)

This scheme will automatically escape all paths on newly cloned or created repos.

On the working directory side, the best we can do is detect collisions. A simple scheme might look something like this:

  • detect case sensitive filesystem at checkout/update time (./)

  • scan manifest for case-folding collisions and issue a warning (./)

There are some further issues on the working directory and user inteface side.

  • renames which only change case (e.g. foo -> Foo) will not be properly detected in the filesystem

  • user supplied filenames may differ in case from the actual file on disk (Question: is it reasonable to require the user to specify the correct case? Probably not, see Issue646).

Also, filesystems like OSX do Unicode normalisation, meaning that two filenames with differing normal forms may in fact be the same.

Finally, there are some filename identity issues even on Unix - the files foo/bar and baz/../foo/bar are the same. These are (presumably) solved on Unix, so looking at how the solution works may offer some advice on how to deal with user input issues.


  • Classify file names into different types:
    • Manifest internal (case sensitive always)
    • OS Native (possibly case or normalisation insensitive)
  • Identify which type of file name is involved in the various API calls
  • Determine the correct behaviour whenever the 2 types come into contact

There are some other differences between manifest internal and os native pathnames (the former always uses / path separators, where the latter uses os.sep) as well as differences between absolute and relative pathnames - in reviewing API calls, these differences should be noted as well.

In some cases, this may require carrying round of additional data, to preserve both the user-supplied name, and the actual filesystem canonical name.

See also

CategoryWindows CategoryOldFeatures

CaseFoldingPlan (last edited 2012-11-06 23:04:58 by abuehl)