Differences between revisions 18 and 19
Revision 18 as of 2012-10-25 21:44:49
Size: 3351
Editor: mpm
Comment:
Revision 19 as of 2012-11-06 23:04:58
Size: 3332
Editor: abuehl
Comment:
Deletions are marked like this. Additions are marked like this.
Line 60: Line 60:
 * CaseFoldPlugin

/!\ This page is primarily intended for Mercurial's developers.

/!\ This page is no longer relevant but is kept for historical purposes.

Case-folding Plan

To deal with CaseFolding on the repo side, we need to:

  • escape uppercase ASCII characters in filenames
  • escape high ASCII
    • Unicode and other characters may be case-folded as well
    • Filesystems and operating systems may do other unfortunate things to
      • filenames which will cause interoperability trouble
  • use the same scheme by default on all systems to avoid backup and media sharing issues

A simple escaping scheme is as follows:

  • replace _ with __

  • replace A-Z with _a, etc.
  • replace characters 126-255 and '\:*?"<>|' with ~7e to ~ff (note this escapes tilde as well

Note that we rarely need to

Implementation plan:

  • add separate localrepo access methods for all store data (changelog, manifest, data/*, journal, lock) (./)

  • if .hg/data exists at localrepo init time, use old access scheme (./)

  • if not, access all store data with escaped paths inside .hg/store/ (eg .hg/store/00changelog.i or .hg/store/data/_readme.i) (./)

This scheme will automatically escape all paths on newly cloned or created repos.

On the working directory side, the best we can do is detect collisions. A simple scheme might look something like this:

  • detect case sensitive filesystem at checkout/update time (./)

  • scan manifest for case-folding collisions and issue a warning (./)

There are some further issues on the working directory and user inteface side.

  • renames which only change case (e.g. foo -> Foo) will not be properly detected in the filesystem

  • user supplied filenames may differ in case from the actual file on disk (Question: is it reasonable to require the user to specify the correct case? Probably not, see Issue646).

Also, filesystems like OSX do Unicode normalisation, meaning that two filenames with differing normal forms may in fact be the same.

Finally, there are some filename identity issues even on Unix - the files foo/bar and baz/../foo/bar are the same. These are (presumably) solved on Unix, so looking at how the solution works may offer some advice on how to deal with user input issues.

Proposal:

  • Classify file names into different types:
    • Manifest internal (case sensitive always)
    • OS Native (possibly case or normalisation insensitive)
  • Identify which type of file name is involved in the various API calls
  • Determine the correct behaviour whenever the 2 types come into contact

There are some other differences between manifest internal and os native pathnames (the former always uses / path separators, where the latter uses os.sep) as well as differences between absolute and relative pathnames - in reviewing API calls, these differences should be noted as well.

In some cases, this may require carrying round of additional data, to preserve both the user-supplied name, and the actual filesystem canonical name.

See also


CategoryWindows CategoryOldFeatures

CaseFoldingPlan (last edited 2012-11-06 23:04:58 by abuehl)