Created on 2009-09-23.16:23:40 by gward, last changed 2010-02-15.14:07:27 by gward.
| msg11726 (view) |
Author: gward |
Date: 2010-02-15.14:07:27 |
|
I complained about this family of bugs on the mailing list recently
(specifically, on the unreliability of 'hg log <file>'). Matt sketched a
possible solution in his reply. I'll just paste it in here for the record.
"""
I bet there's actually a relatively lightweight algorithm for
reconstructing a complete changeset to file revision mapping that would
solve all the problems in this area (log, annotate, deletions, and
merge). Consider:
cset graph:
1-2-3-4-5-6-7
file graph:
a2-b7 (number is the linkrev)
The file graph implies that csets 2-6 all contain revision a with no
deletions. And we also know it's not present in 1. We can't delete the
file and reintroduce it without adding a parentless node to the file
graph. Now consider:
cset graph:
1-2-3-4-5-6-10-11-12-13
\
7-8-9
file graph:
a2-b4-c9
\
d10
Again, not present in 1, a in 2 and 3, b in 4. But what's happening in
c9? The parent cset of 9 must contain b (the duplicate revision on
branch case), which means b is in 8. Which leaves us with either a or b
in 7. And we can quickly test whether the file was modified in cset 7
(appears in the ctx.files()) without doing an expensive manifest lookup.
We also know that csets 5 and 6 must contain b because no other
revisions can appear between 4 and 10.
That leaves csets 11-13. The only thing that can happen here is for the
file to be deleted or remain revision d. It can't switch to c because c
is not a descendant of d. There are a couple tests we can make here. If
the file is present in 13 (via slow manifest lookup), we know it's
present in 11 and 12. Alternately, if it's not changed in 11 and 12 (via
two fast cset checks), we know it's present in 13.
I'm not sure what a complete algorithm looks like (and there are a bunch
of other cases to be considered!), but it seems pretty clear there's
enough implied information here that we can figure most of it out
directly from the graphs, without more expensive cset and manifest
lookups.
(Which is fortunate, because changing the schema is still out of the
question.)
"""
|
| msg11725 (view) |
Author: tonfa |
Date: 2010-02-15.13:28:50 |
|
Is there some write-up somewhere? Even if it's very little.
|
| msg11724 (view) |
Author: parren |
Date: 2010-02-15.13:18:21 |
|
There was a bit of discussion on it at the sprint actually, but it did
not get far.
|
| msg11720 (view) |
Author: tonfa |
Date: 2010-02-15.10:07:59 |
|
This was actually a proposed topic at the sprint (reconciliating file- and
changelog- graphs). Too bad we didn't find the time to work on that.
|
| msg11719 (view) |
Author: tonfa |
Date: 2010-02-15.10:05:42 |
|
I posted on the wrong issue. Sorry about that!
|
| msg11717 (view) |
Author: tonfa |
Date: 2010-02-15.10:04:37 |
|
What's the status?
|
| msg11665 (view) |
Author: antmar |
Date: 2010-02-11.00:28:06 |
|
On the mailing list, Greg Ward mentioned that caching annotations will be
especially useful with the more accurate algorithm mentioned here [1].
If this algorithm was implemented in the annotate command, would there be
interest to integrate the annotation caching code (see [1]) in Mercurial core?
Alternatively, this could be made part of the cache-annotations extension.
[1] http://selenic.com/pipermail/mercurial-devel/2010-January/018304.html
|
| msg10556 (view) |
Author: gward |
Date: 2009-09-23.16:27:35 |
|
On the mailing list, MPM explained the problem:
"""
The problem is that the file-level DAG is a subset of the changelog DAG
and has no way of representing two changes with (identical delta, same
parents). The only fix is to walk the entire changelog searching for all
changes that touch a given file to rebuild a complete graph. There's not
even a good way of detecting you might hit the problem without walking
the cset DAG.
Now consider that this will make all annotates on a large project take
10-100 times as long while being more accurate in only a small subset of
cases (you must have identical changes with identical parents and no
subsequent merge).
"""
So it sounds like there is no obvious/easy fix.
This may be vaguely related to issue1327.
|
| msg10555 (view) |
Author: gward |
Date: 2009-09-23.16:26:30 |
|
For the record, here's how I created this test repository:
hg init annotate
cd annotate
echo hello > hello.txt
hg ci -A -m'init'
echo world >> hello.txt
hg ci -m'change on branch (will not merge)'
hg up 0
echo world >> hello.txt
hg ci -m'duplicate change from branch (not a merge)'
sed -i -e 's/hello/goodbye/' hello.txt
hg ci -m"unrelated change"
|
| msg10554 (view) |
Author: gward |
Date: 2009-09-23.16:23:39 |
|
It appears that 'hg annotate' ignores topology and simply iterates
over revisions. Consider this history:
"""
@ changeset: 3:36b0380f33d2
| summary: unrelated change
|
| diff --git a/hello.txt b/hello.txt
| --- a/hello.txt
| +++ b/hello.txt
| @@ -1,2 +1,2 @@
| -hello
| +goodbye
| world
|
o changeset: 2:5bb8f2b05956
| summary: duplicate (not merge) patch from branch
|
| diff --git a/hello.txt b/hello.txt
| --- a/hello.txt
| +++ b/hello.txt
| @@ -1,1 +1,2 @@
| hello
| +world
|
| o changeset: 1:0e02d370eb6f
|/ summary: edit on branch (will not merge)
|
| diff --git a/hello.txt b/hello.txt
| --- a/hello.txt
| +++ b/hello.txt
| @@ -1,1 +1,2 @@
| hello
| +world
|
o changeset: 0:62107b5aea16
summary: init
diff --git a/hello.txt b/hello.txt
new file mode 100644
--- /dev/null
+++ b/hello.txt
@@ -0,0 +1,1 @@
+hello
"""
If I run
hg annotate -r3 hello.txt
then I want to know "how did each line get into revision 3?", not "how
did each line get into the repository as a whole?". But it seems like
I'm getting the answer to the latter question:
"""
$ hg annotate -r3 hello.txt
3: goodbye
1: world
"""
The problem is that rev 1 is not an ancestor of rev 3, since it never
got merged. So changes in 1 cannot affect what's in 3. Instead, the
patch in 1 was duplicated in 2, and *that* is the changeset I wanted
"annotate" to pinpoint.
|
|
| Date |
User |
Action |
Args |
| 2010-02-15 14:07:27 | gward | set | nosy:
tonfa, mg, parren, gward, friedrich, antmar messages:
+ msg11726 |
| 2010-02-15 13:28:50 | tonfa | set | nosy:
tonfa, mg, parren, gward, friedrich, antmar messages:
+ msg11725 |
| 2010-02-15 13:18:21 | parren | set | nosy:
tonfa, mg, parren, gward, friedrich, antmar messages:
+ msg11724 |
| 2010-02-15 10:07:59 | tonfa | set | nosy:
tonfa, mg, parren, gward, friedrich, antmar messages:
+ msg11720 |
| 2010-02-15 10:05:42 | tonfa | set | nosy:
tonfa, mg, parren, gward, friedrich, antmar messages:
+ msg11719 |
| 2010-02-15 10:04:37 | tonfa | set | nosy:
+ tonfa messages:
+ msg11717 |
| 2010-02-11 00:28:06 | antmar | set | nosy:
+ antmar messages:
+ msg11665 |
| 2010-01-31 22:34:01 | mg | set | nosy:
+ mg |
| 2009-10-20 15:29:45 | friedrich | set | nosy:
+ friedrich |
| 2009-09-24 07:04:08 | parren | set | nosy:
+ parren |
| 2009-09-23 16:27:36 | gward | set | messages:
+ msg10556 |
| 2009-09-23 16:26:30 | gward | set | status: unread -> chatting messages:
+ msg10555 |
| 2009-09-23 16:23:40 | gward | create | |
|