Issue1839

Title annotate ignores topology of changelog graph (sometimes?)
Priority bug Status chatting
Superseder Nosy List antmar, friedrich, gward, mg, parren, tonfa
Assigned To Topics

Created on 2009-09-23.16:23:40 by gward, last changed 2010-02-15.14:07:27 by gward.

Messages
msg11726 (view) Author: gward Date: 2010-02-15.14:07:27
I complained about this family of bugs on the mailing list recently
(specifically, on the unreliability of 'hg log <file>').  Matt sketched a
possible solution in his reply.  I'll just paste it in here for the record.

"""
I bet there's actually a relatively lightweight algorithm for
reconstructing a complete changeset to file revision mapping that would
solve all the problems in this area (log, annotate, deletions, and
merge). Consider:

cset graph:

1-2-3-4-5-6-7

file graph:

a2-b7  (number is the linkrev)

The file graph implies that csets 2-6 all contain revision a with no
deletions. And we also know it's not present in 1. We can't delete the
file and reintroduce it without adding a parentless node to the file
graph. Now consider:

cset graph:

1-2-3-4-5-6-10-11-12-13
    \
     7-8-9

file graph:

a2-b4-c9
   \
    d10

Again, not present in 1, a in 2 and 3, b in 4. But what's happening in
c9? The parent cset of 9 must contain b (the duplicate revision on
branch case), which means b is in 8. Which leaves us with either a or b
in 7. And we can quickly test whether the file was modified in cset 7
(appears in the ctx.files()) without doing an expensive manifest lookup.

We also know that csets 5 and 6 must contain b because no other
revisions can appear between 4 and 10.

That leaves csets 11-13. The only thing that can happen here is for the
file to be deleted or remain revision d. It can't switch to c because c
is not a descendant of d. There are a couple tests we can make here. If
the file is present in 13 (via slow manifest lookup), we know it's
present in 11 and 12. Alternately, if it's not changed in 11 and 12 (via
two fast cset checks), we know it's present in 13.

I'm not sure what a complete algorithm looks like (and there are a bunch
of other cases to be considered!), but it seems pretty clear there's
enough implied information here that we can figure most of it out
directly from the graphs, without more expensive cset and manifest
lookups.

(Which is fortunate, because changing the schema is still out of the
question.)
"""
msg11725 (view) Author: tonfa Date: 2010-02-15.13:28:50
Is there some write-up somewhere? Even if it's very little.
msg11724 (view) Author: parren Date: 2010-02-15.13:18:21
There was a bit of discussion on it at the sprint actually, but it did
not get far.
msg11720 (view) Author: tonfa Date: 2010-02-15.10:07:59
This was actually a proposed topic at the sprint (reconciliating file- and 
changelog- graphs). Too bad we didn't find the time to work on that.
msg11719 (view) Author: tonfa Date: 2010-02-15.10:05:42
I posted on the wrong issue. Sorry about that!
msg11717 (view) Author: tonfa Date: 2010-02-15.10:04:37
What's the status?
msg11665 (view) Author: antmar Date: 2010-02-11.00:28:06
On the mailing list, Greg Ward mentioned that caching annotations will be 
especially useful with the more accurate algorithm mentioned here [1].

If this algorithm was implemented in the annotate command, would there be 
interest to integrate the annotation caching code (see [1]) in Mercurial core? 
Alternatively, this could be made part of the cache-annotations extension.


[1] http://selenic.com/pipermail/mercurial-devel/2010-January/018304.html
msg10556 (view) Author: gward Date: 2009-09-23.16:27:35
On the mailing list, MPM explained the problem:

"""
The problem is that the file-level DAG is a subset of the changelog DAG
and has no way of representing two changes with (identical delta, same
parents). The only fix is to walk the entire changelog searching for all
changes that touch a given file to rebuild a complete graph. There's not
even a good way of detecting you might hit the problem without walking
the cset DAG.

Now consider that this will make all annotates on a large project take
10-100 times as long while being more accurate in only a small subset of
cases (you must have identical changes with identical parents and no
subsequent merge).
"""

So it sounds like there is no obvious/easy fix.

This may be vaguely related to issue1327.
msg10555 (view) Author: gward Date: 2009-09-23.16:26:30
For the record, here's how I created this test repository:

hg init annotate
cd annotate
echo hello > hello.txt
hg ci -A -m'init'
echo world >> hello.txt 
hg ci -m'change on branch (will not merge)'
hg up 0
echo world >> hello.txt
hg ci -m'duplicate change from branch (not a merge)'
sed -i -e 's/hello/goodbye/' hello.txt 
hg ci -m"unrelated change"
msg10554 (view) Author: gward Date: 2009-09-23.16:23:39
It appears that 'hg annotate' ignores topology and simply iterates
over revisions.  Consider this history:

"""
@  changeset:  3:36b0380f33d2
|  summary:    unrelated change
|
|  diff --git a/hello.txt b/hello.txt
|  --- a/hello.txt
|  +++ b/hello.txt
|  @@ -1,2 +1,2 @@
|  -hello
|  +goodbye
|   world
|
o  changeset:  2:5bb8f2b05956
|  summary:    duplicate (not merge) patch from branch
|
|  diff --git a/hello.txt b/hello.txt
|  --- a/hello.txt
|  +++ b/hello.txt
|  @@ -1,1 +1,2 @@
|   hello
|  +world
|
| o  changeset:  1:0e02d370eb6f
|/   summary:    edit on branch (will not merge)
|
|    diff --git a/hello.txt b/hello.txt
|    --- a/hello.txt
|    +++ b/hello.txt
|    @@ -1,1 +1,2 @@
|     hello
|    +world
|
o  changeset:  0:62107b5aea16
  summary:    init

  diff --git a/hello.txt b/hello.txt
  new file mode 100644
  --- /dev/null
  +++ b/hello.txt
  @@ -0,0 +1,1 @@
  +hello
"""

If I run

  hg annotate -r3 hello.txt

then I want to know "how did each line get into revision 3?", not "how
did each line get into the repository as a whole?".  But it seems like
I'm getting the answer to the latter question:

"""
$ hg annotate -r3 hello.txt
3: goodbye
1: world
"""

The problem is that rev 1 is not an ancestor of rev 3, since it never
got merged.  So changes in 1 cannot affect what's in 3.  Instead, the
patch in 1 was duplicated in 2, and *that* is the changeset I wanted
"annotate" to pinpoint.
History
Date User Action Args
2010-02-15 14:07:27gwardsetnosy: tonfa, mg, parren, gward, friedrich, antmar
messages: + msg11726
2010-02-15 13:28:50tonfasetnosy: tonfa, mg, parren, gward, friedrich, antmar
messages: + msg11725
2010-02-15 13:18:21parrensetnosy: tonfa, mg, parren, gward, friedrich, antmar
messages: + msg11724
2010-02-15 10:07:59tonfasetnosy: tonfa, mg, parren, gward, friedrich, antmar
messages: + msg11720
2010-02-15 10:05:42tonfasetnosy: tonfa, mg, parren, gward, friedrich, antmar
messages: + msg11719
2010-02-15 10:04:37tonfasetnosy: + tonfa
messages: + msg11717
2010-02-11 00:28:06antmarsetnosy: + antmar
messages: + msg11665
2010-01-31 22:34:01mgsetnosy: + mg
2009-10-20 15:29:45friedrichsetnosy: + friedrich
2009-09-24 07:04:08parrensetnosy: + parren
2009-09-23 16:27:36gwardsetmessages: + msg10556
2009-09-23 16:26:30gwardsetstatus: unread -> chatting
messages: + msg10555
2009-09-23 16:23:40gwardcreate