netsurf/docs/ideas/boxes.txt

Box tree dynamism
=================

The first building block needed for event-driven layout is for the box tree
construction and subsequent maintenance to become event driven and for the
box tree management implementation to emit events marking parts of the tree
changed so that the layout calculator can react to them.

We expect the box tree to be in normal form at all times (and certainly
whenever an event is emitted). The box tree manager must ensure that this is
the case.

[Note: the box tree manager could be plumbed into the existing content handler
with a little effort: have the box tree constructed dynamically during the
parse, but ignore box tree events and perform layout computation at the same
place as currently. As box tree events are ignored, subsequent changes to the
tree will not result in updated layout computations -- this might produce weird
effects, so an option might be to forcibly relayout the entire tree on receipt
of box tree change events received after the initial layout has been performed.
This would not be substantially different from the existing reformatting on
viewport dimension changes, and similarly heavyweight]

The box tree manager needs to know when any of the various inputs that
determine the shape of the box tree changes. These inputs comprise:

  1. Style changes
  2. DOM changes
  3. Dynamic pseudo classes
  4. Other non-DOM changes (predominantly Media Queries)

The following sections consider the each of these inputs and attempts to scope
out the impact of changes made to them.


Style changes
-------------

Style information can change as the source document is parsed and as external
stylesheets are fetched. Script may also modify style information via the CSS
Object Model or simply injecting its own style attribute values into Elements.
Changes to any of this information has the potential to cause wide-ranging
ripples across the box tree structure.


DOM changes
-----------

Given CSS 3 selectors and combinators, the impact of DOM changes is thus:

  1. Adding a new Element node:
       Selection must be re-run for all children of the new node's parent
       and their descendants (i.e. all trees derived from the new node and
       its siblings).
       [XXX: is this correct?] If it is known that neither the :nth-last-child()/:nth-last-of-type()
       pseudo-classes nor the sibling combinators are in use, the selection
       re-run can be limited to the preceding siblings and the new node (and
       their descendants).
  2. Removing a new Element node:
       Selection must be re-run for all children of the node's former parent
       and their descendants (i.e. all trees derived from the node's siblings)
  3. Moving an Element node:
       This is effectively a removal (see: 2) followed by an addition (see: 1)
       There is an optimisation to be had if the node is attached to one of
       its ancestors: in that case, selection need not be re-run on removal
       (as all of the affected nodes will be in a subtree of the node's
       siblings on re-insertion).
  4. Addition/Removal/Modification of an Element node's Attributes:
       Selection must be re-run for all children of the Element node's parent
       and their descendants (i.e. all trees derived from the node and its
       siblings).


Dynamic pseudo classes
----------------------

These require dynamic re-selection based on user activity. The most
aggressive of which is :hover (as, in theory, causes changes whenever the
user moves the mouse). So this probably requires the presence of some
debouncing logic to avoid excessive reselection every time the mouse
pointer moves 1px in any direction.

The set of affected nodes is as for node addition (where the directly
affected node serves as the added node).


Other Non-DOM changes
---------------------

Media queries will require re-selection to happen whenever any of the
queries properties (e.g. viewport dimension) changes. The impact in this
case, however, is completely unconstrained -- potentially the entire document
tree may require reselection.


Change detection
----------------

In general we want, wherever possible, to minimise the amount of work performed
to make the box tree reflect reality. This will be achieved through aggressive
filtering out of unnecessary work and reuse of existing results where possible.

The following sections consider change detection for each of the information
sources.


Detecting selection context changes
-----------------------------------

Style selection for a node is performed using a selection context
(simplistically: an ordered list of stylesheets). If a suitable hashing
scheme can be devised such that we can detect:

  a. changes to the contents of a stylesheet (e.g. as a result of rules
     or properties being added/removed/changed via the CSS Object Model)
  b. addition/removal/reordering of stylesheets within a selection context

and we record the aggregation of this information as a single hash value
within the selection context and also record this hash value alongside the
computed style for a node, then it becomes trivial to compare the recorded
value with the live value within the selection context to determine if there
is any need to do any work.

If work is necessary, and assuming no ability to determine the differences
between the prior and current set of input styles, a change to the selection
context effectively invalidates the computed styles for the entire tree. This
may require further optimisation, however.


Detecting DOM changes
---------------------

Changes to a given Element node that might affect style selection rules are:

  1. The Attributes associated with the Element

(no part of the node's QName can be altered after it is created).

In addition, the Element's position within the DOM also affects the set of
selection rules that apply to it. Specifically, given the existence of
ancestor and sibling combinators, changes to any part of the tree above and
to the left of an Element can affect its styling.

In other words, for a given Element node, we care about changes to:

  1. the Element itself
  2. any preceding sibling Element of the Element
  3. any of its ancestor Elements
  4. any of the preceding sibling Elements of its ancestor Elements

In the case where the tree structure itself does not change (i.e. only changes
to Attributes are in play), it is sufficient to mark the modified Element as
dirty and then proceed on the basis that the subtree of the Element's parent
element is dirty as a result (and, given that changes to an Element cannot
affect its preceding siblings, the processing of the parent's children can
skip over any that occur before a dirty child is encountered).

Where an Element (potentially the root of a subtree) is inserted into the DOM,
the Element itself should be marked dirty and its parent's subtree should be
processed (again, any child of the parent preceding a dirty child can be
skipped).

Where an Element is removed from the DOM, its (pre-removal) subsequent sibling
should be marked dirty and its (pre-removal) parent's subtree should be
processed (again, any child of the parent preceding a dirty child can be
skipped).

Under the (old) Mutation event model, we can use the following events to mark
nodes as dirty (and enqueue their parents for consideration):

  1. DOMAttrModified
     - Dispatched after the modification (addition/modification/removal) has
       happened
     - The event target is the Element node to mark dirty
     - The relatedNode field identifies the modified Attr
     - Its parent Element must be searched for
  2. DOMNodeInserted
     - Dispatched after the insertion is complete
     - The event target is the inserted node (which may not be an Element)
     - The relatedNode field identifies the parent node (again, may not be
       an Element)
  3. DOMNodeRemoved
     - Dispatched before the removal has begun
     - The event target is the removed node (which may not be an Element)
     - The relatedNode field identifies the parent node (again, may not be
       an Element)

Determining whether styling for descendants of the dirty node will change
requires an efficient way of evaluating whether the change to the dirty node
causes a change in the applicable selectors. One way might be to consider
ancestor/parent combinators where the dirty node appears on the left hand side
(i.e. as the ancestor/parent). If we can efficiently detect no change in the
potential set of applicable selectors (which will necessarily require knowing
which ones matched last time), then we can minimise the number of descendant
nodes for which we need to perform a full reselection.

Similarly, for subsequent siblings of the dirty node (and thus, their
descendants), we want to consider the (previous) sibling combinators where the
dirty node appears on the left hand side (i.e. as the preceding sibling).
Again, if there is no change to the set of applicable selectors, there is no
need to reselect for any subsequent sibling of the dirty node or their
descendants.

Note, however, that all dirty nodes in a tree need to be processed, so event
processing should not try to short circuit here.

In addition to DOM changes which might affect the application of styling, we
are also interested in changes to text content which may cause the creation,
destruction, modification, or invalidation of boxes intended to contain the
content. To do this, we will use the following events, marking the containing
Element as requiring reflow (this is distinct from marking it dirty for the
purposes of style selection):

  1. DOMCharacterDataModified
     - Dispatched after the modification has happened
     - The event target is the Text (or other, uninteresting) node
     - The relatedNode field identifies the parent node (which may not be an
       Element)
     - Its parent Element must be searched for
  2. DOMNodeInserted
     - Dispatched after the insertion is complete
     - The event target is the inserted node (which may not be a Text node)
     - The relatedNode field identifies the parent node (again, may not be
       an Element)
  3. DOMNodeRemoved
     - Dispatched before the removal has begun
     - The event target is the removed node (which may not be a Text node)
     - The relatedNode field identifies the parent node (again, may not be
       an Element)

Detecting changes to pseudo classes
-----------------------------------

Changes to structural pseudo classes should fall out of the generic DOM change
detection described above: the DOM tree will have to change for these to do so.

Dynamic pseudo classes, however, are a right pain, particularly in the case
where you have rules containing selectors that include combinators in
conjunction with pseudo classes. In these cases, the pseudo class might become
engaged on any other element in the tree (be that ancestral or siblings, or
both) surrounding the target element. Thus, there's no trivial way to
pre-compute an answer and test against it efficiently.

Consider something like this:

div:first-child + :hover wibble :last-child > foo { ... }

This will match a "foo" element whose parent is the last child of "foo"'s
grandparent and "foo"'s parent has an ancestor element, "wibble" and "wibble"
has an ancestor element that the user is hovering over and that ancestor
element is immediately preceded by a "div" that is the first child of their
parent. If the user isn't hovering over the intermediate ancestor, this rule
will not match, regardless of the shape of the document tree.

To make matters even more involved, the user may be considered to be hovering
over the intermediate ancestor as a result of hovering over some other
descendant of that ancestor. Also, because any ancestor of "wibble" with a
preceding first child "div" will do, the identity of that ancestor could
equally well change but the rule still match.

Perhaps, therefore, it becomes necessary to gather the set of nodes for which
each dynamic pseudo class is active and record this on the document element.
Then, for each subsequent selection, gather the set of nodes to which these
pseudo classes apply at that time. This gives two sets:

  a. The previous set of nodes for which each dynamic pseudo class was active
  b. The current set of nodes for which each dynamic pseudo class is active

The disjunctive union of these two sets produces a set of nodes for which the
active state of each dynamic pseudo class has changed between selections. Any
descendant of any node in this set, or of any sibling of any node in this set,
will require reselection.

The current set of nodes for which each dynamic pseudo class is active will be
recorded on the document element each time. [Note: it may be possible to
minimise the set sizes by taking account of the rules involving the relevant
pseudo class -- in the extreme case, no rules use the relevant pseudo class and
so the set will always be empty].


Detecting other non-DOM changes
-------------------------------

Media Queries Level 4 permits testing against a variety of mostly-static
properties of the rendering device. Some properties may change dynamically
(e.g. viewport width/height when the window is resized). Assuming that all
property values are always provided and they do not change regularly, the
simple approach would be to hash the information provided (i.e. property
names and values) and store the hash alongside computed styles. If the
hash value changes, then reselection must occur.

If the hash value changes, a more granular additional test would be to apply
the input properties to all known media queries and cache the results, thus
ensuring that range-based properties continue to match until they don't even
if there are slight changes to their declared values. The assumption is that
this matching will be performant (if it is not, more intelligence is likely
required).


Construction nuances
--------------------

Avoiding repetitious selection during document construction (assuming no
scripting causing DOM changes) would be desireable. Given the rules above,
performing selection for the Element node children of a node is best done
after the last child of the parent has been added to the document (i.e. when
the parser encounters the parent Element's closing tag in well-formed
document source).

This may, however, be difficult to detect reliably from the DOM event stream
(where only node insertion/removal events are emitted). In a well-formed
document, we know that a node's subtree is complete once either:

  a) a subsequent sibling is inserted into the node's parent; or
  b) a subsequent relation is inserted into one of the node's ancestors

In the case of the document root node, the completion of the parse is the
relevant point (as, until then, children may be changing).

Where the source document is not well-formed (i.e. the likes of the HTML5
adoption agency are in play), it is possible for nodes to be moved around
the tree or inserted in unlikely locations in the first place. Where this
happens, it will be difficult to avoid repetitious selection in a reliable
manner.

In either case, if repetitious selection is avoided at all, the box tree
will not be complete until the entire document parse has ended (and the
event-based box tree builder must take great care to ensure the tree is
in normal form at all times).

Assuming a well-formed input, all stylesheet references and inline styles
should be known when the Body Element is inserted into the document. There
is no point attempting to build any part of the box tree until all the style
information is available, so waiting until all of the external stylesheets
have been fetched is probably sensible (perhaps with some timeout to avoid
waiting forever -- if style information arrives late, then too bad, you get
to enjoy a load of repetitious selection).


Migration considerations
------------------------

The existing box tree structure conflates a variety of things. Specifically,
the box tree is not "just" a structure containing nodes to be laid out but it
also contains large swathes of information that is already stored in the DOM,
and also pieces of information that have no bearing on how the document is
laid out or displayed (e.g. form gadgets, which really belong on the DOM, and
object parameters).

Additionally, there are special-case box types for floated boxes (even using
different types to distinguish between left and right floated boxes),
linebreaks, and flex-layout boxes. There are also custom INLINE_CONTAINER
boxes (which serve as a transparent container for INLINE boxes when the
containing block only contains INLINE children, but act as a BLOCK box when
the containing block has both BLOCK and INLINE children). There's also the
magical INLINE_END box type which represents the far end of a fragmented
INLINE box.

Retaining the myriad box types (including the special cases) is possible, but
care will need to be taken to ensure that the box tree remains in normal form
as and when the type of a box changes (i.e. because the display or float
property values change). However, retaining the existing box types does
restrict what we can do (as they effectively lose information we would
otherwise want to retain). Therefore, the most sensible course of action
would likely be to hide the box type information behind some API surface
which the layout calculator can use to obtain it. This would allow the
existing layout calculator to obtain the legacy box type without needing to
bake this into the modern box model.

Boxes also contain various pieces of information created by the layout
calculator and stashed away for future use. Some generic mechanism to permit
the layout calculator to associate state with a box is is probably needed
(with such information being invalidated whenever the relevant section of the
tree changes).

There is some magic handling of trailing space characters in text elements
(simplistically: they're flagged on the preceding text box). This was probably
an optimisation at the time (as the box structure is relatively large), but
makes many things more complex. This optimisation should be removed entirely
and, instead, the boxes should reference the underlying text data attached to
the source DOM nodes (Note: some consideration needs to be given to whitespace
squashing here, to ensure that continues to happen in a sensible way,
preferably without duplicating the entire textual contents of the DOM into the
box tree).

Ideally, then, the box tree contents will be reduced to solely the information
needed to lay the document out and render it. Where possible, anything that is
already stored on the DOM should not be duplicated (as it is easy to obtain the
corresponding DOM node from the box and then query that).

It would be nice to retire talloc, too, as object lifetimes are now
well-defined.


Outwith the HTML content handler, there are several parts of the code that
expect to be able to access box objects and their contents:

  1. iframe handling

     This involves unpleasant collusion between the HTML content handler and
     the desktop UI. This layering violation should be cleaned up by
     introducing a proper orthogonal interface (if needed, or something else,
     if not). The public "html_redraw_a_box" API should die with it.

  2. save as text

     This is a similar layering violation -- the code is simply in the wrong
     place. It should be moved down into the HTML content handler and the
     higher level code should simply dispatch the operation through the content
     system.

There are also the following things that grub around inside the DOM to fish
out the box pointer:

  1. The JavaScript Canvas binding

     This is done so it can then call html__redraw_a_box, which is an API
     private to the HTML content implementation. Bad Daniel, no biscuit.

     The fix here is probably to have the binding fire a "redraw needed" event
     at the canvas dom_node (which it already knows) and for that to be caught
     and handled by the HTML content handler itself.

Additionally there are a few places that leak box pointers through APIs:

  1. The content_html_object struct contains the box pointer

     This is used by:

       a. save complete

          This simply checks that the pointer is non-NULL to determine whether
          to save the corresponding content for the object. Like save as text
          (described above), this arises as a result of a layering violation.
          The implementation should be pushed down into the HTML content
          handler and the higher level code should simply dispatch the
          operation through the content system. (This would also permit
          retirement of the similarly anachronistic html_get_document API).

       b. reload

          This cares not about the box pointer, but does process the objects to
          invalidate their caching status. Again, a layering violation: push
          the implementation down and invoke it via the content system.

  2. The textsearch_bounds content interface accepts box pointers (as does
     content_textsearch_add_match)

     While this is internal to the content infrastructure it would be better
     all round for the exposure of box pointers here to go away and be replaced
     with something more orthogonal.

There are also places where box pointers might once have been used and no
longer are, but where vestiges remain:

  1. The core text selection object contains a pointless box pointer field
     (it is initialised to NULL and never used)
  2. The Windows frontend's browser_mouse object contains a pointless box
     pointer field (it is never used).

  These should just be cleaned up.


Layout calculation
------------------

The layout calculators will be notified about changes to the box tree. Change
events will report the containing block-boxes in which the alterations were
made. This is analogous to the DOM's SubtreeModified event which is targetted
at the parent Element containing the modified subtree. (TBD whether one event
per containing block or one fat event, or some combination of the two -- in the
situation where positioned and/or floated boxes are involved, changes to the
in-flow box tree structure may result in *multiple* parent containing blocks
being affected).

The layout calculators will operate on the box tree to lay it out for a given
viewport width (where possible, only needing to reflow the altered containing
block contents, thus performing the minimal amount of work). The resulting
layout information should be cached on each box tree element (potentially with
some magic for inlines, as they get (de)fragmented depending upon available
space).

Once the position and dimensions of all boxes have been determined, the entire
document should be rendered *unclipped* to produce the intermediate structure
used for redraw. We expect this step to produce a stack of device-independent
render operations (one stack level per layer).

"Pictorally":

  [canvas] -> [command, ...]
  [     1] -> [command, ...]
  [     2] -> [command, ...]
  [     3] -> [command, ...]
  [     4] -> [command, ...]

where the document is then drawn layer-by-layer (canonically: starting at the
canvas and moving down the stack, overdrawing each layer on top of what has
been drawn before). [Note: the layers in this structure may not directly
reflect the z-indexes specified in the source stylesheets in part as a result
of the magical handling of elements which can create new stack levels -- see
Appendix E of CSS2.1 for the gory details].

With the exception of text plotting commands, the render operations are
entirely independent of the originating document and box tree. Text commands
will necessarily reference the source text data by holding references to the
relevant DOM string data, and whatever metadata is needed to index into it.
(As this data is referenced, if the DOM changes, the old text will continue
to be rendered safely until the redraw stack is updated).

To ensure that the document may be redrawn at any time, regardless of the
underlying state of the DOM, box tree, or layout calculation, and to ensure
that redraw output is consistent the redraw stack must be updated atomically
(simplistically: build a new one, flip the old and new pointers, and destroy
the old one; more optimised solutions are likely available).

[Note: we omit the precise detail about how the layout calculation is perfomed
as this is left for future work. The overall event flow and decoupling of
layout from rendering is, however, important to record].


Redraw
------

Redraw operates solely using the render stack. This is the point that clipping
is applied (e.g. to skip over any commands that would render outside the clip
boundary). Additionally, if output scaling is desired, it can be applied at
this point, too (i.e. by transforming the device-independent render operations
into device-dependent ones -- just apply a transformation matrix, right?).

Multiple buffering, if desired, is left up to the plotter implementation
provided by the client (as different frontends may wish to do things
differently).

[Note: again, the precise detail is omitted here].