Data Model Issues and Assumptions

Assumptions

We model only one person (and possibly their assistant), not groups of individuals. However, presentities can be an aggregate name for a group of people. For example, the unified presence information for all sales staff can be represented by the presentity sales@example.com. An outside observer cannot tell whether sales@example.com is a single individual or a group whose presence information has been aggregated into one presentity.
While the persons describable by SIMPLE presence information can physically only be in one place, its devices reporting person state may (inadvertently or intentionally) be in several places and thus an observer (watcher or composer) can only assign probabilistic estimates as to where the person "really" is. (We will, however, not attempt to describe people by Schroedinger equations.)
We describe presentities as a set of <tuple>, <device> and <person> elements. (There can be multiple tuples and devices; the number of person elements is contentious.)
Activities, mood, place-type, privacy, sphere, status-icon and time-zone describe a person, even if the information is derived from a device or service. They are contained within a <person> element.
<user-input> describes a service (tuple) or device.
A watcher may be able to resolve the same URI appearing in two tuples into two different services by mechanisms such as caller preferences or session-level negotiation.
The data model MUST work with elementary composers that only concatenate or replace existing tuples. (It MUST also work with smart composers to be specified in the future, but that seems less likely to be a problem.)
Presence data for a single person or device may be published by different publishers (e.g., different devices) that are unaware of each other's existence.
In some cases, the watcher is smarter than the composer, particularly because the watcher may be human and the composer is almost always a program. Such watchers should not be denied information due to constraints in the data model.
We cannot assume that a composer will be able to resolve all contradictions or even recognize a contradiction. (For example, if activities "on-the-phone" and "in-a-meeting" are reported, this may be true or conflicting information.)
Work correctly, including replacement, with composers that do not understand a certain service URI scheme and, in particular, may not know if a URI is "special" (AOR, GRUU, whatever).
Allow the watcher (rather than forcing the composer) to deal with uncertainty and contradictory information.
Be able to construct non-lossy composers, i.e., composers that pass all data to the watcher as in some cases the watcher may have better information than the composer about the reliability or relevance of information. (One example is if the watcher is itself part of another PA that aggregates information from multiple PAs and may have access to external information or algorithms.)
Cannot rely solely on publication time to override earlier data since later information may not be better. (Example: What happens if the phone stops publishing on-the-phone after a call that overrode 'in-a-meeting'?)
Alignment with XCAP mechanisms and tuple identification is desirable.
At some later point, views should be labeled with source information. We don't have to solve the problem of this metadata now, but need to provide the right data model so that source labels are likely to work well.

Proposal

URIs are not used for comparison and replacement, only element identifiers for tuples, persons and devices.
Default composition policy is to take the most recent of tuples with the same tuple/person-id, and retain all tuples with different tuple/person-ids (even if they have identical contact URIs), using the interpretation of multiple <person> elements defined earlier.
Different composition policies that never publish more than one <person> record are possible by having the composer discard or merge information.
Persons are somewhat different from regular tuples and devices since person information for one person is very frequently collected from a variety of device and service sensors that see aspects of the person. Examples include calendars, phones ("on-the-phone") and devices (location information, including categorical location information).
Each <person> is labeled with a view-id (or source-id?). The same rule as for tuples applies, i.e., a <person> element replaces one with the same view-id.
A watcher treats multiple <person> elements as alternate views of the state of the person. In the future, source-describing meta data may enable the watcher to better judge the value of these elements. Initially, information such as publication time or external information may help. (Example: my calendar publishes "lunch" at noon every day, based on EST. If the watcher knows that I'm in Japan, it will discount that information, even without knowing the source details. It is unlikely that a composer would be able to do this.)
If a watcher supports caller preferences or other source selection mechanisms not based on the URI (e.g., Accept headers in HTTP), it can render the multiple tuples with the same service URI as distinct contacts. If not, it can merge the OPEN tuples for user interface purposes since they are indistinguishable to the watcher. The capabilities are then the union of the capabilities of the tuples.
Depending on whether it is considered likely that device state for one device will be published by several entities, the same considerations apply.

Meta Data

Longer term, presence data should be taggable with meta data identifying its source, reliability and other information that allows the recipient to judge which pieces of contradictory data to believe. As part of the proposal above, one could envision meta data for each element, e.g., <person>, as in the wholly fictitious example below:

  <person view="12xy">
    <source>
      <provider>calendar</provider>
      <provider-domain>yahoo.com</provider-domain>
      <input>manual</input>
    </source>

    ... other person information ...
  </person>

  <person view="12ab">
    <source>
      <provider>body-sensor</provider>
      <provider-domain>bigbrother.com</provider-domain>
      <input>sensor</input>
    </source>

    ... other person information ...
  </person>

If elements from multiple sources are mixed, the RPID definition would have to allow multiple instances of each element, including <status. Also, each element would have to be able to refer to an external element describing the source information by some name. Such external references ("pointers") increase the difficulty of data management, as the source information needs to be removed, for privacy reasons, if all referring elements have been removed. Conversely, it introduces additional error cases of dangling pointers.

Critical Open Issues

Non-Critical Open Issues

These issues may be deferrable.

Does CLOSED for a person imply global unreachability? Is there a need for a global override?

OMA PAG spec