8 Technologies to Structure Our Information

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 8 Technologies to Structure Our Information

Even as we’ve considered technologies to save and to search in previous chapters, we’ve also considered structure.¹⁸⁵ Any search that we do is structured by its scope. For example, in our efforts to re-find, we may be searching within a specified folder or we may be searching for email only. The scope of any web search is implicitly “stuff that my search service searches for.” We shouldn’t fool ourselves that we’re searching the whole Web.¹⁸⁶

And then we use the structure of the information we’re searching through to further constrain the results returned by a search. We don’t, for example, want any email containing “Harry” but only those emails “From: Harry.”

In a web search, even a restrictive search is likely to return far more results than we have time to consider. Again, structure helps and, in fact, is indispensable. Google’s famous PageRank algorithm, for example, uses a recursive analysis of in-link structure to rank search results.¹⁸⁷

Structure also provides a basis for navigating to information, either as a complement to search or as a primary method of information access. And structure helps people to recognize a desired item in search results.¹⁸⁸ People can scan email subject lines returned by a search query or sort a list of documents such that those most recently modified appear first.

Most important, search through any large corpus depends upon the structure of an index. A search index is an “inversion” in which the documents of a corpus are grouped by terms extracted from an analysis of document content. When search terms match index terms, a listing of documents for a term can be generated quickly without a need to search exhaustively through all documents of a corpus. Without the index, our searches become a non-interactive “batch job” that we wait hours or even days to complete.

We’ve also implicitly considered structure as we considered technologies to save our information. An item-event log (i.e., log), for example, is structured as a sequence of events where each event includes time, location, address for information item and an action taken (e.g., “open,” “close,” “create,” “delete,” “move,” “rename,” etc.). No matter what the fidelity of our “lifelog,” whether a sequence of URIs, pictures taken at regular intervals or full-motion video, the log as a whole is quickly too long to review in its entirety. We need some structure in the form of time- and location-based entry points, for example, if we’re to focus and selectively sample (e.g., for purposes of recollection, retrieval, etc.).

In other words, we can’t help talking about structure even in chapters where the focus has been on technologies to save or to search.

But in this chapter we consider technologies to structure in their own right.

What are these technologies to structure? The corresponding question is answered by example for save and search. Technologies to save include devices we might wear or mount for capturing information. Technologies to save include new ways to store ever more information, ever more cheaply, for ever longer periods of time.

As an application of search technology we think right away of the web search box and the sometimes seemingly magical return of the information we seek. Behind the UI are methods for building indexes, methods for matching query terms to index terms, and methods for the ordering and display of results. Index building includes methods to crawl for content and methods to analyze this content for index terms.

Technologies to save mean that massive amounts of data can be stored digitally as the raw materials of search. Properties and patterns of this data are made visible for structural expression (automated or optional) through search technologies. Index-building is a process of automatically structuring our information. We also actively structure as a follow-on to a search as, for example, when we bookmark a search result or when we save a document we’ve found to a local folder for later use.

Lots of structuring, in other words, is happening already—enabled by technologies to save and completed by technologies to search.

What then remains to discuss in a chapter on technologies to structure?

A lot.

But first, what is structure?

8.1 STRUCTURE, STRUCTURE EVERYWHERE . . . NOR ANY BITTO SHARE¹⁸⁹

Structure “. . . from Latin structura (‘a fitting together, adjustment, building, erection, a building, edifice, structure’), from struere, past participle structus (‘pile up, arrange, assemble, build’).” Structure is both a verb (“To give structure to; to arrange.”) and a noun (e.g., “A cohesive whole built up of distinct parts . . . The underlying shape of a solid . . . The overall form or organization of something.”).¹⁹⁰

The concept of structure is important across disciplines and art forms from biology, chemistry,¹⁹¹ and physics to music,¹⁹² literature, and poetry¹⁹³ and then to sociology,¹⁹⁴ organizational psychology,¹⁹⁵ and, of course, mechanical engineering and architecture.

Structure accomplishes a grouping and interrelating of heterogeneous elements. Through structure comes an emergence of properties we could not predict from a separate consideration of component elements. “The whole,” as the phrase goes, “is greater than the sum of the parts.”¹⁹⁶

Structures for information come in many forms and at many levels. The data of a relational database is structured into tables with rows of attribute values. The content of an XML document is structured into a hierarchy of elements where a given element may contain attributes and sub-elements.

The structure in some information items is the product of a formal schema or template—a structure in its own right. In other cases, the structure is implicit. Part 2 of this book is a little of both: A structure of chapters, one each for input/output technologies and technologies to save, search and structure. Some chapters, in turn, have a section concerning “caveats and disclaimers” relating to the chapter’s technology area. The six activities of PIM and six senses in which information is personal have also provided a structure of sorts for an assessment of impacts for each chapter’s technologies.

Information structures can be “strong,” providing skeletons for the flesh of content. But then we also note that the boundaries between structure and content are not fixed. What appear to be “blobs” of content at one level reveal, on closer inspection, a structure (or structure in several overlays) holding together smaller blobs of content within.

Information structure can give consolidated expression to data patterns lying otherwise diffuse, hidden, and dormant. A search index does this, for example. Structure makes the implicit explicit.¹⁹⁷

But the full expression of an information structure itself may be mostly hidden from our view. We see that some text is underlined and in a special color and that, if we click on it, a web page comes into view. Unless we’re in an editor to “view source” for the page’s HTML, we don’t see the anchor tag and the “href” attribute that defines the hyperlink.

We structure as we use folders, tags, sections, “albums,” and a host of other structuring forms. We structure as we place, order, highlight, and otherwise format our information. We follow conventions of structuring even in our choice of words, their ordering and the selection and ordering of characters within. Consistencies of word choice, ordering, and spelling, in turn, provide the basis for the index structures that are so essential to search. Structure builds on structure.

We often speak as if we needed more structure, in our lives and with our information, as we use phrases like “I should get more organized” or “My information space is kind of a mess right now.”

But if we look at our information structures, we could easily conclude that we have too much structure already.

More accurately, we have too many disparate structures. Each application, each tool we use for the management of our information, comes with its own forms for structuring. Table 8.1 lists forms for structuring for several popular applications.

Table 8.1: Each application we use comes with its own ways to structure

Application	Forms for structuring
Microsoft Outlook	Email folders
Microsoft OneNote	Notebooks, Sections (top), Pages (right-hand)
Evernote	Notebooks, Tags (in a hierarchy)
Remember The Milk	Lists, Tags
Facebook	Groups, Albums, FriendLists

As we work with various applications, whether desktop or, increasingly, web-based, we structure in similar but slightly different ways using different forms for structuring.

Different applications may use the same name but “mean” very different things. Consider “tags” in Evernote vs. “tags” in OneNote. Evernote tags can be structured into a hierarchy. Tags in OneNote, on the other hand, can each be given a font, a color, and an icon but cannot be structured into a hierarchy.

We should not be surprised, therefore, that the information structures we build, as we use these different forms for structuring, are varied and inconsistent with one another. Our structures are fragments—shards, we might say—imperfectly reflecting the more coherent, integrative expression we might like to give to our information.

If only we could.

The services of the desktop file system provide at least basic unifying support across desktop applications for the “file” and “folder”¹⁹⁸ as information items with existence independent of any given application. Included with this support are basic operations to create, delete, move, re-name, etc. We lose this unifying support as we shift to web applications and use the storage these web applications provide.

But we shouldn’t kid ourselves that the organization of our information was so good back in the old days—the days a decade or so back when we couldn’t so easily push information into web-based apps and when the file system (including various storage devices) was all most of us had for persistence of digital information.

If we look at our personal folder organizations today, we’re likely to see a great many abandoned systems of information organization.¹⁹⁹ We start one system of organization, only to leave it in favor something else.

Worse, we may switch back and forth between two or more systems for organizing. We’re then reluctant to decommission one organization in favor of another. Which one? And what a pain to move everything! But then we’re uncertain where to find old information. And we’re uncertain where to keep new information. Maybe we decide to keep the information in several places inside different organizations. But then . . . if we change an item even slightly—with a comment or a highlight, for example—we’re left with different versions. Which to use later? Which is the right one?

This diversity of similar—but not the same—structures, especially as “siloed” in different applications, is part of a bigger problem of information fragmentation as briefly discussed in Part 1 of this book.

People persistently express a desire for greater integration, as an antidote to fragmentation, with comments such as “ideally I would like a unified system, I wouldn’t have all these different databases and all these different check lists and manuals.”²⁰⁰

In particular, people want to organize project-related information together but end up duplicating and maintaining related structures across different applications (e.g., file managers, email clients, web browsers and a wide range of web applications).²⁰¹ Most of the structures people use lie buried in their applications and cannot readily be re-used or even easily examined.

With the emergence of collaborative tools, these fragmentation problems are replicated and exacerbated for collaborating teams who have to create and maintain shared structures—a complex task we consider further in Chapter 10’s (Part 3) exploration of group information management (GIM).

In our field studies in the Keeping Found Things Found group, the participants we interview are frequently critical of their organizations. As we ask people to give us “tours” of their organizations of personal information we often hear comments like “I’m really messy” or “I really need to take some time to get this stuff better organized.”²⁰² Sometimes people even interrupt the interview to move or delete a folder that “really shouldn’t be there.”

Paradoxically though, we also hear people expressing embarrassment for being too organized or, more accurately, these people express embarrassment for the percentage of their time and energy that is required to maintain their organizations. We’re “damned if we do; damned if we don’t.”

We shouldn’t be so hard on ourselves.

Our structures, even if we invest considerable time and effort to make these consistent, will still lie scattered and siloed in various applications. And our investment in desktop folder structure,²⁰³ based on past experience, may later be abandoned only to add to overall clutter. Whether based on the desktop or the Web, the tools for viewing and working with information structure are poor. Without better tool support, the organizations we devise may be more trouble than they are worth (i.e., more trouble to maintain than we save in costs of keeping and finding).

Suppose our information—content and structure—were “first class”? What might this mean?

For one thing, this might mean that structures are no longer buried within different applications. Structures, instead, would have an existence independent of any given application.

But then what is information—especially digital information—without supporting applications?

We then need to stipulate, in addition, that, for information structure to be given first class treatment, it needs to be shared between, and viewed and manipulated through, a large number of applications.

To be clear, sharing in this manner is not a serial export/import of our information from one application into another . . . Export/import might have worked well enough back in a day when packages of our information—a document or a presentation or a figure—were the primary object of our efforts and where these packages were passed along from one application to the next in the manner of an assembly line (often with printing to paper as a final step).

There is nothing first class about our information in an export/import assembly line. Notwithstanding the noble goals of initiatives such as the Data Liberation Front,²⁰⁴ the export of information from one application into another remains a troublesome, time-consuming. and “lossy” process. Information structures in particular are apt to be mangled or left behind altogether. And in an age where power comes from a persistence and projection of our information, through our devices and interconnected on the Web, an assembly line focus on a single information item will no longer do.

We invoke again the metaphor used in Chapter 4 (Part 1)²⁰⁵ of our information as house . . . or a garden.

And then we return to an original definition of application as in “the act of applying or laying on.” We apply a variety of tools to a vegetable garden: First a shovel, then a hoe, then a rake, then the hose for watering. Similarly, Chapter 4 (Part 1) considered how information for an activity (a project) might be worked through a variety of tools. Different tools, for example, might help us to organize this information according to an evolving story told first in the future tense (what are we going to do?), then in an extended present tense (what do we need to do now or soon?) and then in the past tense (what did we do?)

Let an application be a tool we “apply” to our information—the right tool for the right job²⁰⁶—rather than a thing to which we “submit.” Tools work with our information in place.

But that place is defined by structure—a structure to be shared with the applications we use and, selectively, with other people. And with ourselves over time.

We’ll consider how this might happen later. But first, what is the personal potential of shared structure (and a concomitant ability to share information “in place”)? What are the benefits?

Taking back our information, #1. Focus on the grouping item.

This is the first in a series of insets on the theme, “Taking back our information.” Drawing upon the work of the Keeping Found Things Found initiative,²⁰⁷ the series describes an approach toward gaining greater control over the information that is personal to us in any of the six senses in which information can be said to be personal.

Why bother?

Motivation on the one hand comes from the pain of a present situation where our information is scattered across a range of devices and applications. The fragmented nature of our information makes each of the six activities of PIM more difficult. Where to keep useful information for later use? How to find this information later? Similarly with meta-level activities. How to maintain and organize information that is so widely scattered? How to manage privacy and the flow of information when we haven’t a clue where our information is? How to measure and evaluate our practices of PIM when these seem so disjointed—partly on our laptops, partly on the Web, partly though a host of highly specialized apps on our palmtops and, still, partly through paper? Most important, how to make sense of our information when we have difficulty even assembling relevant information (e.g., for a decision we must make or a task we must otherwise complete) into a single view?

Motivation on the other hand comes from the power of a future situation in which we are able to work with and use our information much more effectively than today. Our information on the Web serves as proxy for ourselves, speaking for us on routine matters (e.g., current status, job availability, time and place of parties we’re organizing and so on). Our information on our palmtops frees us from a need to be physically “here” or “there”; we can be digitally present in any number of places no matter the physical distance. We look to a future where we are much less constrained by acts of clerical necessity and have greater freedom to pursue those creative “CAPTCHA” activities that make us uniquely human.²⁰⁸

Taking control of our information means, first and foremost, taking control of our information structures. Control the structures and the content will follow.

But, as the “Structure, structure everywhere . . .” section explores, information structure comes in many forms. Where to start?

Consider a focus on the “grouping item.”

The grouping item is a kind of information item whose primary purpose is to group together and provide access to other information items. Folders, tags, “section tabs,” and “pages” (for Microsoft Outlook), “albums” and “groups” (for Facebook), and “notebooks” as defined (differently) in several applications are each examples of grouping items.

Alternatively, any information item can be regarded as a grouping item. When we do so, we consider the item primarily for its function to represent and provide quick, easy access to other information items (including other grouping items). An ordinary web page, for example, has its own content but also, via its hyperlinks, provides ready access to other web pages and their content.

Grouping items are called by different names in different information management applications. Evernote, for example, provides for “notebooks,” “notebook stacks,” and “tags.” Microsoft OneNote, an alternate note-taking application, provides for “notebooks,” “section groups,” “sections,” “pages,” and “sub-pages.” A file system like that used in Microsoft Windows or on the Macintosh provides for directories or “folders.” Even indexing and search utilities such as Lucene²⁰⁹ can be seen to create a kind of grouping item in the form of a term + links to documents in which the term has been found or to which the term is otherwise associated.

Grouping items differ from one another in many respects such as the manner of their creation and the manner of and restrictions in their use. For example, folders are created by people (except, of course, for all the other folders that are created by desktop applications in the course of their installation and use). Index terms and their lists of associated documents, on the other hand, are created by an indexing utility (though people can exert indirect control over the indexing process through choices made with respect to indexing components such as the word stemmer and the word separator).

Notwithstanding these and other notable differences, all grouping items share the following features in common:

A basic “noodle” structure (see Figure 8.1) consisting of a node + outgoing links.

Links point “directly” (i.e., via an address such as a URI that is quickly and unambiguously resolved) to other information items including, when the application allows, to other grouping items. For example, a folder points to its subfolders and files. An Ever-note tag points to other Evernote tags and to notes that have been “tagged” by the tag.

The grouping item is addressable by one or more URIs. Through these URIs the node of a grouping item can, in turn, be addressed by other grouping items.

The grouping item often has a name or a label by which it is represented in displays.

Figure 8.1: The explicit structure of a grouping item is a “noodle” (node + outgoing links).

We use grouping items in one form or another throughout a typical day. Consider these examples:

To decide which hotel to book, Gordon copies text, pictures, and links from the Web for several alternatives, placing information as notes on a OneNote page. Alternatively, he might take notes in Evernote, giving each the tag “hotels-Boston.”

Rashmi is working to complete a complicated application process that requires her to fill out several different forms. She seeks first to place all forms in one place—perhaps in a folder or even as printouts on a physical desktop. She does this to gain a clearer sense of the effort involved and to be sure she has “everything in one place.”

In his search for the current version of a document, Oscar navigates to a folder containing several versions as separate files. He then sorts these by “last modified” date before selecting the most recently modified of these.

Ursula doesn’t have a direct address for a targeted web page but she knows that a hyperlink to this web page is the second of two that can be found in the “upper right-hand corner” of another page.

Piao wants to contribute to an email discussion. To do so, he needs both to locate the most recent post and also to locate previous posts so that he can reflect their content in his response. If he is using Gmail, this grouping is already done for him. But he doesn’t know how to do this in Microsoft Outlook so, instead, he opts to see “related messages in this conversation” and he does a reply-all to the most recent post.

Susan is planning her wedding. Nearly six months prior to the event, she creates a “wedding” folder in her file system in which she places files and also URLs (as shortcuts) that relate to different aspects of her wedding. As her tolerance for “clutter” is reached, she begins to organize items into subfolders for different aspects of her wedding such as “wedding dress,” “honeymoon,” “reception,” and “wedding vows.” On the eve of her wedding, the folder structure nicely resembles a problem decomposition for her wedding with grouping items (folders and subfolders) representing sub-areas (“honeymoon”) and specific tasks (“decide on wedding dress”).

The folder, the OneNote page, the web page, the subject line used in common among the email messages of a conversation—even the physical desktop—are forms of a grouping item. In each case, the grouping item, including its set of associations to other information items, provides an important context for the activity at hand—whether this is selecting the right document, the right hotel room, the right hyperlink, responding to the most recent email post in a conversation or, more generally, getting a sense for and making sense of the information at hand.

Taking back our information, #2. Observations and speculation about the grouping item.

A grouping item provides fast, usually certain access to the items it groups.

A grouping item provides key context. Even in the two examples of the previous inset where focus was on a single link—to the latest version of a document or to a specific web page—the selection could not be made absent a larger context provided by the grouping item that includes a representation of alternate files or links.

The grouping item supports a stepwise navigation to the desired information that both simplifies the activity of finding the desired information and also helps us to more easily determine the relevance of this information once found.²¹⁰

The context provided by the grouping item becomes especially important in cases where the activity is less well defined and we are trying to “see what we have here.”

Two additional observations can be made about the grouping item. Each observation prompts speculation concerning the study of the grouping item as a way to understand better how people categorize, conceptualize, and, generally, make sense of the information at hand.

Grouping items are an external expression of internal goal-derived categories and “ad-hoc concepts”

How do people use categories and concepts to represent their world in meaningful, useful ways? Much of the work originally done to address this important question had²¹¹ a “semantic” focus on categories such as “bird” or “dog” that we learn early in life, use every day and that seem almost to be “built into” the way we view and interact with our world.

More recently, there has been work to demonstrate the practical importance and widespread use of goal-derived categories,²¹² i.e., categories that are created ad hoc in support of the completion of a project or the solving of a problem. “Things to pack for the ski trip” and “Hotels to stay at in Boston” are examples of goal-derived categories. The formation of goal-derived categories appropriate to a given set of circumstances is an important and teachable skill.²¹³ Can this formation also be supported through tools that support the creation of grouping items and through an underlying representation of the structure in these grouping items?

More generally, people form and make use of concepts ad hoc (i.e., as needed for a particular task or situation) for which they can articulate no formal definition. An ad hoc concept or category may represented, instead, extensionally through exemplars we are able to list or can recognize if shown. As Justice Potter Stewart famously said when facing the challenge to define “hard-core pornography,” “I know it when I see it.”²¹⁴ We may have ad hoc concepts for “things to do before I go to bed” or “things to look for in a work situation.” Many interpersonal conflicts arise from discrepancies between people with respect to a concept such as “things to expect from a friend.”

Ad hoc concepts are manifest in our grouping items²¹⁵ and so too are challenges to define these concepts in ways that would support their consistent use over time and amongst the members of a group. For want of shared definition or description of purpose, for example, the pages of a team wiki—a form of grouping item—may become a hodgepodge. Or the burden to maintain these may be borne by the one person in the group who “understands” the organization of the wiki and the meanings (purposes) of its various pages.²¹⁶ But exclusive maintenance by one person is no guarantee that grouping items and their underlying concepts will continue to make sense over time. As we approach a folder we haven’t used in a while we may often ask “what (on earth) was this for?”²¹⁷

Grouping items have latent, implicit structure, and emergent properties

The grouping item has structure explicitly represented in its outgoing links. At the same time, as we perceive these links (as mediated by the sights or possibly the sounds of a tool) these links may seem to relate to each other and to node-level information in ways less easily described explicitly. “Semi-groupings” may “pop out” as perceived patterns in the information being viewed. Our perceptions of emergent groupings are influenced, for example, by the relative location of links in a display or by display properties of their appearance such as their color, size, or shape.

In the spirit of spatial hypertext,²¹⁸ we may manipulate our perceptions of these semi-groupings as, for example, when we place notes close to one another on a page in Microsoft OneNote. Such a manipulation is often a first step toward a more explicit chunking of links under a finer-grained partition of grouping items. A grouping of links to web pages related to a “term paper on search” might, for example, be further divided into sub-groupings of links to web pages for “Lucene,” “faceted search,” “stemming and word-breaking,” etc. It is often the case that these subgroupings emerge and can then be made explicit only after an initial process of discovery that happens in the larger context of the initial grouping.²¹⁹

The creation of grouping items and the naming of these is then an essential step in the problem setting (i.e., the definition of the problem space) that precedes problem-solving. Before we figure out which hotel best meets our travel needs in Boston, we first create a “hotels in Boston” folder affording ready access to information concerning hotel alternatives. Schon²²⁰ defines problem setting as “a process in which, interactively, we name the things to which we will attend and frame the context in which we will attend to them.”

Making use of a chemical metaphor we might call the grouping item a “molecular” unit of structure. In contrast, the link, association, or proposition (such as those formed by a subject-predicate-object statement) is an “atomic” unit of structure. The notion of emergence has implications for efforts (such as the Semantic Web initiative discussed further in the “Personal potentials of shared structure” section) that aim for a more atomic level of structural representation. Some properties of a link and its referent—such as “most recent version”—can only be confirmed in the larger context of a grouping item. Even more so, relationships and emergent groupings of links require the larger context of a grouping item. The grouping item is the shared subject for a collection of propositions or a collection of links pointing to information (e.g., “Hotels in Boston” or “things to do this weekend”).

The “noodle” alone is not enough

From the two observations of this inset it is apparent that the structure of a grouping item is only partially described by the explicit structure of a node + outgoing links. The properties of the node and each of its outgoing links—for example, display properties such as size, shape, color, display location—may also help us to perceive additional structures. We can call these “gestalts.”²²¹ But then we realize that in our efforts toward a metadata representation of grouping items, a stark “noodle” representation of the grouping item is not enough. We also need to provide room for tool-specific (more generally, “namespace-specific”) properties both at the level of the node and each of its outgoing links.

How? How can we realize both a tool-independent “noodle” metadata representation of the explicit structure of a grouping item and also leave tool-accommodating room for additional properties of the node and its outgoing links that are so critical to our efforts to understand and manage our information? And then, given this metadata representation, what can we do with it? These are topics we consider in the next inset in the “Taking back our information” series.

8.2 PERSONAL POTENTIALS OF SHARED STRUCTURE

We share lots of information structure already. One very prominent example of global sharing is, of course, the World Wide Web.²²² As noted in Chapter 4 (Part 1),²²³ the Web has so profoundly changed the lives of most of us that we may find it hard to imagine what life was (or would be) without it. This holds true even for those of us (for me, for example) who spent much of their lives without the Web.

The Web functions through shared structures and shared conventions of three kinds: 1. A shared protocol (HTTP²²⁴) for the exchange of information—as web pages. 2. A shared syntax for generating web page addresses as Uniform Resource Locators (URLs²²⁵). A URL is a text string that provides information needed to locate a web page for return (via HTTP). 3. A markup language—HTML (HyperText Markup Language²²⁶)—used to express the content and structure of web pages including the hyperlinks of these pages.

As a result of this sharing of structure and convention, we’re able to view a web page retrieved from anywhere in the world via a variety of devices and in several different web browsers. Pages linked together define a global structure that may eventually link together nearly all of the world’s human-generated information. The Web is obviously a huge success. And, even so, as noted in Chapter 4 (Part 1), we are only just beginning to realize its potential.

8.2.1 VISIONS OF THE SEMANTIC WEB

If we get this much benefit from a relatively modest investment in shared structures, what might we realize from even more shared structure? The tags of HTML, for example, deal mostly with the display of information with some limited semantics in the form of, for example, its treatment of hyperlinks (via anchor tags each with an href attribute taking a URL as a value).²²⁷

What if the structure of a web page could also communicate the “meaning” of its information in a format that could be “understood” and acted upon by “machines”—your computer or mine or some server somewhere in the cloud?

Enter the Semantic Web²²⁸ as a seemingly natural—some might even say “inevitable” (it is often called “Web 3.0”)—progression in the shared structures of the Web.²²⁹

In the words of Berners-Lee, Hendler, and Lassila, “The Semantic Web will bring structure to the meaningful content of web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users.”²³⁰

As early as 1998, Berners-Lee provided a “Semantic Web Road Map . . . with the goal that it (the Web) should be useful not only for human-human communication, but also that machines would be able to participate and help.”²³¹

He observed that “One of the major obstacles to this has been the fact that most information on the Web is designed for human consumption, and even if it was derived from a database with well-defined meanings (in at least some terms) for its columns, that the structure of the data is not evident to a robot browsing the Web.”

Why not express information both ways: for people in a web page display and, in a more structured format, for “machines.”

A major part of Berners-Lee’s road map and still a key component of the Semantic Web is the encoding of virtually any assertion of interest as a subject-predicate-object statement using the Resource Description Framework or “RDF” for short.²³² RDF assertions are often (but not necessarily) serialized for persistent storage using XML (Extensible Markup Language).²³³

Building on RDF/XML is the RDF Schema (RDFS)²³⁴ for defining the basic elements (building blocks) of ontologies²³⁵ for use on the Semantic Web. The Web Ontology Language (OWL)²³⁶ then builds upon RDF and RDFS to provide a family of languages for the authoring of ontologies.²³⁷ Ontologies in turn specify objects/concepts of interest in an area (e.g., “event,” “location”), properties of these and relations between them. RDF expressions can be selectively retrieved using expressions (in RDF) conforming to the SPARQL query language.²³⁸

Taken together, what do these and other initiatives of the Semantic Web provide? Berners-Lee et al. describe the power of “agents”: “The real power of the Semantic Web will be realized when people create many programs that collect web content from diverse sources, process the information and exchange the results with other programs. The effectiveness of such software agents will increase exponentially as more machine-readable web content and automated services (including other agents) become available. The Semantic Web promotes this synergy: even agents that were not expressly designed to work together can transfer data among themselves when the data come with semantics.”²³⁹

Elsewhere, Berners-Lee describes the power of the Semantic Web in terms of data integration: “To appreciate the need for better data integration, compare the enormous volume of experimental data produced in commercial and academic pharmaceutical laboratories around the world with the frustratingly slow pace of drug discovery. Life-science researchers are coming to the conclusion that in many cases no single lab, library or genomic data repository contains the information necessary to discover new drugs.” ²⁴⁰

Interest in the Semantic Web has spawned a number of projects over the years including DBpedia,²⁴¹ an effort to extract structured data from Wikipedia for general use, Friend of a Friend (FOAF)²⁴² to describe the relationships between people and their “things” and GoPubMed,²⁴³ a structured search engine for biomedical texts designed to significantly improve the speed and effectiveness of information retrieval for medical professionals (in comparison to information retrieval using Pubmed as a baseline).²⁴⁴

At a more personal level, Wesabe was designed to turn a jumble of personal “bank statements, credit-card accounts and so on”²⁴⁵ into information we might use (e.g., “where is my money going month by month?,” “When will I have enough savings to retire?”). Alas, Wesabe “is no more.”²⁴⁶

8.2.2 FROM THE PUBLIC TO THE PERSONAL

In the spirit of the Semantic Web and often using its components (e.g., RDF and OWL) a number of research initiatives have explored the benefits of shared structure as these might be realized by individual people in their practices of PIM.

Research efforts divide roughly by focus, whether primarily on representation or application. On the representational side and related to efforts toward “personal knowledge management” as discussed in Chapter 1 (Part 1),²⁴⁷ are a number of efforts over the years that have focused on support for the creation and use of personal knowledge bases and personal ontologies.²⁴⁸ Motivation is expressed by the following excerpt, “People often use powerful tools to manage the documents they encounter, but very rarely to store the mental knowledge they glean from those documents.”²⁴⁹

Whether or not knowledge is a “thing” to be represented directly and externally from its applications is questionable,²⁵⁰ but certainly one dominant theme of these efforts is the imposition of greater structure on information: “The personal ontology attempts to encompass a wide range of user characteristics, including personal information as well as relations to other people, preferences and interests. The ontology may be extended through inheritance and the addition of more classes, as well as class instantiation according to the needs of user stereotypes or individuals.”²⁵¹

Related to these efforts are the more application-oriented efforts to realize a semantic desktop:²⁵² “People gather information on their desktop computers, but current systems lack the ability to integrate this information based on concepts or across applications. The vision of the Semantic Desktop is to use Semantic Web technology on the desktop to support Personal Information Management (PIM). In addition to providing an interface for managing your personal data it also provides interfaces for other applications to access this, acting as a central hub for semantic information on the desktop.”²⁵³

There is some research to suggest that semantic desktops can help, giving users a more satisfying experience in their interactions (especially with personal information of the first sense, P1) and improving user efficiency.²⁵⁴ However—and this applies to any prototype evaluation—we must season the results of one evaluation with a few grains of salt since evaluations under ideal, laboratory conditions for the prototype may not translate to real-world conditions of ongoing use where people may face daily challenges of maintenance and use long after the “luster” of the prototype has dimmed.

Haystack²⁵⁵ is one of the better known efforts to bring the power of the Semantic Web down to a personal level to help people manage their information. Haystack creates a URI “to name anything of interest.”²⁵⁶ And then all information is represented via the RDF standard.

Elsewhere, Karger²⁵⁷ describes the benefits of unification (e.g., as a achieved through a uniform use of RDF) in terms of the sharing of structured information between applications: “we can argue that the functionality of sending email should not be locked up in the address book, but should instead apply to any person we encounter in any application—calendar, photo album, and so on.”

And through a sharing of structured information between applications comes greater opportunity for integrative visual displays of information: “One motivation for unification is that a user may need to observe several distinct information objects in order to draw conclusions about them. Looking at them one at a time can be slow and difficult, particularly if we must return to each several times.”

The difficulties Karger identifies with our current situations of PIM are real enough—as are the benefits of a solution to reduce these difficulties. We have likely all experienced problems in the sharing of information—especially structured information—between applications. Related to this, we’ve likely experienced the difficulty of bringing all of our information into a single coherent view so that we can “make sense of things” in order to make effective use of our information.

What isn’t established is the necessity of the Haystack approach, i.e., unification via RDF. Also not established is the sufficiency of the RDF approach given the many practical difficulties of its use, which we explore further below.

8.2.3 UNFULFILLED PROMISES

The potential of the Semantic Web overall, and its applications to PIM in particular, would appear to be enormous. But now, over 15 years after the Semantic Web roadmap, the Semantic Web initiative is still mostly promise, having produced little in the way of practical solutions of widespread use beyond the research laboratory. McCool, cofounder of the large-scale RDF project TAP, notes a three-fold lack of deployment—of information, services, and applications and this “despite substantial research funding in the US and European Union (EU).²⁵⁸” McCool concludes that “Because it’s a complex format and requires users to sacrifice expressivity and pay enormous costs in translation and maintenance, the Semantic Web will never achieve widespread public adoption.”

In a letter to the editor posted in the Communications of the ACM in response to Horrock’s article on ontologies,²⁵⁹ Aït-Kaci notes that “Whether the various languages proposed by the W3C are able to fly beyond toy applications has yet to be proved, especially in light of the huge financial investment being poured into the Semantic Web.”²⁶⁰

Successes are no better in efforts to apply Semantic Web components in tools of PIM. For example, although code for the original Haystack prototype has been open-sourced, there no longer appears to be active work on the prototype. Karger reflects that “I’m still a believer in the Haystack vision, but in practice we found it difficult to convince people to abandon their long-cherished pim tools in favor of a half-baked research tool.”²⁶¹

Based on their own efforts to provide support for non-expert users to work with personal ontologies, Katifori, et al.²⁶² observe that “some users had problems when familiarizing themselves with the ontology model–they found it in some cases overwhelming. . . . it seemed that the full complexity of an ontology . . . may be difficult for the end user to comprehend.”

McCool²⁶³ notes that “The ontological data model makes representation of any nontrivial factual information difficult because it can’t represent context of any kind.” He goes on characterize the Semantic Web as a kind of “shadow web” that is entirely separate from the Web we use every day.

Singh²⁶⁴ notes that “if there is one lesson to be learned from the long history of databases, it is that it is practically impossible to describe data well enough for it to be used in arbitrary applications.”

Doctorow²⁶⁵ goes further, characterizing efforts toward a rendition of meaning in a “world of exhaustive, reliable metadata” as “a pipe-dream, founded on self-delusion, nerd hubris and hysterically inflated market opportunities.” Among the “seven insurmountable obstacles between the world as we know it and meta-utopia” he notes, for example, that “people lie” and that “There’s more than one way to describe something.”

But not even Doctorow suggests dispensing with metadata altogether. He recognizes, for example, the obvious value of in-link analyses of hyperlink structure, as practiced by search services, as a kind of derived or implicit metadata. What else? How might structures be shared—whether to communicate meaning with our machines or, less grandly, simply to do useful things?

8.2.4 MORE SPECIFIC, MORE APPLIED, IN-LINE, “SMALLER”—YES; BUT SIMPLER?

Some argue for an abandonment of the Semantic Web initiative altogether.²⁶⁶

Others argue for its simplification. In his own writings on “linked data,” Berners-Lee²⁶⁷ provides some general guidelines toward the publishing of structured data that might have wider use:

1. Use URIs as names for things.

2. Use HTTP URIs so that people can look up those names.

3. When someone looks up a URI, provide useful information using the standards (RDF*, SPARQL).

4. Include links to other URIs so that they can discover more things.

McCool²⁶⁸ believes in the need for more drastic simplification of the Semantic Web initiative drawing lessons from the success of the Web as a drastic simplification of earlier hypertext initiatives: “My proposal is to do for the Semantic Web what Tim Berners-Lee . . . did for Project Xanadu, the original hypertext project.”

Noting the irony here, we might say that the Semantic Web needs someone in pragmatic spirit of the Tim Berners-Lee of the early 90s to correct for the complexities introduced by the visionary Tim Berners-Lee at the start of the new millennium. McCool goes on to argue for an approach that would permit an embedding of “named-entity” information directly in HTML markup—and so eliminate the need for a “shadow web” of semantic data maintained separately from the Web.

There is a general call, voiced by McCool and many others, for more targeted, practical, “real” (or less contrived) applications to illustrate the value of the Semantic Web.²⁶⁹ Karger²⁷⁰ argues for “Less Semantic, More Web” noting that “The introduction of structured data can drive that revolution forward, but only if we continue to think about how end users will use that technology.”

Singh²⁷¹ argues that “best hope for the Semantic Web is to encourage the emergence of communities of interest and practice that develop their own consensus knowledge on the basis of which they will standardize their representations. For example, such standards have emerged in narrow areas of personal information management, e.g., with the vCard standard.” From such standards may come, gradually, selectively, the shared structures needed for at least a partial fulfillment of the Semantic Web dream.²⁷²

Figure 8.2: Efforts to represent meaning through structure have produced a complexity of different initiatives and formats.²⁷³ Used courtesy of Vuk Miličić, http://milicicvuk.com/blog/2011/07/21/problems-of-the-rdf-syntax/.

The call for representations of meaning done in-line (i.e., as part of an HTML or XHTML representation for a web page) has been answered by not one but several new formats. Each may be simpler (easier for people to read and write in) than RDF/XML. However, in aggregate, new formats have added to overall complexity. Understanding similarities and differences among the formats is difficult, even for people experienced in the area. The three primary formats for in-line representation of meaning are:

• Microformats—a re-purposing of tags/attributes already available in HTML/XHTML prior to HTML5 (i.e., HTML 3 and later).²⁷⁴

• RDFa—a method of expressing RDF information in-line. RDFa was first proposed in 2004 for use with XHTML but its newer version (RDFa 1.1) is now compatible with HTML5 as well.²⁷⁵

• Microdata—the most recent of the three formats, developed specifically in the context of HTML5 and as an alternative to RDFa.²⁷⁶

In-line formats have been used in several applications and initiatives. We have, for example:

Schema.org—a joint effort by Google, Microsoft Bing, and Yahoo²⁷⁷ to define schemas that websites might optionally use in order to pass along structured data (along with free text) to search engine web crawlers. Schemas are available to structure information for many different circumstances and in many different areas (e.g., events, organizations, places, products, medical).²⁷⁸ For example, the “Event” schema provides attributes for “duration,” “location,” and “performer.” Following the structure defined by a schema, web crawlers are able to preserve property/value pairings in support of more structured queries by people (e.g., “Show me everything happening nearby this weekend”). Schemas and instances of their expression are based in the Microdata format.²⁷⁹

COinS (ContextObjects in Spans)²⁸⁰—a method of including bibliographic (citation) metadata in a web page for purposes of inclusion in a bibliography. If the page in view in our browser includes information written in the COinS schema, we may see a small icon to the right of the URL in the browser’s address well. We can click to include citation information in the reference database of a reference manager that supports COinS (e.g., Zotero, Mendeley, Research Gate, etc.). In this book’s completion, I have found COinS to be incredibly useful. It has saved me enormous amounts of time I would otherwise spent in a laborious, error-prone entry of citation information by hand. COinS is a win for citer and the citee (i.e., the people whose work is being cited—insuring that information concerning the work is complete and correct). COinS information is currently written in Microdata format.

vCard/hCard. vCard²⁸¹ is a file format standard for the electronic exchange of business card information (i.e., the information found on a typical business card including name, contact information, picture portrait, etc.). “vCards are often attached to e-mail messages, but can be exchanged in other ways, such as on the World Wide Web or instant messaging.”²⁸² hCard is a microformat that is used to embed the structured property/value information of a vCard into a web page.²⁸³

RDFa, Microformats, Microdata—how do these in-line formats for the expression of meaning compare with one another? Which one(s) should we use? When? And how do these formats relate to the Semantic Web initiative? Are these new formats simply a pragmatic detour whose path ultimately leads back to the fulfillment of the visions originally expressed for the Semantic Web? Or do new formats presage a gradual abandonment of the Semantic Web?

These are all good questions to which only the briefest attention can be given here. In his careful comparison of the three formats, Manu Sporny²⁸⁴ notes that only RDFa has a clear mapping to RDF.

Chris Sliver Smith²⁸⁵ observes that “Microformats have been established the longest of the three protocols, and used by the search engines the longest. Google and Yahoo! both introduced hCard microformat on their own webpages by marking up local listings with it” and that “Microformat’s initial advantage was that it worked seamlessly in existing HTML code, so using it within a page didn’t require any special tags that might overly restrict one’s version of HTML nor cause a page to be invalid code. The downside is that it primarily required using particular naming conventions of class attributes.”

The use of Microformats and a need to support these will likely persist as long as there are still web pages written in HTML 3 or 4. But moving forward, the extensibility built into the design of the Microdata format and its place as part of the HTML5 standard would seem to position Microdata format as the successor to Microformats. For example, Google now recommends the Microdata format for the representation of “rich snippets” and other structured data²⁸⁶ (although RDFa and Microformats are also supported).

Microdata appears also to be winning the “format battle” with RDFa. Jason Ronallo²⁸⁷ notes, for example, that “Google has supported RDFa in some fashion since 2009, and over that time has discovered a large error rate in the application of RDFa by Webmasters. Simplicity is a central reason for the development of Microdata and the search engines preferring it over RDFa.”²⁸⁸

Even so, the format battles continue with the more recent introduction of “RDFa Lite.”²⁸⁹ Sporny argues that “RDFa Lite contains all of the simplicity of Microdata coupled with the extensibility of and compatibility with RDFa.”²⁹⁰ Others, however, point to problems inherent in RDF and the extreme ambiguities concerning how to “correctly” express meaning through RDF.²⁹¹ These problems persist no matter whether “a” or “a Lite” is appended.

Which format will win? Or will two or more coexist for better (we can choose) and worse (continued complexity and confusion concerning which to use)? Will the winning (surviving?) format(s) lie on a path toward eventual realization of the Semantic Web visions? Even in a book about “The Future” (of PIM) I won’t hazard a prediction.

But we can speculate concerning how formats will be used and what this will mean for us. We take inspiration from growing list of schemas provided by schema.org (and supported by Bing, Google, and Yahoo) or we can consider the kinds of structured information we might work with following the hCard example.²⁹²

Web pages might embed structured information concerning calendar events (e.g., upcoming concerts, school plays, soccer matches, etc.), products such as palmtop devices (price, storage capacity, connectivity, etc.) and recipes (ingredients, preparation time, calories, etc.). We in turn could promote ourselves (e.g., on the job market or the “dating market”) through the structured information we push (e.g., education, hobbies, job history, languages spoken, etc.).

And then we can consider some specific examples where a sharing of “meaning” (structured information, metadata) might bring real benefits:

Form filling. We will own (P1) a “this is my life” resume of structured information to represent job history, credit card information, hobbies, preferences, family background, medical history, etc. Information can be structured according to shared standards (e.g., an elaborated version of the “person” schema of schema.org or the hResume microformat²⁹³).

If the standard is widely supported, we have greater freedom to switch between services such as Amazon or Expedia, knowing that the same information can be used and that we don’t need to do a time-consuming (and error-prone) re-entry of information. More generally, the completion of all manner of forms—for travel reimbursement, medical claims, job applications, etc.—should become much faster, easier, and less prone to error. We can push updates concerning our status to others who maintain information about us (P2). We can push information about ourselves out to others (P4) in ways that more clearly establish our relevance (e.g., to a prospective employer).

We’re also more likely to take the time to get the information right (or at least right for us and our aims), knowing that we need do so only once. This information is selectively, securely communicated to others on a need to know basis. Obviously, this information is extremely personal so it is good that it is under our “lock and key.”

Bidding for us and our resources (money, time). We can expect searches for relevant information (P6), especially for goods and services, to be much more effective. Meaning expressed through structure (metadata) will support searches for relevant information that are more precise (“only hotels within a mile of the convention center please and under $200 per night”). Just as important, shared meaning (e.g., of and through attributes like “price,” “bed type,” “on site fitness center”) will give us a basis for making sense of the results returned. We can elect to display and sort by attributes that help us to compare and contrast the results returned. Using the search for an airplane flight as an example, we might elect to display cost, date, travel time, number of plane changes, for the flights that are returned by a search.

Different from today’s support through services such as Expedia and Kayak is the possibility to pick and choose for different components to a search. Pick one application for assistance in the specification of the query; pick another for the actual completion of the search; and pick a third for results display. For example, I really like the results display provided by Hipmunk. If only I could combine with a search (or searches) that returned the best possible matches for my query.

Shared meaning though structured information gives us a basis for issuing our own personal advertisements or our own “contracts” for bidding. We might post (P4) our interest in planning a vacation for “sometime in June” with appropriate restrictions for total cost, duration, location, etc. Let providers bid for the money we’re willing to spend. Have all bids organized into a spreadsheet for display and sorting by the features we care about.

As we considered the personal potential of shared structure at the start of this section, we drew inspiration from the vision of the Semantic Web. The Semantic Web is envisioned as a structured expression of meaning—for nearly everything of interest—to be shared globally between machines as well as people. What could the realization, if only partial, of the Semantic Web vision mean for each of us in our personal practices of PIM? Scenarios described “agents” operating on our behalf and capable of doing a wide range of things to make our lives easier, more productive, and—dare we say?—happier.

But the path toward fulfillment of the Semantic Web has been anything but straight. Detours have brought us to Microdata and RDFa Lite as the likely the top two formats, moving forward, for the structured representation of meaning. Whether these detours eventually lead back to a fulfillment of the Semantic Web is not clear.

Experiences so far would seem, at the very least, to challenge the notion that meaning can be simply expressed as RDF subject-predicate-object triples or that these triples can be meaningfully shared—whether between people or computer-based applications—absent the sharing of a larger context. We note, for example, that the successes listed above for sharing of structured information—schema.org, COinS, and vCard/hCard—each depend critically upon the use of schemas.

Schemas express expected attributes, their names and the types of values these attributes can be assigned. A schema establishes a namespace—“title” in the context of vCard means something distinctly different than “title” in the context of COinS bibliographic reference. Schemas establish a grouping for attribute/value pairs and so a subject for these pairs. In the context of such a grouping we can meaningfully generate RDF triples, i.e., attribute/value pairs of a grouping are each “about” the same entity (e.g., the person described in a vCard or the bibliographic reference described via COinS).

But micro-expressions of structured information for events, personal contacts, bibliographic references and so on fall far short of grand unifications envisioned in the context of the Semantic Web. More important for our purposes, these tell only part of the story for the personal potential of shared structure.

What’s left to consider?

8.2.5 PERSONAL POTENTIAL REVISITED: THE MEANINGFUL SHARING OF STRUCTURE

We’ve considered the sharing of meaning through “micro” structure as expressed through subject-predicate-object triples or, alternatively, as attribute/value pairs in the larger context of a grouping to establish “subject.”

But much of the information we work with resists such a fine level of structuring. What matters, instead, is a more “macro” level of structuring. We need to organize our information so that, as we work to complete a task or make a decision, the information we need is all in one place. (See the “Taking back our information, #1” inset with its discussion of the grouping item.)

In considering a macro vs. micro level of structuring we distinguish between the “sharing of meaning through structure” (in the spirit of RDF, for example) and the “meaningful (useful) sharing of structure.” At a macro level, computers can help us to share structure through, for example, search and social media support but may not need to “know” much about meaning in order to provide this support—any more than, for example, Google, Bing, or Yahoo! search needs to “understand” a web page in order to give us fast access via a standard keyword search.

What are these macro structures? And what personal benefits can we realize as these structures are shared—whether between the applications we use, among the people in our lives or with ourselves over time?

A review of macro structures we might potentially share and the benefits of doing so can be organized by one of the two yardsticks we’ve been using to assess PIM—the six senses in which information can be personal. We’ll start with the sixth and work our way backward.

P6, information that is relevant to a current or future need. How can the sharing of structure help us with the information in this “6th sense” of the personal? P6 comes from a variety of sources but it’s easiest to think of P6 in the context of our search for information on the Web. An intriguing finding by Qu and Furnas²⁹⁴ is that our searches on the Web are often more for structure, e.g., in the form of definitive, well-organized articles, than it is for mere “bags of facts.” Subsequent searches are then often guided by the structure extracted from such an article.

Those of us who have, at one time or another, been given the task of writing on a topic—especially one with which we are not familiar—can surely relate to this finding. We often look for a good, well-cited (or “well-linked”) review on the topic. The headings and subheadings of a review article may then each be the basis for follow-on searches. These may also (with some modification by us) form the outline of our own report and also the structure that we use to organize search results we wish to keep.

If structure, more than content, is sometimes the object of a search, what implications can we draw? Search services might, at least as an option, rank matching items higher if their internal structure is more elaborated. Beyond this, couldn’t the structure (especially as expressed in the HTML/XHTML) be extracted as first class to be used in its own right as part of our own organization of information and also (with attribution and modified to meet our needs) as part of the structure of our own report?

P5, the information experienced by us. I happened to sit on a plane about a year ago next to a woman in her 90s who had been a dancer at the Latin Quarter²⁹⁵ in New York City during World War II. She had fascinating stories to tell. But I’ve forgotten their details . . . and her name. If only I’d written the details of my encounter down while these were still fresh in my mind. How many of us wish we were more disciplined about keeping a diary or a journal? Yet we never seem to find the time. Part of the problem in a digital age of information is that we’re not sure where to put our reflections. In a special-purpose “digital diary” app? Which one would we trust? On Facebook? Many of us may not want to be that public with our personal thoughts and reflections.

We may be more inclined to record our daily experiences—our stories—if these can serve not only as content but also as enhancement to shared structures with multiple uses:

• Stories might be told as an overlay to the item event log.²⁹⁶

• Stories might in particular be a way of organizing the pictures we take.

• Stories are a framework in which to weave in additional information—our thoughts, to-dos, or links to additional information (to the Wikipedia article on the Latin Quarter for example).

We often tell our stories with different audiences in mind. Some stories are told for our bridge or poker buddies. Or the members of our book club. Others may be told for colleagues at work. Still others are told for our children or our spouses.

Stories linked to our intended audiences on the one hand and to pictures and video clips on the other may provide a more solid basis (structure) for sharing. Many of us are unsure what to do with our pictures. Or with whom these should be shared. Sharing individual pictures can be something of a mish mash anyway. Stories provide a basis for organizing pictures and other information items (e.g., videos, web page links) and for systematically sharing these. (“that’s a story I’ll tell only my spouse;” “this is a story I’ll tell my friends and colleagues²⁹⁷).

An intriguing thought behind the sharing of structure in these examples is that we might be better about keeping a diary and better about being more organized in general if only we could get more leverage from our efforts. Even the busiest of us might invest the time in structures for our information if only these could be shared more widely (with our applications and with other people) and used in more ways.

P4, the information we share with others. As we share our experiences with others, we move from P5 to P4 (information sent to/shared with others). Shared structure for P4 information can come in forms other than a story. The classic is the structure of a resume. A search for “resume templates” returns many useful links.

But increasingly the information we use to sell ourselves and our services persists on the Web. We use principles of SEO to draw people to our site. Shared structure now comes in the form of templates we might use to improve the attractiveness of our site and its ranking in search results (for the search phrases we target). Also, shared structure may come in the form of standard layouts for the web pages on our site so that visitors, familiar with the standard, can more quickly find the information they seek. (We don’t want visitors to “time out” on our sites in frustration for not being able to find what they’re looking for.)

P3, information directed to us from others. In the chapter on technologies to search we considered the potential for a filtering of incoming information directed our way from others (P3) and, more generally, the information we experience (P5). This is done via situated searches—searches that are placed in association with a folder (or tag or other grouping item) that represents a project or area of interest for which the search is tuned to return relevant results. But how are these folders organized? And what might the folders and their organization say about the diet of information we receive? Are we getting a balance of information and viewpoints on controversial topics? Is the information we receive on professional matters balanced by information relating to finances? Fun things? Family? We can imagine sharing template structures (and supporting applications) that help us to achieve a greater balance. Working through the structures we use and the information so organized, a “Work-life balance” application might give us an assessment concerning the “weight” of information under different branches of the tree (e.g., for work, play, family, friends, community involvement, etc.).

P2, information about us (especially as kept by others). Structure in the case of P2 information might help us most by simply giving us an overview of all the different people and organizations that keep information about us and the members of our family. Such as structure might include categories for finances (income and expenses), legal, school/academics (for us or our children), and medical (for us and each member of our family).

P1, the information we own. Shared structure as applied to the information we own (and is under our control) might come in the form of “Getting Organized” templates.²⁹⁸ Also, as structure is given first class treatment, we can expect that any number of websites might provide structure for us to copy and paste as a local structure (realized by folders or otherwise) that we can apply to our own information. This was the possibility discussed above for P6 information and so now we’ve come full-circle—from information relevant to us to information we own. The structures we extract can be made ours to manage our information.

Consider, for example, a small sampling of many “how to” sites available now and providing a range of step-by-step procedures:

• How to get into a top college or university²⁹⁹

• How to buy a house³⁰⁰

• (So you wanna) run a marathon?³⁰¹

• How to get a job³⁰²

Each comes with a structured set of steps (and sub-steps and even sub-sub-steps). This structure has value not only as breakdown (decomposition) of a procedure into manageable steps, but also as a way of organizing the information we collect or generate along the way toward our own personal fulfillment of a goal. A step toward buying a house, for example, is often to find a lender. This step can do double duty as a grouping item (such as a folder) in which to place lender information and paperwork associated with the loan application.

More generally, P1 is where we bring together the information that is personal to us across all the senses of personal. If information content is stored elsewhere and not under our control then at least the links to this information can be kept locally for our awareness. And the structures we impose can be local to be viewed and worked upon through tools that work with structure.³⁰³

Summarizing this section we can say that the personal potential of shared structure (and the information content so structured) is considerable. Shared “micro”-structures can greatly improve the ease and accuracy in the exchange of information about events, bibliographic references, and people. Micro-structures can form the basis of persistent queries or, alternatively, “requests for bid” made available on the Web to providers of goods and services. Micro-structures can also help us to organize the results that come back. Key to the useful sharing of microstructures, however, is also a sharing of associated schemas and namespaces to avoid inconsistencies and “collisions” in the use of names (for attributes and predicates).

Shared macro-structures—whether in the form of stories told, step-by-step “how to” guides, comprehensive reviews, personal SEO templates, or systems of personal organization—have the potential to help us with our information, in each of the six senses in which our information is personal.

But this section, especially in its review of the troubles with the Semantic Web, also points to several caveats and considerations we need to be aware of as we use and share information structure. These are discussed in the next section.

Taking back our information, #3. Representations of structure that are tool independent and also tool-accommodating.

In order to take back our information, we need, first and foremost, to take back our information structures: Reclaim our structures and the content will follow. This is done by giving structures—especially those defined through various forms of the grouping item—a first-class representation. A first-class representation of structure is one that is:

1. Tool-independent but also

2. Tool-accommodating, i.e., representations need to provide room for different applications to work with a structure, each in its own way and to persist relevant data.

In this special insert, we consider how representations can be both tool-independent and tool-accommodating through use of an XML schema called “XooML.”³⁰⁴ For a complete definition of the current version visit the keepingfoundthingsfound.com website.³⁰⁵ XooML has also been described in several papers.³⁰⁶

The following things are true for XooML and the XooML approach:³⁰⁷

XooML is a simple application of XML. We chose XML over reasonable alternatives such as JSON and RDF for two basic reasons:

We like the “document focus” of XML. Documents are assembled from XML fragments (e.g., conforming to the XooML schema). Digital documents displayed online can provide a dynamic, interactive surface for our interactions with information. Document = application. In the spirit of HTML5 any web page can be considered to both a document and an application.

XML supports namespaces and, most important, does so in a decentralized way (no central registry). Namespaces are key if the representation of structure is to be tool-accommodating.³⁰⁸

Focus is on the grouping item, as described in previous insets. A given XooML fragment models the simple node + outgoing-link structure of a grouping item. In recent work, we’ve focused especially on file folders but XooML can be used to represent the structure of any form of grouping item (e.g., tags, “albums” of pictures, “notebooks” of notes, and so on).

Structures are “mirrored” for first-class treatment. A critical principle of the XooML approach is that people shouldn’t have to change and their information shouldn’t have to move in order for structures to become first class. We may rather like the folders, tags, albums, notebooks, etc., that we have or, at least, we may have gotten accustomed to using them. Mirroring is done via itemMirror drivers. Drivers, running from the client side, all work according to a single itemMirror object model but these vary on their “back end” depending upon the storing application and the API it supports for access to the grouping item (e.g., various Windows APIs, various Mac OS APIs, Graph API, the Dropbox API, POSIX, RESTful APIs, etc.). itemMirror—the object model, drivers, and overall approach in relation to XooML—is further described in the next insert.

Figure 8.3: A XooML schema provides for a tool-independent representation of the structure of a grouping item as a fragment (node) plus 0 or more associations (links). XooML is also tool-accommodating through the provision for NamespaceElements. An application can store data specific to its work with the structure within namespace elements—both at the fragment level and for each association of a fragment.

The “essentials” of the XooML schema are depicted in Figure 8.3. A fragment, representing the node-link structure of a grouping item, consists of:

tool-independent (fragment common) attributes +

zero or more tool-accommodating (fragment namespace) elements +

zero or more associations

This pattern partially repeats for each association which also consists of:

tool-independent (association common) attributes +

zero or more tool-accommodating (association namespace) elements

The schema supports a representation of structure that is both:

Tool-independent—a fragment (node) can have zero or more associations (links) and

Tool-accommodating³⁰⁹—at both the fragment level and the level of each association, “XooML-speaking” applications can store data specific to their work with the structure within namespace elements.³¹⁰

A few variations in the use of XooML are worth noting:

1. Support a metadata standard. A collection of applications might all work with the information in a namespace element. For example, applications self-described as supporting Dublin Core might each work with elements (at both the fragment and association levels) accordingly identified (e.g., xmlns:dc=“http://purl.org/dc/elements/1.1/”). Other tools might work with iCalendar namespace bundles (e.g., with xmlns:ic=“urn:ietf:params:xml:ns:icalendar-2.0”).

2. Make bibliographic references, tasks, and to-dos. As an extension to variation #1, namespace sub-elements needn’t be restricted just to “surface” information for display (e.g., position, color, shape, etc.). Elements can contain information needed for an association or a fragment to work (appear and behave) as a task, an appointment, a bibliographic reference, etc.³¹¹ For example, an iCalendar attribute bundle could provide the necessary data for a fragment or an association to behave as an event (ic:vevent) or a “to-do” (ic:vtodo).

3. Use associations or whole fragments? As #2 suggests, the namespace bundle needed to make an association work in a special way (e.g., as a task, an appointment, a reference, etc.) could just as easily be placed at the level of the fragment as a whole. Fragment or association? The answer depends upon whether we wish for the “thing” involved to behave as a grouping item in its own right—i.e., capable of linking to other items and, in turn, capable of being linked to.³¹²

4. Support RDF. Namespace bundles can also be used in support of RDF (i.e., using namespaces with the following assignment: xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#”). The grouping item mirrored might then serve to group together SPO triples pertaining to the same subject.

5. Represent a multidigraph. Fragments as nodes, link one to another via an association attribute, associatedXooMLFragment (an association-common attribute). Two or more associations of a fragment can link to the same fragment or even to the fragment itself. As such, fragments in aggregate have the flexibility to represent a multiple digraph or multidigraph.³¹³ XooML has the flexibility needed, for example, to model the hyperlink structure of the Web as a whole.

6. Represent a hypergraph.³¹⁴ In example #4 and with reference to graph theory, an association is an edge (link) with three vertices (nodes): one each for the subject, predicate, and object of the proposition. More generally, an association, through the additional attributes of a namespace element, can link to any number of nodes.

In the XooML approach, our information stays where it is. Leave the information organized into the folders of our local file system or the synchronized folders of a web-based storing application like Dropbox. Leave the information in the albums of Facebook or the notes of Evernote or the tasks of Remember The Milk. XooML-speaking applications work with our information “as is” via APIs supported by the storing applications (i.e., the applications through which the information is currently stored).

The XooML approach doesn’t presume that these existing applications will (ever) change. Nor does it presume that we will ever move our information from these applications. Nor does XooML’s success depend upon the adoption of new standards or the dominance of some new unifying storage “vault.” The XooML approach allows, instead, for an incremental approach in which integration happens through the supported APIs of existing applications and through a gradual accumulation of XooML-speaking apps built or retrofitted to “speak XooML” in order to work with our information through these existing applications.

But how? How can apps be built or modified to speak XooML . . . correctly? clearly? consistently? And how much work are XooML-speaking apps required to invest in order to speak not only XooML but also the API of a storing application? The answer, in short, is “not much.” But this is a topic for the next inset in this “Taking back our information” series.

8.3 CAVEATS AND CONSIDERATIONS

Following the structure of chapters for saving and searching, this section would be titled “Caveats and disclaimers.” But we’ve considered a number of disclaimers already in the context of the chapter’s review of the Semantic Web. In short: The grand vision of the Semantic Web as a global, interconnected, machine-readable representation of meaning may never be realized nor efforts to extend the Semantic Web into the realm of PIM.

If some form of a grand integrative representation of structure for meaning is eventually realized, it will most likely happen from the ground up. Efforts to represent structured information, in-line in the HTML/XHTML representation of web pages, and to share this information, have been successful and have proven very useful. Whether these islands of structure—for contact information, bibliographic references, events and so on—are eventually linked into larger, grander representations of meaning remains to be seen.

This section keeps a focus on caveats (warnings, concerns, exceptions). But then, in a more constructive vein, caveats are grouped by considerations that apply to any initiative to make more effective use of information structure—whether the initiative is grand and global or limited and local (as in “I need a better system for organizing my stuff”).

8.3.1 CONSIDERATION #1: WHAT IS THE SMALLEST UNIT FOR A “MEANINGFUL” SHARING OF STRUCTURE?

Efforts to share meaning in structure can occur at two distinctly different levels. Call these:

An “atomic” level in the form of the beguilingly simple subject-predicate-object (SPO) triples of RDF.

A “molecular” level in “micro”-nuggets of information conforming to a schema such as the contact information of a vCard/hCard, the bibliographic information of a COinS citation or the “events,” “organizations,” “people,” etc., as specified through schema.org.

There is a seemingly similar question to that of Consideration #1: What is the smallest unit for the structured representation of meaning? But this question is actually quite different and out of scope for a proper treatment here. The answer to this second question may very well be 3-tuple of a SPO statement as represented through RDF. But some might argue that our “atoms” for the representation of meaning through structure need to be larger or that, conversely, they might be smaller still—i.e., that we might possibly create a structured representation of meaning out of 2-tuples (pairs).

The question of Consideration #1 is what is the smallest unit that can be meaningfully shared, amongst ourselves and with our applications? In his blog post “The Ultimate Problem of RDF and the Semantic Web,”³¹⁵ Vuk Miličić contests the frequent characterization that “RDF is just triples” as an “illusion.” But RDF is, at its essence, an expression of information in simple subject-predicate-object statements, i.e., as SPO triples. Triples are the unit for the expression of meaning. The illusion then may be in ever thinking we can infer or share meaning at the level of the SPO triple.

An example illustrates. If I happen across the statement “John’s cell phone number is +1 888-888-8888” then, even if “+1 888-888-8888” is a perfectly working phone number and even if I’m clear which John is being referred to (in an actual RDF statement, “John” would be identified by a URI), I might still have questions. When was this statement made? By whom? Based on what? A reliable source or my own scribbles hurriedly made on a scrap of paper? Sure, I could give the number a try anyway. But what if this is no longer John’s number? What if the number has been assigned to someone else who happens to be in a different time zone? (This actually happened to me once. I found myself making apologies to a groggy, irritated stranger.)

The questions above are meant to elicit metadata (data about data, information about information) concerning the statement’s provenance.³¹⁶ We rarely take a statement at face value in our daily lives. We want to know who is making the statement. When? Where? From the answers we can make a judgment about the statement’s current validity. A forecast of “rain today” may no longer be valid if it turns out that the forecast was made last week (unless, perhaps, if the forecast was made for Seattle in the wintertime).

And what happens if two statements conflict with one another as when we have two statements concerning John’s current telephone number? Do we compare metadata statements of provenance associated with each? But then, what about the metadata provenance of these metadata statements? Statements concerning when a statement was made, where, and by whom, are themselves subject to the same questions of provenance.

In simple cases where a “micro” chunk of structured information is communicated in-line, we’re spared infinite regress by making reasonable assumptions. If we trust what we see in a web page’s display, for example, we’re also inclined to trust the structured information within. This isn’t foolproof. But the assumption mostly works. We trust the hCard information we get through a person’s website. We trust the COinS information that comes from a publication database or a researcher’s site with its “list of publications.” If a website is masquerading as that for a person or an academic institution, that is another matter entirely. More likely is that the micro-information is improperly formatted or out of date or that content for visible web content was updated but not the micro-information. Cross-checking and sanity checking are always advised. Reasoning “outside” the space of information, we might say, for example, “this can’t be her current email address—I got a message from her sent via another email address. I wonder if the rest of the information is wrong too?”

For some formats and facilities, updates and provenance are built in. The vCard we have may be out of date but we might be able to query SOURCE to get an updated copy. Wikis provide information concerning provenance in the form of a revision history. Increasingly, we should expect (insist) that the information we work with includes at least minimal information concerning the who, when, and where of its origin and possibly including an “expiration date.”³¹⁷

In the meaningful sharing of structure, we need to work at a “molecular” level where the sharing is not of just one but of a constellation of interrelated, schema-conforming, namespace-qualified statements. The individual statements including statements of provenance, might themselves be fully expressed in RDF (or not). But the sharing is of a grouping of statements.

We also note that making statements of provenance in RDF about a statement requires a reification of the statement, i.e., the statement is given its own URI and statements of provenance refer to the statement via its URI.³¹⁸

Reification in RDF is needed not just for statements of provenance but even to represent simple sentences such as “John gave Mary the book.” We might, for example, assign a URI to the SPO triple representing that “John gave the book” and then address this statement through its URI as an object in its own right to add that “Mary is recipient” (of the act of John’s giving the book).

Why bother? Well, if I want the book it matters. But then provenance also matters. If John has lent the book out on several occasions I want to be sure that the statement “John gave Mary the book” represents the current state of affairs.

Needless to say, RDF expressions, no matter how these are serialized, can get quite involved (and difficult for people to read). Source code is also hard for people to read, the rejoinder might be, and even more so compiled code. But then a counter (to this counter) is that some readability and an ability to edit directly (e.g., in a plain text editor) was a key part of the success of HTML and may be so as well in efforts toward the representation and sharing of meaning on the Web.

Levels of meaningful sharing and issues of provenance are not just the province of researchers working on the Semantic Web. We encounter a variation of the problem of provenance whenever we come across a document such as “Very important marketing report, Final Version.” “Final version”? Really? We’re wise to take such a statement with a few grains of salt. We’re more confident that the document really is the final version if we’re able view its entry in the context of a folder listing all versions of the document, sorted by “last modified.”

8.3.2 CONSIDERATION #2: HOW MUCH MEANING CAN BE SHARED (RELIABLY, USEFULLY) THROUGH STRUCTURE?

The question of Consideration #2 points to the heart of discussions concerning the relative merits of ontologies vs. taxonomies. What’s the difference?³¹⁹

Let’s start with origins and definitions:

Ontology. “Originally Latin ontologia (1606, Ogdoas Scholastica, by Jacob Lorhard (Lorhardus)), from Ancient Greek ὢν (‘on’), present participle of εἰμί (‘being, existing, essence’) + λόγος (logos, ‘account’).³²⁰

The modern general meaning closest to our purposes is “The science or study of being; that branch of metaphysics concerned with the nature or essence of being or existence.”³²¹

Taxonomy. “It derives from the French taxonomie coined from the Greek words taxis (τάξις; order, arrangement) + -nomia (method) from -nomos (νόμος; managing, law) from nemein (manage, distribute, put in order).”³²²

The modern general meaning closest to our purpose is “Classification, esp. in relation to its general laws or principles.”³²³

Ontologies are often thought to trump taxonomies in a manner similar to the way “knowledge” is often considered by some to be a stronger playing card than “information.” In the case of knowledge vs. information, why wouldn’t we want to manage knowledge representing, for example, expertise in an area, rather than mere information? (Or, even better, why not some kind of “mind meld”?) Likewise, in the contrast between ontologies and taxonomies, isn’t it better to represent the “nature of essence” rather merely to classify?

If only we could. But we can’t. Elsewhere, I argue that information is a thing to be managed.³²⁴ Knowledge, by contrast, is “information in action”—to be inferred from actions and behavior—ours or an organization’s. In the other direction, we acquire new knowledge for an area through a process of instillation rather than “installation,” i.e., we learn. The information we experience must be made sense of, internalized, and integrated with the knowledge already in our heads. No such thing as a “mind meld.”

All of our efforts to “capture” knowledge and to express as a thing in its own right produce information instead, albeit often in a more complicated form that is more difficult for us to understand and maintain. And then, even so, the information so rendered may not be all that consistently understandable by our computing applications either (for reasons discussed in the previous section).

But the analogy that ontologies are to taxonomies as knowledge is to information is imperfect. I’m not a philosopher and this is not the place anyway to deal with questions concerning the differences between ontology and epistemology.³²⁵ Suffice it only to say that “ontology” as used in an informational, computational context³²⁶ is quite different from its use in philosophy.

In an informational context, efforts to produce either an “ontology” or a “taxonomy” result in external expressions of an internal understanding, i.e., information representing knowledge. Guarino et al. note that “The backbone of an ontology consists of a generalization/specialization hierarchy of concepts, i.e., a taxonomy.”³²⁷

But then, with some rough connection to the philosophical roots of ontology, we can agree that an ontology in an information context should, somehow, be “more” than a taxonomy, i.e., the aims behind the creation of an ontology are more ambitious and there is an attempt to convey more meaning in the structure of an ontology, in contrast to the structure of a taxonomy.

What then is an ontology that makes it “more” than a taxonomy?

A taxonomic structure is often a hierarchy or rooted tree³²⁸ (i.e., where, informally, every node—taxon, category—except for the root has exactly one parent.³²⁹ But this is not a restriction. We can allow for the possibility that an information item, for example, can be “classified” in more than one way. We do so when we apply more than one tag to an item or—if folders are used instead as the grouping item—when we place link (e.g., via a “shortcut” or “alias”) to an item that isn’t “contained” within the folder.

What then, distinguishes an ontology from a taxonomy?

Gruber, a well-cited authority on ontologies, offers the following definition: “In the context of computer and information sciences, an ontology defines a set of representational primitives with which to model a domain of knowledge or discourse. The representational primitives are typically classes (or sets), attributes (or properties), and relationships (or relations among class members). The definitions of the representational primitives include information about their meaning and constraints on their logically consistent application.”³³⁰

What does an ontology look like? Portions of sample ontologies are graphically depicted in Figure 8.4 and Figure 8.5.

Figure 8.4: A graphical depiction of the Java Cyber Agent Framework (JCAF) ontology for payload description.³³¹ From Wallace, Leveraging OWL-DL, SPARQL, and XSLT to Automate Java Agent Configuration, 2009. Copyright © 2013 Mediabistro Inc.

One thing evident in both figures and also in Gruber’s definition is an emphasis not only on relating the elements (classes, categories) but also in a more precise specification of the nature of the relationship(s) between two elements. This is the “P” in an SPO triple. This is also the attribute connecting the implicit subject (e.g., of a vCard or COinS reference) to the value (as object).

The relationship between elements in a taxonomy may be characterized as one of class, containment, “copied from,” or “links to” (e.g., a web page is a node in a taxonomy, with links to other web pages via its hyperlinks). But this relation is not explicitly represented in the links of the taxonomy. Links are not typed or labeled.

Does explicit representation of the relationship (the attribute, the predicate) matter? Yes. But not always and everywhere. We care whether a phone number as value is for a person’s mobile phone or land line. Even more so, we care that it is a mobile phone number and not a surface address or a credit card number. Consideration #1 already established the utility of structured information in “micro”-chunks, represented in-line in HTML or XHTML web pages and conforming to a specified schema and with attribute names unambiguous within a specified namespace.

But a more general, consistent, application-independent representation of relationships is problematic. Thinking in terms, now, of SPO triples, we can observe that, though agreement concerning the “S” and the “O” may sometimes be challenging, it is certainly doable. We do the equivalent all the time when we point from one web page to another using a hyperlink. Far more problematic is consistent specification of a link type or, equivalently, the “P” of a triple. Doable to be sure. But the costs are considerable—costs to create, costs to test, costs to update later, and the costs of the inevitable error now and then in the representations.³³³

In most situations of PIM, we must question “Are these costs are ever likely to be repaid?” If our interest is mostly in a basic organization of our information according to interest areas (e.g., people, places, general topics of interest) and according to current projects, then the answer is a definite “no.”

Yes, we forgo the potential for a finer-grained support from our computing devices as envisioned under the Semantic Web initiative. But we also forgo the costs associated with a finer-grained representation needed for this support.

In reference to the word “ontology” and its use in connection to information management, Bates notes that we are burdened by philosophic etymology of the word as “describing the world as it truly is, in its essence . . .” when “. . . in fact, we do not actually know how things ‘really’ are. Put ten classificationists (people who devise classifications) in a room together and you will have ten views on how the world is organized.”³³⁴

In the final chapter (“To each of us, our own”) of this book we will consider the development of schemes for the organization of all of our information. These schemes are meant to have sufficient durability and flexibility to last us a lifetime, in contrast to “brittle” schemes organization that we may develop with high hopes only to see these “break” under the pressure of incoming information).

These schemes are taxonomies not ontologies: Personal unifying taxonomies or PUTs. PUTs can be realized through an assembly of grouping items (see previous inserts) linking to one another and to items of information content.

Grouping items to realize a PUT may come in the form of folders, tags, section tabs, and even ordinary web pages or any combination of these.

As we’ll explore through the next inset in the “Taking back our information” series, the choice of grouping item (and storing application) shouldn’t matter providing that the applications we use do a small number of things the “same” way. But convincing application developers to do things the same way is a tough sell. Sameness should bring benefits not just eventually for the end user but also immediately for developers as the application is being built. This is a topic for Consideration #3.

8.3.3 CONSIDERATION #3: HOW MUCH NEEDS TO BE THE SAME FOR STRUCTURES TO BE SHARED?

How much do things need to be “the same” in order to make better use of structure? This is a question that’s more often of concern to developers than to end users. As end users, we might happily reap the benefits of, for example, “RDF sameness” as described in the Semantic Web initiative but still have little reason understand or care about the way this sameness is actually implemented—unless the costs of sameness are passed along to us as a degradation in performance or as unexpected behaviors in the agents that are supposedly acting on our behalf.

Sameness matters more to developers who may be charged with using a common infrastructure in support of whatever “sameness” is meant to reign across a particular information landscape. The near uniform support we see now for copy & paste and the clipboard, though clearly a win for us as end users, came at an initial cost to application developers who were required to make modifications in their code in order to use shared (“same”) support for these features.

For successes such as the clipboard, there are many other failed attempts at sameness. Consider, for example, initiatives in the hypertext/hypermedia community toward a basic sameness in the storage and semantics of hyperlinks. First came efforts toward open hypermedia systems in which anchors, links, and other structural elements are flexibly defined and have existence independently of the documents to which they apply.³³⁵ The open hypermedia initiative, in turn, inspired a movement toward structural computing³³⁶ as an attempt to generalize the techniques and lessons learned from open hypermedia efforts.

But a shared limitation of these efforts is a “heavy weight” requirement that participating applications make common use of basic utilities and structured storage. The work and the trust involved to do this has been prohibitive. Anderson, in reference to structural computing efforts aimed at integration, notes, for example, that an environment may require “installation of a database, . . . server, . . . support tools . . . clients” and that, in general, these requirements are “too steep.”³³⁷

A little bit of sameness can go a long way toward making our interactions with information easier. In a book chapter, “Unify Everything: It’s All the Same to Me,” Karger³³⁸ notes that unification can come in many different forms. He describes common support for text as one such useful unification. (And, we might say more specifically, support for ASCII and, now, variable-width encoding standards—most notably UTF-8—as a way of supporting the Unicode character set while maintaining backward compatibility with ASCII³³⁹).

But especially notable as a success story are the unifications that underlie the Web. As reviewed at the outset of this chapter, the Web is based on three key unifications: HTTP, the URL, and common support among browsers for the rendering of HTML. From these basic unifications has emerged a World Wide Web that may ultimately link nearly all human-generated information in one way or another. The Web grows daily in a highly dynamic, distributed, decentralized fashion. Any one of us can add to and extend the Web without “approval” from a central authority.

One downside of this relatively low level of unification and the informal flexibility it permits is that links break and we occasionally see the 404 error (“page not found”). Most of us likely consider this a fair tradeoff.

Proposed unifications such as a uniform use of RDF for the sharing of structured information between applications may still “go viral” in the manner of the Web unifications. But, notwithstanding the apparent simplicity of the basic SPO triple as a means for representing information, the evidence so far suggests that the costs of consistent, coordinated use or RDF are too heavy and the requisite changes required of participating applications too extreme for RDF to gain widespread adoption.

In the book Keeping Found Things Found³⁴⁰ I make a distinction between unification and integration.³⁴¹ “With integration, pieces fit together to make a more perfect whole but still retain their identity as separate pieces. With unification, the pieces lose independence with respect to the dimension of unification.”

Integration at one level can build upon unification at a lower level. We see it all the time in the form, for example, of web pages generated dynamically through a linking in of different pieces—text, pictures, videos—that still retain their separate identity (and URLs) to be used in other ways in other web pages.³⁴²

We aim for a greater integration of our information through the structures we share—with our applications, with other people and with ourselves over time. Sharing in turn is enabled through some unification—some sameness—in an enabling infrastructure used by applications.

But requisite sameness is a cost to application developers—a cost measured not only in changes to a code base but also in dependencies that could be an ongoing source of bugs and maintenance headaches. In general, the more sameness, the more cost. We then need to be selective and strategic in the unifications we embrace.

It’s all well and good to speak of offsetting benefits in the eventual goodness delivered to the end user (and also to the developer whose application is successful with the end user). But benefits are later and costs are now. In the ideal, sameness also brings some immediate benefit—to developers—as well as delayed benefits to users. This might happen, for example, to the extent that common use of external utilities spares developers the cost of developing these utilities on their own. This is a topic for insert #4 in the “Taking back our information” series.

8.3.4 CONSIDERATION #4: HOW MUCH NEEDS TO CHANGE FOR STRUCTURES TO BE SHARED?

Just as Consideration #3 is more of a concern for developers, Consideration #4 is more of a concern for us as end users. How much must we—are we willing to—change in order to make better use of our information structures?

The question is addressed more generally in a later chapter of Part 3 to this book (the chapter titled, “PIM by Design”): How much—how quickly—are we willing to change our habits of information interaction in order to improve our practices of PIM? How soon do we need to see payback for our efforts?

These questions are especially important in a “web-widened” world where the requirement to install an application on the desktop or even as a browser plugin is often a “nonstarter”—all the more so if any money must be paid.

We return to a previous “post-mortem” comment by Karger on the difficulties of getting people to use Haystack: “in practice we found it difficult to convince people to abandon their long-cherished PIM tools in favor of a half-baked research tool.”³⁴³ I’m not so sure most of us “cherish” the applications we currently use. The relationship between us and our current applications is often more love-hate. I quip, for example, that “MS Word is an app that I hate to use . . . everyday.”

Even so, a requirement that we abandon existing applications in favor of new applications—even if these are polished products rather than research prototypes—is usually a non-starter. This holds especially true if we’re asked to move our information to some new storing application or to transform our information in ways that mean we can no longer work with the information through our current applications.

Much better is if information can stay where it is but also be used in new and different ways. One accounting for the tremendous success of Dropbox with consumers is that Dropbox does—almost—exactly this. Yes, we need to move information to be in or under a designated Dropbox folder in our local file system. This is a drawback.³⁴⁴ But information is still in our local file system. Our information stays where it is but now, thanks to Dropbox (or, similarly, with SkyDrive or Google Drive), can be shared with others and synced across our devices.

How can we realize a comparable sharing of our information structures—so that these structures stay where they are (e.g., as folder structures in our local file system) but can now be shared not only with other people but also with a whole new set of applications? This is another question to be addressed in inset #4 of the “Taking back our information” series.

Taking back our information, #4. Taking back our information even as we leave it where it is.

The previous inset in this “Taking back our information” series described the XooML way of using XML to represent—“mirror”—the structure of grouping items (such as folders, tags, “albums,” “notebook,” and ordinary web pages). The XooML representation of structure is “first class” in the two senses described previously in this chapter on structure.

1. The XooML fragment is a modular, tool-independent representation of a “noodle” (a node + outgoing links). Since nodes can link to other nodes (or even to themselves), fragments in aggregate can be seen to form a multidigraph (as described in the previous insert on XooML).

2. A fragment, and each of its associations, can include any number of tool-accommodating namespace elements to store the data an application (or a collection of applications supporting a particular metadata standard) needs in order to give its special spin on the underlying structure.

The XooML approach, in line with the “Caveats and Considerations” section, focuses on the grouping item as the basis for a meaningful sharing of information structure. As end users, we aren’t required to move our information or otherwise change very much of what we do already in order to take advantage of new ways of working with our information as provided by “XooML-speaking” apps.

But where do these XooML apps come from? How difficult are they to build? How difficult is XooML to use? How well do apps work together? And how much needs to be the “same,” beyond use of XooML, among XooML-speaking apps?

The short answer: itemMirror.

itemMirror³⁴⁵ is an object class supported through a simple code base that can be translated into different programming languages for use on different platforms. We playfully call the code base the zootilities (as a combination of “XooML” as it should be pronounced + “utilities”). itemMirror is currently supported through JavaScript zootilities for use in the construction of HTML5 applications. However, by the time you read this we expect to have a port of itemMirror code to Objective C for use to build applications on the iOS³⁴⁶ platform to build applications for iPhone, iPod Touch, and iPad devices.

As mediating software, itemMirror zootilties (or simply, “itemMirror”) has both a front-end and a back-end:

On the front-end, itemMirror “faces” developers and XooML-speaking applications with a simple itemMirror object model.

On the back-end, itemMirror is able to work with—read and write-changes back to—the structure of various forms of the grouping item via the APIs of storing applications such as Dropbox, Google Drive, SkyDrive, Box, and even social media applications like Facebook (e.g., via Graph API). Interaction with the APIs of storing applications is through itemMirror drivers as specified through association-common attributes and described further below.

All of a XooML-speaking application’s interactions with a grouping item, its storage, and the XooML fragment take place through the itemMirror objects.

On the front-end: itemMirror methods

An application begins a session with a user by instantiating an itemMirror object for a “seed” grouping item. Instantiation can happen either using a known XooML fragment for the grouping item or, if such a fragment is not available, by creating one.

Thereafter, instantiate additional itemMirror objects recursively for each link of the grouping item under consideration according to app-specific settings from a previous session with the user. In the desktop version of Planz, for example, expansion happened in order to reconstruct the state of the outline view from the previous session—where each heading/subheading of an outline corresponded to a different folder/subfolder (as the grouping item). Expansion was driven by an “isCollapsed” attribute in the namespace element that Planz kept for each association of a grouping item.

Methods of the itemMirror object provide support for the following:

List associations.

Create an association.

Delete an association.

Save—i.e., save the XooML fragment back to its file.

Sync—synchronize to insure that the XooML fragment accurately reflects the structure of the grouping item it is mirroring. (In cases of conflict, the grouping item wins.)

Create namespace element—the app provides a namespace URI as an argument. Elements can be created at both the fragment level and for each of a fragments associations.

Delete namespace element.

Get and set of common attributes—both at the level of the fragment and for each of its associations.

Get/set namespace element.

A promise to application developers is that an application, developed once, will work no matter what the storing application is—whether Dropbox, Google Drive, SkyDrive, Box or, even, some applications that we don’t think of us as “storing” (e.g., Facebook).

On the back-end: itemMirror drivers

Key for uniform support across all storing applications are association-level attributes, specified in the XooML, that point to the code needed to read the structure of a grouping item, its mirroring XooML fragment and to handle the logic of synchronization between grouping item and fragment:

itemDriver

xoomlDriver

syncDriver

A class project to use itemMirror³⁴⁷

itemMirror (and through itemMirror, XooML) was successfully used by a group of sixteen Master of Science in Information Management (MSIM) students as part of an independent study project the Information School at UW during the spring quarter of 2013. The class split into five teams and were asked to build end-user applications in HTML5 using item-Mirror objects.

Teams built the following apps (available to try out at http://keepingfoundthingsfound.com/itemmirror/):

• Noot, an application that used tags to categorize information

• Planz5, a way to organize information by project

• StormNote, a simple note-taking application

• Mind-mapper, a way to visualize and link information

• NoteU, an application that created an effective use of check-lists

By working with itemMirror objects—one, for example, per Dropbox folder—student applications could focus on the “front end” and the user experience, while JavaScript drivers accessed through the itemMirror objects worked directly with Dropbox to ensure that applications also worked well with each other. Apps each worked with the same folder hierarchies as shared through Dropbox but each in their own way (in support of mind-mapping, outlining, note-taking, to-do list management, and quick capture).

The essential steps of the XooML/itemMirror approach are simple (see Figure 8.6):

1. Leave the information where it is (in “storing” apps and services such as Dropbox, Google Drive, Facebook, etc.).

2. Model the structure of this information using itemMirror objects that . . .

a. support the same methods on the front-end but, . . .

b. on the back-end, work with drivers specific to a given application and its API.

c. Drivers provide read/write access to information structures that are “siloed” in the storing app.

3. Now other apps working exclusively through these itemMirror objects might provide complementary ways of working with the information structures.

4. itemMirror objects persist their “mirrors” of structure in synchronized XML fragments according to a “XooML” schema that is both app-independent and also, using XML’s namespace convention, “app-accommodating.”

a. Fragments can also support the requirements of metadata standards such as Dublin Core and iCalendar.

Figure 8.6: The essential steps of the XooML/itemMirror approach.

¹⁸⁵ A blog post by Alex Payne, “The Case Against Everything Buckets” (https://al3x.net/2009/01/31/against-everything-buckets.html), nicely argues for the use of structure as realized through our local “free” file system. The post cautions against “Everything Buckets” that promise us can save and search (e.g., our notes, to-dos, web references, etc.) without a need to structure. The post notes that “everything” applications (e.g., Evernote) are backed by some database—often proprietary. The structure realized through such an application is much less easily shared than are structures realized through a local file system. The post was referenced in a later post by Adam Pash where it prompted considerable discussion (http://lifehacker.com/5666954/avoid-everything-buckets-aka-why-i-cant-get-into-apps-like-evernote, scroll to the comments at the end).

¹⁸⁶ See for example, http://www.quora.com/What-percentage-of-the-web-does-Google-index-and-how-has-it-changed-over-time.

¹⁸⁷ See http://en.wikipedia.org/wiki/PageRank; and also Bryan & Leise, 2006.

¹⁸⁸ J. Teevan, 2006b.

¹⁸⁹ With apologies to Samuel Taylor Coleridge, http://en.wikipedia.org/wiki/The_Rime_of_the_Ancient_Mariner.

¹⁹⁰ Definitions are taken from the Wiktionary (http://en.wiktionary.org/wiki/structure).

¹⁹¹ We can be very glad, for example, that the structure of water leads to an unusual property that its density decreases when cooled below 4 °C. See http://en.wikipedia.org/wiki/Properties_of_water, http://chemistry.about.com/od/chemistryfaqs/f/icefloats.htm, http://wiki.answers.com/Q/Why_does_ice_float_in_water.

¹⁹² Consider, for example, the traditional four movement structure of a “classical” symphony (http://en.wikipedia.org/wiki/Symphony) in which at least one movement is composed in sonata-allegro form, which is itself typically structured into sections of exposition, development, and recapitulation (http://en.wikipedia.org/wiki/Sonata_form).

¹⁹³ For a list of poetry forms (an structures) see http://thepoetsgarret.com/list.html. See, for example, the structure of a villanelle (http://en.wikipedia.org/wiki/Villanelle).

¹⁹⁴ There is for example, Durkheim’s theory “Structural/Functionalism” (http://www.wavesofwords.4t.com/theorywebpage.htm).

¹⁹⁵ There is discussion, for example, of an appropriate level of structure in organizations. Optimal structure varies with the organization, its charter and competition. See for example, J. P. Davis, Eisenhardt, & Bingham, 2009; Ogollah & Bolo, 2009; H. A. Simon, 1971.

¹⁹⁶ This emergent magic of structure is nicely described by Chaisson, 2002. See also Pullan & Bhadeshia, 2000; Herbert A. Simon, 1962.

¹⁹⁷ Implicit (entangled, twisted together). http://en.wiktionary.org/wiki/implicit.

¹⁹⁸ Karger, 2007.

¹⁹⁹ See Erickson, 1996, for personal account of the challenges in creating and maintaining an system for organizing personal information.

²⁰⁰ See, for example, Bruce, Wenning, Jones, Vinson, & Jones, 2010, for the results of a longitudinal study into the ways people manage information related to a project they wish to complete. Participants, after relating locations of information related to a project and reviewing the various applications used in the management of project-related information, were asked to reflect upon their “ideal system.”

²⁰¹ For accounts of how project-related information can be scattered by the very applications we use to manage this information see Bergman et al., 2006; Boardman & Sasse, 2004.

²⁰² See, for example, Bruce et al., 2010; W. Jones, Phuwanartnurak, et al., 2005, for reviews of empirical studies done in the Keeping Found Things Found group.

²⁰³ Again, I use the term “desktop” in connection with applications, information, and information structures that are managed through the operating system of a laptop or desktop computer.

²⁰⁴ http://www.dataliberation.org/.

²⁰⁵ William Jones, 2012.

²⁰⁶ Our information structures accomplish a componentization of our information that can be contrasted with and is complementary to efforts toward a componentization of the software we use in our applications for information management. See, for example, http://en.wikipedia.org/wiki/Component-based_software_engineering. The dynamic assembly of reusable software components into a larger application has often proven to be demanding of computing resources—and the patience of users. See, for example, http://en.wikipedia.org/wiki/OpenDoc#Problems. If we can use our information structures to focus more selectively on a “chunk” of information (e.g., a paragraph, a photo, a single figure, or table) then less assembly may be required to realize the applications needed to work with this information.

²⁰⁷ For more on the Keeping Found Things Found initiative see http://keepingfoundthingsfound.com/ and http://kftf.ischool.washington.edu/.

²⁰⁸ The reference is to CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) (see http://en.wikipedia.org/wiki/CAPTCHA and also http://www.captcha.net/). As we register for this or that service on the Web, many of us have likely encountered the CAPTCHA test to type in the letters of a distorted image. The test remains valid even today as a way to distinguish people from computers and heaven help us if algorithms eventually give computers to the ability to “see” the letters as well. For an entertaining and thought-provoking exploration into what it means to be human in an age of ever smarter computers see Christian, 2012.

²⁰⁹ http://lucene.apache.org/.

²¹⁰ See Bergman, Whittaker, Sanderson, Nachmias, & Ramamoorthy, 2010; Jones, Phuwanartnurak, Gill, & Bruce, 2005; Teevan, Alvarado, Ackerman, & Karger, 2004.

²¹¹ See Collins & Quillian, 1969; Rosch et al., 1976; Tulving, 1983.

²¹² Barsalou, 1983, 1991; Chrysikou, 2006.

²¹³ Chrysikou, 2006, showed improved problem-solving performance after training to construct goal-derived categories.

²¹⁴ http://www.oyez.org/justices/potter_stewart/, found through http://en.wikipedia.org/wiki/Potter_Stewart#cite_note-oyez_stew-art-10.

²¹⁵ “Concept” relates to “conceive” which derives from “Middle English conceiven, from Old French concevoir, concever, from Latin concipere (‘to take’), from con- (‘together’) + capio (‘to take’).” (http://en.wiktionary.org/wiki/conceive). A concept is a “taking together” of things (ideas, to-dos, features, etc.).

²¹⁶ Burrow, 2004; Jonathan Grudin & Poole, 2010; Phuwanartnurak, 2009.

²¹⁷ W. Jones, Phuwanartnurak, et al., 2005.

²¹⁸ Buchanan, Blandford, Thimbleby, & Jones, 2004; Gamberini & Bussolon, 2001; Marshall & Frank M. Shipman, 1995; Shipman, Hsieh, Moore, & Zacchi, 2004.

²¹⁹ As a complement to the bottom-up process of chunking (or composition), new grouping items achieving a finer-grained partition are also generated through a top-down process of decomposition. We may do this, for example, when we break a larger project such as “Plan trip to Boston” into smaller tasks more appropriately placed on a checklist (e.g., “make the hotel reservation,” “book the flight”).

²²⁰ Schon, 1984, p. 4.

²²¹ See, for example, http://en.wikipedia.org/wiki/Gestalt_psychology#Emergence.

²²² Wikipedia provides a very nice introductory article on the World Wide Web with good pointers to additional information (http://en.wikipedia.org/wiki/World_Wide_Web).

²²³ The Future of Personal Information Management, Part I (William Jones, 2012).

²²⁴ http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol. Or see http://www.w3.org/Protocols/HTTP/AsImplemented.html for the original definition as used in 1991.

²²⁵ http://en.wikipedia.org/wiki/Uniform_resource_locator. See also one of the original specifications of the URL, http://www.w3.org/Addressing/URL/url-spec.txt. Conventionally, the URL has been classed as a sub-type of the Uniform Resource Identifier (URI) (http://en.wikipedia.org/wiki/Uniform_resource_identifier). However, the “contemporary view” is slightly different: (http://www.w3.org/TR/uri-clarification/).

²²⁶ http://en.wikipedia.org/wiki/Hypertext_Markup_Language. See also http://www.w3.org/TR/1999/REC-html401-19991224/ and for HTML5 see http://www.w3.org/html/wg/drafts/html/master/.

²²⁷ http://en.wikipedia.org/wiki/Hyperlink.

²²⁸ For a more formal description of the Semantic Web and access to specifications for its major components see http://www.w3.org/standards/semanticweb/. For a more accessible overview with additional pointers to Semantic Web components and applications see http://en.wikipedia.org/wiki/Semantic_web.

²²⁹ The hypertext community has also generated technologies for information structuring, including taxonomic hypertext (Millard, Moreau, Davis, & Reich, 2000), open hypermedia (Østerbye & Wiil, 1996), spatial hypertext (Marshall & Shipman, 1995), structural computing (Nürnberg, Leggett, & Schneider, 1997), and Xanalogical structure (Theodor H. Nelson, 1999).

²³⁰ Berners-Lee, Hendler, J.A, & Lassila, O, n.d., p. 1.

²³¹ Berners-Lee, 2005.

²³² For more on RDF see the “RDF Primer” of W3C (http://www.w3.org/TR/rdf-primer/) and for a hands-on experimentation, try the W3schools.com RDF tutorial (http://www.w3schools.com/rdf/). For an overview with pointers to additional information try http://en.wikipedia.org/wiki/XML.

²³³ Formal specifications for XML can be found with W3C (http://www.w3.org/XML/). But for a hands-on tutorial see http://www.w3schools.com/xml/. And for a more accessible overview with pointers to follow-on information, try http://en.wikipedia.org/wiki/XML.

²³⁴ For the formal specification of RDFS see http://www.w3.org/TR/rdf-schema/; for a more informal overview with several good references for greater exploration see http://en.wikipedia.org/wiki/RDFS.

²³⁵ For an overview on ontologies in the context of the Semantic Web effort see (Horrocks, 2008). For an example of an ontology see Sowa’s work, http://ontology4.us/english/Ontologies/Upper-Ontologies/Sowa%2520Ontology/index.html. For one person’s efforts to render his personal ontology see http://personalontology.wordpress.com/.

²³⁶ For formal specifications relating to OWL see http://www.w3.org/TR/owl2-overview/; for a more informal overview with pointers to additional information see http://en.wikipedia.org/wiki/Web_Ontology_Language.

²³⁷ http://en.wikipedia.org/wiki/Ontology_(information_science).

²³⁸ http://en.wikipedia.org/wiki/SPARQL.

²³⁹ Berners-Lee et al., n.d., p. 5.

²⁴⁰ http://www.economist.com/node/8134382.

²⁴¹ http://en.wikipedia.org/wiki/DBpedia. See also http://en.wikipedia.org/wiki/Freebase.

²⁴² http://en.wikipedia.org/wiki/FOAF_(software).

²⁴³ http://en.wikipedia.org/wiki/GoPubMed.

²⁴⁴ For an explanation for how semantics and an assessment of meaning can improve search effectiveness see Haller, 2010.

²⁴⁵ http://www.economist.com/node/9716955.

²⁴⁶ http://wesabe.com/.

²⁴⁷ William Jones, 2012.

²⁴⁸ See, for example, Catarci, Dong, Halevy, & Poggi, 2007; Chaffee & Gauch, 2000; Dieng & Hug, 1998; Haase, Hotho, Schmidt-Thieme, & Sure, 2005; Horrocks, 2008; Huhns & Stephens, 1999; Katifori et al., 2008. See also Völkel & Haller, 2009. PIMO (http://www.dfki.uni-kl.de/~sauermann/2006/01-pimo-report/pimOntologyLanguageReport.html; http://www.semanticdesktop.org/ontologies/pimo/) is an effort to support the construction of ontologies in the context of Nepomuk (http://en.wikipedia.org/wiki/NEPOMUK_(framework), http://nepomuk.semanticdesktop.org/), a semantic desktop effort.

²⁴⁹ Davies et al., 2006

²⁵⁰ See “No knowledge but through information” (William Jones, 2010).

²⁵¹ Katifori et al., 2008, p. 3.

²⁵² http://en.wikipedia.org/wiki/Semantic_desktop. See also http://www.semanticdesktop.org/. For more scholarly treatment, see Chirita, Gavriloaie, Ghita, Nejdl, & Paiu, 2005; Decker & Frank, 2004; Groza, Handschuh, & Moeller, 2007; Sauermann & Heim, 2008; Sauermann, 2005a, 2005b; Sauermann et al., 2006; Sauermann, Bernardi, & Dengel, 2005.

²⁵³ Sauermann & Heim, 2008, p. 467.

²⁵⁴ See, for example, Franz, Ansgar, & Staab, 2009. Of course, evaluation is not only used to demonstrate the value of a prototype or its approach. Evaluation can also be very useful in directing the design and refinement of a prototype. For an example of such an evaluation as applied to the Gnowsis Semantic Desktop prototype see Sauermann & Heim, 2008.

²⁵⁵ E. Adar, Karger, & Stein, 1999; Karger, Bakshi, Huynh, Quan, & Sinha, 2005a; Quan, Huynh, & Karger, 2003.

²⁵⁶ Karger, Bakshi, Huynh, Quan, & Sinha, 2005b.

²⁵⁷ Karger, 2007.

²⁵⁸ McCool, 2005.pp. 88 and 86.

²⁵⁹ Horrocks, 2008.

²⁶⁰ Aït-Kaci, 2009.

²⁶¹ http://haystack.csail.mit.edu/blog/2010/10/20/why-all-your-data-should-live-in-one-application/.

²⁶² Katifori et al., 2008.

²⁶³ McCool, 2005, p. 88.

²⁶⁴ Singh, 2002, p. 1.

²⁶⁵ http://www.well.com/~doctorow/metacrap.htm.

²⁶⁶ See, for example, Sean B. Palmer explanation of his decision to “ditch it” after eight years of effort on the Semantic Web (http://inamidst.com/whits/2008/ditching).

²⁶⁷ http://www.w3.org/DesignIssues/LinkedData.html. See also http://en.wikipedia.org/wiki/Linked_data, http://www.w3.org/TR/2013/WD-ldp-20130730/, http://en.wikipedia.org/wiki/Linked_Data#cite_note-DesignIssues-2 http://www.w3.org/wiki/LinkedData and Heath & Bizer, 2011.

²⁶⁸ McCool, 2006, p. 96.

²⁶⁹ Similar arguments can be made in relation to technologies for information structuring generated by the hypertext community including taxonomic hypertext (Millard et al., 2000), open hypermedia (Østerbye & Wiil, 1996), spatial hypertext (Marshall & Shipman, 1995), structural computing (Nürnberg et al., 1997), and Xanalogical structure (Theodor H. Nelson, 1999). While a compelling case can be made for each of these initiatives, none (to my knowledge) has scaled beyond prototypes or very small-scale deployments. More recent work has explored ways to realize some of the goals of these hypertext initiatives in ways that constraints and required services to a minimum (K. M. Anderson, 2005).

²⁷⁰ http://haystack.csail.mit.edu/blog/2013/06/10/keynote-at-eswc-part-3-whats-wrong-with-semantic-web-research-and-some-ideas-to-fix-it/.

²⁷¹ Singh, 2002, p. 2.

²⁷² See also community efforts toward a wiki-style authoring and shared use of a knowledge base such as Wikidata (http://www.wikidata.org/wiki/Wikidata:Main_Page) and Semantic MediaWiki (http://en.wikipedia.org/wiki/Semantic_MediaWiki; http://www.semantic-mediawiki.org/wiki/Semantic_MediaWiki). (And for comparisons between the two initiatives see http://semantic-mediawiki.org/wiki/FAQ#What_is_the_
relationship_between_Semantic_MediaWiki_and_Wikidata.3F).

²⁷³ Image downloaded from http://milicicvuk.com/blog/2011/07/21/problems-of-the-rdf-syntax/.

²⁷⁴ http://en.wikipedia.org/wiki/Microformats. See also http://knowledge.wharton.upenn.edu/article.cfm?arti-cleid=1247 (“What’s the Next Big Thing on the Web? It May Be a Small, Simple Thing—Microformats” for a discussion that motivates not only Microformats but also RDFa and Microdata formats).

²⁷⁵ See http://en.wikipedia.org/wiki/RDFa and http://www.w3.org/TR/xhtml-rdfa-primer/#html-vs.-xhtml.

²⁷⁶ http://en.wikipedia.org/wiki/Microdata_(HTML). For more technical detail, see http://www.w3.org/TR/microdata/, http://microformats.org/wiki/microdata and http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#microdata.

²⁷⁷ http://en.wikipedia.org/wiki/Schema.org/ http://schema.org/.

²⁷⁸ see http://schema.org/docs/schemas.html.

²⁷⁹ Although the website expresses the intention to track RDFa and Microformats for possible support later (http://schema.org/docs/faq.html#14). See also documentation on Google’s support for “rich snippets” (http://googlewebmastercentral.blogspot.com/2009/05/introducing-rich-snippets.html)—also expressed using Micro-data format.

²⁸⁰ http://ocoins.info/; see also http://en.wikipedia.org/wiki/COinS and http://epub.mimas.ac.uk/openurl/KEV_Guidelines-200706.html#sect5_4.

²⁸¹ http://tools.ietf.org/html/rfc6350.

²⁸² http://en.wikipedia.org/wiki/VCard.

²⁸³ http://en.wikipedia.org/wiki/HCard, for a more technical description, see, http://microformats.org/wiki/hcard.

²⁸⁴ “An Uber-comparison of RDFa, Microdata and Microformats, The Beautiful, Tormented Machine,” http://manu.sporny.org/2011/uber-comparison-rdfa-md-uf/. See also http://en.wikipedia.org/wiki/Semantic_HTML http://programmers.stackexchange.com/questions/166612/schema-org-vs-microformats, http://blog.foolip.org/2009/08/23/microformats-vs-rdfa-vs-microdata/, http://ablognotlimited.com/articles/microformats-html5-microdata, http://stackoverflow.com/questions/14307792/what-is-the-relationship-between-rdf-rdfa-microformats-and-microdata, http://stackoverflow.com/questions/2986918/microformats-rdf-or-microdata, http://blog.teamtreehouse.com/writing-semantic-microformats-amp-microdata-in-html-markup and http://evan.prodromou.name/RDFa_vs_microformats. For a concise but incomplete history of microformats (as of my access on August 10, 2013, history stops at 2005) see http://microformats.org/wiki/history-of-microformats.

²⁸⁵ http://www.semclubhouse.com/microformats-rdfa-or-micro-data/.

²⁸⁶ https://support.google.com/webmasters/answer/99170.

²⁸⁷ Ronallo, 2012.

²⁸⁸ Read “The New York Times Blunders Into Linked Data, Pillages Freebase and DBPedia” (http://go-to-hellman.blogspot.com/2009/10/new-york-times-blunders-into-linked.html) for an example of the “wrong” use of RDF. The article also illustrates the intricacies of RDF and imparts, I think, a greater understanding for how easily mistakes can be made in its use.

²⁸⁹ http://www.w3.org/TR/rdfa-lite/.

²⁹⁰ http://manu.sporny.org/2012/mythical-differences/.

²⁹¹ See, for example, the series of posts by Vuk Miličić (http://milicicvuk.com/blog/2011/07/19/ultimate-problem-of-rdf-and-semantic-web/, http://milicicvuk.com/blog/2011/07/21/problems-of-the-rdf-syntax/, http://milicicvuk.com/blog/2011/07/16/problems-of-the-rdf-model-literals/).

²⁹² http://en.wikipedia.org/wiki/HCard. See for example, the listing of “Specific Microformats” at http://en.wikipedia.org/wiki/Microformat#Specific_microformats.

²⁹³ See http://schema.org/Person and http://en.wikipedia.org/wiki/HResume, respectively.

²⁹⁴ Qu & Furnas, 2005, 2008.

²⁹⁵ http://en.wikipedia.org/wiki/Latin_Quarter_(nightclub).

²⁹⁶ As discussed in the section on saving and also in Chapter 3 of Part 1 to this book.

²⁹⁷ Selective sharing in this manner might use features of existing services such as Google+ (e.g., “circles”) and Facebook (e.g., “friend lists”) (See for example, http://timwhitlock.info/blog/2011/08/circles-vs-friend-lists/, http://www.webpronews.com/facebook-sharing-2011-08, http://www.zdnet.com/blog/facebook/facebook-engineers-bring-google-circles-to-facebook/1885, and http://www.googleplusdaily.com/2013/02/differences-google-plus-facebook.html#.UgfR2ZK1GSo)

²⁹⁸ See for example, http://www.howtogeek.com/howto/15677/zen-and-the-art-of-file-and-folder-organization/ and http://www.pcmag.com/article2/0,2817,2385612,00.asp.

²⁹⁹ http://www.ehow.com/how_138030_top-college-university.html.

³⁰⁰ http://michaelbluejay.com/house/index.html.

³⁰¹ http://www.soyouwanna.com/site/syws/marathon/marathon.html.

³⁰² http://jobsearch.about.com/od/findajob/tp/tensteps.htm.

³⁰³ As we invest more in the structuring of our information we may find application from tools such as IMapping (http://semanticweb.org/wiki/IMapping) which were originally developed in the context of research connected to the Semantic Web initiative.

³⁰⁴ XooML (pronounced “zoom’l”) stands for Cross (X) Tool Mark-up Language. XooML was briefly discussed in this book in Chapter 4 (Part 1, in section 4.1, see especially Figure 4.2).

³⁰⁵ http://keepingfoundthingsfound.com/xooml.

³⁰⁶ William Jones, Anderson, & Whittaker, 2012; William Jones & Anderson, 2011; William Jones, 2011.

³⁰⁷ People involved in the development of XooML and related work over the years include: Dawei Hou, Deen Sethanandha, Sheng Bi, Zhiyong Xie, Jasper Bleijs, Lizhang Sun and Cody Stebbins.

³⁰⁸ We do use JSON for the communication of information to and from applications that use XooML. There has been discussion over the years about introducing namespace conventions to JSON. If this occurs, we may consider a XooML-approach using JSON. For more on the discussion to support namespaces in JSON see http://davidchuprogramming.blogspot.com/2011/10/jsonnet-issue-does-not-support.html, https://www.p6r.com/articles/2010/04/05/xml-to-json-and-back/, http://www.mnot.net/blog/2011/10/12/thinking_about_namespaces_in_json.

³⁰⁹ The “tool-accommodating” extensions provided for in the XooML schema (as namespace elements) are designed to make it possible for any number of applications to work with the same structure. The more applications that can do so, the more “first class” the representation of structure. But there is a more basic reason to provide for application-specific namespace elements (at both the level of a fragment and at the level of an association): There always will be an aspect of our information interaction that is dependent upon and fundamentally intertwined with the specific tools that we use. Furthermore, this tool-dependent aspect of the information interaction cannot be fully explicated in a way that allows for its preservation separate from the tools.

³¹⁰ Namespace elements are uniquely identified via a URI (assigned as the value of the xmlns attribute). For more on namespace conventions of XML see http://www.w3schools.com/xml/xml_namespaces.asp (for a tutorial), http://stackoverflow.com/questions/1181888/what-does-xmlns-in-xml-mean (for some nice examples of use), http://www.w3.org/TR/REC-xml-names/ (for W3C definitive source), and http://www.ibm.com/developerworks/xml/library/x-nmspace.html (for a detailed developer-centered but very accessible overview). We might say that through the data stored in namespace elements (as interpreted by the application), the grouping item (or any of its associations) can differential in the manner of stem cells to assume the behavior of different item types such as to-dos, appointments, contacts, references, etc. This is the notion of “notions” as described in Chapter 4 (Part 1, “Sometimes a small notion”). Alternatively, namespace elements can be seen to provide a basis for flexible use of many different schemas as argued for by Karger (http://haystack.csail.mit.edu/blog/2013/06/05/keynote-at-the-european-semantic-web-conference-part-1-the-state-of-end-user-information-management/, http://haystack.csail.mit.edu/blog/2013/06/06/keynote-at-eswc-part-2-how-the-semantic-web-can-help-end-users/).

³¹¹ We might say that through the data stored in namespace elements (as interpreted by the application), the grouping item (or any of its associations) can differential in the manner of stem cells to assume the behavior of different item types such as to-dos, appointments, contacts, references, etc. This is the notion of “notions” as described in Chapter 4 (Part 1, “Sometimes a small notion”). Alternatively, namespace elements can be seen to provide a basis for flexible use of many different schemas as argued for by Karger (http://haystack.csail.mit.edu/blog/2013/06/05/keynote-at-the-european-semantic-web-conference-part-1-the-state-of-end-user-information-management/, http://haystack.csail.mit.edu/blog/2013/06/06/keynote-at-eswc-part-2-how-the-semantic-web-can-help-end-users/).

³¹² In this respect, folders as a grouping item are especially plastic. Given the right information within its associated XooML.xml file (in namespace elements), a folder can be made to appear in many different ways.

³¹³ For more on multidigraphs see http://en.wiktionary.org/wiki/multidigraph (for simple definition), http://networkx.github.io/documentation/latest/reference/classes.multidigraph.html (for more formal definition) or http://en.wikipedia.org/wiki/Multigraph#Directed_
multigraph_.28edges_without_own_identity.29 (for a nice explanation for how multidigraphs relate to other graph forms). Or, for even more on graph theory, try Bollobas, Bela; Modern Graph Theory, Springer; 1st edition (August 12, 2002). ISBN 0-387-98488-7. XooML is simple but extremely flexible as a means for representing structure. However, in the context of graph theory, its flexibility is not without limit. We note, for example, that XooML is not well-suited to represent undirected multigraphs nor, equivalently, to represent a graph in which links are bi-directional. On the other hand, efforts over the years to support bi-directional links in hypertexts (e.g., Project Xanadu, http://en.wikipedia.org/wiki/Project_Xanadu) have had difficulty gaining widespread adoption whereas the very distributed, decentralized Web, even with its one-way hyperlinks that frequently break (“404 Not Found”) is succeeding brilliantly.

³¹⁴ For more on hypergraphs see http://mathworld.wolfram.com/Hypergraph.html (for simple definition) or http://en.wikipedia.org/wiki/Hypergraph (for more elaborate explanation and references).

³¹⁵ http://milicicvuk.com/blog/2011/07/19/ultimate-problem-of-rdf-and-semantic-web/.

³¹⁶ “From French provenance (‘origin’), from Middle French provenant, present participle of provenir (‘come forth, arise’), from Latin provenio (‘to come forth’).” (http://en.wiktionary.org/wiki/provenance) as in the source of an artifact (e.g., place, time or history of “ownership”).

³¹⁷ For discussions of an expiration date for information see, for example, http://bigthink.com/videos/should-information-have-an-expiration-date and http://sloanreview.mit.edu/article/should-information-have-an-expiration-date/.

³¹⁸ For more about the use of reification and issues of provenance for RDF statements see (Dividino, Sizov, Staab, & Schueler, 2009; Hartig, 2009; Jensen et al., 2010).

³¹⁹ A distinction is also often made between “classification” (i.e., a classification scheme) and “taxonomy” especially in the field of library and information science. See, for example, http://www.aiim.org/community/wiki/view/Classification-and-Taxonomy. What’s the difference? I found especially helpful the blog post by Heather Hedden, author of The Accidental Taxonimist, in which she reviews distinction between uses of the two terms and summarizes: “Classification is for: where to put things/where does this document or item go. Taxonomy is for: how to describe content/what is this text, image, or other media about”. By this distinction we might say that a file system is a classification whereas a system of tagging is a taxonomy. However, as soon as we introduce the support for shortcuts (“aliases”, “links”) to a file system, however poorly supported, we also introduce the possible use of a folder to “tag” (via shortcut) a file or folder that is not strictly “contained” under the folder—and, so, by Hedden’s distinction, a folder structure can become a taxonomy.

³²⁰ http://en.wiktionary.org/wiki/ontology.

³²¹ http://www.oed.com/view/Entry/131551?redirectedFrom=ontology#eid.

³²² http://ewonago.wordpress.com/tag/etymology-of-taxonomy/.

³²³ http://www.oed.com/view/Entry/198305?redirectedFrom=taxonomy#eid.

³²⁴ See “No knowledge but through information” (William Jones, 2010). We often hear discussions of data, information, knowledge and even “wisdom”. But how do these terms relate? Simply put, we might say that information is data “in motion” (i.e. communicated to someone). Knowledge then, is information “in action” (i.e. apparent only indirectly through its impact on behavior). And then, wisdom is knowledge “in perspective” (to know, for example, the limits of our knowledge).

³²⁵ But for a reasonably accessible discussion see http://ethicalpolitics.org/seminars/neville.htm; see also the Wikipedia articles on each, http://en.wikipedia.org/wiki/Ontology and http://en.wikipedia.org/wiki/Epistemology.

³²⁶ See http://en.wikipedia.org/wiki/Ontology_(information_science)#cite_note-50.

³²⁷ Guarino, Oberle, & Staab, 2009.

³²⁸ http://en.wikipedia.org/wiki/Rooted_tree#rooted_tree.

³²⁹ See, for example, the discussion at http://en.wikipedia.org/wiki/Taxonomy_(general)#cite_note-5.

³³⁰ Gruber, 2009, see also http://en.wikipedia.org/wiki/Ontology_(information_science)#cite_note-50>.

³³¹ From http://semanticweb.com/leveraging-owl-dl-sparql-and-xslt-to-automate-java-agent-configuration_b10690.

³³² Excerpted from http://www.obitko.com/tutorials/ontologies-semantic-web/owl-example-with-rdf-graph.html.

³³³ I experienced these costs first hand when, while working at Boeing in the early 1980s, I was tasked to join a team to do an on-site, hands on evaluation of CYC as part of a visit to Austin, TX. (For an entry point into more information about CYC see http://en.wikipedia.org/wiki/Cyc and also www.cyc.com).

³³⁴ Marcia J. Bates, 2002.

³³⁵ See for example, K. M. Anderson, Sherba, & Lepthien, 2002; K. M. Anderson, Taylor, & E. James Whitehead, 1994; H. Davis, Hall, Heath, Hill, & Wilkins, 1992; Karousos, Pandis, Reich, & Tzagarakis, 2003.

³³⁶ See, for example, K. M. Anderson, Sherba, & Lepthien, 2003; K. M. Anderson, 2005; Nürnberg, Wiil, & Hicks, 2004.

³³⁷ K. M. Anderson, 2005.

³³⁸ Karger, 2007.

³³⁹ http://en.wikipedia.org/wiki/UTF-8.

³⁴⁰ W. Jones, 2007, Chapter 14, “Bringing the Pieces Together.”

³⁴¹ The making up or composition of a whole by adding together or combining the separate parts or elements; combination into an integral whole: a making whole or entire. http://dictionary.oed.com/cgi/entry/50118573?single=1&query_type=word&queryword=integration&first=1&max_to_show=10.

³⁴² This process is sometimes referred to as tansclusion—a term coined by Ted Nelson (1982).

³⁴³ http://haystack.csail.mit.edu/blog/2010/10/20/why-all-your-data-should-live-in-one-application/.

³⁴⁴ Dropbox requires that we move information to be under a designated Dropbox folder in our file system—a local move. Even better would be if information needn’t move at all. Instead we would simply designate the folders—any folders—to be shared through Dropbox. The ability to do so may eventually be supported by Dropbox. In the meantime, workarounds are being developed. See, for example, http://www.apartmenttherapy.com/how-to-sync-any-local-folder-t-139040.

³⁴⁵ See keepingfoundthingsfound.com.

³⁴⁶ See http://en.wikipedia.org/wiki/IOS and http://www.apple.com/ios/.

³⁴⁷ Cody Stebbins, a junior in the Informatics program, and Lizhang Sun, a graduating student of the MSIM program, did tremendous work to complete a zootility code base in JavaScript in time for the class to use.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 8 Technologies to Structure Our Information

Create new playlist

Sign In

Sign Up

CHAPTER 8

Technologies to Structure Our Information

8.1 STRUCTURE, STRUCTURE EVERYWHERE . . . NOR ANY BITTO SHARE189