This is next in a series of related posts on Fun Tech Stuff going under the hood with XML and its uses in learning technologies: Learning Management Systems, SCORM, Tin Can, Metadata, and XML, as well as examples of XML in SCORM and XML in eBooks.
Metadata is information about information. Metadata typically describes an information resource (e.g., a Web page, an image, a document, an e-learning module, an ebook) by specifying its creation, purpose, author, location, and any other data helpful in utilizing it. For example, an image may contain information about its size and resolution, and a document may contain information about its length. Both the SCORM and Tin Can specifications are at heart metadata, since they define how to specify information (e.g., module structure, navigation) about a resource (e.g., the e-learning course) in an organized fashion.
Early Metadata
Metadata is not a new concept; librarians have been using it for years in card catalogs to manage their book collections. Almost every book on the market in the United States today has metadata associated with it in the form of Library of Congress Cataloging-in-Publication information, including author, title, ISBN, subject category and subcategories, number of pages, length of the physical book, Library of Congress Control Number, and cataloging numbers.

- Early Metadata
Unlike the card above, of course, today this information lives in servers and is easily searchable, not only because it’s computerized, but because the Library of Congress previously defined the types of information it thought necessary to collect about each book, allowing searches within a specific category. For example, if you wanted to find a book by an author named Joe Homer, you could search on “author,” and skip all the other books about the Greek poet Homer, the books about Homer Simpson, and books about famous home runs in baseball history. Just specifying that the word Homer is a part of the author’s name, and not a part of a title or a subject, makes your search that much more effective. Thus, a key component of any type of metadata is a specification of the individual types of information to be collected, and a definition of each.
Metadata on the Web
Wouldn’t it be nice if the Web itself had a real digital card catalog and not just a brute force text search, which often returns completely irrelevant (and occasionally quite entertaining) results? There actually are some starting to appear. A number of groups have developed and continue to evolve standards for the incorporation of metadata into Web sites. One of the first is called the Dublin Core Metadata Initiative; two other current standards of interest are Microformats and RDFa.
Let’s take a look at the Dublin Core. It is a set of descriptors that can be used to specify information about Web resources (e.g., video, images, Web pages) as well as books, CDs, and artworks. It consists of information about the resource’s content, intellectual property data, and version:
- Content information—Title, Subject and Keywords, Description, Resource Type (e.g., image, sound, text), Source, Relation (i.e., related resource), and Coverage (i.e., time period)
- Intellectual Property information—Creator, Publisher, Contributors, and Rights Management (e.g., copyright)
- Version information—Date, Format, Resource Identifier (e.g., URL), Language, Audience, Provenance, Rights Holder, Instructional Method, and data about how the resource has changed
Metadata standards such as the Dublin Core not only specify searchable types of information about a resource, they contain a clear definition of that information. A Dublin Core date, for example, is defined as YYYY-MM-DD, so 2014-03-09 means March 9th, not September 3rd. While this may appear trivial, the automated search of metadata by computers requires that the exact meaning of each data element be agreed upon, documented, stored on the Web, and linked to, so it is obvious what metadata definitions are being utilized.
Metadata and the Semantic Web
What’s the purpose of all of this metadata? One purpose is to help computers better understand the meaning behind a Web resource, so that they can do a better job helping you find what you want. Let’s say you do a search for the term “Chicago Fire.” Are you looking for the TV show, the conflagration that burned down the city in 1871, or the Chicago Fire soccer club? If a Web page contained information that better specified its meaning by noting that the Web page was about an event that occurred at a specific time (i.e., 1871), and the search engine was able to take advantage of it, then your search would return a much more meaningful set of results.
Another purpose of metadata is to enable computers to better communicate with one another as they do our bidding. Let’s say you are looking for the location of a monument in a park. If a Web site that contained that monument’s information also contained its precise latitude and longitude (e.g., 41.882406, -87.62382), that would be great, but your app would not necessarily know that those two numbers meant latitude and longitude. With the proper metadata—often just simple text tags in the Web site code—your app could then recognize and utilize that information, and present you with a map that showed the monument’s location as well as your own. The simple addition of metadata describing the meaning and context of the information on a Web site can make a huge difference in its usefulness.
When Tim Berners-Lee, the inventor of the Web, speaks of the Semantic Web, this is what he means: people and computers working more closely together to improve the meaning and the context of the information on the Web. He envisions people creating metadata to disclose meaning and context, and computers cranking through that metadata, processing it as appropriate.
In our next post, we’ll look at how such meaning and context can be added via XML.