"Byte" magazine on a copyright catalog card at Library of Congress, 30 April 2009
This competency is the reason I chose to study library science. Since at least 2600 BC ("History of Libraries", 2015), librarians have been organizing information so that it can be found again. I am interested in archiving and retrieving datasets, and I assumed (correctly) that the concepts and methods of organizing books would apply, in principle and possibly some detail, to organizing datasets.
Information cannot be used unless it can be found. Finding the right information is not easy; any attempt to search for an unpopular product on Google whose name is a common English language noun or verb makes that obvious. In order to help users find what they need, librarians (and data curators) need to classify items (provide metadata) and catalog them. These activities are as much an art as a science; while there are rules and best practices to help, they require imagination, creativity, common sense, a deep understanding of the user base, an understanding of potential non-typical users (edge cases), and a passion for making information as findable as possible.
I suspect mastering these skills will take a lifetime of practice, but in my studies I have learned the basics and have delved into the details of some aspects. Getting the organization of information "right", in the sense that users find what they were looking for, is the heart of library science, to me, and a required ingredient in becoming a great data curator.
Top[8]
Before I attended iSchool, I stumbled around elements of this competency without really understanding how to do it well. As described elsewhere, I built my first database in the late 80s; it was a relational database, so I had a passing understanding of creating primary keys and indices. In high school and college, I had many temporary secretarial jobs, where I learned the importance of filing paperwork in such a manner that it could be found again. In fact, my personal paper filing system is so suited to the vagaries of my own thinking that I can almost always find what I want on the first try.
At my first job after graduation, I learned about the mysteries of data encoding, which is a low-level of classification in much the way that assembly language is a very elemental version of human-readable programming languages. At a job where I wrote and produced a technical manual, I learned the value of an index the hard way (I didn't think to put one in, to the disappointment of the manual's users). By the time I was working at NASA's Goddard Space Flight Center, I had a fairly consistent system of directory structures and daily log entries for my personal work use. When I worked on Furl, I entered the chaotic age of tagging by trying to craft the best tags for my own personal retrieval (much less effective than my paper filing system) and watching the tags our clients used. At Cytobank, our documents and communications are scattered across multiple platforms: for instance, meeting agendas and logs are kept on Evernote or Confluence or Google Docs, which is a Wild West-sort of view of How Not To Organize Company Documents. (Luckily, our customer's experiment data is much easier to find via the GUI…though how it is organized in directories on our servers is best left undiscussed).
At iSchool, multiple classes that I've taken have touched on organizing issues: "Information Retrieval" (LIBR202), "Vocabulary Design" (LIBR247), "Beginning Cataloging and Classification" (LIBR248), and "Seminar in Contemporary Issues: Metadata" (LIBR281). I was fascinated by the Anglo-American Cataloguing Rules (AACR2) and the Library of Congress Subject Headings (LCSH) in LIBR248, and we explored faceted classification in more depth in LIBR247. I studied a tangled system of standards for data quality metadata in LIBR284, and I created an XML structure for a somewhat obscure and awkward data format in LIBR281. Thinking about organizing information for retrieval has been a part of most of my iSchool classes. I am much more prepared to organize datasets well than before I began my MLIS degree.
Top[8]
Instructions for Cataloging & Classification final project
Cataloging & Classification final project (group)
"Beginning Cataloging and Classification" (LIBR248) was a demanding class and I earnedthe only A-minus grade I've received in iSchool (in all other classes, I have As). Unlike most LIS classes, in Dr. Ellett's class, answers were either correct or not. I found this class interesting and difficult. I present the final project from that class, which was a joint effort of a group of four students. We met almost every week of that class, and the resulting project was truly the result of equal work between all of us, as we discussed and debated all classifications and helped each other figure out the proper catalog elements according to AACR2. While I didn't memorize the rules, I understand them well enough that I believe I would be certainly be able to do copy cataloging if I was a librarian and would probably just need a few hours of refreshing in order to do original cataloging.
From "Vocabulary Design" (LIBR247), I present an assignment that was the culmination of several assignments teaching us how to select and index key terms. From a set of 15 subject statements provided by Dr. Zhang, I created both a classified and an alphabetized index, following the format of section 6 of the American National Standards Institute (ANSI)/National Information Standards Organization (NISO) ANSI/NISO Z39.19-2005 standard (NISO, 2010, pp. 20-36) and standardized to terms from either the Association for Information Science and Technology (ASIS&T) Thesaurus of Information Science and Librarianship or the Library Literature and Information Science Full Text database thesaurus (n.d.) where possible. This was an excellent experience with digging deeply into the meanings of words and phrases and thinking hard about how to organize them to best facilitate retrieval.
Paper: "IMMA: Making Centuries of Climate Data Usable"
For "Resources and Information Services in the Disciplines and Professions: Science and Technology" (LIBR220), I undertook a project where I transcribed a whaler's log and extracted data to track Arctic sea ice in May, 1851. The Old Weather project is working on similar log transcriptions, and they are putting the resulting data into International Maritime Meteorological Observation (IMMA) format. The IMMA schema is a way to code both data and metadata. If I had continued working on the sea ice project, I would have had to put the data I created into the IMMA format, so in "Seminar in Contemporary Issues: Metadata" (LIBR281) I did research on the schema, reviewing its history, examining the details of its format, and evaluating it. Eventually I would create an XML format to try to make the IMMA schema more usable (presented in Comptency 8).
Paper: "Marine Meteorological Data Formats"
In "Seminar in Contemporary Issues: Metadata" (LIBR281), building on my investigation into the IMMA schema, I carried out an expanded review of marine meteorological metadata schemas in general. Because I plan to work with climatology datasets, it is equally likely that I will work with ocean-oriented datasets as atmosphere or land datasets. Being familiar with these metadata schemas will undoubtedly be useful in my future career.
Top[8]
In the seven years it has taken me to earn my MLIS, I have done a lot of studying and thinking about the ways library science has developed to organize information. I have a solid grasp of the basics, the nuts and bolts of things like "how to catalog a book in the Dewey Decimal System". I am extremely familiar with and adept at creating and using metadata, and I am passionate about standards. However, I do not feel that I have yet digested all of these ideas so that organization methods are ingrained in me, as automatic as merging onto a highway after 33 years of driving. I plan to continue to read about, contemplate, and practice these methods in my career, and I think with regular daily use as a data curator the overall picture–the meta concepts, if you will–will become more than a vague outline in my brain. This degree has given me all the ingredients for eventually grokking the gestalt of information organization, and I look forward to the process.
Top[8]
Browsing: Library, Information Science & Technology Thesaurus. (n.d.). EBSCO Host. Retrieved from http://web.b.ebscohost.com.libaccess.sjlibrary.org/ehost/thesaurus?sid=384d5c53-1521-4904-91b8-3a7e688d74bc%40sessionmgr110&vid=78&hid=108
History of libraries. (2015, February 7). Retrieved from Wikipedia: http://en.wikipedia.org/wiki/History_of_libraries
Holley, M. (2009, April 30). Copyright Byte 2 [Online image]. Retrieved from http://commons.wikimedia.org/wiki/File:Copyright_Byte_2.jpg
National Information Standards Organization (NISO). (2010, May 13). Guidelines for the construction, format, and management of monolingual controlled vocabularies (pp. 42-57). Baltimore, MD: National Information Standards Organization (NISO)
Last updated: Friday, April 17, 2015
Back to top