xmlmarc ranting
Dec. 18th, 2004 03:33 am![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Dorothea wrote a scathing piece on some of the problems in electronic cataloguing that I was going to respond to, but I realised my response was more of a spinoff than a reply, so it'll be here instead.
Caveat: I've been absorbed in schoolwork. I have not been following the myriad projects combining technology and cataloging. It's entirely possible that the rant I'm about to make is Old News, years old. I know there are other DTDs out there I haven't investigated.
Note for the non-librarians: MARC -- Machine Readable Cataloguing -- is a 30-year-old format which encodes bibliographic information in a way that's can be read by computer. The step inevolution before MARC was the card catalogue, so at the time it was a massive advance. But it can get a little fuggly
When I took my Cataloging class, I decided to do my term project on MARCXML. MARC, in my techie's opinion, was a cute work-around from less technological days, but clearly outdated in this day and age. I was jazzed by the notion of using the powers of XML to do an intelligent and flexible encoding of cataloging data. I imagined something like this:
That would break up each element into a completely machine parseable entity, ready for display in MARC format. Perfect! Handy, useful, easy to convert existing MARC records into XML and XML records back into MARC or any other format. Instead, here's what the Library of Congress schema actually calls for:
What's wrong with this encoding? Let me count the ways. Firstly, the fact that MARC, while machine-readable, is not particularly human readable, is a side effect of technological limitations which are no longer in place. Once upon a time it made sense to name your machine-readable fields "100" with single-letter subfield codes. Now is not that time. For goodness' sake, it's a positive abuse of XML to have a number of data fields (all named"datafield") tagged with the MARC number. And you kept the meaningful spacing and punctuation. By the ghost of S. R. Ranganathan, people, this is not Fortran.
Ach, this project made me cry.
Caveat: I've been absorbed in schoolwork. I have not been following the myriad projects combining technology and cataloging. It's entirely possible that the rant I'm about to make is Old News, years old. I know there are other DTDs out there I haven't investigated.
Note for the non-librarians: MARC -- Machine Readable Cataloguing -- is a 30-year-old format which encodes bibliographic information in a way that's can be read by computer. The step inevolution before MARC was the card catalogue, so at the time it was a massive advance. But it can get a little fuggly
When I took my Cataloging class, I decided to do my term project on MARCXML. MARC, in my techie's opinion, was a cute work-around from less technological days, but clearly outdated in this day and age. I was jazzed by the notion of using the powers of XML to do an intelligent and flexible encoding of cataloging data. I imagined something like this:
<WORK> <MAIN TYPE="personal_name" FORMAT="forename"> <PERSONAL_NAME>Avi</PERSONAL_NAME> <DATE TYPE="start">1845</DATE> <DATE TYPE="end">1999</DATE> </MAIN> ... </WORK>
That would break up each element into a completely machine parseable entity, ready for display in MARC format. Perfect! Handy, useful, easy to convert existing MARC records into XML and XML records back into MARC or any other format. Instead, here's what the Library of Congress schema actually calls for:
<record> ... <datafield tag="100" ind1="1" ind2=" "> <subfield code="a">Sandburg, Carl,</subfield> <subfield code="d">1878-1967.</subfield> </datafield> <datafield tag="245" ind1="1" ind2="0"> <subfield code="a">Arithmetic /</subfield> <subfield code="c">Carl Sandburg ; illustrated as an anamorphic adventure by Ted Rand.</subfield> ... </record>
What's wrong with this encoding? Let me count the ways. Firstly, the fact that MARC, while machine-readable, is not particularly human readable, is a side effect of technological limitations which are no longer in place. Once upon a time it made sense to name your machine-readable fields "100" with single-letter subfield codes. Now is not that time. For goodness' sake, it's a positive abuse of XML to have a number of data fields (all named"datafield") tagged with the MARC number. And you kept the meaningful spacing and punctuation. By the ghost of S. R. Ranganathan, people, this is not Fortran.
- Give each field type (authority, title, etc) its own field, to start. It's readable, it's portable, and there's no reason not to.
- Name the fields. Please. Please. Don't name your field "245". Name it "Title". Readable code is everything.
- Under no circumstances should the meaningful spacing and punctuation exist. What on earth is the point of converting to XML if you're not going to take advantage of its power? A field for title and one for subtitle; a field for birth date and for death date. Use the tool you're in. You like the way MARC looks? Fine, write an XML to MARC converter and you can view the MARC to your heart's delight. But store your data in the extensible, human-readable, portable database. Please.
Ach, this project made me cry.
no subject
Date: 2004-12-18 12:25 pm (UTC)no subject
Date: 2004-12-18 02:20 pm (UTC)Because, yeah, MARCXML? Ugly. Ugly beyond ugliness. A true bit of grotesquerie.
no subject
Date: 2004-12-23 05:26 am (UTC)But why is MARCXML so bad? What was the LOC smoking?
no subject
Date: 2004-12-23 05:27 am (UTC)