April 24, 2008

A Pause

I’ve taken a pause in the O’Hara archive project, and this blog, but not because of disappointment about the O’Hara project. My pause is only coincidental with the bad reaction from the O’Hara estate. I’m pausing in the interest of maximum efficiency, because I had applied for, and recently learned I’ve been accepted in, a course on TEI at the Rare Book School at the University of Virginia. So I asked for and received an extension on this project, which is a for-credit independent study towards my PhD at Boston University. I’ve contracted to complete the O’Hara project by the end of July. Because I’ll be so much better prepared to complete it after the UVA course, which is in June. For one thing, I’ll finally have access to scanners. It’s been pretty hard to create a digital archive that includes facsimiles of every page of a fragile first edition paperback without access to equipment and advice. (My post “How not to scan” on March 8 described my woes.) And I am loaded with questions, many more than I would have needed answers to had I taken a TEI course before I began this project.

It’s very exciting to be going to UVA, even if it’s just for a week. I am so out of it here at Boston University—neither BU nor Boston generally is what you’d call a hotbed of activity for the TEI. How I wish I were at the University of Maryland, or at Brown, or in Canada. But this summer I’ll be at UVA! Home of the IATH, which spawned many of the most important online literary editions, the Rossetti Archive, the Blake Archive, the Whitman Archive, the Piers Plowman project. Where David Seaman was making scanners available in the e-text center for students to create their own digital projects in 1993. David Seaman is teaching the course I’m taking, which is only offered every couple of years I think. And it’s the Rare Book School, what fun to peek into that wonderful bookish world. A bit strange to be back at UVA too, I used to spend weekends there on blind dates with UVA frat boys, back in the day, when I was a student at a nearby women’s college.

So in the meantime, in the interest of maximum efficiency, I’m investigating nice out-of-copyright poets to find a good subject for my PhD dissertation. I know it will be an online edition, to include facsimiles of all witnesses of a work, and transcriptions, and annotations—like the test run I’m doing with Lunch Poems. I’d like to take a stab at a real edition next time though, not just an "archive" as in "a collection of documents." The edition as interface, as gateway into the wealth of documents.

March 31, 2008

A bad thing happened yesterday

I was contacted by the Frank O'Hara estate last night about posting this project online. I had a link to it here on this blog, and more obviously a link to it on my web site, literaryhistory.com. I added the public links in early March, just before I presented my paper on this project at the STS, and said in my paper the project would be accessible online for a month to view, but no longer since I did not have permission to publish FOH's poetry. I received an email from the estate executor last night, who was not very happy about what I did, understandably. It's not apparent why the poems are there; from the links on literaryhistory.com it looks like I am just blatantly violating copyright. I immediately took the links down of course. Previously the project was at a secret web address that I shared with my adviser, not publicly linked on my web site or anywhere else. Now I'm afraid to even make this material available at a secret link. Because it's a whole collection of files (image files, etc.) I can't easily send it to anyone as an email attachment just so that they can read it. I'm not sure what to do. I've written back to the estate apologizing and trying to explain, but haven't heard anything back from them.

March 27, 2008

Pointers, references, linking, divisions

The problem with teaching yourself something is that you can waste of time looking for the information you need in the wrong place. The basic principles of linking, which I was having such trouble understanding a few weeks ago, are covered in the Guidelines on the TEI web site, in Section 3.6 of the P5 Guideline.  Today I learned two ways to link, the one I actually need is the ref tag, which allows me to include a description. I started working on my table of contents for the City Lights edition, with links from the page number to the section in the text that has the poem. Unfortunately I’m using this generic stylesheet which is putting bullets next to list items and indenting them, which is not the way the City Lights edition is formatted, but until I tackle the stylesheet task I am stuck with this. At least I know how to put in links and anchors and page breaks now, and create an index in TEI. The index is tedious, more so by the fact that the new edition I’d like to consult has changed the page numbers and also the way the little .. .. .. looks. So I have to repeatedly consult my fragile first edition to see how I need to enter the dots, and to get the page numbers. I wish I had a scan I could work from, but I don't want to lay my nice edition flat to scan it. Here’s what the code looks like:

<list>

<item>Music .. .. .. .. .. .. .. .. <ref target="#p6">7</ref></item>

<item>Alma .. .. .. .. .. .. .. .. <ref target="#p7">8</ref></item>

<item>On Rachmaninoff's Birthday .. .. .. .. .. <ref target="#p10">11</ref></item>

<item>Poem .. .. .. .. .. .. .. .. <ref target="p11">12</ref></item>

 

I decide to move on to some other task, now that I know how to do this, the rest is just doggy work, I can come back to it.

Division questions. Should I use division type page or division type poem? For pages one through six I used division type page, but when a poem spans several pages, as “Alma” does, it creates problems. To close a division type page, I have to close a division mid-stanza, which requires me to close the line group mid-stanza. Surely I want to switch to division type poem once the poetry begins. I guess I should change everything. I wonder if there are any consequences.

March 26, 2008

Coding names and places

I’ve come to an important part of the TEI guidelines now for me to understand: the coding of names and places, Section 3.5. This issue seemed of interest and value in coding Frank O’Hara’s poetry, but so problematic that I’d concluded I shouldn’t attempt it. But armed with the TEI guidelines I’m feeling more secure about being able to navigate the ambiguities. The TEI has, in its usual thorough way, considered and classified many ways that names can appear in texts, and has a coding solution for a wide variety of situations. Also, one of the additional tags you can apply, called key, is a flag that this is a word that will have an annotation supplied. For the kind of searching and analysis that a student or scholar would do in a digital Frank O’Hara text, it seems like names of people, places, books, and paintings, New York locations, seasons and times of day, would attract the most interest. As the TEI says “Names, dates, and numbers are likely to be of particular importance to the scholar treating a text as source for a database.”

A repeated problem in coding FOH’s poetry is the free, creative way he refers to people, dates, places. The TEI describes the problems well, “firstly, there may be a need to encode a regularised form of a name, distinct from the actual form in the source to hand; secondly, there may be a need to identify the particular person, place, etc. referred to by the name, irrespective of whether the name itself is normalized or not.” These can in fact be coded, the TEI has a way. Place names are treated similarly to person names in TEI. But what of titles, of books, songs, paintings? I haven’t reached the section for coding those yet. In Section 3.5.4 they explain how to code partial or imprecise dates and times. It would be very useful to have all the date and time references in FOH’s poems available to pull up in a list and examine. One is so aware of the presence of time in his poems, but it’s expressed in such variable ways, how nice to not just count the occurances, but to be able to find them quickly, pull up a list of them, compare them side by side. I am getting a renewed interest in trying to code names, dates, and places in at least one of his poems to see the costs in terms of time expended and the benefits in terms of useful information provided.

Also read about ways to code abbreviations. That, along with coding editorial corrections, will be especially useful for working with manuscripts.

March 25, 2008

SIC

TEI coding for editorial corrections or normalizations seems less problematic than disambiguation, more obviously of use for the textual scholar. An editor can silently correct a word, or let the original error stand, but the best technique is to code the text to show both the correction and the original. In the following example, the text is transcribed only once, and the original and the corrected word are both available. It’s a great solution, the invisible “sic.” It preserves the information of sic without the rudeness. The editor can then design a stylesheet that would render present the text in any way she chose, supplying the corrected word in the main text and the original in an endnote is only one possible presentation of the encoded information. The code to signal such a situation is:

An <choice>

<corr cert="high">Autumn</corr>

<sic>Antony</sic>

</choice> it was,

That grew the more by reaping

The same text could be presented in the normalized version for student use and the original version for a different audience; stylesheet design governs which version the reader is presented with.

March 24, 2008

Disambiguation as a coding goal

There were several tasks that I could have addressed on my Lunch Poems project—learning more about scanning and images so that I could prepare page facsimiles, learning how to create a custom xslt style sheet to have better control of the appearance of my coded poems. But it seemed more important to gain a better understanding of the rules of the TEI. I’ve been learning only what I needed to know, as I go along, rather than reading the TEI guidelines from start to finish. There are a lot of concepts I don’t understand. For example, the various kinds of divisions, and when and how I can use them. When I should use a title tag and when I should use a subhead tag. I don’t yet have a grasp of what the coding choices are, and why one makes one choice as opposed to another. The xslt will only control the formatting of my choices. And the digital scanning issue seems like it might also be better postponed until I have access to equipment and people who know how to use the equipment. I’ve been looking into taking a class at the University of Virginia Rare Book School this summer that would cover these things. UVA is a center for electronic publishing and I know they have scanners that they’ve been making available to students for digital projects since 1993. The course this summer sounds like it’s covering everything that I am trying to teach myself, but there will be someone very experienced showing me the way. “A practical exploration of the research, preservation, editing, and pedagogical uses of electronic texts and images in the humanities. The course will center around the creation of a set of archival-quality etexts and digital images, for which we shall also create an Encoded Archival Description guide. Topics include: SGML tagging and conversion; using the Text Encoding Initiative Guidelines; the form and implications of XML; publishing on the World Wide Web; and the management and use of online texts.” The teacher, David Seaman, is the former head of the Digital Library Federation.

So I’ve addressed myself to reading the TEI guidelines for the latest version, P5, for now.

One thing that strikes me in the guidelines is that this seems to be a code that is designed for language analysis, of the kind practiced by linguistic specialists rather than literary analysists. The tags the TEI supplies us with allow us to code parts of speech, with the implicit goal at all times of “disambiguation.” How should we precisely code a complicated case? The TEI supplies solutions. There are distinct tags to indicate that a word is highlighted in some generalized way, but the TEI prefers that the tags be more specific to explain why a word or phrase is highlighted. Is it a foreign word? there’s a specific way to tag that, and to indicate which foreign language. But this code should not be used “should not be used to represent foreign words which are mentioned or glossed within the text: for these use the appropriate element from section 3.3.4 Terms, Glosses, Equivalents, and Descriptions below.” There are specific ways to tag the following: “emphasized for rhetorical or linguistic effect,” “linguistically distinct, for example as archaic, technical, dialectal.” There are many variations of ways to tag quotations to capture their precise linguistic structure. To disambiguate. Whether this kinds of analysis serves literary understanding or is appropriatly practiced by a literary scholar is a question in my mind, though.

The TEI provides the following example of coding to disambiguate a subtle text. The text is, “A pretty common case, I believe; in all vehement debatings. She says I am too witty; Anglicé, too pert; I, that she is too wise; that is to say, being likewise put into English, not so young as she has been: in short, she is grown so much into a mother, that she had forgotten she ever was a daughter. ...” The TEI recommends (see below) that the first tagged word is disambiguated as “emphasized”; this is distinguised from the second tagged phrase, which is functioning as a quote; distinguished from the third tagged word, which is a foreign, specifically latin, phrase; distinguished from the fourth tagged phrase, which is serving as a gloss on a previous word; distinguished from the fifth tagged phrase which is functioning as a quote; distinguished from the sixth tagged phrase, which is functioning as a gloss on a previous word; distinguished from the seventh and eighth tagged words, which are simply highlighted words but are not the same as the “emph” of emphasized words.

A pretty common case, I believe; in all <emph>vehement</emph>

debatings. She says I am <q rend="italic">too witty</q>;

<foreign xml:lang="la" rend="roman">Anglicé</foreign>,

<gloss rend="italic">too pert</gloss>; I, that she is

<q rend="italic"> too wise</q>; that is to say, being likewise

put into English, <gloss rend="italic">not so young as she has

been</gloss>: in short, she is grown so much into a

<hi rend="italic">mother</hi>, that she had forgotten she ever

was a <hi rend="italic">daughter</hi>.

Not surprisingly, the computer programmers have been at work creating programs that can automatically tag parts of speech. Eric Brill, formerly at Johns Hopkins and now at Microsoft, is a big figure in this research. “Unsupervised learning of disambiguation rules for parts of speech tagging” a representative article title.

March 20, 2008

Metadata and front matter, unitary and composite texts

For a break, for a treat, I’ve switched to thinking about what it means to represent Lunch Poems, City Lights edition, to get away for a minute from what Oxygen will let me do, and how can I even upload this onto the internet so that the reader can see anything.

My current representation is just the best I can do with my low level of skills, but it’s way off. For example, I’ve recorded some metadata (information about the transcription) as a part of the text.

Now, instead of just letting Oxygen do what it does automatically, I’m reading about my choices. I don’t have to start with <body>. I can have an earlier tag for front matter <front>. But how should I use that area? For the metadata? Or for the front matter of the edition itself, page one with its title, author and publisher information, page two with copyright info, page three with list of poems, page 4 with dedication, etc.?

Another question is whether I should consider the book unitary or composite. It should be tagged differently depending on how I think of it. Here’s the explanation the TEI web site gives for the difference:

“TEI texts may be regarded either as unitary, that is, forming an organic whole, or as composite, that is, consisting of several components which are in some important sense independent of each other. The distinction is not always entirely obvious: for example a collection of essays might be regarded as a single item in some circumstances, or as a number of distinct items in others. In such borderline cases, the encoder must choose whether to treat the text as unitary or composite; each may have advantages and disadvantages in a given situation.

Whether unitary or composite, the text is marked with the <text> tag and may contain front matter, a text body, and back matter. In unitary texts, the text body is tagged <body>; in composite texts, where the text body consists of a series of subordinate texts or groups, it is tagged <group>. The overall structure of any text, unitary or composite, is thus defined by the following elements:

 

 * front (front matter) contains any prefatory matter (headers, title page, prefaces, dedications, etc.) found at the start of a document, before the main body.

 * body (text body) contains the whole body of a single unitary text, excluding any front or back matter.

 * group contains the body of a composite text, grouping together a sequence of distinct texts (or groups of such texts) which are regarded as a unit for some purpose, for example the collected works of an author, a sequence of prose essays, etc.

 * back (back matter) contains any appendixes, etc. following the main part of a text.

 

The overall structure of a unitary text is:

<TEI>

 <teiHeader>

<!-- .... -->

 </teiHeader>

 <text>

 <front>

<!-- front matter of copy text, if any, goes here -->

 </front>

 <body>

<!-- body of copy text goes here -->

 </body>

 <back>

<!-- back matter of copy text, if any, goes here -->

 </back>

 </text>

</TEI>”

 
This example suggests that my metadata is not the front matter; the tag for front matter applies to what is front matter in the copytext itself. The example they supply of a composite structure suggests that it isn’t appropriate for this edition, because it’s a structure that allows you to insert individual front matter and even back matter for each group in the composite collection. If you don’t need to do that, as I do not with LP, it seems like the wrong structure. So I will go back and redo my first pages of the City Lights edition. It’s time to transcribe the front matter anyway, though I don’t yet have any page images. I think I am now feeling confident enough of my ability to represent the text in some way that I’m ready to focus on the best and right way to represent it, and let go of my worries about just getting anything to work at all. What a relief, I can finally start doing some literary work.

March 19, 2008

Things are looking decent

At last, it doesn't look too bad. I've had to work around the stupid stylesheet, that wants to automatically create a table of contents and add numbers to everything. What it's doing isn't in the text and doesn't belong in the transcription! I'll work on stylesheets next. For now I shoved the stupid table of "Contents" list to the bottom of the page because I couldn't get rid of it. Everything else looks moderately acceptable. The coding of divisions, stanzas, pages, all that stuff is very unclear to me, because it depends on what needs to be a variable when I'm formatting the stylesheet. That will be interesting to think about. Anyway I have a good enough grip on this, finally, to start asking some more interesting questions.

Same link, better content 

http://www.literaryhistory.com/fo_archive/fod/lp.htm

Inching forward

It turned out that the pointer tag I learned how to use wasn’t the one I needed to link to my graphic files; once I inserted it all I had was a url listed. I needed to use the figure tag, because the thumbnail is a graphic. Anyway, I got a thumbnail in.

There’s now one full page in TEI, the beginning of my transcription of the City Lights edition of Lunch Poems, at http://www.literaryhistory.com/fo_archive/fod/lp.htm

The xml code can be viewed at http://www.literaryhistory.com/fo_archive/fod/lp.xml

 

I have to figure out now how to make an image clickable, so that it will take the user to the full-page image. I hate the way it looks, it is numbering the figures, which must be instructions supplied in the stylesheet and I won’t be able to fix that until I create my own stylesheet.  I don’t like where it puts the name of the poem, I’d like it to the side of the image. And I’d like a line skipped between the thumbnail and the poem. All that will go in my stylesheet design I think. But at least I have figured out how to insert my images.

There is a lot of tedious technical stuff to figure out still. Like the things I just mentioned. There is the whole problem of the XSLT that I have to learn, but the w3schools should be a good way to get oriented for that. The problem of scanning when I lack equipment and skills. How to put in the navigational (non-graphic) links so you can get from page to page. I don’t know whether to try to work out these technical infrastructure problems some more or to work on coding the poetry. I’d really like to think about coding the poetry. There are quite a few challenges in O’Hara’s poetry, the unusual breaks and white space, the prose poetry sections, questions about how to logically and consistently code the varied kinds of stanzas he uses and couplets and numbered sections. TEI lets me provide at least some information about that kind of thing, I’d like to explore how that works. There are so many things that need to be done on this.

March 18, 2008

I had an idea

I got so mad at Oxygen I decided I'd try to trick it. I put my ptr language that it wouldn't accept inside a <p></p> since that's where they had it in the example, and Oxygen accepted it. Finally it didn't tell me it wasn't valid. But then it rejected my <head></head>, now this wasn’t valid in this context (though it had been before). The head, that was the title of the poem. But I guess the link to the page image is more important than the title. I’ll figure out how to get the title in there somewhere that Oxygen won’t reject.

Oh I see now. I've been exploring the drop down list that appears after you type <p> in Oxygen. A lot of information gets nested inside the <p></p>. There are all kinds of choices in there. Including pointer and title. Now I’ve got a title and a pointer! I've linked to my page image. This is better than what I had before, which was only a header, not a real title tag! I’m on a roll!

My Photo

April 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30      
Blog powered by TypePad