PiratePad
Full screen

Server Notice:

hide

Public Pad Latest text of pad 491CHtq3Mj Saved Sept 16, 2011

 
     
Thursday June 2, 2011
See also Participants list: http://lod-lam.net/summit/participants/
 
Please add to the information contained here. Especially where there's missing information (???, ...). 
 
 
Day 2 session plan:
 
Dorkshorts session plan:
 
 
Pictures of the event:
 
http://www.flickr.com/photos/captsolo/5794870054/in/photostream - Hideaki Takeda shows us how to properly hold a kendo sword. and you have to swing it too! :)
 
Open search in an afternoon:
 
Tweet archive:
 
Brief intro to LD:
 
 
Day 2 - follow-up :
  • RDFa and ePub session (???)
  • vocabulary maintenance toolkit; vocab preservation framework; alignments b/w vocabularies (e.g., crowdsourced mappings) - notes re this on LOD-LAM list (Tom Baker)
  • fundamentals of web architecture, how to make [meta]data linkable, guidance (???)
  • what practical tools ppl use to expose content ask LD / SPARQL (???)
  • microdata (Eric Hellman)
  • ...
  • ...
  • desireable licenses for metadata (???)
  • alignments, what tools and processes are people using (from FAO, spent lots of time aligning AGROVOC with diff thesauri, built ad-hoc tools in Eclipse, ...)
  • explaining LD, references for beginners (David Weinberger)
  • (...)
  • have dataset re theater, would want something with it
Shawn Simster, Andrew Ashton, Paul Keller, Scott Nesbitt, Thea Lindquist, William Gunn
Rurik Thomas Greenall
database has one line per production
a-frbr:work
dcterms:title->"Twelfth Night"
 
dcterm:creator - <http://...link to Norwegian Authority file>
 
Discussed methods of converting the data from CSV to LD; our current solution creates 7000 triples about the basic productions
 
  •  
  • ... (worked with machine tags, George Oates?)
  • crowdsourcing, creating quality LOD (from Wikimedia foundation - interested to expose LOD from Wikipedia, etc.)
  •  
  •  
  •  
 
 
9 AM, intro session
 
9:15 AM, list of topics
  • Folksonomies, controlled vocabularies, etc: How do we get them to play nice together. ( )
  • Vocab maintenance, strategy for vocab long-term preservation (Tom Baker)
  • Utilizing existing vocabs for modeling archives (Aaron Rubinstein)
  • LOD-ABC: Linked Open Data for people who need to understand more about what it is (David Weinberger)
  • "tweets" to convince to use LOD:
  • leverage other people's work
  • better: discoverability, linkability
  • ...
  • ...
  • What are we going to do for the User? (Karen Coyle)
  • Rights: rights languages (search for things with those rights), rights in general
  • Refine archival description standards to make them more flexible, suitable for LOD
  • Scaling provenance (Bradley Allen) see notes at 
  • How can we crack open the digital humanities projects, get out the data into a more open environment? -- People data in digital humanities/text projects -(Andy Ashton) -"I kind of feel like a gladiator" 
  • Linking together the same object in different media, e.g. Google Book, youtube video (Doug Reside)
  • Getting structured linked data out of LAMS, getting to the juicy bits - to link to collections (Tim Sherratt)
Attending: Tim, Anra Culture 24), David W., Jill (Apple), Susan Chun, Lori J., Mary (Bancroft), Rachel (DLF), George (OpenLibrary), Francesco (UC Berkeley),Piotr (Met), Layna (SFMOMA), Charles Moad (IAM), Uldis (Natl Library of Latvia) and a few others who came in later
 
Tim: Nat. Museum Au is migrating its CIS; what should be thought of in terms of LOD when this process is going on? Important linking points, People, geo; how can you give people tools to make this work in the data migration and do it with little money and less people? Are there opportunities with marchine processing (Open Calais)?  
 
George: IA has been saving TV for a number of years; much of this has closed captions; IA has thrown this at a NLP to pull out key bits of data. Goal is to build a UI on top of this data to see various things (what happended on a date); not yet doing live processing of new acquisititions. 
 
Lori: used Open Calais a bit but has questions about how its missing people; what about using interns to create Wikipedia entries for medical people not there to help jump start Open Calais?
 
Rachel: tried some similar things with newspaper project; how to make it beneficial to a larger audience and to the contributing institution.
 
Susan: using tools to create stub entries in wikipedia for later editing and integration.
 
Anra: are there lists of tools that "enty extraction tools" out there?
 
Martin: referenced BHL and taxon name finding tools that look through large text banks
 
Rachel: referenced work of Greg Crane (Tufts/Perseus) in name/place finding
 
George: What about "how can I make my dataset useful for creating/training these tools?"
 
Martin: Take a look at the NEH "Digging into Data" project from National Endowment of the Humanities, IMLS, NSF, and more (http://www.diggingintodata.org/)
 
Susan and Piotor: Text Retrieval Conference (TREC) another example of big data sets that are up for people to experiment with and to have computer science hack at it. http://trec.nist.gov/
 
George: OpenLibrary is exploring the idea of it being an identifier reconcillation service. It would be good to have these types of "cleansing" services on the Internet.
 
Tim: People Australia is sort of something like this where you can get biographical links back for/about other people; but not yet LOD, things just go to an HTML page
 
Tim: Regarding Wikipedia authority creation; look at the recent discussion on Code4Lib; George suggested what if a "Wikipedian in Residence could be an alogorithm or more likely a coder?
 
Martin: [in life sciences] we are careful not to make too strong assertions:
    using "... is LIKELY a representation of ... (e.g., corn)" instead of "... is THE representation of ..."
        letting the system respond "if you agree ... is a representation of ..., then here are other related representations" <-- please improve this note
        
George: in social networks there is a the concept of strong ties and weak ties; in thinking of authorities, how could this be applied? Piotr mentioned WordNet as a good example of this.
 
David W.: How can you express levels of confidence in triple stores/RDF? Will discuss in detail later offline
 
 
 
ACTION ITEMS:
- Look at Stanford cite of extraction tools
- Be mindful of linking when you're creating things; creates a layer of work, but do it (Rachel)
- Node thyself
- Lunch
 
 
  • Crowdsourcing LOD - Tim Sherrat - National Museum of Australia
12 People: Tim Sherrat, William Gunn, Mia Ridge, Jon Voss, Kris Carpenter, Martin Kalfatovic, Josh Greenberg, ...
Types of crowdsourcing - emergent behaviors
Flickr example of machine tags - emergent phenomenon, not driven by flickr, but astronomers
peopleaustralia  - persistent identifiers for people, made machine tags from peopleaustralia service to machine tag people in Flickr photos with persistent ids
assembling descriptive systems from existing sets of data
Mia - assembling collections is messy, requires human intelligence - Who is the crowd?
Jon Voss - tagging for tagging's sake is perhaps a nonstarter, but seeing usability gets people interested.
Jo Pugh - example of getting the crowd involved Old Weather project
Mia - crowdsourcing games, people contribute edits simply because they want to be right
Motivations - usability critical, how much are you asking of people?
Seti@Home asked very little.
Old Weather & Mia's games  - simple tasks you could pick up or put down whenever
Mendeley - if you want to use an item, could you add a bit of info while doing so? also implict annotations captured in the background.
Ratebooks, Old Weather, GalaxyZoo, Mendeley, Seti@Home, wikitrust, NYPL Historical Food Menus
Crowdsourcing - roles of contributors - what level of trust do edits to metadata get? Individual collections have authoritative sources, general catalogs can't.
The amount of participation is inversely related to the amount of information you require from people (which is necessary for establishing authoritativeness)
Jo - Is the distinction between internal  authoritative sources and external crowdsourced info an artificial distinction?
Josh - assigning authority by source/contributor causes provenance & rights problems when building the database.
asa letourneau - Australia crown copyright causes serious issues for crowdsourcing & rights since there's so little in the public domain & permission to reuse presents a serious burden.
Mia - start with simple tools and watch the emergent behaviors that emerge. Taught curators to use machine tags, they liked it, but didn't see the re-use emerge to make a compelling case for using it.
People want to talk about relationships between things, perhaps more than simply describing a thing.
??? - What if we captured the user context around an item - when they saw it and why, to create a context for the item - valuable to track?
??? - either do the user experience part up front, or just capture all the annotations in an unstructred format & apply structure later
asa - driven by needs and wants, not tools
mrgunn - user experience or lack thereof is the fraction encountered in delevoping the community and the data set
Mia - best projects are the ones that understand their community
how do you get projects to interoperate?
Josh - NYPL - tapped foodie community to annotate historical menus
 
IMPORTANT FACTORS
low entry barriers, user experience, seeing progress, reuse, game dynamics, reputation, democratic access (balancing democratic access vs. privileging authoritative sources)
what are some proxies for correctness? Number of edits, edit half-life, etc.
 
Jon - How do we realize these missed opportunities? 
Tim - How do we capture contrbutions as linked data?
 
Action items/outcomes:
Create a crowdsourcing system that captures structured data, not just info - how a LOD corwdsourcing project differs from a crowdsourcing project - capture implicitly as much as possible.
List of recommendations/best practices for capturing linked data
List of good examples
Gather data on:
how much community input is required to achieve a certain level of quality, effect of UX/UI, seeing progress, enabling reuse, effect of reward/privileging authority, gaming
 
  • Builduild a tool for scholars capturing unstructured data in archives -- to *reuse* data. Make it extensible, usable. First priority: network mapping (sociology) (Micki McGee)
  • Build a reconciliation/lookup service from a shared open discipline-specific resource (Susan Chun)
 
  • EPUB 3 support for RDFa (Eric Hellman) - 10 AM in Kyoto #6
People: Jodi Schneider, Jill Vermillion, Ryan Shaw, Doug Reside, Roger Macdonald, Karen Coyle, Eric Rochester, Eric Hellman, Eric Kansa, Jerry Persons, Tyng-Ruey Chuang, Jane Hunter
  • Karen worked on the first EPUB standard.
  • Eric: looked for HTML5 annotations in EPUB3, assumed they would be there, thus that RDFa would be there. Linked data as an "appropriate technology". HTML5 had a big controversy about what semantic markup would be supported out of the box (microdata vs. RDFa).
  • RDFa supported in the package data (metadata header; not supported in the content documents.
  • Linking mechanism left in, annotation left out: there are companies doing annotation in different ways, wait and see how it pans out.
  • Applications: 
  • Marking up citations e.g. with BIBO -- ideal application for RDFa
  • Other added semantic markup with HTML5
  • ...
  • Book publishing community doesn't have the concept yet. They now get that metadata is important, but may not be thinking about intertextual metadata yet.
  • Open access community expected to pick up faster.
  • Doug: TEI http://www.tei-c.org/index.xml: overlapping hierarchies -- difficult for encoding; standoff references might be better, especially for editorial additions, non-authoritative crowdsourced annotation.
  • Eric: This is supported by EPUB3. Not using XPath
  • Eric: Assertions being made by the text should be in the text. 
  • Jane: Use pointers to controlled vocabulary, embed with URIs, not the thing itself.
  • Doug: Self-contained nature of EPUB argues against this.
  • Eric: rabbit hole of what deserves first class status (e.g. name string, ...)
  • Jane: Do want to use controlled vocabuary though
  • Eric: annotation uses named graphs, needed "to do anything practical with linked data"
  • Tools: 
 
  • Eric: Would help if libraries would express that they want to get ahold of the EPUB
  • Eric: Expect STM publishing over the next 5-10 years to migrate to EPUB rather than HTML. Reflowing, supports Math-ML
  • Jill: EEBO, Oxford -- converting to EPUB
  • Doug: TEI & EPUB community should talk to one another. Their fees are so expensive for academics (e.g. BookExpo)
  • Eric Kansa: travel literature: entity-disambiguation useful, mapping; interactivity for mobile devices
  • Jane: markup for scholarly reasons is different than 
  • Eric: Rights problems, DRM--most libraries don't have access to EPUB
 
  • Tension between historical data + LODLAM: appearance of truth but need more subtle ways. Need to express confidence, lack thereof (???)
  • Authority control -- how to repurpose name authority cataloging in LOD (Jerry Simmons)
  • WWI -- recurrent use of vocabularies -- tried to use DC, doesn't work well. Use of multiple vocabularies. How do I convince archives to use persistent URLs so I can link to them? (Rob Warren)
  • Want tools for chemists to tell archivists about their data sets -- w/o becoming a data modeler or LD expert (???)
  • How can we use machine learning to scale up? Discovery/linked data -- authority records, metadata associated with these records (archival description) via annotation, crowdsourcing. Add machine learning. (Dave Lester)
  • Business case for linked data (Adrian Stevenson)
  • Crowdsourcing annotation -- join with David Lester -- explicit AND particularly implicit annotations (William Gunn)
David Henry, Mia Ridge, asa letourneau, Jane ??, Ingrid Mason, Sunghyuk Kim, John Deck, William Gunn, David Lester, Jane Hunter
How to capture annotations & knowledge about a collection (sensors, paintings, etc) such that the analysis can begin without too much "janitorial" work on the data.
Citizen digitization - researchers have imaged documents many times over, how can we capture this, perhaps in a Flickr pool, to stitch the whole item together.
Creation of a contributor wall for people who contribute?
Assessing contribution quality - specialist contributions are rare and outliers by default - how do you tell those apart from spurious and wrong outlier contributions.
 
Crowdsourcing list of best practices
tracking a annotator across annotations - where is the identity stored, retaining provenance can substitute for identity
Scalability is a major problem with collecting provenance, having to generate URIs for every annotation and storing those over time - maybe decrease resolution as the annotation history ages. Storage is cheap, maintenance, curation, preservation isn't.
 
  •  
  • How to promote LOD to organizations--need to integrate into workflow, use teamwork (Hideaki Takeda)
  • History of -- description in eternal presence -- change over time. What is owl:sameAs in this context? LAM can contribute (Doug Knox)
  • Using globally unique identifiers to track objects when they are collected. Transitive properties, relationships to track when things have changed. Specimens, observations, tissue samples, sequences -- a lot of very different specialized databases that are related together. (John Deck)
  • Annotations -- collecting them and pushing info to creator (John Deck)
  • (1) Rights. How do we make linked data open? (2) How do we make this data useful to web users? Great tools to make LD useful on the web (MacKenzie Smith)
 
Linking Open Citations (Jodi Schneider) - 11:30 in Kyoto #6
Reflections from John Willbanks
 
  • misc thoughts from ww:
  • the Citation as a first class object, not just A cites B but a richer idea of citation where it can express the nature of the citation (supports, refutes, annotates, etc)
  • relationship between annotations and citations?
  • How can we evaluate and fund science in a more transparent manned using linked citation data?
  • What citation express ("typed citations")? What citation means in different contexts?
  • Citing data vs. citing articles.
  • Citation is a speech act.
  • Extending Trackback to support "labelled" trackckbacks.
  • It's easy to see who retweets you, but not as easy to know who cites you.
  • Meaning of a citation can shift. 
  • CITO ontology http://purl.org/net/cito/ allows to express "typed" citations, such as agreement/disagreement and the like.he like. -- CITO is nice but inherits from a big OWL beast - is it possible to use it with others (like e.g. BIBO) without inadvertantly entailing contradictions?
  • Citation granularity - how to reference micropublications?
  • Ability to annotate and cite particular paragraphs or arbitrary parts of a text (e.g., on Kindle).
  • OpenURL - the "grandfather" of linked citations
  • What limitations does OpenURL have that can be solved by citation ontologies?
  • Idea: RDF snippet generator for citations. Enter URL, choose a citation type, get an RDF snippet.
  • We live in a "like button culture". "Like" is a trivial form of a link. "Like" is an easy interface for creating links.
  • Relatedness as a service: Based on citation data, give me related resources.
  • Comparability based on structured data, such as reading lists
  • Citation is a proxy for impact of a work. How do you measure the impact of a work on the Web?
  • "Auto-generating" reading lists for syllabi based on the most cited works for a topic
  • How do you determine which citations are important enough to be surfaced in user interfaces?
  • Provide an embed code to increase usage. (And "embedding code" is also a type of citation.)
  • Libraries can serve as an infrastructure for identifiers. When you cite, you need an identifier.
  • TODO list:
  • Generators and embed codes.
  • Provide persistent identifiers.
  • RDFa in EPUB hack session - 2:45 in boardroom
  • Participants: 
 
Rights & Open Data, Kyoto # 7
    Rights Metadata & Metadata Rights
    Paul Keller giving rundown on how Europeana has been dealing with rights questions.
    Karen: some of these strategies similar to OKF Open Data Principles
    Paul: not only policy, but also enforcement
    Paul: for an action item we should consider endorsement or discussion of Discovery Open Metadata Prinicipals: http://discovery.ac.uk/businesscase/principles/
    Reading through this, "Discovery Metadata" is specific to project, not technical term.
    Risk Management and Long Term Strategy vs. giving away your data (Rachel, referencing DPLA meeting)
    MacKenzie: ACTION: How about a 5 star like rating (with 5-2 being called "open"):
        5: cc0 or comp
        4: attribution license with link as attribution
        3: attribution with other form of attribution
        2: attribution-share alike
        1: non-commercial
    Rachel: also add risks and rewards
    Rachel: 
    What does it mean to provide attribution when using CC-BY licensed data?
    Europeana needs a transfer of some rights in order to operate at all.
    Don't talk about ownership. Talk about risk management.
 
Long-term preservation of RDF, vocabulary maintenance session
  • How do you maintain URIs?
  • Use implementation-agnostic URIs (e.g., without .php)
  • Use a content delivery network (CDN)?
  • Use a proxy server that redirects the user to the current location of an RDF resource when its URI is resolved? (e.g., http://dx.doi.org )
  • Will the link-rot get more intense in a couple of years?
  • How can you decide if you can trust an URI provider?
  • You can trust Library of Congress to maintain their URIs.
  • Does the LOCKSS (Lots of copies keep stuff safe) principle apply to vocabularies?
  • How to make strategic partnerships between vocabulary maintainers (e.g., DCMI and FOAF Project)?
  • Stable context vs. flexibility
  • Using DNS system for versioning (by @dchud)
  • Vocabulary management toolkit:
  • Best practices for vocabulary alignment
  • Which RDF properties (mapping predicates) to use? owl:sameAs for classes (schema-level data) and SKOS matches (skos:exactMatch, skos:closeMatch) for instance data (value vocabularies).
  • How NOT to overstate similarity with owl:sameAs?
  • Vocabulary preservation framework
  • A mailing list will be set up by DCMI.
  • Vocabulary maintenance
 
 
Digital Humanities, Prosopography, cracking open crafted data:
 
  • How to engage faculty
  • What is the “script” to sell LOD to faculty
  • Ryan Shaw at UNC Chapel Hill - Emma Goldman, Susan B. Anthony, Labbidy? collection of radical organizations at UNC
  • How easy are you going to make it for faculty to create these links.
  • Bringing together fragments of identities for people who may not be well-represented in other ways.  
  • Crowdsourcing the linking of fragments of data
  • The story for Linked Data may be natural historians - this is already their methodology.
  • Identifying canonical reference works (Epigraphic corpora)
  • Citation LOD for humanities work
  • Funding agencies could identify priorities for exposing daata
  • Licensing issues need to be clear. 
  • Is there a set of killer tools for a demonstration that we could define?
  • Every domain is different - not easy to identify the “killer app”
  • Tools for scholars to work with data that they  have produced (Google Refine)?
  • Where is there abundant data that scholars haven’t been exposed to yet?  Authority files are part of library catalogs, but not part of scholarly projects.
 
Explaining LOD
  • How do explain LOD to non-techie audience?
  • What do you need to convince others of the benefits of LOD?
  • Killer apps
  • Money
  • Success stories
  • How do you assess the costs of doing LOD? What are the key benefits of LOD?
  • What kind of training do the implementers of LOD need?
  • How do you sell the idea of LOD?
  • Put up an FAQ explaining LOD for librarians.
  • What are the questions this FAQ should answer?
  • Do we need to use explanatory metaphors or affordances of LOD (i.e., what can be done with LOD)?
 
Users and uses
Thursday, 11:30 (reported by Karen Coyle)
 
What user tools exist that we can build on?
- BBC
- Freebase
- VIVO
- Open Library
- Zotero
 
We need open-ended tools that people can build on. Here are some things that people might want to do
  • corrections
  • timelines, maps, views
  • mashups
  • re-expose info
  • filter, sort
  • visualize
  • share, link
  • annotate
  • cite
  • create
  • debate
  • mark "like" or "wrong"
  • feedback and following
  •  
 
We want to have VERBS, things people can do
 
We need to create use cases.
 
There are different types of users who have different needs. Examples of users are teachers, journalists, scholars. 
 
Data and information creators need an incentive to create metadata. That incentive probably has something to do with visibility. Note that there is a difference between 'semantic soup' and curated LOD.
 
LOD: linked open data
LD: linked data
LCD: linked closed data
LED: linked enterprise data
 
Wishlist:
  • e-reference librarian for getting started with research
  • prototypes, so that we can try things out
  • surprises; users doing things we never would have dreamed of
 
LODLAM-DC Sept. 16, 2011
 
How to License Metadata Clinic
 
Complexity of what people at federal institutions create, whether or not it's in the public domain.
 
Do you have a digital bibliography that refers back to the data sources for each project?
 
Trevor mentions datacite.org
 
Lawyers trying to sort out what is metadata and the facts vs. creative expression.
 
"Implied license by letter" send a letter or correspondence that outlines what you want to do with that information and how you plan to share or license it and then use that to be license by letter.
 
Provenance people have a good hedge on this: until policy people figure it out, use provenance markup language to track changes
 
 
Use Case Collection Session (notes by Amanda Vizedom (@ajvizedom), please add/correct!):
 
1. Civil War Data  150  http://www.civilwardata150.net/(Jon Voss) (@jonvoss) 
-- desire to connect data across collections about specific event and organizations (regiments, etc.).
-- eventually, to get down to the level of individual people, connect information about individual people in different collections.
-- Connecting information about individuals of other sorts - people, but also events, organizations, buildings, places -- very difficult (Simon Spero) -- need ways to search for existing stores of information about.
 
2. Smithonian  -  (Suzanne Pilsk) An existing reference source in Botany (TL2?) (Suzanne). Want to convert into linked data. Because it's a good idea, but also because this source is already used as a standard - in fact, as The Standard. It is an authoritative source,But exists only in print right now. Botanists Committee announced that you can create a name without publishing, e.g. announce into the ether. But print source is out of synch with that, need authority source to be in ether as well. Putting it out as linked data addresses
- address potential conflicts
- address print lag
- gives publishers, etc., somewhere to link to to declare authority source
- gives researchers, etc., an easy way to add trackable citations to your source.
 
3. U Penn ? and Archeology - Internally already track citations to objects held. Going to publish these lists of all citations to the objects, initially as dumb list. Long view is to be able to link  to the *other* objects cited in those articles, and the information in their holders' data (have natural partner organizations, e.g., the British Museum).
- address common issue of diff people coming and asking same question repeatedly
- create data that answers their questions ahead.
- cross-ref arose between this and the Civil War Data project 
 
- would like to make the semantic markup part of the capture process
- question about having something like accession number, but for linked data.
- Question about tools for making this happen
- Active research project in this area for Archeological sites: Digital Gordion
- Discussion of BBC's tools, using GATE (from Sheffield), combining automatic and manual processes to mark up media in sport and nature domains
- Discussion of ArchiveSpace - future - going from archivists' toolkit to/in this direction, building on existing linked data tools in the open space. 
- Note SNAC project - Social Network & Archival Context Project: http://socialarchive.iath.virginia.edu/ (@jenserventi)
 
- Non academic / public uses?
  - it makes the LAM data linkable from such places as Wikipedia 
   - from LeafSnap to taxoname of plant to museum data on that plant to information on plant origins and ....
 
- What kind of uses / connections/ links are you trying to drive?
  - virtual museum visits
  - awareness of the LAM as a resource
  - potential new supporters, draw to $-generating presence, e.g. gift shop?
  - ability for public to contribute content
  - crowdsourcing opportunities
  - capturing elements of what people find interesting about objects, how they see them, so that this can feed back into the repositories and inform the holder's practices -- be it cataloguing, museum experience shaping, etc., 
  - people searching on amazon for a book by an author also see Smithsonian's archive data on that author?
  - open library project