Opened 14 months ago
Last modified 3 months ago
#999 infoneeded enhancement
OAI-ORE support
| Reported by: | skaplun | Owned by: | lnielsen |
|---|---|---|---|
| Priority: | major | Milestone: | |
| Component: | *general* | Version: | |
| Keywords: | OpenAIRE OAI-ORE | Cc: |
Description
Open Archives Initiative Object Reuse and Exchange defines standards for the description and exchange of aggregations of Web resources [...]
In Invenio aggregation might come from different sources:
- the DB:
- collections aggregate records
- records links to documents
- documents links to revisions of documents
- revisions of document link to formats of documents
- MARC 76x-78x fields (at CERN e.g. these are used to link a record to the official publication, or a conference with its talks, a talk with the contribution and the proceeding, etc. or can be used to link a photo to a photoshoot). (in OpenAIREplus datasets will be linked to publications).
Implementation details will be added as comments to this ticket.
Attachments (2)
Change History (6)
comment:1 Changed 14 months ago by lnielsen
- Owner set to lnielsen
- Status changed from new to assigned
comment:2 Changed 4 months ago by skaplun
- Type changed from task to enhancement
Changed 3 months ago by lnielsen
comment:3 Changed 3 months ago by lnielsen
OAI-ORE prototype notes
Use cases of OAI-ORE:
- Data exchange (primary):
- CDS, Inspire, ADS, arXiv
- OpenAIREplus Orphan Repository, OpenAIREplus repository.
- Visualisation (secondary):
- Enhanced publications: Browsing the archives via Firefox plugin (small showcase).
Candidate aggregation examples:
- General examples:
- Collection aggregating:
- Collections
- Records
- Feed
- Record aggregating:
- Metadata record (perhaps via OAI-PMH)
- Authors
- Documents (PDFs, Images, Videos, Audio)
- Bibliographic descriptions: BibTex, MARC, MARCXML
- Comments
- External links
- Similar to relationship: DOI, arXiv id
- See also
- Record citations (this could also just be added as a relationship to other
resources)
- Records
- Record translations
- Records
- Documents aggregating:
- Revisions
- Revisions aggregating:
- Formats
- Collection aggregating:
- Specific examples:
- Logs
- Login information?
- Photo shoot aggregating:
- Photos (isnât this the same as a record aggregating documents?)
- Conference aggregating:
- Contributions
- Proceeding
- Notes
- Posters
- Talks
- Slides
- Book aggregating:
- Chapters
- Periodical
- Journals
- Volumes -> Issues -> Record
- Journals
- OpenAIRE:
- Funding scheme aggregating
- Projects
- Records (data, publications)
- Projects
- Publications aggregating
- Data
- Project(s)
- Funding scheme
- Data aggregating
- Publications
- Project(s)
- Funding scheme
- Funding scheme aggregating
- See also videos
- Similar records
- https://twiki.cern.ch/twiki/bin/view/Inspire/TalkORE
Abstract data model
- Resource: anything of interest - resources are identified by HTTP URIs
- Information resource: Any kind of document, image, video etc that when you access the URI get information back (i.e like we know the web).
- Non-information resource: The HTTP URI doesnât return information - just a name for a âreal-worldâ object
- Aggregation: a set of resources (a non-information resource).
- Aggregated resource: a resource in an aggregation (which can be an aggregation).
Important: Anything that should be in an aggregation,
must have a URL (e.g project, funding scheme, etc)
- Resource map: a description of one aggregation (i.e an information resource)
- Proxy: used for ordering
HTTP implementation
- Each URI defined in resource maps must resolve
- Separate resource maps
- Model 1:
- http://foo/aggregation/a (aggregation - redirects with 303 via content negotiation)
- http://foo/aggregation/a.html (resource)
- http://foo/aggregation/a.rdf (resource map)
- http://foo/aggregation/a.atom (resource map)
- Model 2:
- http://foo/aggregation/a.rdf#aggregation (aggregation)
- http://foo/aggregation/a.html (resource)
- http://foo/aggregation/a.rdf (resource map)
- Pros:
- Clear standalone resource map
- Cons:
- Redirects will degrade harvester performance
- Model 1:
- Embedded resource map via RDFa:
- Model 3 (without redirect):
- http://foo/aggregation/a.html#aggregation (aggregation)
- http://foo/aggregation/a.html (resource map + resource)
- Model 4 (with redirect):
- http://foo/aggregation/a (aggregation)
- http://foo/aggregation/a.html (resource + resource map)
- Pros:
- Resource map is embedded in splash page (no redirects needed)
- Cons:
- Size of HTML (perhaps with gzip compression itâs negligible).
- Depending on size
- Load issues during harvesting
- Model 3 (without redirect):
- Resource Map discovery:
- Generate site map xml
- Generate atom feed
- Via OAI-PMH (could possibly avoid redirects from aggregation to resource map)
- Insert link-tag in HTML
Risks/concerns
- Inclusion of other relationships and metadata:
- How much (see 4.5 Relationships to other Resources and Types)? Citation links, translations.
- Exporting very large aggregations
- HTTP implementation of OAI-ORE incompatible with Invenio URL scheme?
- Efficiency of protocol
- Redirects in resource map discovery
- One aggregation per resource map (means lots of HTTP requests to harvest #records).
- Enforcing structural constraints of aggregation graph
Relation with OAI-PMH
- OAI-PMH be used to support resource map discovery
- OAI-ORE can be used to include a link to a OAI-PMH metadata record
Integration in Invenio
- URL Scheme + Data model
- Anything that needs to be referenced from an aggregation needs a HTTP URI (there are ways to express relationships with other entities though).
- The data model and URL scheme is tightly connected.
- Resource Map generation framework:
- Mapping of Invenio data to resource maps
- Module for mapping anything in Invenio to the OAI-ORE data model
comment:4 Changed 3 months ago by lnielsen
- Status changed from assigned to infoneeded
Note: See
TracTickets for help on using
tickets.

Prototype of ORE generation