Invenio Blog

Follow news and updates on Invenio world

Introducing RDM, ILS and Framework

Lars Holm Nielsen Oct 15, 2019 Invenio

We're happy to announce a major overhaul of inveniosoftware.org. Some of the highlights of the new website include:

  • Rebranding of Invenio into three products: InvenioRDM, InvenioILS and Invenio Framework
  • A new forum
  • People in the community
  • Logo downloads

Framework, RDM and ILS

The primary reason for the website overhaul is a rebranding of Invenio into three different products:

Both, InvenioRDM and InvenioILS are applications built on top of the Invenio Framework. On each product page, you'll find a lot more information about the product as well as their current roadmaps.

Talk - Discourse forum

We are also launching a new forum for both users, administrators and developers, which replaces our current troubleshooting repository on GitHub and compliments the current Gitter-based chatrooms.

People and institutions in the community

We have also added a new people section to better showcase the persons who are making Invenio into a real community. Don't hesitate to email us if you'd like to be displayed on the list.

New logos

Last but not least, we've made some very minor modifications to the Invenio logo, as well as made a dedicated download page where you can get SVG versions of all logos.

Towards Invenio v3.2 and Elasticsearch 7 support

Lars Holm Nielsen Aug 7, 2019 Invenio

Two CERN sprint teams with a total of 14 developers have just each finished a two-week sprint:

  • Sprint team 1: Focused on the Invenio v3.2 release which will include the new Files bundle.
  • Sprint team 2: Focused on adding Elasticsearch 7 support as well prepare the Invenio v3.3 release which will include the new index migration utilities.

Highlights

A total of 23 new module releases where made during the sprint. The highlights from the two sprints include:

  • Elasticsearch 7 support.
  • Elasticsearch index prefixing and suffixing support for shared clusters (this is needed for the upcoming index migrator utilities).
  • Marshmallow v2 and v3 compatibility. Invenio is now able to accept both Marshmallow v2 and v3 schemas. In your Invenio instance you will need to pin the Marshmallow version that matches your schemas, and follow the upgrade guide provided by Marshmallow to upgrade your schemas.
  • Sentry support is now using the Sentry-Python library instead of Raven library (you can still switch back to Raven by setting SENTRY_SDK = False in your configuration).
  • Rate limiting now differentiates between guests and authenticated users, and allows for external modules to provide per user rate limits.
  • Improved HTML sanitisation support in several modules.
  • Improved support for client-side infinite scroll in the REST API.
  • Housekeeping: we have fixed a significant number of build failures as well as deprecation warnings from other libraries.

All of above highlights will be released together with Invenio v3.2. Most of the individual modules have already been released, however you are still on your own if you decide to go head with them prior to the Invenio v3.2 release (if you encounter problems, we are of course very interested in hearing about it, so that we can solve them before the v3.2 release).

Future plans

Invenio v3.2 and Files Bundle

The primary focus is still to release Invenio v3.2. The pending issues are limited to final testing and documentation.

Elasticsearch v2 and v5 deprecation

Invenio v3.3 will add index migration utilities that will allow Invenio users to upgrade their Elasticsearch clusters to supported Elasticsearch versions. In Invenio v3.4 we plan to then remove support for Elasticsearch v2 and v5.

Python v2 support ends January 1st 2020.

Python v2.7 will reach end of life on January 1st, 2020. Invenio will only support Python 2.7 until that date. From January 1st, 2020 we will remove Python 2.7 from our test matrixes, and thus new module releases after January 1st 2020 will very likely no longer work on Python 2.7.

We are already seeing a large number of our dependent Python libraries that have removed Python 2 support, and thus we will not be able to continue Python 2 support beyond January 1st, 2020.

Invenio v3.3 - Index migration and usage statistics

Invenio v3.3 is planned for release in late 2019 or early 2020. The primary focus for Invenio v3.3 will be adding support for Elasticsearch index migration as well as releasing the Statistics Bundle (COUNTER-compliant usage statistics). The Statistics Bundle includes the following modules:

  • Invenio-Stats
  • Invenio-Queues
  • COUNTER-Robots

Invenio v3.3 may also see the release of a new module, Invenio-Records-Permissions, which will significantly simplify the defining and managing access control for records.

Releases overview

  • invenio-access: v1.2.0
    • Removed DynamicPermission from Invenio-Access (deprecated since v1.0.0)
  • invenio-app: v1.2.0
    • Fixed issue with instance_path and static_folder being globals evaluated once which caused problems with fixtures in pytest-invenio.
    • Improved the rate limiting to differentiate between guests and authenticated users.
    • Added possibility for external modules to provide per user rate limits via the Flask g global request object.
    • Fixed deprecation warnings from Werkzeug.
  • invenio-assets: v1.1.3
    • Changed module to hide webpack warnings (primarily needed for the cookiecutter-invenio-instance to reduce output clutter).
  • invenio-base: v1.1.0
    • Added support for allowing instance_path and static_folder to be callables which are evaluated before being passed to the Flask application class (related to invenip-app fix ).
  • invenio-celery: v1.1.0
    • Fixed missing release on PyPI.
  • invenio-config: v1.0.2
    • Added ALLOWED_HTML_TAGS and ALLOWED_HTML_ATTRS default configuration for bleach HTML sanitisation library (values are used by Invenio-Records-REST, Invenio-Formatter and Invenio-Previewer).
  • invenio-db: v1.0.4
    • Added PostgreSQL v10 into the test matrix to ensure future compatibility.
  • invenio-formatter: v1.0.2
    • Added a new Jinja filter sanitize_html that uses the bleach library to sanitise data and to be used in combination with the safe template filter to prevent Cross-Side Scripting (XSS) vulnerabilities.
  • invenio-indexer: v1.1.0
    • Added Elasticsearch 7 support.
    • Added before_record_index.dynamic_connect() signal utility for connecting index receivers directly to specific indexes.
    • Fixed Elasticsearch index prefixing support.
  • invenio-logging: v1.2.0
    • Changed Sentry integration to use the sentry-python module instead of raven library. Raven library is still supported for backward compatibility.
  • invenio-oaiserver: v1.1.1
  • invenio-oauthclient: v1.1.3
    • Fixed deprecation warnings from Flask-OAuthlib
    • Fixed issue with the ?next parameter not supporting a query string.
  • invenio-records: v1.3.0
    • Removed the CLI (deprecated since v1.1.0)
  • invenio-records-rest: v1.5.0
    • Added Elasticsearch 7 support
    • Added CSV serialiser (for allowing CSV exports)
    • Added Marshmallow v3 support
    • Added “from” and “aggs” query parameters for better supporting client-side infinite scroll use cases.
    • Changed SanitizedHTML marshmallow field to use central configuration from Invenio-Config.
    • Fixed a deprecation warning.
    • Fixed Elasticsearch index prefixing support.
    • Fixed bug with browsers not respecting the content type when caching the REST API responses. (PENDING merge)
  • invenio-rest: v1.1.1
    • Added compatibility layer for marshmallow v2 and v3
  • invenio-search: v1.2.1
    • Added Elasticsearch v7 support
    • Fixed bug with Elasticsearch index prefixing support.
    • Added index suffixing and write aliases.
    • Deprecated Elasticsearch v5 support.
    • Changed default library used for making request to Elasticsearch from requests to urllib3 (default recommended library).
  • invenio-theme: v1.1.4
    • Added an error handler for HTTP 429 (rate limiting error).
  • xrootdpyfs v0.1.6
    • Fixed bug preventing previewing large ZIP files (2GB+).

Files Bundle

The Files Bundle also saw releases of the following modules:

  • invenio-files-rest: v1.0.1
  • invenio-records-files: v1.1.1
  • invenio-previewer: v1.0.1
  • invenio-iiif: v1.0.1
  • pytest-invenio v1.2.0

We don’t recommend upgrading to these versions until Invenio v3.2 have been released. In particular, we have made breaking changes to Invenio-Records-Files from the v1.0.0a11 to v1.0.0, that are likely to impact you if you depended on the unsupported alpha releases.

Stay tuned for the Invenio 3.2 release!

Invenio security releases - XSS and Host header injection

Lars Holm Nielsen Jul 15, 2019 Invenio

Two vulnerabilities have been identified in supported Invenio modules.

  • Invenio-Records (security advisory): A Cross-Site Scripting (XSS) vulnerability has been identified in Invenio-Records in the administration interface.
  • Invenio-App (security advisory): A Host header injection vulnerability has been identified in Invenio-App.

In addition, two XSS vulnerabilities have been discovered in unsupported Invenio modules:

  • Invenio-Previewer (security advisory): An XSS vulnerability affecting the JSON, Markdown and iPython Notebook previewers.
  • Invenio-Communities (security advisory): An XSS vulnerability affecting the Jinja templates.

The vulnerabilities were found after an XSS vulnerability was reported to Zenodo by Ciro Santilli. As a standard measure and after patching Zenodo, we reviewed the Invenio source code for potential similar issues to those identified in the Zenodo source code. This led to the discovery of three additional XSS vulnerabilities. The host header injection vulnerability was discovered after a standard vulnerability scan of another service running at CERN.

Releases

We have issued two new Invenio releases fixing these issues:

  • Invenio v3.0.2 and v3.1.1

The following individual modules fixing the vulnerabilities have been released:

  • Invenio-Records v1.0.2, v1.1.1 and v1.2.2
  • Invenio-App v1.0.6 and v1.1.1
  • Invenio-Previewer v1.0.0a12 (unsupported)
  • Invenio-Communities v1.0.0a20 (unsupported)

New security policy

We have taken the chance during handling of these vulnerabilities to also clearly define and document Invenio's security policy. Please have a look and let us know what you think.

Previously, we have sometimes privately notified potentially affected services about a security vulnerability. We have however decided to discontinue this practice, and instead, send out an advance notification to everyone about an upcoming security release including only the severity level of the issue. This allows everyone to plan ahead for the upcoming release and ensure they have staff available to handle the release. This partially smoothens the communication process but also ensures that everyone receives the same information in a scalable approach.

GitHub security advisors

As a new thing, we have also evaluated the new GitHub maintainer security advisories to handle the vulnerabilities. These advisories are reviewed by GitHub and should allow a security alert to be sent to affected repositories.

For more information

If you have any questions or comments about this security release:

Keeping up with Elasticsearch

Alex Ioannidis & Lars Holm Nielsen May 24, 2019 Invenio

In our latest Invenio sprint from May 13-24 we've been working on making Elasticsearch upgrades, migration and reindexing simpler and with near zero downtime. Read on, to learn more about how you keep up with Elasticsearch rapid release and deprecation cycles with Invenio and for a preview of what will come of new features in Invenio v3.2.

End of life

Once you start running a production Invenio instance, you'll quickly notice that your Elasticsearch cluster will become outdated pretty fast, due to Elasticsearch's rapid release and deprecation cycle. Invenio v3 development was started in 2016 against the newly released Elasticsearch v2. When Invenio v3 was finally released in June 2018, Elasticsearch has had two major releases (v5 and v6) and v2 had already reached end of life. As can be seen from Elastic Product End of Life Dates, you can expect a major Elasticsearch version to be supported for a maximum of 2-3 years. Thus, you should plan to regularly upgrade your Elasticsearch cluster in order to not run on unmaintained software.

Index upgrades with downtime

Elasticsearch allows an index created in v5 to be read in v6 which combined with rolling cluster upgrades allows you to upgrade a cluster relatively easy with no downtime. However, you are out of luck as soon as you want to upgrade again from v6 to v7 since v7 cannot read an index created in v5.

The solution is for every major Elasticsearch upgrade to create a new index and reindex content from your old index. This requires you to take offline your Invenio instance to avoid that new content is indexed during the reindexing process. For large-scale production systems with many millions of records, reindexing can take a long time, and thus might require several hours of downtime, which can quickly be a big issue these systems.

Fixing Elasticsearch mappings

A related problem you may hit when running an Invenio instance is that you would like to tweak the Elasticsearch mappings in order to improve search performance and precision. In Elasticsearch you cannot modify the mapping of an existing index, thus similar to above, you need to create a new index and reindex content from the old index.

Use cases

All in all, you'll find yourself in need of tools to manage indexes and performing reindexing in order to:

  • Upgrade an Elasticsearch cluster between major versions.
  • Migrate between two Elasticsearch clusters.
  • Change Elasticsearch mappings of existing indexes.

Near-zero downtime upgrade, migration and reindexing

To solve the above use cases, we've built a new Invenio module named Invenio-Index-Migrator. The module allows you to synchronize two Elasticsearch indexes (in-cluster or cross-cluster) over an extended time period, and when you are ready, you can roll over the new index. Depending on how you have deployed Invenio, the rollover can happen with either zero downtime or near zero downtime (<5 minutes compared to hours)

The new module supports migrating records but also any other index you may have in Elasticsearch such as event logging from usage statistics. The new module works by defining migration recipes and managing them via a CLI

Demo

Disclaimer: APIs etc may change slightly before the final release.

First, you initialize a migration recipe (here a recipe named records) which will create new indexes to hold the new data:

$ invenio index migration init "records"

Next, you kick-off the index migration:

$ invenio index migration run "records"

Index migration happens in two stages:

  • Bulk migration: First a snapshot of the current index is reindexing into the new index. Invenio-Index-Migrator is flexible an supports several methods. By default, we use the Elasticsearch Reindex API which in most cases are orders of magnitudes faster than any other method. Other methods allow you to use Invenio's default record indexing or simply implement your own.
  • Synchronization: Once the first snapshot has been fully indexed, a second job is started to keep the two indexes in sync. Essentially this works by indexing any modifications to the old index in the new one.

At any point during the migration, you can check the status:

$ invenio index migration status "records"

Once you're ready, you can roll over the new index:

$ invenio index migration rollover "records"

In an in-cluster migration, the rollover works by shuffling around index aliases (since Elasticsearch does not support renaming indexes). In a cross-cluster migration you update the Invenio configuration to point to your new Elasticsearch cluster.

In case something goes wrong during the index migration, you can of course also cancel the job:

$ invenio index migration cancel "records"

Bonus

During the sprint a number of other issues were addressed that will also be part of Invenio v3.2:

  • Elasticsearch v7 support was added to Invenio-Search, Invenio-Indexer and Invenio-Records-REST. Only Invenio-OAIServer needs to be upgraded to support Elasticsearch 7.
  • Search index prefixing allowing multiple Invenio instances to use the same Elasticsearch cluster. Elasticsearch by default does not support the concept of virtual hosts. With search indexing prefixing, it will be possible to use a single Elasticsearch cluster for multiple Invenio instances, given that the Invenio instances either trust each other, or you have a protection-layer like Elastic X-Pack or ReadonlyREST in front of Elasticsearch.

Next steps

The Invenio-Index-Migrator is only the first building blocks in order to make Elasticsearch upgrades, migration and reindexing simple. We hope to add assistant-like features that will make it even easier to keep with the latest Elasticsearch releases.

The Invenio-Index-Migrator will be finally released with Invenio v3.2 which is planned for July-September.

Invenio User Group Workshop 2019, June 10th

Lars Holm Nielsen May 3, 2019 Invenio

We would like to announce the 5th Invenio User Group Workshop, IUGW2019, to be held as part of Open Repositories 2019 (OR2019) on Monday, June 10th from 09:00-17:00 at the University of Hamburg (main building).

Registration (mandatory)

  1. First register to Open Repositories 2019 conference.

  2. Next, register to the Invenio User Group Workshop 2019.

Call for proposals

Invenio User Group Workshop (IUGW) is a biennial workshop where the Invenio repository community meet among current and future users and developers from around the world. The workshop consists of a series of tour de table service presentations and talks from attendees related to the Invenio digital repository framework.

Submit abstract

Deadline: May 22nd at 23:59 CEST

Acceptance: Notification on May 27th.

The Invenio User Group Workshop 2019 will address a wide range of topics related to the overall theme for Open Repositories 2019:

All the user needs

1. Understanding user needs and user experience

  • User research and engagement
  • User experience design for repository services
  • Better user experience through data and workflow integration
  • Improving repository user interfaces

2. Discovery, use and impact

  • Increasing content visibility in search engines and discovery systems
  • Open access discovery, research data discovery
  • Tools for researchers, interfaces for machines
  • The role of aggregation services
  • Measuring use and impact

3. Repositories – evolution or revolution?

  • Beyond the repository: using repository platforms for purposes not originally intended
  • Convergence with other types of systems (e.g. current research information systems, digital asset management systems, journal publishing platforms, library service platforms)
  • Interoperability vs integration: will repositories survive as stand-alone systems?
  • The developing role of repositories in the scholarly communications and research information systems ecosystem (e.g. the Next Generation Repositories vision)
  • New models for scholarly sharing (e.g. blockchain)
  • Data mining, artificial intelligence and machine learning

4. Supporting open scholarship and cultural heritage

  • Providing access to different types of materials (e.g. research data, scholarly articles, pre prints and overlay journals, open access monographs, theses and dissertations, educational resources, archival and cultural heritage materials, audiovisual materials, software, interactive publications and emerging formats)
  • Workflows and support services for the repository users
  • Training, communication and outreach
  • Long-term access and preservation
  • Repositories as digital humanities and open science platforms
  • Working with large and complex data sets

5. Open and sustainable

  • Service and business models that meet user needs
  • Local systems vs repository as a service
  • The expanding role of service providers in the repository landscape, pros and cons?
  • Sustainability of the open source community model
  • Securing long-term funding for open infrastructures
  • Open business models and open governance for open infrastructures

6. Policies, licensing and the law

  • Impact of GDPR (General Data Protection Regulation) and copyright laws
  • Publisher policies, embargoes and rights retention
  • Licenses, use and re-use of content
  • ‘Closed’ material in ‘open’ repositories
  • Compliance and impact of funder policies (e.g. Plan S) on repositories

7. How can metadata and standards help our users?

  • Development and standardisation of repository metadata
  • Data models and entities
  • Linked open data and repositories
  • Persistent identifiers (e.g. DOI, Handle, URN, ORCID, ISNI)
  • Open citations
  • International Image Interoperability Framework (IIIF)

8. Repositories and global knowledge

  • Integration with other open knowledge resources (e.g. Wikimedia and Wikidata)
  • National vs global solutions
  • Repository systems and language barriers
  • Repositories in the global south
  • User needs in developing countries

InvenioRDM: a turn-key open source research data management platform

Lars Holm Nielsen Apr 29, 2019 Invenio

CERN has partnered with 10 multidisciplinary institutions and companies to build a turn-key open source research data management platform called InvenioRDM, and grow a diverse community to sustain the platform.

The InvenioRDM project is funded by the CERN Knowledge Transfer Fund, as well as all the participating partners, including:

The project has an ambitious one year schedule in which it will deliver:

  • InvenioRDM - A research data management platform based on Zenodo and Invenio v3 Framework.
  • A community of public and private institutions to sustain InvenioRDM.
  • Minimum two existing repositories migrated to InvenioRDM, with Zenodo being one of them.

The key to successfully achieving the ambitious schedule is that InvenioRDM will be based on Zenodo that have already been successfully validated over the past 5 years.

Our vision in the next five-years, is to make InvenioRDM a world-leading extensible research data management platform used by research institutions all around the world and with businesses providing services, support and customizations on top of InvenioRDM.

What is a RDM platform?

An research data management (RDM) platform allows researchers to share and preserve scientific results. Researchers can share anything from publications, posters, presentations to datasets and software. Once a researcher have shared a result, they get a DOI (Digital Object Identifier), that allow them to properly cite their result.

There are three primary purposes of RDM is to

  • Disseminate and archive
  • Enable reproducibility
  • Enable reuse

Most importantly, research funding agencies all over the world have realised the huge potential economic and social benefits of RDM to society and are now demanding solutions.

Zenodo

CERN in partnership with OpenAIRE has built one such RDM service with European Union funding called Zenodo. Zenodo has been highly successful and has in its 5 years of existence become a world-leading general purpose research data repositories.

In fact, other institutions have already taken Zenodo source code (which is open source), and started building their own local RDM solutions on it. The goal of this project is to join our efforts to build a common RDM-platform from which both CERN, other institutions and private businesses can profit.

Multidisciplinary partners

The real strength of the project is that it brings together a suite of partners from multidisciplinary domains, that each bring unique knowledge and know-how from their specific domains that will be critical to the success of project:

  • Institutional partners:
    • University of Hamburg
    • University of Münster
    • Caltech Library
  • Health and medical science partners:
    • Northwestern University
  • Physics partners:
    • Helmholz Zentrum Dreseden Rossendorf
    • Brookhaven National Laboratory
    • TUBITAK - The Scientific and Technological Research Council of Turkey
  • Digital humanities partners:
    • Data Futures
  • Business partners:
    • TIND Technologies
  • Community partners:
    • OpenAIRE

Building InvenioRDM

InvenioRDM will include most of the features that Zenodo alreday include today such as e.g. DOI miniting capabilities, versioning support and COUNTER compliant usage statistics to name some few.

The work in transforming Zenodo into a general purpose RDM-platform will involve three key areas:

Core repository

The repository platform will at its core include an extensible metadata model based DataCite metadata schema with support for handling millions of records and peta bytes of data. The repository will further be aligned with the Next Generation Repositories (NGR) standard.

Packaging and distribution

Key for easy adoption of InvenioRDM is to ensure that it is a real turn-key solution requiring minimal experience in installing, operating and administering the platform, or in short getting users started in no time. Thus, a significant part of the efforts will go into simplifing the installation, improving the packaging and distribution, as well as providing excellent end-user documentation.

Customization and extendability

A key requirement for InvenioRDM is that it can easily be extended and customized just enough to adapt to each particular institution. This includes for instance defining authentication mechanisms (SAML/LDAP/OAuth), integrating with mutliple storage backend system, and most important of all to make these customizations easy.

Contact

For more information about the InvenioRDM project, please contact:

Lars Holm Nielsen

Invenio Product Manager

CERN IT Department

Email: info@inveniosoftware.org

Invenio Training Bootcamp 2019

Nicola Tarocco Mar 25, 2019 Invenio

The Invenio v3 Bootcamp was held from the 19th to the 21st of March, 2019 at CERN. The event reached the maximum number of 30 participants with attendees from all over the world (Czech Republic, Danmark, Germany, Finland, France, Japan, Norway, Spain, Switzerland, UK and USA). The bootcamp was targeting developers wanting to learn more about Invenio and to understand how to create or customize an Invenio repository.

Format

The three days of the bootcamp was organised with a succession of talks and practical hands-on sessions, presented by 6 Invenio experts. Participants were able to develop each of the functionality on their laptop with a constant assistant in case of need.

The objective was to discover the Invenio framework and acquire knowledge on how to build a new repository by progressively introducing new concepts.

The main topics included:

  • Getting started with Invenio
  • Tour of functionalities and infrastructure
  • Customizations of data models and look & feel
  • Deposit of new records and references between data models
  • Access control
  • Security
  • Deployment and application architecture

Try it yourself

We've made all the material publicly available for all those who couldn't participate in the event, so you can try out the same exercises at home:

If you run into troubles, we're happy to answer questions you may have.

Takeaways

Participants had the opportunity to experience how to work with Invenio thanks to the hands-on sessions and to understand how to apply each of the introduced functionalities to use cases of their organization.

The bootcamp attracted quite a lot of interest for Invenio and participants were very involved and curious: we were very glad to answer to all of the interesting questions that were asked.

We would like to thank everyone for their participation and help to have contributed​ to the success of the event.

Invenio User Group Workshop 2019 @ Open Repositories 2019, June 10th

Our next event is the Invenio User Group Workshop (IUGW) at the Open Repositories 2019 conference in Hamburg. The workshop will be held on June 10th with presentations from Invenio users from around the world.

The call for proposals will be announced early April, so stay tuned.

See you in Hamburg!

Invenio v3.1.0 released

Lars Holm Nielsen Mar 11, 2019 Invenio

We are proud to announce the release of Invenio v3.1.0.

Head over to our Getting started to see it in action.

Python compatibility

Invenio v3.1 supports Python 2.7 (until 2019-12-31), Python 3.5 and Python 3.6. We expect to add support for Python 3.7 in the near-term future once Celery v4.3 has been released.

What's new in Invenio v3.1?

Webpack build system

Invenio v3.1 comes with a new assets build system based on Webpack for building and packaging your JavaScript applications, stylesheets and much more. The system replaces the previous AMD/RequireJS based system which was deprecated in v3.0.

The old build system is still available to allow users to upgrade to Invenio v3.1 without first migrating to Webpack. The old build system will be removed in Invenio v3.3

For more information about the new build system, please see:

Simplified scaffolding

We have simplified the scaffolding of new Invenio instance by merging the data model template into the main Cookiecutter-Invenio-Instance.

The previous approach of two separate packages -- one for the application and one for the data model -- caused friction and confusion for new users and we, therefore, decided to merge both.

This also fits with our long-term goal, where we want to provide standard data models (such as DataCite, Dublin Core, MARC21) so that users don't have write their own data model.

Docker base image

We have released a new Docker image that can serve as base image for your Invenio instances. The image is based on CentOS 7 and comes with Python 3.6, Node.JS, NPM and some standard libraries often needed by Invenio.

See inveniosoftware/centos7-python on DockerHub.

Pipenv

In order to manage Python dependencies more reliable and securely for your Invenio instance we have moved to use Pipenv which also handles the virtualenv creation. This has all been integrated with the Getting started guide.

Documentation

New sections where added to the documentation specifically on:

  • Bundles
  • Requirements
  • Build a repository
  • Managing access
  • Secure your instance
  • Infrastructure architecture

See https://invenio.readthedocs.io.

Request tracing

Invenio v3.1 has added new features for improved request tracing to allow for better troubleshooting and auditing of problems. The feature allows logging a request id, session id and user id across multiple services such as Nginx and Invenio error logs. This enables e.g. system administrators to identify exactly which Nginx access log line caused a specific error logged by Invenio.

If combined with e.g. centralised log aggregation, this can be used for e.g. viewing requests by a user in real-time, request performance statistics and many other metrics. Please note that in order to be compliant with EU General Data Protection Regulation (GDPR), you must ensure that these logs are automatically deleted after 3 months (the same is the case if you only log an IP address).

  • Cookiecutter-Invenio-Instance:
    • Nginx configuration has been updated to automatically generate a random request id and add is as X-Request-ID header.
    • Nginx log format has been updated to log timing information, request id, session id and user id if provided by the application server in the X-Session-ID and X-User-ID HTTP headers. Nginx will remove both headers prior to sending the response to the client.
  • Invenio-App:
    • Extracts the X-Request-ID header (max 200 chars) if set in the HTTP request and makes it available on the Flask g object as g.request_id.
  • Invenio-Logging:
    • The request id is made available to all log handlers.
    • The Sentry log handler will add the request ID as a tag if available.
  • Invenio-Accounts
    • The X-Session-ID and X-User-ID HTTP headers will be added to the HTTP repsponse if the configuration variable ACCOUNTS_USERINFO_HEADERS is set to True. This makes the session and user id available to upstream servers like Nginx.

Minor changes in v3.1

Token expiration

The token expiration was changed from 5 days to 30 minutes for the password reset token and email confirmation token. Using the tokens will as a side-effect login in the user, which means that if the link is leaked (e.g. forwarded by the users themselves), then another person can use the link to access the account. Flask-Security v3.1.0 addresses this issue, but has not yet been released.

Globus.org OAuth Login

Invenio v3.1 now comes with support for login with your Globus.org account. The feature was contributed by University of Chicago.

See Invenio-OAuthClient for details.

Health-check view

A /ping view that can be enabled via the APP_HEALTH_BLUEPRINT_ENABLED configuration variable has been added to support load balancers like HAProxy to check if the application server is responsive.

Backwards incompatible changes

  • Pytest-Invenio: The celery_config fixture has been renamed to celery_config_ext due to naming conflict with fixture provided by Celery.

Deprecations in v3.1

Following list of features have been deprecated and will be removed in either Invenio v3.2 or Invenio v3.3:

Elasticsearch v2 support

Elasticsearch v2 support will be removed in Invenio v3.2. Elasticsearch v2 has reached end of life and no longer receives any bug or security fixes.

Both the support in Invenio-Search for creating indexes for v2 as well as any v2 Elasticsearch mappings in other Invenio modules will be removed.

AMD/RequireJS

Invenio's assets build system based on AMD/RequireJS will be removed in Invenio v3.3.

This involves e.g. the two CLI commands:

$ invenio npm
$ invenio assets build

Several Python modules in Invenio-Assets will be removed, including (but not limited to):

  • invenio_assets.npm
  • invenio_assets.filters
  • invenio_assets.glob
  • invenio_assets.proxies

Also, bundle definitions in other Invenio modules will be removed. These are usually located in bundles.py files, e.g.:

  • invenio_theme.bundles

Also, some static files will be removed from bundles, e.g.:

  • invenio_theme/static/js/*
  • invenio_theme/static/scss/*

DynamicPermission class

The invenio_access.DynamicPermission class will be removed in Invenio v3.2. It has been superseded by the invenio_access.Permission class. The Permission class by default deny an action in case no user/role is assigned. The DynamicPermission instead allowed an action if no user/role was assigned.

Records CLI

The following CLI commands will be removed in Invenio v3.2:

$ invenio records create
$ invenio records delete
$ invenio records patch

Please use the REST API instead to create, patch and delete records.

AngularJS (reminder from v3.0)

In Invenio v3.0 we deprecated the AngularJS 1.4 application Invenio-Search-JS as AngularJS by that time was already outdated. We have selected React and SemanticUI as the replacement framework for AngularJS.

The new Webpack build system released in Invenio v3.1 is part of the strategy to move from AngularJS to React (note however that you can use Webpack with your favorite framework, including AngularJS).

We have started the rewrite of Invenio-Search-JS and have already released the first version of React-SearckKit which eventually will replace Invenio-Search-JS.

Features removed in v3.1

These following already deprecated features have been removed in Invenio v3.1.

  • invenio_records.tasks was removed from the Invenio-Records module.

Maintenance policy

Invenio v3.1 will be supported with bug and security fixes until the release of Invenio v3.3 and minimum until 2020-03-31.

What's next?

We originally planned to release the Files and Statistics bundle in Invenio v3.1. We however decided it was more urgent to release the new Webpack build system in order to avoid too much code being written against the old build system.

In Invenio v3.2 we are planning to release the Files bundle including:

  • invenio-files-rest
    • Object storage REST API for Invenio with many supported backend storage protocols and file integrity checking.
  • invenio-iiif
    • International Image Interoperability Framework (IIIF) server for making thumbnails and zooming images.
  • invenio-previewer
    • Previewer for Markdown, JSON/XML, CSV, PDF, JPEG, PNG, TIFF, GIF and ZIP files.
  • invenio-records-files
    • Integration layer between object storage and records.

Invenio v3 Training Bootcamp

Lars Holm Nielsen Dec 21, 2018 Invenio

We are pleased to announce the first Invenio v3 Bootcamp taking place at CERN, 19-21 March 2019. The Bootcamp is intended as an introduction to developing digital repositories with the Invenio v3 framework.

Website

https://indico.cern.ch/e/invenio-bootcamp/

Planned topics

  • Creating your first Invenio instance.
  • Customizing the look and feel.
  • Working with data models.
  • Managing access to records.
  • Managing files.
  • Creating a new module from scratch.
  • Depositing records.
  • Securing your Invenio instance.
  • Deploying Invenio.

The topics will be covered through practical tutorials and presentations. Note, that the final programme is subject to change, based on inputs we receive from you in the registration form.

Who can register

  • All registrations are subject to our approval (i.e. don't book flights until we confirm your registration).
  • Intended audience:
    • Software developers with some prior Python experience.
  • Limited capacity:
    • We have limited capacity so we prioritise to have as many different institutions/companies represented as well as people with concrete projects for which they plan to use Invenio v3.

How to register

https://indico.cern.ch/event/773969/registrations/46902/

Cost

The Bootcamp itself is free of charge, but you will need to cover your own expenses during the Bootcamp including (but not limited) to lunches, dinners, coffee, accommodation, transport and social events.

Location

CERN, Geneva, Switzerland.

Invenio v3.0.0 Released

Lars Holm Nielsen Jun 7, 2018 Invenio

Welcome to Invenio 3!

We are proud to announce the release of Invenio v3.0.0. Invenio has been completely rewritten from scratch with a radically improved architecture and technical implementation. Invenio 3 is now a framework, like a Swiss Army knife, complete with battle-tested, safe and secure modules providing all the features you need to build and run a trusted digital repository.

Whilst Invenio 3 is officially released to the world today, in reality it has already been relied upon in large-scale production systems for more than 1.5 years on sites such as:

Also other sites are already in process of being built on Invenio 3:

  • INSPIRE HEP - an aggregator for High-Energy Physics.
  • WEKO3 - repository platform for 500+ Japanese universities.

What's new

Invenio functionality is being released in bundles of modules. Invenio v3 contains the following bundles totaling more than 27 individual Invenio modules:

  • Base: the core application framework with e.g. distributed task queue support.
  • Auth: accounts management, role-based access control, OAuth 2.0 client and provider, user profiles management.
  • Metadata: record and persistent identifier management including indexing, querying and OAI-PMH server.

The following bundles are being prepared for release in v3.1:

See our roadmap for further details.

Getting started

In order to get started developing with Invenio v3 follow our getting started guide.

Next, head over https://invenio.readthedocs.io to understand how to develop with Invenio.

In addition, each Invenio module also has extensive documentation:

Base bundle

Auth bundle

Metadata bundle

Compatibilities

Python compatibility

Invenio v3.0 supports Python 2.7, 3.5, 3.6. We highly recommend only using the latest official release in each series.

Python 2.7 end-of-life is scheduled for April 2020. Invenio will only support Python 2.7 until that date. We highly recommend that all new projects are started on the latest available Python 3 version.

Elasticsearch compatibility

Invenio v3.0 supports Elasticsearch 2, 5 and 6.

Elasticsearch v2 has reached end-of-life (February 2018) and Invenio v3.0 is the last release to support Elasticsearch v2.

PostgreSQL compatibility

Invenio v3.0 supports PostgreSQL 9.4, 9.5 and 9.6. We have not yet tested Invenio v3.0 with PostgreSQL 10.

MySQL compatibility

Invenio v3.0 supports MySQL 5.6+.

Deprecations

AMD/RequireJS

Invenio v3.0's current static assets management system is based on e.g. RequireJS will be replaced with Webpack. We expect this work to be ready for Invenio v3.1, and thus we are already deprecating the current support. Specifically this means that Invenio-Assets and Invenio-Theme will change significantly in Invenio v3.1. We would have liked to already have this ready for this v3.0 release, but unfortunately it was time-wise not possible.

AngularJS

Invenio v3.0 comes with one AngularJS 1.4 application (Invenio-Search-JS). AngularJS is by now already outdated, and we are planning a rewrite of the application in another JavaScript framework that is currently in process of being selected. Essentially this means that you should not extend Invenio-Search-JS at this point, since it will change significantly.

Maintenance policy

Invenio v3.0 will be supported with bug and security fixes until the release of Invenio v3.2 and minimum one year.

We aim at one Invenio release with new features every 6 months. We expect upgrades between minor versions (e.g. v3.1 to v3.2) to be fairly straight-forward as in most cases only new features are added.