Invenio Blog

Follow news and updates on Invenio world

Sprint: Elasticsearch v5 support

Lars Holm Nielsen Dec 1, 2017 Invenio Framework

This sprint was focused on:

  • Elasticsearch v5 support
  • Preparing to release the metadata bundle.
  • Removing remaining Invenio-DB warnings.

During the sprint 60 developer days were spent, 52 commits were created and 4.3k lines were touched (3.1k additions and 1.2k deletions).

List of changes:

  • Cookiecutter-Invenio-Module:
    • Minor various fixes for build issues.
  • IDUtils (v1.0.0):
    • global: fix DOI Unicode issues
  • Invenio-Base (v1.0.0b1)
    • Cookiecutter template removal (to be replaced by upcoming cookiecutter-invenio-instance).
    • Release checklist, docs build errors, 100% test coverage.
  • Invenio-DB (v1.0.0b9)
    • Remove annoying warning.
  • Invenio-I18N (v1.0.0b4)
    • Example app rendering on ReadTheDocs
  • Invenio-Indexer (v1.0.0b1):
    • New BulkRecordIndexer class with RecordIndexer-compatible API (to be used by Invenio-Records-REST).
    • Elasticsearch v5 support.
    • Release checklist.
  • Invenio-JSONSchemas (v1.0.0a7)
    • Documentation improvements.
    • Release checklist.
  • Invenio-Mail (v1.0.0b1)
    • Broken docs build fix (related to Celery problem).
  • Invenio-OAIServer (v1.0.0b1)
    • Description support in Identify verb (eprints, friends etc).
    • Elasticsearch v5 support.
    • Release checklist + documentation improvements.
    • License change PR
  • Invenio-OAuth2Server (v1.0.0b3)
    • Unpinned oauthlib.
  • Invenio-PIDStore (v1.0.0b2)
    • Release checklist.
  • Invenio-Records-REST (v1.0.0b5)
    • Index after create, update and delete record creation (this will later impact Invenio-Deposit).
    • Serializers refactored to be more easily composable.
    • Improved tombstone handling for the REST API (e.g. include removal reason).
    • Elasticsearch v5 support.
    • Improved documentation.
    • Dynamic aggregations.
    • Release checklist.
  • Invenio-Records-UI (v1.0.0b2)
    • Release checklist + example app docs fix for RTD.
  • Invenio-Search (v1.0.0b4)
    • Elasticsearch v5 support (via version-specific mappings).
    • Support for creating only specific indexes.
    • CLI for listing all indexes and aliases.
  • Invenio-Search-UI (v1.0.0a9)
    • Release checklist.
  • Invenio-MARC21 (v1.0.0a6)
    • Elasticsearch v5 support.

Elasticsearch 5 support

Search, Indexer, Records-REST, OAIServer and MARC21 have all been upgraded to support Elasticsearch v2 and v5. Elasticsearch v6 is not yet support due to the elasticsearch-dsl package not yet supporting v6 (support already merged in master branch but not yet released).

Other Invenio packages have not yet been upgraded to Elasticsearch v5. E.g. Records-Files, OpenAIRE, Stats, Collections, OpenDefinition, Query-Parser have not yet been tested with Elasticsearch v5.

Choosing version

You will need to know at install-time which version of Elasticsearch you'd like to use. For instance to use Elasticsearch 5 you need to install Invenio-Search like this:

$ pip install invenio-search[elasticsearch5]

Mappings

Due to the differences between Elasticsearch versions, we have opted for version-specific mappings. This means that Invenio modules must provide a mapping per Elasticsearch version they wish to support. E.g. today, the mapping is placed in a directory like e.g.:

mappings/records/record-v1.0.0.json

To support Elasticsearch v2 and v5 you now need two mappings:

mappings/v2/records/record-v1.0.0.json
mappings/v5/records/record-v1.0.0.json

Note that mappings for Elasticsearch v2 may use either the mappings/v2 directory or the mappings/ directory like previously (for backward compatibility).

Adding Elasticsearch v5 support to a module

In case you have site-specific modules and would like to add Elasticsearch v5 support here's a rough guide:

  • Update travis.yml to test both v2 and v5 (example)
  • Update setup.py by moving Invenio-Search dependency to extra_require (example).
  • Move existing mappings and add new mappings for v5 (example)
    • Most common change from v2 to v5 is the change from string type to either text or keyword type.
  • Update docs/requirements.txt by adding elasticsearch5 as an extra requirement to ensure ReadTheDocs builds will be fine.
  • Fix any API specific calls (see below).

Canonical way of checking for ES version.

from elasticsearch import VERSION as ES_VERSION

if ES_VERSION[0] == 2:
    # ...

Completion Suggesters

The Completion Suggesters have changed from v2 to v5. In v2, suggesters supported an index-time payloads option, which was used to store and return metadata with suggestions. In v5, completions are now returned with their associated document in the _source field.

If you have completion suggesters for v2 you will need to make them compatible with v5. This involves:

  • API clients should read metadata from _source instead of payload. For v2 the payload is copied to _source by Records-REST, which allows you to already now upgrade API clients to use the new _source field.
  • On indexing, you need to add the payload only for v2.

Elasticsearch v2 end-of-life

Elasticsearch v2 reaches end of life in February 2018. Elasticsearch v2 support in Invenio will be removed in Invenio v3.1, thus Invenio v3.0 will be released with Elasticsearch v2 support and be maintained until v3.0 end of life (currently TBD).

Sprint: JSONB, bugs & bouncing search

Lars Holm Nielsen Oct 20, 2017 Invenio Framework

This sprint was focused on:

  • Metadata bundle: Finalize Invenio-Records, Invenio-Search, Invenio-JSONSchemas.
  • Fixing new bugs in Base/Auth bundles.

As a result of 40 developer days, 37 commits and 2.8k lines touched (2k additions and 0.8k deletions), the following improvements were implemented:

  • Fixed bouncing of search results (sorting of the same query could change depending on which Elasticsearch node your query would be answered by).
  • JSONB now being used for record storage (Thanks to Javier for the PR).
  • Rendering of JSONSchemas, meaning allOf and $refs can now be resolved on-the-fly to generate an self-contained schema for e.g. deposit forms (Thanks to Pamfilos for the PR).
  • Boring fixes that makes sure the Base and Auth bundles are stable.

List of changes:

  • Invenio-Records (v1.0.0b4):
    • Beta release (release checklist).
    • Changed data storage from JSON to JSONB (requires data migration and PostgreSQL v9.4+) Thanks to Javier for PR.
    • Changed signals receiver signatures (backward-incompatible!).
    • Fixed invalid MARC in demo records causing export errors.
    • Fixed CLI for deletion of records.
    • Fixed fractional seconds problem in MySQL causing tests to fail.
  • Invenio-Search (v1.0.0b1):
    • Beta release (release checklist, documentation)
    • Fix bouncing search results.
    • Bumped Travis PostgreSQL to v9.4 to support JSONB.
  • Invenio-JSONScheams (v1.0.0a7):
    • Fixed double registration of schema endpoint on both UI and API app.
    • Added support for resolving allOf and JSONRefs in JSONSchemas (thanks to Pamfilos for PR).
    • Fixed some documentation issues.
  • Invenio-Records-REST (v1.0.0b3):
    • Fix bouncing search results.
    • Bumped Travis PostgreSQL to v9.4 to support JSONB.
  • Invenio-OAuth2Server (v1.0.0b2):
    • Improved authorization template design (text alignment, cover page usage and display number of users).
    • Broken tests fix due to cryptography package changes.
  • Invenio-DB (v1.0.0b8):
    • Alembic documentation refactored and integrated in docs.
  • Invenio-Records-UI:
    • Bumped Travis PostgreSQL to v9.4 to support JSONB.
  • Invenio-OAIServer (v1.0.0a14):
    • Invenio-Records signals signature compatibility.
  • Invenio-App-ILS:
    • Test for ensuring all records export formats are working.

Sprint: Security and annoying warnings

Lars Holm Nielsen Aug 11, 2017 Invenio Framework

This sprint was focused on:

  • Data model issues (in Invenio-Access and Invenio-OAIServer).
  • Security issues (permanent sessions, "remember me", content security policy).
  • Working demo site (remove annoying warnings, fixed docs, SSL problems, bugs, admin interfaces, ...).

As a result of 89 developer days, 126 commits and 6.6k lines touched (4.7k additions and 1.8k deletions) auth bundle has been stabilized and released in beta version (Accounts, Access, Profiles, OAuthClient and OAuth2Server).

List of changes:

  • DoJSON (v1.3.2):
    • Remove 'Undo' is experimental warning
  • Flask-Menu (v0.6.0):
    • Python 3 warnings fixes
  • Invenio
    • Fixed login problem
  • Invenio-Access (v1.0.0b1):
    • Reviewed module and fixed data model issues.
    • Deprecated DynamicPermission in favor of Permission (aligning with Flask-Principal's deny by default behavior)
    • Added system roles with support for any user and authenticated user (could be extended to support IP-based access control). ActionUsers was previously used for similar feature by setting user_id to None but this is no longer possible.
    • Updated administration interface.
    • Added usage documentation (https://invenio-access.readthedocs.org).
    • Fixed superuser issues.
  • Invenio-Accounts (v1.0.0b8):
    • Fixed Content Security Policy issues
    • Removed remember me login support in favor of using permanent sessions (remember me support could be used to circumvent a revoked session).
    • Removed support for login via headers (enabled by Flask-Security by default).
    • Fixed Content Security Policy problems in templates.
    • Upgraded to Flask-Security v3 (thanks to @jacquire).
  • Invenio-Admin (v1.0.0b3):
    • Disabled Content Security Policy on admin interface.
  • Invenio-App (v1.0.0b1):
    • Adds Jinja byte code caching support.
    • iPython is now the default shell.
  • Invenio-App-ILS (v1.0.0a3):
    • Added initial Selenium integration tests.
    • Fixed email sending when in debug mode.
    • Bumped all packages to latest versions.
    • Fixed Celery 4 configuration warnings.
    • Clarified force HTTPS behaviour and adapted the user guide.
  • Invenio-Celery (v1.0.0b3):
    • Fixed Celery 4 configuration warnings.
  • Invenio-DB (v1.0.0b8):
    • Disabled SQL statement printing when in debug mode (has to be enabled manually now).
  • Invenio-I18N (v1.0.0b4):
    • Fixed Content Security Policy issues in templates.
  • Invenio-OAIServer (v1.0.0a13):
    • Fixed selective harvesting by timestamp caused by Marshmallow field parsing bug.
    • Removed updated timestamp _oai.updated from record in favor of using the record models updated date (fixes issue with selective datetime harvesting).
    • Add support for searching by spec in admin interface.
  • Invenio-OAuth2Server (v1.0.0b1):
    • Added feature to show scopes related to an authorized application.
    • Added client example application to enable easier testing.
    • Added new scope user:email which when granted will return the user's email address in the access token.
    • Updated "authortize this application" template.
    • Fixed security issue that allowed obtaining a session cookie via an access token and thus bypassing scope protection.
    • Fixed Content Security Policy issues in templates.
    • Fixed issue when strings where not strictly URL encoded (better error message).
    • Fixed template rendering issues when no scopes where given and with example URLs.
  • Invenio-OAuthClient (v1.0.0b2):
    • Added admin interface for UserIdentity.
    • Fixed Flask-WTF v0.14/v0.13 CSRF validation issues.
    • Reorganized documentation to new structure.
    • Removed support for remember me feature.
    • Remove "Linked accounts" menu item when no providers where defined.
    • Fixed Content Security Policy issues in templates.
    • Fixed issue with always redirecting to "Linked accounts" after a login.
  • Invenio-PIDStore (v1.0.0b2):
    • New release.
  • Invenio-Search-JS (v1.2.0):
    • Fixed strict URL encoding of query strings.
    • Fixed Content Security Policy issues in CSS.
  • Invenio-Search-UI (v1.0.0a7):
    • Fixed template issue.
  • Invenio-Theme (v1.0.0b4)
    • Fixed Content Security Policy issues in templates.
  • Invenio-Cache (v1.0.0b1):
    • New module which provided Redis/Memcahed caching support.

What's next?

Next Invenio Sprint will focus on:

  • Metadata bundle:
    • JSONSchemas, PIDStore, Records, Indexer, Records-UI, Search, OAIServer, Records-REST
  • General documentation
  • Framework launch (process, branches, maintenance plan, user experience)