REANA-Commons

https://img.shields.io/pypi/pyversions/reana-commons.svg https://img.shields.io/travis/reanahub/reana-commons.svg https://readthedocs.org/projects/reana-commons/badge/?version=latest https://img.shields.io/coveralls/reanahub/reana-commons.svg https://badges.gitter.im/Join%20Chat.svg https://img.shields.io/github/license/reanahub/reana-commons.svg https://img.shields.io/badge/code%20style-black-000000.svg

REANA-Commons is a component of the REANA reusable and reproducible research data analysis platform. It provides common utilities and schemas shared by the REANA cluster components.

Features:

  • common API clients for internal communication
  • centralised OpenAPI specifications for REANA components
  • AMQP connection management and communication
  • utility functions for cluster components

Usage

The detailed information on how to install and use REANA can be found in docs.reana.io.

Configuration

REANA Commons configuration.

reana_commons.config.CVMFS_REPOSITORIES = {'alice-ocdb.cern.ch': 'alice-ocdb', 'alice.cern.ch': 'alice', 'ams.cern.ch': 'ams', 'atlas-condb.cern.ch': 'atlas-condb', 'atlas-nightlies.cern.ch': 'atlas-nightlies', 'atlas.cern.ch': 'atlas', 'cms-ib.cern.ch': 'cms-ib', 'cms-opendata-conddb.cern.ch': 'cms-opendata-conddb', 'cms.cern.ch': 'cms', 'compass-condb.cern.ch': 'compass-condb', 'compass.cern.ch': 'compass', 'cvmfs-config.cern.ch': 'cvmfs-config', 'fcc.cern.ch': 'fcc', 'geant4.cern.ch': 'geant4', 'ilc.desy.de': 'ilc-desy', 'lhcb-condb.cern.ch': 'lhcb-condb', 'lhcb.cern.ch': 'lhcb', 'na61.cern.ch': 'na61', 'na62.cern.ch': 'na62', 'projects.cern.ch': 'projects', 'sft.cern.ch': 'sft', 'unpacked.cern.ch': 'unpacked'}

CVMFS repositories available for mounting.

reana_commons.config.INTERACTIVE_SESSION_TYPES = ['jupyter']

List of supported interactive systems.

reana_commons.config.K8S_CERN_EOS_AVAILABLE = None

Whether EOS is available in the current cluster or not.

This a configuration set by the system administrators through Helm values at cluster creation time.

reana_commons.config.K8S_CERN_EOS_MOUNT_CONFIGURATION = {'volume': {'hostPath': {'path': '/var/eos'}, 'name': 'eos'}, 'volumeMounts': {'mountPath': '/eos', 'mountPropagation': 'HostToContainer', 'name': 'eos'}}

Configuration to mount EOS in Kubernetes objects.

For more information see the official documentation at https://clouddocs.web.cern.ch/containers/tutorials/eos.html.

reana_commons.config.MQ_CONNECTION_STRING = 'amqp://test:1234@reana-message-broker.default.svc.cluster.local//'

Message queue (RabbitMQ) connection string.

reana_commons.config.MQ_DEFAULT_EXCHANGE = ''

Message queue (RabbitMQ) exchange.

reana_commons.config.MQ_DEFAULT_FORMAT = 'json'

Default serializing format (to consume/produce).

reana_commons.config.MQ_DEFAULT_QUEUES = {'jobs-status': {'durable': False, 'exchange': '', 'routing_key': 'jobs-status'}, 'workflow-submission': {'durable': True, 'exchange': '', 'routing_key': 'workflow-submission'}}

Default message queues.

reana_commons.config.MQ_HOST = 'reana-message-broker.default.svc.cluster.local'

Message queue (RabbitMQ) server host name.

reana_commons.config.MQ_PASS = '1234'

Message queue (RabbitMQ) password.

reana_commons.config.MQ_PORT = 5672

Message queue (RabbitMQ) service port.

reana_commons.config.MQ_PRODUCER_MAX_RETRIES = 3

Max retries to send a message.

reana_commons.config.MQ_USER = 'test'

Message queue (RabbitMQ) user name.

reana_commons.config.OPENAPI_SPECS = {'reana-job-controller': ('http://0.0.0.0:5000', 'reana_job_controller.json'), 'reana-server': ('http://0.0.0.0:80', 'reana_server.json'), 'reana-workflow-controller': ('http://reana-workflow-controller.default.svc.cluster.local:80', 'reana_workflow_controller.json')}

REANA Workflow Controller address.

class reana_commons.config.REANAConfig

REANA global configuration class.

classmethod load(kind)

REANA-UI configuration.

reana_commons.config.REANA_COMPONENT_NAMING_SCHEME = '{prefix}-{component_type}-{id}'

The naming scheme the components created by REANA should follow.

It is a Python format string which take as arguments: - prefix: the REANA_COMPONENT_PREFIX - component_type: one of REANA_COMPONENT_TYPES - id: unique identifier for the component, by default UUID4.

reana_commons.config.REANA_COMPONENT_PREFIX = 'reana'

REANA component naming prefix, i.e. my-prefix-job-controller.

Useful to find the correct fully qualified name of a infrastructure component and to correctly create new runtime pods.

reana_commons.config.REANA_COMPONENT_PREFIX_ENVIRONMENT = 'REANA'

Environment variable friendly REANA component prefix.

reana_commons.config.REANA_COMPONENT_TYPES = ['run-batch', 'run-session', 'run-job', 'secretsstore']

Type of REANA components.

Note: this list is used for validation of on demand created REANA components names, this is why it doesn’t contain REANA infrastructure components.

run-batch: An instance of reana-workflow-engine-_ run-session: An instance of an interactive session run-job: An instance of a workflow’s job secretsstore: An instance of a user secret store

reana_commons.config.REANA_CVMFS_PVC_TEMPLATE = {'metadata': {'name': ''}, 'spec': {'accessModes': ['ReadOnlyMany'], 'resources': {'requests': {'storage': '1G'}}, 'storageClassName': ''}}

CVMFS persistent volume claim template.

reana_commons.config.REANA_CVMFS_SC_TEMPLATE = {'metadata': {'name': ''}, 'parameters': {'repository': ''}, 'provisioner': 'cvmfs.csi.cern.ch'}

CVMFS storage claim template.

reana_commons.config.REANA_INFRASTRUCTURE_COMPONENTS = ['ui', 'server', 'workflow-controller', 'cache', 'message-broker', 'db']

REANA infrastructure pods.

reana_commons.config.REANA_INFRASTRUCTURE_COMPONENTS_HOSTNAMES = {'cache': 'reana-cache.default.svc.cluster.local', 'db': 'reana-db.default.svc.cluster.local', 'message-broker': 'reana-message-broker.default.svc.cluster.local', 'server': 'reana-server.default.svc.cluster.local', 'ui': 'reana-ui.default.svc.cluster.local', 'workflow-controller': 'reana-workflow-controller.default.svc.cluster.local'}

REANA infrastructure pods hostnames.

Uses the FQDN of the infrastructure components (which should be behind a Kubernetes service) following the Kubernetes DNS-Based Service Discovery

reana_commons.config.REANA_INFRASTRUCTURE_KUBERNETES_NAMESPACE = 'default'

Kubernetes namespace in which REANA infrastructure is currently deployed.

reana_commons.config.REANA_INFRASTRUCTURE_KUBERNETES_SERVICEACCOUNT_NAME = None

REANA infrastructure service account.

reana_commons.config.REANA_JOB_HOSTPATH_MOUNTS = []

List of dictionaries composed of name, hostPath and mountPath.

  • name: name of the mount.
  • hostPath: path in the Kubernetes cluster host nodes that will be mounted into job pods.
  • mountPath: path inside job pods where hostPath will get mounted. This is optional, by default the same path as the hostPath will be used

This configuration should be used only when one knows for sure that the specified locations exist in all the cluster nodes. For example, if all nodes in your cluster have a directory /usr/local/share/mydata, and you pass the following configuration:

All jobs will have /mydata mounted with the content of /usr/local/share/mydata from the Kubernetes cluster host node.

reana_commons.config.REANA_LOG_FORMAT = '%(asctime)s | %(name)s | %(threadName)s | %(levelname)s | %(message)s'

REANA components log format.

reana_commons.config.REANA_LOG_LEVEL = 20

Log verbosity level for REANA components.

reana_commons.config.REANA_MAX_CONCURRENT_BATCH_WORKFLOWS = 30

Upper limit on concurrent REANA batch workflows running in the cluster.

reana_commons.config.REANA_RUNTIME_KUBERNETES_NAMESPACE = 'default'

Kubernetes namespace in which REANA runtime pods should be running in.

By default runtime pods will run in the same namespace as the infrastructure pods.

reana_commons.config.REANA_RUNTIME_KUBERNETES_NODE_LABEL = {}

Kubernetes label (with format lable_name=lable_value) which identifies the nodes where the runtime pods should run.

If not set, the runtime pods run in any available node in the cluster.

reana_commons.config.REANA_RUNTIME_KUBERNETES_SERVICEACCOUNT_NAME = None

REANA runtime service account.

If no runtime namespace is deployed it will default to the infrastructure service account.

reana_commons.config.REANA_SHARED_PVC_NAME = 'reana-shared-persistent-volume'

Name of the shared CEPHFS PVC which will be used by all REANA jobs.

reana_commons.config.REANA_STORAGE_BACKEND = 'local'

Storage backend deployed in current REANA cluster [‘local’|’cephfs’].

reana_commons.config.REANA_USER_SECRET_MOUNT_PATH = '/etc/reana/secrets'

Default mount path for user secrets which is mounted for job pod & workflow engines.

reana_commons.config.REANA_WORKFLOW_UMASK = 2

Umask used for workflow worksapce.

reana_commons.config.SHARED_VOLUME_PATH = '/var/reana'

Default shared volume path.

reana_commons.config.WORKFLOW_RUNTIME_USER_GID = 0

Default group id for running job controller/workflow engine apps & jobs.

reana_commons.config.WORKFLOW_RUNTIME_USER_NAME = 'reana'

Default OS user name for running job controller.

reana_commons.config.WORKFLOW_RUNTIME_USER_UID = 1000

Default user id for running job controller/workflow engine apps & jobs.

API

REANA API client

REANA REST API base client.

class reana_commons.api_client.BaseAPIClient(service, http_client=None)

REANA API client code.

class reana_commons.api_client.JobControllerAPIClient(service, http_client=None)

REANA-Job-Controller http client class.

check_if_cached(job_spec, step, workflow_workspace)

Check if job result is in cache.

check_status(job_id)

Check status of a job.

get_logs(job_id)

Get logs of a job.

submit(workflow_uuid='', experiment='', image='', cmd='', prettified_cmd='', workflow_workspace='', job_name='', cvmfs_mounts='false', compute_backend=None, kerberos=False, kubernetes_uid=None, unpacked_img=False, voms_proxy=False)

Submit a job to RJC API.

Parameters:
  • job_name – Name of the job.
  • experiment – Experiment the job belongs to.
  • image – Identifier of the Docker image which will run the job.
  • cmd – String which represents the command to execute. It can be modified by the workflow engine i.e. prepending cd /some/dir/.
Prettified_cmd:

Original command submitted by the user.

Workflow_workspace:
 

Path to the workspace of the workflow.

Cvmfs_mounts:

String with CVMFS volumes to mount in job pods.

Compute_backend:
 

Job compute backend.

Kerberos:

Decides if kerberos should be provided for job container.

Voms_proxy:

Decides if grid proxy should be provided for job container.

Kubernetes_uid:

Overwrites the default user id in the job container.

Unpacked_img:

Decides if unpacked iamges should be used.

Returns:

Returns a dict with the job_id.

reana_commons.api_client.get_current_api_client(component)

Proxy which returns current API client for a given component.

REANA Kubernetes API client

Kubernetes API Client.

reana_commons.k8s.api_client.create_api_client(api='BatchV1')

Create Kubernetes API client using config.

Parameters:api – String which represents which Kubernetes API to spawn. By default BatchV1.
Returns:Kubernetes python client object for a specific API i.e. BatchV1.

REANA Kubernetes volumes.

reana_commons.k8s.volumes.get_k8s_cvmfs_volume(repository)

Render k8s CVMFS volume template.

Parameters:repository – CVMFS repository to be mounted.
Returns:k8s CVMFS volume spec as a dictionary.
reana_commons.k8s.volumes.get_reana_shared_volume()

Return REANA shared volume as k8s spec.

Depending on the configured storage backend REANA will use just a local volume in the host VM or a persistent volume claim which provides access to a network file system.

Returns:k8s shared volume spec as a dictionary.
reana_commons.k8s.volumes.get_shared_volume(workflow_workspace)

Get shared CephFS/hostPath volume to a given job spec.

Parameters:workflow_workspace – Absolute path to the job’s workflow workspace.
Returns:Tuple consisting of the Kubernetes volumeMount and the volume.

REANA AMQP Publisher

REANA-Commons module to manage AMQP connections on REANA.

class reana_commons.publisher.BasePublisher(queue, routing_key, connection=None, exchange=None, durable=False)

Base publisher to MQ.

close()

Close connection.

class reana_commons.publisher.WorkflowStatusPublisher(**kwargs)

Progress publisher to MQ.

publish_workflow_status(workflow_uuid, status, logs='', message=None)

Publish workflow status using the configured.

Parameters:
  • workflow_uudid – String which represents the workflow UUID.
  • status – Integer which represents the status of the workflow, this is defined in the reana-db Workflow models.
  • logs – String which represents the logs which the workflow has produced as output.
  • message – Dictionary which includes additional information can be attached such as the overall progress of the workflow.
class reana_commons.publisher.WorkflowSubmissionPublisher(**kwargs)

Workflow submission publisher.

publish_workflow_submission(user_id, workflow_id_or_name, parameters)

Publish workflow submission parameters.

REANA AMQP Consumer

REANA-Commons module to manage AMQP consuming on REANA.

class reana_commons.consumer.BaseConsumer(queue=None, connection=None, message_default_format=None)

Base RabbitMQ consumer.

get_consumers(Consumer, channel)

Map consumers to specific queues.

Parameters:
on_message(body, message)

Implement this method to manipulate the data received.

Parameters:

REANA Serial workflow utilities

REANA Workflow Engine Serial implementation utils.

reana_commons.serial.serial_load(workflow_file, specification, parameters=None, original=None, **kwargs)

Validate and return a expanded REANA Serial workflow specification.

Parameters:workflow_file – A specification file compliant with REANA Serial workflow specification.
Returns:A dictionary which represents the valid Serial workflow with all parameters expanded.

REANA utilities

REANA-Commons utils.

reana_commons.utils.build_caching_info_message(job_spec, job_id, workflow_workspace, workflow_json, result_path)

Build the caching info message with correct formatting.

reana_commons.utils.build_progress_message(total=None, running=None, finished=None, failed=None, cached=None)

Build the progress message with correct formatting.

reana_commons.utils.build_unique_component_name(component_type, id=None)

Use REANA component type and id build a human readable component name.

Parameters:
  • component_type – One of reana_commons.config.REANA_COMPONENT_TYPES.
  • id – Unique identifier, if not specified a new UUID4 is created.
Returns:

String representing the component name, i.e. reana-run-job-123456.

reana_commons.utils.calculate_file_access_time(workflow_workspace)

Calculate access times of files in workspace.

reana_commons.utils.calculate_hash_of_dir(directory, file_list=None)

Calculate hash of directory.

reana_commons.utils.calculate_job_input_hash(job_spec, workflow_json)

Calculate md5 hash of job specification and workflow json.

reana_commons.utils.check_connection_to_job_controller(port=5000)

Check connection from workflow engine to job controller.

reana_commons.utils.click_table_printer(headers, _filter, data)

Generate space separated output for click commands.

reana_commons.utils.copy_openapi_specs(output_path, component)

Copy generated and validated openapi specs to reana-commons module.

reana_commons.utils.create_cvmfs_persistent_volume_claim(cvmfs_volume)

Create CVMFS persistent volume claim.

reana_commons.utils.create_cvmfs_storage_class(cvmfs_volume)

Create CVMFS storage class.

reana_commons.utils.format_cmd(cmd)

Return command in a valid format.

reana_commons.utils.get_workflow_status_change_verb(status)

Give the correct verb conjugation depending on status tense.

Parameters:status – String which represents the status the workflow changed to.
reana_commons.utils.get_workspace_disk_usage(workspace, summarize=False, block_size=None)

Retrieve disk usage information of a workspace.

reana_commons.utils.render_cvmfs_pvc(cvmfs_volume)

Render REANA_CVMFS_PVC_TEMPLATE.

reana_commons.utils.render_cvmfs_sc(cvmfs_volume)

Render REANA_CVMFS_SC_TEMPLATE.

REANA errors

REANA Commons errors.

exception reana_commons.errors.MissingAPIClientConfiguration

REANA Server URL is not set.

exception reana_commons.errors.REANAConfigDoesNotExist(message)

Validation error.

exception reana_commons.errors.REANAEmailNotificationError(message)

Email notification error.

exception reana_commons.errors.REANASecretAlreadyExists

The referenced secret already exists.

exception reana_commons.errors.REANASecretDoesNotExist(missing_secrets_list=None)

The referenced REANA secret does not exist.

exception reana_commons.errors.REANAValidationError(message)

Validation error.

Changes

Version master (UNRELEASED)

  • Adds new utility to send emails.
  • Adds centralised operational options validation.
  • Fixes memory leak in Bravado client instantiation. (reanahub/reana-server#225)
  • Makes maximum number of running workflows configurable.
  • Adds configurable prefix for component names.
  • Adds central variable for the runtime pods node selector label.
  • Allows specifying unpacked Docker images.
  • Upgrades minimum version of Kubernetes Python library to 11.
  • Centralises CephFS PVC name.
  • Updates to latest CVMFS CSI driver.
  • Introduces new configuration variable REANA_INFRASTRUCTURE_KUBERNETES_NAMESPACE to define the Kubernetes namespace in which REANA infrastructure components run.
  • Introduces new configuration variable REANA_RUNTIME_KUBERNETES_NAMESPACE to define the Kubernetes namespace in which REANA runtime components components run.
  • Increases default log level to INFO.
  • Add Black formatter support.
  • Adds initfiles as an operational option for Yadage.

Version 0.6.1 (2020-05-25)

  • Upgrades Kubernetes Python client.

Version 0.6.0 (2019-12-19)

  • Adds new API for Gitlab integration.
  • Adds new Kubernetes client API for ingresses.
  • Adds new APIs for management of user secrets.
  • Adds EOS storage Kubernetes configuration.
  • Adds HTCondor and Slurm compute backends.
  • Adds support for streaming file uploads.
  • Allows unpacked CVMFS and CMS open data volumes.
  • Adds Serial workflow step name and compute backend.
  • Adds support for Python 3.8.

Version 0.5.0 (2019-04-16)

  • Centralises log level and log format configuration.
  • Adds new utility to inspect the disk usage on a given workspace. (get_workspace_disk_usage)
  • Introduces the module to share Celery tasks accross REANA components. (tasks.py)
  • Introduces common Celery task to determine whether REANA can execute new workflows depending on a set of conditions such as running job count. (reana_ready, check_predefined_conditions, check_running_job_count)
  • Allows the AMQP consumer to be configurable with multiple queues.
  • Introduces new queue for workflow submission. (workflow-submission)
  • Introduces new publisher for workflow submissions. (WorkflowSubmissionPublisher)
  • Centralises Kubernetes API client configuration and initialisation.
  • Adds Kubernetes specific configuration for CVMFS volumes as utils.
  • Introduces a new method, copy_openapi_specs, to automatically move validated OpenAPI specifications from components to REANA Commons openapi_specifications directory.
  • Centralises interactive session types.
  • Introduces central REANA errors through the errors.py module.
  • Skips SSL verification for all HTTPS requests performed with the BaseAPIClient.

Version 0.4.0 (2018-11-06)

  • Aggregates OpenAPI specifications of REANA components.
  • Improves AMQP re-connection handling. Switches from pika to kombu.
  • Enhances test suite and increases code coverage.
  • Changes license to MIT.

Version 0.3.1 (2018-09-04)

  • Adds parameter expansion and validation utilities for parametrised Serial workflows.

Version 0.3.0 (2018-08-10)

  • Initial public release.
  • Provides basic AMQP pub/sub methods for REANA components.
  • Utilities for caching used in different REANA components.
  • Click formatting helpers.

Please beware

Please note that REANA is in an early alpha stage of its development. The developer preview releases are meant for early adopters and testers. Please don’t rely on released versions for any production purposes yet.

Contributing

Bug reports, issues, feature requests, and other contributions are welcome. If you find a demonstrable problem that is caused by the REANA code, please:

  1. Search for already reported problems.
  2. Check if the issue has been fixed or is still reproducible on the latest master branch.
  3. Create an issue, ideally with a test case.

If you create a pull request fixing a bug or implementing a feature, you can run the tests to ensure that everything is operating correctly:

$ ./run-tests.sh

Each pull request should preserve or increase code coverage.

License

MIT License

Copyright (C) 2018-2020 CERN.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

In applying this license, CERN does not waive the privileges and immunities granted to it by virtue of its status as an Intergovernmental Organization or submit itself to any jurisdiction.