Description
- A system outage (application crash, DB downtime, network issue) occurs while experiments are:
- Being created, exported, or processed.
- After restart, some experiments stay in an in‑between status (e.g., “in progress”, “exporting”) and cannot be edited or finalized.
Solution
- Identify affected experiments:
- Use admin or database views to find experiments with flags indicating an active operation that is no longer running.
- Safely reset their states:
- Follow internal procedures or scripts to clear transient flags (e.g., “export in progress”) when the underlying operation has failed.
- Avoid resetting experiments that might still be truly active.
- Validate consistency:
- After reset, open the experiment in the UI and verify:
- Plates, materials, and analytical data are intact.
- Experiment can be edited, reprocessed, or finalized as appropriate.
- After reset, open the experiment in the UI and verify:
- Prevent recurrence:
- Improve resilience of long‑running tasks (e.g., use job queues with retry and recovery).
Document recovery steps as part of your operational runbook.
Comments
0 comments
Article is closed for comments.