Description
A batch Percepta prediction run on a large SDF file terminates early or skips records, with log messages indicating parse errors or invalid structures. Investigation shows that the SDF contains malformed entries (e.g., truncated records, missing M END, or corrupted field lines).
Solution
- Identify problematic records
- Check Percepta’s batch log for:
- Line numbers or record indices where parsing failed.
- Any messages about invalid valence, missing coordinates, or truncated records.
- Check Percepta’s batch log for:
- Validate the SDF with an external tool
- Use an SDF checker or another cheminformatics tool to:
- Load the file.
- Identify which records cannot be parsed or trigger warnings.
- Use an SDF checker or another cheminformatics tool to:
- Clean or split the SDF
- Remove or correct clearly corrupted records:
- If a record is truncated (missing M END or tag termination), either repair it from the source or delete it from the batch.
- Optionally split a huge SDF into smaller chunks (e.g., 500–1000 structures per file) to isolate issues more easily.
- Remove or correct clearly corrupted records:
- Rerun Percepta batch on cleaned data
- Run predictions on the cleaned file(s).
- Confirm that:
- Jobs complete successfully.
- The number of records processed matches expectations.
- Document source of malformed data
- If the SDF was exported from an internal system or external partner:
- Communicate the issue and request corrected exports going forward.
- Update internal SOPs to include a quick validation step for large SDFs before Percepta batch runs.
- If the SDF was exported from an internal system or external partner:
- Keep an exception list
- For any remaining structures that cannot be processed (e.g., exotic or incomplete entries), maintain a small list for case‑by‑case review.
Comments
0 comments
Article is closed for comments.