Product |
Version |
Lead Discovery Premium |
3.5 or Above |
Keywords:
Antibody sequence clustering, VH sequences, assay integration, sequence alignment, dendrogram visualization, Lead Discovery ChemCharts, HELM notation, CDR regions, MAFFT, IMGT/Kabat numbering.
Introduction:
This article explains how to use Lead Discovery ChemCharts to cluster antibody VH sequences, align them, and link the resulting clusters to assay results (e.g., binding affinity, IC50). The workflow enables scientists to correlate sequence motifs with functional outcomes, accelerating antibody engineering and optimization.
Overview: Antibody development relies on understanding how sequence variations in the variable heavy (VH) chain impact assay performance (e.g., binding affinity). Lead Discovery ChemCharts provides tools to:
- Align and cluster antibody sequences.
- Visualize clusters via dendrograms or MSA (Multiple Sequence Alignment).
- Overlay assay data to identify sequence-activity relationships.
- Handle complex formats like HELM for post-translational modifications.
Process Steps:
-
Import and Prepare Data
Data Format:
- Import VH sequences in FASTA, GenBank, or HELM format.
- Include assay results (e.g., IC50, activity categories) linked by a unique Sequence_ID.
Antibody-Specific Tools:
- Use the Antibody Numbering data function (Biopolymer Antibody Numbering) to standardize sequences into IMGT or Kabat frameworks (Pages 76, 100–119).
-
Align Sequences
Multiple Sequence Alignment (MSA):
- Run the MAFFT or Clustal Omega data function (Data Functions Biopolymer Multiple Sequence Alignment).
- Output: A column of aligned sequences (e.g., Aligned_Sequences).
Visualize Alignment:
- Use the MSA Visualization (Pages 147–169) to inspect CDR regions and framework motifs.
-
Cluster Sequences
Generate Dendrogram:
- Add a Dendrogram Visualization (Pages 80–84).
- Select the aligned sequence column (Aligned_Sequences).
- Configure clustering methods (e.g., UPGMA, neighbor-joining).
Customize Clusters:
- Adjust branch lengths and styles (horizontal/vertical/radial).
-
Link Clusters to Assays
Color Rules:
- Open Properties Coloring (Page 84).
- Select the assay column (e.g., IC50) and apply a gradient or categorical color scheme.
- Example: red (low activity) to green (high activity).
Annotations & Distributions:
- In the MSA Visualization (Pages 160-162):
- Add assay results as tooltips or side panels.
- Use Regions to highlight motifs (e.g., CDR3) and overlay violin/box plots for assay distributions.
-
Advanced Analysis
Cluster Cards:
- Group sequences into clusters manually or by CDR3 similarity.
- Configure cards to display assay data, sequence logos, or 3D structures (Pages 59–74).
HELM Integration:
- Use the HELM Visualization (Pages 100–119) to:
- Translate sequences to SMILES.
- Analyze post-translational modifications (e.g., glycosylation).
Example Workflow:
-
Import:
VH_sequences.fasta
+assay_results.csv
(linked by Sequence_ID). - Align: Generate aligned sequences using MAFFT.
- Cluster: Create a dendrogram colored by IC50 values.
- Analyze: Highlight CDR3 motifs in MSA and overlay IC50 distributions.
- Export: Save clusters as new tables for machine learning.
Additional Documentation:
-
Data Import: Configure sequence columns with
ContentType = chemical/x-sequence
(Pages 28–29). - Large Datasets: Pre-cluster sequences using CD-HIT for 10,000 entries.
- Performance Tips: Use RDKit renderer for faster rendering (Page 74).
- User Guide: All page references (e.g., Pages 76, 100–119) are from the Lead Discovery ChemCharts User Guide. You can download the guide from the Revvity Download Center.
Limitations:
- Data Size: For large datasets (10K sequences), pre-process with external tools (e.g., CD-HIT).
- Assay Format: Ensure assay columns are numeric (Real for IC50) or categorical (String for activity levels).
External Resources:
- IMGT Database: Standardize antibody sequences.
- RCSB PDB: Download 3D antibody structures.
- MAFFT Alignment: MAFFT Documentation.