SeFMol Platform

SeFMol User Manual and Method Guide

SeFMol is a structure-based molecular generation platform for protein pocket conditioned 3D molecule design. It combines semi-flexible diffusion modeling, reinforcement-learning steering, and multi-property conditioning to generate candidate molecules with improved binding affinity and drug-like properties. This manual is written for platform users and explains what to upload, how to set parameters, and how to interpret the generated results.

Structure-Based Drug Design Semi-Flexible Diffusion Reinforcement Learning Steering Multi-Property Control
1. What SeFMol Does

SeFMol is designed for structure-based drug design, where the goal is to generate 3D ligand molecules directly inside a protein binding pocket. Unlike methods that treat ligands as rigid during generation, SeFMol is inspired by semi-flexible docking and allows molecular conformations to be adjusted during the denoising process, making the generated candidates more compatible with pocket geometry and interaction patterns.

What makes SeFMol different

  • Generates 3D molecules conditioned on a protein pocket.
  • Uses reinforcement learning to steer semi-flexible conformational optimization.
  • Supports molecular property guidance such as QED, SA, LogP, TPSA, HBA, HBD, Fsp3, and ROTB.

What users can expect

  • Candidate molecules with strong docking-related performance.
  • Better control over drug-like physicochemical properties.
  • Fast sampling suitable for practical lead exploration and early candidate triage.
Research-use statement: generated molecules are computational hypotheses. They should be further reviewed with docking validation, medicinal chemistry assessment, ADMET analysis, and experimental testing.
2. Method Core

Two-stage rigid training

  • Pretraining: 1,000,000 target-free molecules from Molecule3D.
  • Fine-tuning: 100,000 protein-ligand pairs from CrossDocked2020.
  • Property guidance uses 8 RDKit-calculated properties: QED, SA, LogP, TPSA, HBA, HBD, Fsp3, and ROTB.

SFRL: semi-flexible RL optimization

  • The denoising process is formulated as a Markov Decision Process (MDP).
  • A policy denoiser optimizes molecular states step by step inside the target pocket.
  • KL regularization constrains policy drift from the pretrained denoiser.
  • PPO-style clipping and a value function are used to stabilize optimization.
Default property vector in the paper: [QED, SA, LogP, TPSA, HBA, HBD, Fsp3, ROTB] = [1.0, 1.0, 1.0, 50.0, 3.0, 2.0, 0.5, 2.0]. SeFMol also uses a fast sampling strategy that reduces denoising steps from 1000 to 50, giving about 20× acceleration.
3. User Workflow
SeFMol model architecture
Figure. Overview of SeFMol: a reinforcement-learning-steered semi-flexible diffusion model for pocket-conditioned molecular generation.
Step 1

Upload Target Structure

Provide the target protein structure and make sure the binding pocket is meaningful and complete.

Step 2

Set Property Guidance

Choose the sample number and optionally specify target molecular properties for generation.

Step 3

Run Generation

Launch SeFMol inference to generate 3D molecules under pocket and property constraints.

Step 4

Inspect and Prioritize

Review structures, docking-related scores, and property values to identify promising candidates.

4. Input Requirements

Required input

  • Protein structure / pocket information used as the spatial condition for generation.
  • Use a clean and chemically meaningful structure whenever possible.
  • Binding-site geometry should be relevant to the design objective.

Optional guidance

  • Property targets for QED, SA, LogP, TPSA, HBA, HBD, Fsp3, and ROTB.
  • Sample count to control how many candidate molecules are generated.
  • If you are unsure where to start, use the default property vector from the paper.
In the reported experiments, SeFMol sampled 100 molecules per protein pocket. For practical use, a smaller batch is suitable for quick testing, while larger batches are better for broader candidate exploration.
5. Parameter Definitions and Practical Ranges
Parameter Meaning Reference Value / Range Interpretation Guidance
QED Quantitative estimate of drug-likeness default 1.0; SR criterion > 0.25 Higher values usually indicate a more drug-like overall profile.
SA Synthetic accessibility default 1.0; SR criterion > 0.59 Useful for checking whether generated molecules remain practically synthesizable.
LogP Hydrophobicity balance default 1.0; commonly -0.4 to 5.6 High values may help permeability but can reduce solubility.
TPSA Topological polar surface area default 50.0; often < 90, SR criterion ≤ 140 Important for polarity, membrane transport, and exposure behavior.
HBA Hydrogen-bond acceptors default 3.0; usually ≤ 10 Helps tune intermolecular interaction patterns and polarity.
HBD Hydrogen-bond donors default 2.0; usually ≤ 5 Useful when balancing binding interactions and developability.
FSP3 3D saturation level default 0.5; typically > 0.47, SR criterion ≥ 0.42 Higher values often improve 3D character and scaffold richness.
ROTB Rotatable bonds default 2.0; usually ≤ 10 Lower values often help conformational stability.
num_samples Number of generated molecules paper evaluation: 100 per pocket More samples improve coverage, but also increase screening workload.
Default condition vector: [QED, SA, LogP, TPSA, HBA, HBD, Fsp3, ROTB] = [1.0, 1.0, 1.0, 50.0, 3.0, 2.0, 0.5, 2.0]. This is a good baseline setting for first-time users.
In the paper, Success Rate (SR) is defined as the proportion of molecules satisfying nine joint constraints: Vina Dock < -8.18, QED > 0.25, SA > 0.59, -0.4 ≤ LogP ≤ 5.6, TPSA ≤ 140, FSP3 ≥ 0.42, HBA ≤ 10, HBD ≤ 5, and ROTB ≤ 10.
6. Reported Performance
Avg. Vina Score
-7.23
Success Rate (SR)
11.53%
Sampling Time
0.81 s
Completion
98.3%
Additional reported indicators Value Notes
Fast sampling 1000 → 50 steps About 20× acceleration during sampling.
Test scale 100 protein pockets Used in the benchmark evaluation.
Sampling per pocket 100 molecules Used for model comparison.
Interaction-pattern JSD 0.1401 Best reported value, tied with TargetDiff.
Case studies CDK2 / ROCK1 SeFMol reproduced known interactions and explored new ones.
Generalization AlphaFold structures Also showed favorable Vina score distributions on predicted proteins.

These are paper-reported results intended to describe the method’s performance. Actual web runs may vary across targets, pocket quality, and parameter settings.

The paper also notes a trade-off: SeFMol improves affinity and property control, but may sacrifice some diversity compared with more exploratory settings.
7. Recommended Practice
  • Use biologically meaningful and structurally clean binding pockets.
  • Start with the default property vector, then adjust one or two properties at a time.
  • Do not screen candidates by Vina score alone; combine affinity, QED, SA, TPSA, and Fsp3.
  • Cluster generated molecules before detailed review to reduce redundant chemotypes.
  • Pair SeFMol with downstream docking, ADMET, and synthesis-feasibility tools for better triage.
  • Keep records of inputs, parameters, and output files for reproducibility.
8. Reference

Xudong Zhang, Sanqing Qu, Fan Lu, Jianmin Wang, Zhixin Tian, Shangding Gu, Yanping Zhang, Alois Knoll, Shaorong Gao, Guang Chen, Changjun Jiang, Steering Semi-Flexible Molecular Diffusion Model for Structure-Based Drug Design with Reinforcement Learning.

Code and resources: https://github.com/ispc-lab/SeFMol