Documentation status

This page was generated and edited with the assistance of an LLM and is still in development. It has not been fully vetted by the developer. Verify commands, UI labels, file paths, workflow descriptions, and scientific claims against the current code and your local workflow before relying on it.

If you notice an error, omission, or outdated guidance, please open an issue on GitHub.

pyDREAM Workflow¶

The SAXS DREAM Fit tab is the posterior-sampling side of SAXSShell. It uses the current project, Prefit parameter table, and active template runtime inputs to generate a pyDREAM bundle and launch a Bayesian refinement.

Like Prefit, DREAM depends on the upstream component and geometry inputs prepared earlier in the main UI workflow and by the supporting applications.

In plain language, use this tab when you want uncertainty estimates and a posterior distribution, not just one hand-tuned Prefit curve.

Image placeholder

Add a screenshot of the SAXS DREAM Fit tab showing the runtime controls, prior-map editor button, and the main results panels.

SAXSShell uses pyDREAM, a Python implementation of the MT-DREAM(ZS) sampler. In plain language, pyDREAM runs several Markov chains at once, explores parameter space, and then uses the accepted samples to estimate both a best-fit model and the uncertainty around that fit.

What DREAM Is Doing In SAXSShell¶

Use DREAM when you want more than one optimized answer.

Prefit gives you a fast editable preview. DREAM goes further by sampling many plausible parameter combinations from the posterior distribution. That lets you look at:

one representative best-fit parameter set
uncertainty intervals for each parameter
how strongly the data support different parameter ranges
whether your fit depends on a narrow or broad region of parameter space

In this repository, the DREAM workflow currently includes:

runtime bundle generation
prior-map editing
saved settings presets
progress and log output
model-vs-experimental plots
violin plots for posterior distributions
filter-aware posterior summaries
export helpers for statistics, model-fit bundles, and violin data

Relationship To Prefit¶

The DREAM tab depends heavily on Prefit:

the parameter map is built from the current Prefit parameter table
geometry-aware templates inherit the current cluster-geometry state
template runtime inputs are rebuilt from the saved Prefit workflow state
the list of varying parameters comes from the current prior editor

If Prefit is out of sync, DREAM usually should not be your first stop.

Typical Order Of Operations¶

Build a usable project in Project Setup.
Confirm the model behaves sensibly in SAXS Prefit.
Open SAXS DREAM Fit.
Review or edit the DREAM parameter map.
Review the search/filter preset and individual DREAM settings.
Write the runtime bundle.
Run DREAM.
Load the results and inspect the model plot, summary statistics, and violin plot.

Core Concepts For A Lay User¶

What Is A Chain?¶

A chain is one running copy of the Markov-chain Monte Carlo sampler. Each chain starts from a different location in parameter space and walks through candidate parameter values. Running multiple chains helps pyDREAM explore more robustly and reduces the chance that one unlucky starting point dominates the result.

In SAXSShell:

more chains usually means better exploration
more chains also usually means more runtime
the runtime bundle will raise the chain count if needed so there is at least one chain per varying parameter

What Is Burn-in?¶

Burn-in is the early part of a chain that you throw away because the sampler is still settling into the high-probability region of parameter space.

In SAXSShell, burn-in is implemented as:

compute the number of iterations in each chain
remove the earliest burnin_percent from every chain
apply posterior filtering only to the remaining samples

So burn-in happens before summary statistics, MAP selection, violin plots, and posterior filtering.

What Does MAP Mean?¶

MAP stands for Maximum a Posteriori.

In Bayesian fitting, this means the parameter set with the highest posterior probability among the samples being considered. In SAXSShell, that is implemented as:

apply burn-in
apply the selected posterior filter
among the retained samples, find the one with the highest log-posterior value

That sample becomes the MAP best-fit parameter set.

Important distinction:

MAP is a best-fit method
MAP Chain Only is a violin-sample display option
MAP is not a separate posterior filter mode in the current UI

The DREAM tab includes built-in presets that change both search depth and posterior filtering behavior. The first three are the current SAXSShell profiles; the Legacy ... rows reproduce settings mined from the old MDScatter pyDREAM UI and notebooks.

Preset	Chains	Iterations	Burn-in	nSeedChains	Crossover burn-in	Default filter	Top %	Top N
Less Aggressive	4	5000	15%	24	500	All Post-burnin	20	1000
Medium	4	10000	20%	40	1000	All Post-burnin	10	500
More Aggressive	8	20000	25%	80	2000	Top % by Log-posterior	5	250
Legacy GUI Default	4	10000	20%	40	1000	All Post-burnin	10	500
Legacy SAXS Notebook	50	15000	40%	500	1000	All Post-burnin	10	500
Legacy KWhite Long	50	20000	40%	500	1000	All Post-burnin	10	500
Legacy TChaney Production	25	50000	10%	250	1000	All Post-burnin	10	500

Use them as follows:

Less Aggressive: faster first pass, broader retained posterior
Medium: balanced default for most routine runs
More Aggressive: deeper search and tighter posterior screening
Legacy ...: compatibility presets for reproducing older MDScatter pyDREAM runs

If you change any of the linked controls manually, the preset switches to Custom.

DREAM Settings Explained¶

This section explains the main pyDREAM controls in the current SAXSShell UI in plain language.

Edit Priors And Smart Prior Presets¶

The Edit Priors window starts from the current DREAM parameter map and lets you change:

which parameters vary
the distribution family for each parameter
the numeric distribution parameters
a row-level Smart Preset Status

The smart presets do not replace the current prior family. Instead, they rescale the current width of each row's distribution. In other words, the current table is the baseline and the preset multiplies that baseline width.

Width Rescaling Rule¶

Let f be the smart-preset spread factor. SAXSShell applies:

Very Strict: f = 0.40
Strict: f = 0.65
Proportional: f = 1.00
Lenient: f = 1.50
Very Lenient: f = 2.25

The exact update depends on the distribution family currently assigned to the row.

For a normal prior:

loc' = loc
scale' = f * scale

For a lognormal prior:

loc' = loc
scale' = scale
s' = f * s

For a uniform prior, SAXSShell preserves the current center and rescales the width:

c = loc + scale / 2
scale' = f * scale
loc' = c - scale' / 2

So the preset changes spread, not the intended center of the prior.

Apply To: All Parameters vs Selected Parameters¶

For the single-mode presets above, the Apply to control determines whether SAXSShell adjusts:

all parameter rows in the table
or only the currently selected parameter rows

In the current implementation, a "structure group" means:

all rows sharing the same (structure, motif) pair, if those fields are set
otherwise only the specific row itself

That means if you apply Strict to one selected cluster parameter row, SAXSShell updates the whole structure/motif group together so its associated weight and related rows stay synchronized.

Mixed Size-Aware Presets¶

The two mixed presets are:

Strict Small / Lenient Large
Lenient Small / Strict Large

These always apply across all parameters because SAXSShell must rank the structures against each other before deciding which ones are "small" or "large".

First, SAXSShell builds one effective radius per weight parameter:

if a sphere radius row exists, it uses r_eff_wN
if an ellipsoid is represented by semiaxes, it converts that to an equivalent-volume radius

For ellipsoids, the equivalent radius is:

r_eq = (a * b * c)^(1/3)

where a, b, and c are the active semiaxes for that component.

Next, SAXSShell computes the median radius across all weight-linked structures:

r_med = median({r_i})
tol = max(r_med * 1e-9, 1e-9)

Each structure is then classified as:

small   if r_i < r_med - tol
large   if r_i > r_med + tol
neutral otherwise

Once the structures are labeled, the spread factor f is assigned by class.

For Strict Small / Lenient Large:

f_small = 0.65
f_large = 1.50
f_neutral = 1.00

For Lenient Small / Strict Large:

f_small = 1.50
f_large = 0.65
f_neutral = 1.00

That factor is then applied to every row belonging to that structure group using the same width-update rules described above.

Smart Preset Status Column¶

Each row in the priors table also shows a Smart Preset Status column.

For the single-mode presets, the status usually matches the applied preset:

Very Strict
Strict
Proportional
Lenient
Very Lenient
Custom / Manual

For the mixed size-aware presets, the status is shown per structure as:

Strict
Lenient
Proportional

This is intentional: the mixed presets are global, but the status column tells you how each individual structure was classified and therefore how its priors were tightened or relaxed.

Manual Overrides¶

After a smart preset is applied, each structure can still be overridden independently:

choosing a new row-level Smart Preset Status reapplies that preset to the full structure/motif group for that row
editing the value, distribution family, or raw distribution parameters marks that structure group as Custom / Manual

So the smart presets are a starting point, not a lock.

Model Name¶

The run label stored in the runtime bundle and exported outputs. It helps you identify runs later.

Chains¶

How many DREAM chains to run.

Higher values usually improve exploration.
Higher values also increase runtime and output size.
SAXSShell may raise this automatically so there is at least one chain per varying parameter.

Iterations¶

How many sampler steps to run per chain.

If you use 4 chains and 10,000 iterations, that means 40,000 total raw samples before burn-in and filtering.

Burn-in (%)¶

What fraction of the start of each chain is discarded before any posterior summary is computed.

Example:

10,000 iterations per chain
burn-in = 20%
first 2,000 samples in each chain are discarded
last 8,000 remain available for filtering and summary statistics

History Thin¶

Controls how densely DREAM writes chain-history output to disk. A larger value keeps fewer saved history points and can reduce output size.

nSeedChains¶

The number of initial draws used to seed DREAM's proposal history.

Practical meaning:

too small can make early proposal adaptation weaker
larger values can help stabilize the sampler in harder problems

SAXSShell will raise this automatically if needed so it is at least 2 x Chains.

Crossover Burn-in¶

How long DREAM waits while learning or fitting its crossover-probability behavior.

Lay interpretation: this is part of the sampler's own adaptation period, not the same thing as the posterior burn-in percentage above.

Lambda¶

The DREAM proposal step-size scaling factor. This affects how far proposals can jump.

too small can make exploration slow
too large can lower acceptance and make the chain unstable

Most users should leave this at the default unless they have a specific reason to tune proposal behavior.

Zeta¶

A very small numerical jitter added to proposals to prevent degenerate moves. Most users should leave this unchanged.

Snooker¶

The probability of using DREAM's snooker update move. This is an advanced proposal option designed to help with difficult posterior geometries.

p_gamma_unity¶

The probability of using proposal scaling with gamma = 1. This is another advanced DREAM tuning parameter. Most users should leave the default alone.

Verbose Sampler Output¶

Whether DREAM writes frequent textual progress updates to the UI log.

Verbose Interval (s)¶

How often verbose output is allowed to update the UI. Smaller values mean more frequent log text.

Run Chains In Parallel¶

Whether DREAM is allowed to execute chains in parallel. This usually improves runtime on machines with available CPU resources.

Adapt Crossover¶

Whether DREAM adapts its crossover probabilities during the crossover-burn-in period. In most cases, leaving this enabled is appropriate.

Restart Previous Run¶

Continue an earlier DREAM run instead of starting from scratch.

History File¶

An optional existing chain-history .npy file to reuse when continuing or comparing runs.

Best-Fit Methods¶

The Best-fit method control changes how SAXSShell reduces the retained posterior to a single representative parameter set for the model plot and summary.

MAP¶

Maximum a Posteriori.

SAXSShell picks the retained sample with the highest log-posterior value after burn-in and posterior filtering.

Use this when you want the single most probable retained sampled state.

Chain Mean MAP¶

SAXSShell finds the best retained sample within each chain and then averages those per-chain MAP parameter vectors.

Use this when you want a slightly more chain-balanced representative estimate instead of a single winning sample.

Median¶

SAXSShell computes the parameter-wise median across the retained posterior samples.

Use this when you want a robust central estimate that is less sensitive to one very sharp posterior peak.

Posterior Filtering¶

Posterior filtering controls which retained post-burn-in samples are allowed to contribute to the summary statistics, best-fit selection, and violin plot data source.

In SAXSShell, filtering is performed in this order:

apply burn-in to every chain
flatten the remaining samples across all chains
rank samples by log-posterior if the chosen filter requires ranking
keep the samples allowed by the active filter
compute MAP, medians, credible intervals, and fit-quality summaries from that retained set

All Post-burnin Samples¶

Keep every sample that remains after burn-in.

Use this when:

you want the broadest posterior view
you do not want extra screening beyond burn-in
you want violin plots and intervals to reflect the full post-burn-in sample cloud

Top % by Log-posterior¶

Sort all post-burn-in samples by log-posterior from highest to lowest, then keep only the top percentage.

SAXSShell keeps:

ceil(total_post_burnin_samples * top_percent / 100)
with a minimum of 1 retained sample

Use this when you want to focus on the highest-probability region without hard-coding an exact sample count.

Top N by Log-posterior¶

Sort all post-burn-in samples by log-posterior from highest to lowest, then keep only the best N samples.

SAXSShell clamps this to:

at least 1 sample
no more than the total number of post-burn-in samples

Use this when you want a fixed-size retained subset across runs.

Automatic Posterior Filter Assessment¶

If Auto-select best filter after run is enabled, SAXSShell evaluates all three filter modes after the run finishes:

all_post_burnin
top_percent_logp
top_n_logp

It evaluates them using:

the current best-fit method
the current default Top %
the current default Top N

SAXSShell then recommends the filter with the best fit quality using this tie break order:

lowest RMSE
lowest mean absolute residual
highest R²

If auto-select is on, that recommendation is applied automatically. If it is off, the recommendation is reported but not applied.

Violin Plot Sample Sources¶

Filtered Posterior¶

Use the full set of retained samples after burn-in and posterior filtering.

This is the best choice when you want the violin plot to reflect the full screened posterior.

MAP Chain Only¶

Use only the retained samples from the single chain that contains the global MAP point.

This is not a posterior filter. It is only a violin-plot data source.

Use it when you want to inspect the local behavior of the chain that produced the winning MAP sample.

Credible Intervals¶

The interval controls set the percentiles used for reported posterior bars and summary statistics.

The defaults are:

low = 16%
high = 84%

That is a common choice because it roughly matches a one-standard-deviation equivalent interval for a Gaussian-like posterior, but the posterior does not need to be Gaussian for the percentiles themselves to remain meaningful.

Practical Advice¶

Start with Medium unless you already know your model is easy or very difficult.
If the violin plot is still very broad, try more iterations before assuming the model is underdetermined.
If the retained sample count becomes tiny under Top % or Top N, the summary can look overly sharp.
If MAP and Median disagree strongly, that often means the posterior is skewed, multimodal, or both.
If chains disagree strongly, increase iterations before over-tuning advanced sampler parameters.
Rewrite the runtime bundle after changing priors, vary flags, geometry state, or template settings.