1. Data

πŸ“ Data Management

Import datasets (CSV, Excel), view summaries, and cleaning.

2. Table 1

πŸ“‹ Table 1 & Matching

Generate baseline tables and perform propensity score matching.

3. General Stats

πŸ“Š General Statistics

Diagnostic tests, Correlation, Agreement (Kappa, Bland-Altman).

4. Modeling

πŸ”¬ Advanced Modeling

Regression (Linear, Logistic, Firth), Survival, Advanced Inference.

5. Clinical

πŸ₯ Clinical Tools

Sample Size Calculator, Causal Inference methods.

6. Settings

βš™οΈ Settings

Configure application preferences and defaults.

Categorical Mapping: Format as `0=Control`.
πŸ” Missing Data
🧩 Impute Missing Data
πŸ“ˆ Outlier Handling

πŸ—ΊοΈ Missing Data Pattern
πŸ“Š Multiple Imputation with Rubin's Rules β€” Generate m imputed datasets and pool estimates
βš™οΈ MI Settings


πŸ’‘ Tip: Use m β‰₯ 5 imputations. More is better for high % missing data.
πŸ“ˆ MI Status

πŸ” Imputation Diagnostics
πŸ“Š Assumption Check
Transformation Preview

                                            

πŸ› οΈ Variable Config

  • Metadata: Define variable types (Categorical vs Continuous).
  • Missing Data: standardized coding (e.g., -99, NaN) ensures accurate analysis.

🧹 Cleaning & Imputation

  • Mean/Median: Simple, fast, but reduces variance. Use for low missingness (<5%).
  • KNN (K-Nearest Neighbors): Imputes based on similar rows. Preserves local structure better.
  • MICE (Multivariate Imputation): Models each variable using others. Best for complex datasets with random missingness (MAR).

πŸ“ˆ Outlier Handling

  • IQR (Interquartile Range): Robust method. Flags points < Q1-1.5IQR or > Q3+1.5IQR.
  • Z-Score: Parametric. Flags points > 3 SD from mean. Assumes normality.
  • Actions:
    • Winsorize: Cap values at the thresholds (preserves sample size).
    • Remove: Delete values (creates missingness).

⚑ Transformation

  • Log: Reduces right-skewness (e.g., income, CRP levels). Handles x > 0.
  • Sqrt: Moderate skew reduction. Handles x >= 0.
  • Z-Score: Standardizes to Mean=0, SD=1. Essential for algorithms sensitive to scale (e.g., KNN, Clustering).
πŸ“„ Data Preview
πŸ“Š Table 1 Options

Configuration

Variables


Table 1 Results


βš–οΈ PSM Configuration
πŸ’‘ Need ATE directly? Use Clinical Tools β†’ Causal Methods for Inverse Probability Weighting (keeps all data, no matching).

1. Select Variables

2. Quick Presets

βš™οΈ 3. Matching Settings


Matching Results


βœ… Matched Data Actions
Export Options:
Filter & Display:
Compare Variable:
Reset:
πŸ“Š Summary Statistics
πŸ” Data Preview
πŸ“ˆ Statistics by Group

πŸ”’ Sample Size & Power Calculator


Means (T-test) Setup

Group 1

Group 2


Results


Proportions Setup

Expected Proportions


Results


Survival Analysis Setup

Input Mode

Parameters


Results


Note: This calculates required number of EVENTS, not total subjects.

Correlation Setup

Parameters


Results


πŸ“š Reference & Interpretation Guide

πŸ’‘ Tip: This section provides detailed explanations and interpretation rules for Table 1 and Propensity Score Matching.

🚦 Quick Decision Guide

Question Recommended Action Goal
Do my groups differ at baseline? Generate Table 1 (Subtab 1) Check for significant p-values (< 0.05).
My groups are imbalanced. Can I fix? Run PSM (Subtab 2) Create a "synthetic" RCT where groups are balanced.
Did the matching work? Check SMD (Subtab 2 - Results) Look for SMD < 0.1 in the Love Plot.
What do I do with matched data? Export / Use Matched Data Go to Subtab 3 to export, or select "βœ… Matched Data" in other analysis tabs.

πŸ“Š Baseline Characteristics (Table 1)

Concept: A standard table in medical research that compares the demographic and clinical characteristics of two or more groups (e.g., Treatment vs Placebo).

Interpretation:

  • P-value: Tests if there is a statistically significant difference between groups.
  • p < 0.05: Significant difference (Imbalance) ⚠️. This suggests confounding may be present.
  • p β‰₯ 0.05: No significant difference (Balanced) βœ….

Reporting Standards:

  • Numeric Data (Normal): Report Mean Β± SD. (e.g., Age: 45.2 Β± 10.1)
  • Numeric Data (Skewed): Report Median (IQR). (e.g., LOS: 5 (3-10))
  • Categorical Data: Report Count (%). (e.g., Male: 50 (45%))
βš–οΈ Propensity Score Matching (PSM)

Concept: A statistical technique used in observational studies to reduce selection bias. It pairs patients in the treated group with patients in the control group who have similar "propensity scores" (probability of receiving treatment).

Key Metric: Standardized Mean Difference (SMD):

  • The gold standard for checking balance after matching.
  • SMD < 0.1: Excellent Balance βœ… (Groups are comparable).
  • SMD 0.1 - 0.2: Acceptable.
  • SMD > 0.2: Imbalanced ❌.

Caliper (Tolerance):

  • Determines how "close" a match must be.
  • Stricter (0.1Γ—SD): Better balance, but you might lose more patients (fewer matches).
  • Looser (0.5Γ—SD): More matches, but balance might be worse.

πŸ“ Common Workflow

  1. Check Original Data: Run Table 1 on the "Original Data". Note any variables with p < 0.05.
  2. Match: Go to Subtab 2, select Treatment, Outcome, and all confounding variables (especially those with p < 0.05).
  3. Verify: After matching, check the Love Plot. Ensure all dots (Matched) are within the < 0.1 zone.
  4. Re-check Table 1: Go back to Subtab 1, switch the dataset selector to "βœ… Matched Data", and generate Table 1 again. P-values should now be non-significant (or SMDs low).


ROC Curve Analysis



Chi-Square & Risk Analysis (2x2 Contingency Table)


Descriptive Statistics


Decision Curve Analysis


πŸ“š Reference & Interpretation Guide

πŸ’‘ Tip: This section provides detailed explanations and interpretation rules for all the diagnostic tests.

🚦 Quick Decision Guide

Question Recommended Test Example
My test is a score (e.g., 0-100) and I want to see how well it predicts a disease (Yes/No)? ROC Curve & AUC Risk Score vs Diabetes
I want to find the best cut-off value for my test score? ROC Curve (Youden Index) Finding optimal BP for Hypertension
Are these two groups (e.g., Treatment vs Control) different in outcome (Cured vs Not Cured)? Chi-Square Drug A vs Placebo on Recovery
Is my model clinically useful at a specific threshold? Decision Curve (DCA) Should we biopsy everyone with PSA > 4?
(For Agreement/Kappa, see "Agreement" tab)

βš–οΈ Interpretation Guidelines

ROC Curve & AUC

  • Single Test: Detailed analysis of one diagnostic test with threshold optimization.
  • Compare Tests: Uses Paired DeLong's Test to statistically compare two ROC curves properly.
  • AUC > 0.9: Excellent discrimination
  • AUC 0.8-0.9: Good discrimination
  • AUC 0.7-0.8: Fair discrimination
  • AUC 0.5-0.7: Poor discrimination
  • AUC = 0.5: No discrimination (random chance)
  • Youden J Index: Sensitivity + Specificity - 1 (higher is better, max = 1)

Comparison Interpretation (DeLong)

  • P-value < 0.05: Significant difference between the two ROC curves.
  • Z-score: Strength of the difference.

Chi-Square Test

  • P < 0.05: Statistically significant association
  • Odds Ratio (OR): If 95% CI doesn't include 1.0, it's significant
  • Risk Ratio (RR): Similar interpretation as OR
  • Use Fisher's Exact Test when expected counts < 5

Decision Curve Analysis (DCA)

  • Net Benefit: The benefit of treating true positives minus the harm of treating false positives.
  • Interpretation: The model is useful if the model's curve (Red) is higher than both:
    • Treat All (Gray line): Treating everyone assuming they have the disease.
    • Treat None (Horizontal line at 0): Treating no one.
  • Threshold Probability: The patient's/doctor's preference (e.g., how worried are they about missing a case vs unnecessary treatment?).

Descriptive Statistics

  • Mean: Average value (affected by outliers)
  • Median: Middle value (robust to outliers)
  • SD (Standard Deviation): Spread of data around mean
  • Q1/Q3: 25th and 75th percentiles


πŸ“ˆ Continuous Correlation Analysis
πŸ“Š Correlation Matrix & Heatmap
πŸ“š Reference & Interpretation Guide
πŸ“ˆ Correlation (Relationship)

Concept: Measures the strength and direction of the relationship between two continuous variables.

1. Pearson (r):

  • Best for: Linear relationships (straight line), normally distributed data.
  • Sensitive to: Outliers.
  • Returns: R-squared (RΒ²) = proportion of variance explained

2. Spearman (rho) & Kendall (tau):

  • Best for: Monotonic relationships, non-normal data, or ranks.
  • Robust to: Outliers.
  • Kendall's Tau is often preferred for small datasets with many tied ranks.

Interpretation of Coefficient (r, rho, or tau):

  • +1.0: Perfect Positive (As X goes up, Y goes up).
  • -1.0: Perfect Negative (As X goes up, Y goes down).
  • 0.0: No relationship.

Strength Guidelines:

  • 0.9 - 1.0: Very Strong πŸ”₯
  • 0.7 - 0.9: Strong πŸ“ˆ
  • 0.5 - 0.7: Moderate πŸ“Š
  • 0.3 - 0.5: Weak πŸ“‰
  • < 0.3: Very Weak/Negligible

Confidence Intervals (95% CI):

  • Shows the range where the true correlation likely falls
  • Wider CI = less precise estimate (usually with small samples)
πŸ’‘ Common Questions

Q: What is R-squared (RΒ²)?

  • A: RΒ² tells you the proportion of variance in Y that is explained by X. For example, RΒ² = 0.64 means 64% of the variation in Y is explained by X.

Q: Why use ICC instead of Pearson for reliability?

  • A: Pearson only measures linearity. If Rater A always gives exactly 10 points higher than Rater B, Pearson = 1.0 but they don't agree! ICC accounts for this.

Q: What if p-value is significant but r is low (0.1)?

  • A: P-value means it's likely not zero. With large samples, tiny correlations can be "significant". Focus on r-value magnitude for clinical relevance.

Q: How to interpret confidence intervals?

  • A: If 95% CI includes 0, the correlation is not statistically significant. Narrow CI = more precise estimate, Wide CI = less precise (need more data).

Q: How many variables do I need for ICC?

  • A: At least 2 (to compare two raters/methods). More raters = more reliable ICC.


Categorical Agreement (Kappa)

Bland-Altman Analysis (Continuous Data Comparison)

Intraclass Correlation Coefficient (ICC)

πŸ“š Agreement & Reliability Reference Guide

🀝 Cohen's Kappa & Fleiss' Kappa

  • Cohen's Kappa: For agreement between two raters.
  • Fleiss' Kappa: For agreement between three or more raters.

Landis–Koch (1977) scale:

  • > 0.81: Almost perfect agreement βœ…
  • 0.61–0.80: Substantial agreement
  • 0.41–0.60: Moderate agreement
  • 0.21–0.40: Fair agreement ⚠️
  • < 0.20: Slight/Poor agreement ❌

πŸ“‰ Bland-Altman Plot

Used for continuous data to compare two measurement methods.

  • Bias (Mean Difference): Systematic difference.
  • Limits of Agreement (LoA): Interval containing 95% of differences.
  • Confidence Intervals (Shaded): Shows the precision of the Bias and LoA estimates.

πŸ” Intraclass Correlation (ICC)

Measures reliability/consistency.

ICC Forms (Shrout & Fleiss, 1979):

  • ICC1: One-way random effects (raters selected at random).
  • ICC2: Two-way random effects (raters and subjects random).
  • ICC3: Two-way mixed effects (fixed raters).

Interpretation (Cicchetti, 1994):

  • > 0.75: Excellent 🌟
  • 0.60 – 0.75: Good
  • 0.40 – 0.60: Fair ⚠️
  • < 0.40: Poor ❌


πŸ“ˆ Analysis Options

Variable Selection

Method & Settings

Advanced Adjustments

Exclude Variables
πŸ”— Interaction Pairs:

Analysis Results


Binary Logistic Subgroup Analysis - Heterogeneity

Variables

Stratification & Adjustment

βš™οΈ Minimum Counts

Subgroup Analysis Results


πŸ“Š Poisson Analysis Options

Variable Selection

βš™οΈ Advanced Settings
Exclude Variables

Model Refinement

πŸ”— Interaction Pairs:

Poisson Results


πŸ“‰ Negative Binomial Analysis Options

Variable Selection

βš™οΈ Advanced Settings
Exclude Variables

Model Refinement

πŸ”— Interaction Pairs:

Negative Binomial Results


πŸ“ˆ GLM Options

Variable Selection

Distribution & Link:

Predictors


GLM Results


πŸ“ Linear Regression Options

Variable Selection

πŸ’‘ Leave predictors empty to auto-include all numeric variables

Method & Settings

βš™οΈ Stepwise Selection
βš™οΈ Bootstrap CI

Ad Hoc Exclusions

Exclude Variables

πŸ”„ GEE & LMM Analysis

Variable Selection

Model Settings

Adjustments (Covariates)

πŸ“š Core Regression Reference Guide

1. πŸ“ˆ Binary Outcomes (Logistic Regression)

Use For: Predicting Yes/No outcomes (e.g., Disease vs Healthy, Died vs Survived).

Interpretation:

  • Odds Ratio (OR):
    • OR > 1: Risk factor (Increases likelihood of event).
    • OR < 1: Protective factor (Decreases likelihood).
    • OR = 1: No association.

Methods:

  • Standard (MLE): Best for large datasets. Fails with "Perfect Separation".
  • Firth's Penalized: Use for small samples or rare events. Fixes perfect separation.
  • Auto: Automatically switches to Firth if separation is detected.

2. πŸ“‰ Continuous Outcomes (Linear Regression)

Use For: Predicting numeric values (e.g., Blood Pressure, Length of Stay, Cost).

Interpretation:

  • Beta Coefficient (Ξ²):
    • Ξ² > 0: Positive relationship (As X increases, Y increases).
    • Ξ² < 0: Negative relationship (As X increases, Y decreases).
  • R-squared (RΒ²): Percentage of variance explained by the model (>0.7 is usually strong).

Assumptions Checking:

  • Linearity: Residuals vs Fitted plot should be flat.
  • Normality: Q-Q plot points should follow the diagonal line.
  • Homoscedasticity: Scale-Location plot should have constant spread.

3. πŸ”’ Count Outcomes (Poisson / Neg. Binomial)

Use For: Count data (e.g., Number of exacerbations, Days in hospital).

Model Choice:

  • Poisson: Variance = Mean. Good for simple counts.
  • Negative Binomial: Variance > Mean (Overdispersion). Use if Poisson fails.
  • Zero-Inflated: If there are excess zeros (e.g., many patients with 0 visits).

Interpretation:

  • Incidence Rate Ratio (IRR): Similar to OR.
    • IRR = 1.5: Count increases by 50% for every 1-unit increase in X.

4. πŸ”„ Repeated Measures (GEE / LMM)

Use For: Clustered data (e.g., Multiple visits per patient, Eyes per patient).

Model Choice:

  • GEE (Generalized Estimating Equations): Population-averaged effects. Robust to correlation structure errors. Best for binary/count outcomes.
  • LMM (Linear Mixed Models): Subject-specific effects. Handles missing data better. Best for continuous outcomes.

Correlation Structures:

  • Exchangeable: All time points equally correlated.
  • AR(1): Correlation decays over time.
  • Unstructured: No assumption (requires more data).

5. πŸ”› Subgroup Analysis

Use For: Checking if treatment effect differs across groups (Heterogeneity).

Interpretation:

  • P-interaction < 0.05: Significant difference in effect. Report results separately for each group.
  • P-interaction β‰₯ 0.05: Consistent effect. Report the overall main effect.


Kaplan-Meier & Nelson-Aalen Curves

Variable Selection

Plot Settings


Analysis Results


Landmark Analysis for Late Endpoints

ℹ️ Principle: Landmark analysis is useful when the treatment effect is delayed (e.g., immune-oncology) or violates proportional hazards initially.

How it works:

  1. Select a "Landmark Time" (t).
  2. Patients who died/censored before t are excluded.
  3. Analysis is performed only on patients who survived to time t, resetting their "start" time to t.

Settings

Landmark Time (t)

Landmark Analysis Results


Cox Proportional Hazards Regression

Model Configuration

Select Covariates (Predictors)

Cox Model Results


Cox Subgroup Analysis - Treatment Heterogeneity

Variables

Stratification & Adjustment

βš™οΈ Minimum Counts

Subgroup Analysis Results


Restricted Cubic Spline (RCS) Analysis

ℹ️ Purpose: Visualize non-linear relationships between a continuous variable and the hazard ratio (HR). Useful when the risk does not increase linearly (e.g., U-shaped or J-shaped curves).

Variables

Settings


RCS Results


Select data format and variables to run the model.

πŸ“„ Data Format Selection
Select your data structure below. (See 'Reference' tab for detailed format specifications)

βš™οΈ Column Configuration

Columns for Time-Varying Cox analysis:

Identification & Time
Covariates

Time-Varying Covariates: Columns that can change over time within intervals (e.g., treatment status, lab values, symptoms)

Static Covariates: Columns constant within each patient (e.g., age at baseline, sex, initial diagnosis)



Advanced settings, risk intervals, and data inspection.

πŸ“Š Risk Intervals Definition (Wide Format Only)

Risk intervals divide the follow-up period into time windows. For example: [0, 1m, 3m, 6m, 12m] creates 4 intervals.

Last-observed values of time-varying covariates are "carried forward" within intervals.

Preview: Intervals will appear here


πŸ”§ Model Configuration
  • 0.0: Standard Cox (no regularization)
  • 0.1-1.0: Small penalty (useful if unstable)
  • >1.0: Strong regularization (shrinks coefficients)
πŸ›₯️ Data Preview (First 50 Rows)
Data Structure Summary:

Note: Only first 50 rows displayed. Full dataset is used for analysis.


ℹ️ Information & FAQ

Time-varying covariates are variables that can change their values during follow-up.

Examples:

  • Drug dosage (adjusted over time)
  • Lab values (measured periodically)
  • Treatment status (switched during study)

Unlike standard Cox regression (assumes constant covariates), TVC Cox allows modeling of dynamic effects.

Aspect Long Format Wide Format
Rows/Patient Multiple One
Structure Interval-based Measurement-based
Best for Complex follow-up schedules Regular measurements
Preparation Ready to use Requires transformation
Example Medical visit notes Quarterly labs

Three methods to define intervals:

  1. Auto-detect (Recommended):

    • Extracts time points from column names
    • E.g., 'tvc_3m' β†’ extracts time=3
    • Creates intervals: [0-3m], [3m-6m], [6m-12m], etc.
  2. Quantile-based:

    • Divides data into roughly equal event frequencies
    • Ensures sufficient data per interval
  3. Manual:

    • User specifies exact time points
    • E.g., "0, 1, 3, 6, 12, 24"
    • Maximum control but requires planning

Input (Wide Format):

ID | time | event | tvc_0m | tvc_3m | tvc_6m
1  | 12   | 1     | 100    | 110    | 120
2  | 6    | 0     | 95     | 98     | NA

Output (Long Format):

ID | start | stop | event | tvc
1  | 0     | 3    | 0     | 100
1  | 3     | 6    | 0     | 110
1  | 6     | 12   | 1     | 120
2  | 0     | 3    | 0     | 95
2  | 3     | 6    | 0     | 98

Key Points:

  • Event=1 only in final interval
  • Covariate values carried forward (last observed)
  • Each patient split into multiple rows

⏱️ Time-Varying Cox Reference

1. What is Time-Varying Cox?

Standard Cox regression assumes predictors (like treatment) are constant over time. Time-Varying Cox allows values to change.


2. Data Formats

A. Long Format (Recommended)

  • Structure: Multiple rows per patient. Each row represents a specific time interval.
  • Columns: patient_id, start_time, stop_time, event, and [covariates].
  • Status: Ready for direct analysis.
  • Example: Patient A (0-6 months, Drug=0), Patient A (6-12 months, Drug=1).

B. Wide Format

  • Structure: One row per patient with multiple columns for covariate measurements at different times.
  • Columns: patient_id, followup_time, event, tvc_baseline, tvc_3m, tvc_6m, etc.
  • Status: Requires transformation (we convert it to Long Format using "Risk Intervals").

3. Interpretation

  • Hazard Ratio (HR): Represents the instantaneous risk.
  • HR = 1.5: At any given moment, having the condition (or 1 unit higher value) increases the risk of the event by 50% compared to not having it at that same moment.

4. Assumptions

  • Proportional Hazards: Still applies! The effect of the variable (HR) is assumed constant over time, even if the variable's value changes.

Analysis Results


πŸ“š Quick Reference: Survival Analysis

🎲 When to Use What:

Method Purpose Output
KM Curves Visualize time-to-event by group Survival %, median, p-value
Nelson-Aalen Cumulative hazard over time H(t) curve, risk accumulation
Landmark Late/surrogate endpoints Filtered KM, immortal time removed
Cox Multiple predictors of survival HR, CI, p-value per variable + forest plot
Subgroup Analysis Treatment effect heterogeneity HR by subgroup, interaction test
Time-Varying Cox Time-dependent covariates Dynamic HR, interval-based risk

πŸ“ Detailed Interpretation:

1. Hazard Ratios (HR)

The main measure of effect in survival analysis.

  • HR = 1: No difference in risk between groups.
  • HR > 1: Increased risk of event (e.g., HR 1.5 = 50% higher risk). Hazardous / Bad Outcome.
  • HR < 1: Reduced risk of event (e.g., HR 0.7 = 30% lower risk). Protective / Good Outcome.

2. Log-Rank Test

Used in Kaplan-Meier analysis to compare survival curves.

  • P < 0.05: There is a statistically significant difference between the survival curves of the groups.

3. Landmark Analysis

Addresses Immortal Time Bias or violation of Proportional Hazards.

  • By selecting a "Landmark Time" (e.g., 6 months), you exclude patients who died/censored before 6 months.
  • You then compare survival given that the patient has already survived to 6 months.
  • Note: This reduces your sample size but provides a fairer comparison for late-acting treatments.

πŸ” Advanced Inference

Mediation Analysis

Variables


Results


Collinearity Diagnostic

Variables


VIF Results


Model Diagnostics (OLS)

Model Specification


Diagnostic Results


Meta-Analysis Data Entry

Enter comma-separated values for effect sizes and variances.

Data Inputs


Results


Advanced Inference Reference

Mediation Analysis:

  • ACME (Indirect Effect): The portion of the effect mediated by M. (Effect of X on Y via M).
  • ADE (Direct Effect): The effect of X on Y, keeping M constant.
  • Total Effect: ACME + ADE.

Collinearity Diagnostics:

  • VIF (Variance Inflation Factor):
    • VIF > 5: Moderate multicollinearity (Caution).
    • VIF > 10: Severe multicollinearity (Consider removing variable).
  • Tolerance: 1/VIF. Values < 0.1 indicate problems.

Model Diagnostics (OLS):

  • Residuals vs Fitted: Checks linearity. Ideally, points fluctuate randomly around 0 (horizontal line).
  • Q-Q Plot: Checks normality of residuals. Points should fall along the diagonal line.
  • Cook's Distance: Measures influence. Points with Cook's D > 4/n (or > 1) are highly influential and may skew results.

Heterogeneity (Meta-Analysis):

  • I-squared (IΒ²):
    • < 25%: Low heterogeneity.
    • 25-75%: Moderate heterogeneity.
    • > 75%: High heterogeneity.
  • Q-Statistic P-value:
    • P < 0.05: Significant heterogeneity exists.

πŸ”’ Sample Size & Power Calculator


Means (T-test) Setup

Group 1

Group 2


Results


Proportions Setup

Expected Proportions


Results


Survival Analysis Setup

Input Mode

Parameters


Results


Note: This calculates required number of EVENTS, not total subjects.

Correlation Setup

Parameters


Results


🎯 Causal Inference


πŸ’‘ Need matched dataset? Use Clinical β†’ Table 1 & Matching to create balanced paired data first.

IPW & Balance Results


Stratified Results


E-Value Results


Love Plot (Covariate Balance)



Common Support (Overlap)


Causal Inference Methods

1. Propensity Score Methods (IPW/IPTW)

  • Goal: Estimate the Average Treatment Effect (ATE) by adjusting for confounding.
  • Mechanism: Observations are weighted by the Inverse Probability of Treatment Weighting (IPTW).
  • Diagnostics: Check Standardized Mean Differences (SMD). Ideally, all SMD < 0.1 after weighting.

2. Stratified Analysis (Mantel-Haenszel)

  • Goal: adjust for confounding by a categorical variable (stratum).
  • Mantel-Haenszel OR: A weighted average of stratum-specific odds ratios.
  • Homogeneity Test: Checks if the effect of treatment is consistent across all strata.

3. Sensitivity Analysis (E-Value)

  • Goal: Assess robustness to unmeasured confounding.
  • E-Value: The strength of association an unmeasured confounder must have to explain away the observed effect. Larger E-values imply more robust results.
πŸ“‹ Analysis Settings Guide

Logistic Regression

  • Auto: Best for beginners
  • Firth: Stable for small samples or rare events
  • Screening P: 0.05-0.20 typical range

Survival Analysis

  • Kaplan-Meier: Non-parametric (recommended)
  • Efron: Better for tied event times

P-value Format (NEJM Standard)

  • Lower: 0.001 (display as "<0.001")
  • Upper: 0.999 (display as ">0.999")
🎏 UI Settings Guide

Recommended Settings

  • Plot Width: 10-14 inches
  • Plot Height: 5-8 inches
  • Plot DPI: 100-300 (higher = sharper)
  • Theme: Auto follows system preference
  • Decimal Places: 3 for most stats
πŸ“Š Logging Status
Info: Performance Guide

Caching

  • TTL: How long to keep cached results (seconds)
  • Typical: 3600 (1 hour)

Threading

  • Threads: CPU cores available for parallel processing
  • Typical: Set to # of CPU cores

Compression

  • Reduces memory usage for large datasets
  • May slightly increase CPU usage
πŸ“‹ Analysis Log & Guide
Recent Settings Used

Guide

  • MCC: Adjusts P-values to control family-wise error rate or FDR.
    • Bonferroni: Conservative.
    • FDR (BH): Good balance for discovery.
  • VIF: Detects multicollinearity.
    • VIF > 10: High collinearity (consider removing variable).
⚠️ Advanced Options

Validation

  • Strict: Stop on validation errors
  • Validate: Check data types and formats

Debug

  • Debug Mode: Show detailed debugging info
  • Verbose: Print intermediate steps
  • Profile: Measure CPU/memory usage