What is data-mining bias?
The systematic error of testing many strategies on the same dataset and reporting only the winners. Also called backtest overfitting or p-hacking. The apparent alpha is a statistical artifact, not a real edge.
If you test 100 random strategies on a single dataset, roughly 5 will appear statistically significant at the 95% confidence level by chance alone. Reporting only those 5 is dishonest because the apparent significance is an artifact of the testing process, not evidence of a real signal. The corrections are out-of-sample testing (validate the strategy on data not used to discover it), pre-registration (commit to the strategy before testing it), and conservative confidence thresholds that account for the number of strategies tested.
How invest-like guards against it
The seven frameworks (Buffett, Graham, Fisher, Lynch, Greenblatt, Munger, Smith) were not discovered by backtesting. They are documented investor philosophies that pre-date the dataset by decades. The rubrics encoding each framework are derived from the investors' own writings, not from pattern-matching to historical returns. This eliminates the most common source of data-mining bias.
The consensus screen (5+ of 7 frameworks passing) is a derived signal, but the construction is deliberately simple: an unweighted majority vote across the seven framework outputs. There is no hyperparameter to tune, no threshold to optimise, no factor weighting to over-fit. The working paper documents the construction explicitly.
Frequently asked questions
What is data-mining bias?
The systematic error of testing many strategies on the same dataset and reporting only the winners. Apparent alpha becomes a statistical artifact.
How does invest-like avoid it?
The seven frameworks are documented investor philosophies pre-dating the dataset by decades, not patterns discovered by backtesting. The consensus screen is a simple unweighted vote with no tuned hyperparameters.
What's the right way to read a published backtest?
Look for out-of-sample validation, pre-registered methodology, and explicit disclosure of how many alternative variants were tested before the published version was selected.
Educational only. invest-like is not a registered investment adviser; nothing on this page constitutes personalised investment advice.