Machine Learning | Ryan McComb

fig 1. The whole buoycast machine, end to end. The dots are data moving through the pipe.

Machine learning · buoycast

A little north of me, off Gillson Beach, there’s a NOAA buoy that reports the water temperature every ten minutes. buoycast turns that stream into a seven-day forecast. The whole system is the diagram above: lake history and weather go in at the top, a band of possible water temperatures comes out the bottom. The three ideas it runs on are below.

01 · The feature builder

A model never sees “a lake.” It sees rows. The feature builder takes one hour of plain facts (water now and a few hours ago, wind, sun, air temperature, season) and turns them into a row of 46 numbers. Ten years of hours becomes about 600,000 rows, each paired with what the water actually did next. Learning is just finding rules that connect the two.

fig 2. The feature builder turns ordinary lake and weather facts into one row the model can read.

02 · The regression

Once every hour has become a row of numbers, the job is regression: learn a rule that maps inputs to water temperature. Buoycast uses gradient boosting, which is a stack of small decision trees. The first tree makes a rough guess. The next tree looks at what the first one missed. The next tree fixes what is still wrong. After many tiny corrections, the stack has learned the shape of the lake without anyone writing rules by hand.

fig 3. Gradient boosting: each little tree fixes part of the previous error.

03 · The forecast

The model is trained five times, not once. One version predicts the cold edge, one predicts the warm edge, and the middle one predicts the median. That is why the final product is a band, not a single line. The forecast also starts from the live buoy reading, because if the thermometer says the water is 64.2 degrees right now, hour zero should not pretend otherwise.

fig 4. The output is a forecast band: the middle guess plus cold and warm plausible edges.

04 · The check

The last step is boring on purpose: hide past seasons from the model, make it forecast them, and score the misses. Near-term forecasts mostly tie the simple “water stays where it is” baseline. Farther out, weather starts to matter, and the model pulls ahead.

Mean absolute error in degrees Fahrenheit, model versus the no-change baseline, by forecast lead
lead	model	“no change”	edge
+1 hour	0.09°F	0.09°F	tie
+1 day	0.82°F	0.79°F	tie
+3 days	1.20°F	1.62°F	+26%
+7 days	1.62°F	3.03°F	+47%

These strategies aren’t mine alone, and none of them are new; they’re old, public ideas applied carefully. Most of them run parallel to the VoteHub 2026 midterm methodology, which is mainly the work of Zachary Donnini: that model checks itself the same way buoycast does, by hiding whole election years and forecasting them cold. Most of what I know about this comes from that write-up and from questions Zachary has answered for me.

That’s the whole loop: inputs, regression, forecast, check. The stack is small on purpose: Python and scikit-learn, an hourly refresh, a retrain every Sunday morning, all on open data from NOAA and Copernicus. The live version is headed for Google Cloud, with the dashboard served from a Vercel page. Code and backtests are on GitHub.