Dreams of Spring and Bayesian Inversion
Ah…it
has come to that time of year when we are all dreaming of being outside without
a jacket on. Some subset of those dreamers are also thinking about their
gardens, getting their fingers in the soil, and watching the miracle of life
push its way up out of the soil. Some of us may have already started the year’s
first plants – an act which seems to fit into the “audacity of hope” category
given the number of record low temperatures set in the last week of February
and first two weeks of March. But given the reality of trying to grow a hot
pepper in Wisconsin, getting an early start is the only way.
Coldest Winter in Decades for Southern
Wisconsin
As we
can see from this data provided by the Department of Atmospheric
Sciences at the University of Wisconsin, the coldest winter since 1980 for
Madison, WI was the winter of 1985-86, when the average temperature (black
line) was 16.2 degrees for the 3 month period from December through February.
This
year will crush that mark, with an average of 14.0
degrees. And of course if you get away from the “heat island” effect of
Madison, you can take a couple more degrees off that number – temperatures
where I live 30 miles Southwest of Madison have averaged 12.1 degrees for the
winter. You’d have to go back to the winter of 1885 to find a colder average temperature
in Madison!
Now one
of the great pleasures of life in a crazy cold place like this is to warm
yourself up next to a wood fire stove – and we are blessed with an efficient
and lovely wood stove which we tend to keep going on the particularly cold
days. To see how this connects to Bayesian Inversion we need to talk about
peppers – hot peppers.
Germinating Hot Peppers
You
see, it turns out that for many people growing genus capsicum is something of an obsession, and
while I wouldn’t call myself a “Chili Head” I certainly like spicy food and
thought I’d try my hand at growing some plants this year. The cultivars I
selected were the Long Thin Cayenne (30-50,000 Scoville units) and
the Datil (100-300,000 Scoville units). Hot peppers are the topic of many fiery
discussions in the gardening community and entire forums are dedicated to the subject. While
perusing these sites for some basic advice, I realized (much to my chagrin)
that it is infamously difficult to get these little suckers to germinate and
sprout. In addition, I learned that the company (who will remain nameless) from
which I ordered seeds has a mixed reputation. (A friend recently told me the company
has been accused of being experts on seed catalogues, but being a bit spotty on
seed genetics.)
So it
was with some trepidation that I set out to sprout my first seeds of the year.
We picked the Datil peppers as there seemed to be a consensus that the hotter
peppers needed more time to grow. The Datil is an exceptionally hot pepper - a
variety of the species Capsicum chinense. From
the reading at the aforementioned forum, Capsicum Chinense varieties of hot
peppers have an exceptionally long germination period (12-25 days).
Additionally, it seemed that while there was a lot of experience on the forum,
there was very little data. In fact, I noticed a thread where somebody asked
for a chart of germination times and was greeted with a chorus of responses
that focused on how much germination time varied, rather than responding with
any data. Since I hear this objection so much in my professional life, my
interest was further piqued. Certainly there would be variation in germination
times, both because of seed variance and method, but it should also be possible
to assemble data and report it.
It was
with some surprise that I discovered more than 25% of the seeds had germinated
on day five. An additional 25% germinated on day six, and by day seventeen, 78%
of the seeds had germinated (see chart below).[1]
Calculating Population Proportion
Many seed companies test
a batch of seeds and report what percentage germinated on their seed packages.
As we all intuitively know, the more seeds tested the more confidence we can
have in the reported germination rate; but you’ll have some uncertainty in the
true germination rate of the whole population (or “population proportion”) even
if you do a relatively large germination sample. The consulting firm I work for
specializes in measurements and small sample statistics and has a useful
calculator as one of their power tools called “Bayesian Population Proportion”
which allows a user to calculate confidence intervals for germination rates.
For example if you test 200 seeds and 169 germinate, you would report an 84.5%
germination rate. But without other prior knowledge of germination rates for
that population of seed, you could only be 90% confident that the true
germination rate for the whole population was between 79.7% and 88.1%. This is
not entirely intuitive, but I think it is intuitive that we could not know the
germination rate exactly unless we tested the whole
population.
Returning to my Datil
seed experiment, we can use this calculator to calculate a 90% confidence for
the germination rate of the entire batch of Datils. The sample size was 18
seeds, of which 14 germinated. Using the BayesianPopProportion tool we find
that the 90% confidence interval for the germination rate of the whole
population is 58% to 89%. That's plenty high for me!
Using Bayesian Inversion to Calculate Required Sample Size
Different users will
have different needs in terms of confidence and germination rates. For
example, my bar for germination rate was a low 20% but I
wanted to feel 90% confident that at least 20% would germinate or I would look
to re-buy the seeds elsewhere. But I was just growing the Datils on a lark,
knowing that a hot climate plant like this needs a lot of babying to produce up
here in the north, and would never be a “main crop” producer in our garden.
Someone who runs a Community Supported Agriculture (CSA) business would have a
much higher bar for germination rates. Their livelihood depends on successful
germination and seeds can make up 5-10% of their costs. Such a farmer might
need to feel 95% confident that 75% of their seeds would germinate. In another
circumstance I can imagine an even more restrictive example. I would want to
have 95% confidence that less than 1% of a population of Space Shuttles blow up
on launch before agreeing to ride one into space. But to achieve these various
confidence levels, how many samples of each do we need? The answers are
respectively: one seed for the Datil sample, 13 seeds for the CSA farmer, and
300 launches of a space shuttle without a failure.
I created a little tool to calculate these values (Figure 3) which
I'm happy to share if you are interested. You select your requirements for
confidence level and success rate, and the tool reports the required sample
size given 0-3 failures.
Figure 3: Calculate sample size requirements for various
confidence and population proportions.
Since we bought many of
our seeds from a place with a supposed mixed reputation this year, I’m going to
sample a wide variety of seeds just to see whether I need to re-order any
seeds. This is where my calculator will come in handy. For the main plants in
our garden, I want to have 75% confidence that 60% of the seeds will germinate.
Surprisingly, I only need a sample of two seeds where both germinate to achieve
this. In any sample where one or both of the seeds fail, I can test an additional
nine seeds. As long as eight of eleven seeds germinate I have achieved my
requirements.
Conclusion
Winter was cold. I’m
looking forward to spring. Using statistics to help in your daily life is a hot
topic.
Please let me know if
you found this article interesting by commenting below. And stop over to the
Hubbard Decision Research website if you are interested in statistical tools to download.
[1] For anyone interested, I
used the "wet paper towel in a baggie" method and kept the baggie
behind my wood stove laying on the bricks. I did go to the trouble of finding a
place where the temperature varied between 65 and 90 degrees - most of the day
it is between 75 and 85 on the bricks, and it cools down below 70 for only a
few hours in the early mornings each day.
No comments:
Post a Comment