Split Testing: Interpreting An Example
Case Studies - Justin Premick - November 8th, 2006 - PermalinkI brought up the topic of split testing a while back. However, I didn’t have a sample split test to refer you to at the time.
So, I went back and found an example. Let’s take a look at a split test, what was varied, and what we might infer from our results.
The following stats are from a message that we sent out to our AWeber Test Drive subscribers to inform them about a new article on our website. Open percentages for each appear in the right-hand column.

The complete subjects were:
Learn How to Get More Customers from Free Downloads
{!firstname_fix} Learns How to Convert Free Downloads to Customers
Converting More Free Downloads to Paid Customers
Conversion Secrets for Free Downloads to Paid Customers
By looking at the open rate statistics, we see that the message with subject Conversion Secrets for Free Downloads to Paid Customers garnered the best open rate at 20.6%.
So what do we learn from this?
First of all, all four messages were sent at the same time, so differences in send date and time did not contribute to the difference in open rates. Also, the content of the messages is identical, so any effects due to content filtering would be based on the subject only, which is what we’re testing.
The use of the word “Secrets” contributed to a greater open rate. It implies that the information in the message is not widely known, and is valuable due to that scarcity.
I attribute the success of the next-best subject to personalization.
Including the recipient’s first name didn’t get us as high an open rate as using the word “Secrets,” but it did get a better open rate than not using “Secrets” nor personalization.
A future message might use a subject that included personalization and a psychological trigger such as the word “Secrets” to maximize open rates.
This entry was posted on Wednesday, November 8th, 2006 at 2:00 pm and is filed under Case Studies. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a comment response, trackback from your own site, or permalink.

November 8th, 2006 at 2:53 pm
Be careful when split testing that your sample size is valid.
As a real eye-opener, try split testing identical pages. The results will NOT be evenly distributed.
This means that you want to see an ever-increasing difference between your test pages and that the sample size be sufficiently large (I use at least 250 hits for at least one of the pages).
I’m not sure how much confidence I would give to Justin’s example as the percentage differences between the four groups are so small. The critical question is: How large was the sample size?
November 8th, 2006 at 3:03 pm
Steve,
You’re absolutely correct in stating that a sample size must be sufficiently large before we start testing and drawing conclusions. After all, if you send a message to 10 people, and 2 of them open it, that’s a 20% open rate, but it’s still only 2 opens.
The sample size in this test is statistically significant. While I can’t disclose precise numbers, I can tell you that each message in the split test above was sent to and opened by thousands of subscribers.
November 13th, 2006 at 5:28 am
Justin !
you are in the business of selling, babe !
Try to become a direct marketing genius !
How to Get More Customers from Free Downloads
{!firstname_fix} How to Convert Free Downloads to Customers
Here the people just think (old shit -we use it always, nothing new to us)while mostly they dont do.
How to Convert more Prospects to Paid Customers by simply doing one easy thing all people can do, but most dont
(yes you got it ! by using downloads)
Conversion Secrets to get more Paid Customers you can possibly handle
(Does this sounds great ?
Read all dankennedystuff you can get.
http://www.dankennedy.de
http://www.powermarketingstrategy.com
November 14th, 2006 at 6:26 pm
Justin
After looking at a pretty cool php split testing product that uses taguchi the developer of the product talked about how to measure whether the difference was statistically significant.
He worked on a square root of the sample size being the value that determined whether the difference was significant.
ie The square root of 100 is 10 which is a 10% difference.
however at 2400 samples you only need to get a 2% difference to be signficant.
The bigger the sample the easier it is to get a significant result.
Hope this helps.
Steve Shepherd
founder of theexclusive.info website
PS. I use aweber all the time and it is FANTASTIC!!
November 15th, 2006 at 11:24 am
Steve,
Thanks for that.
I can’t really comment on how accurate that method is compared to using standard deviations, but it does seem a lot quicker/easier to use than standard deviations, so if you’ve found it to be accurate, I say run with it!
November 16th, 2006 at 12:41 pm
I’ve created a small script that computes a value so you can know whether the difference is significant or not in an A/B split test like you’re talking about.
It is free and available here:
http://programmerer.com/ab.php
(Explanation on that page.)
Sincerely,
Sten
January 18th, 2007 at 7:58 pm
I saw this article titled "16 Tests (and Results) to Improve Email Response Rates" which also mentions use of testing to find best format. Shared results are also interesting to check http://www.marketingsherpa.com/sample.cfm?ident=29840
June 9th, 2007 at 11:11 am
Interesting article and I like the idea of testing …. testing …. testing ….
But I have to say with my cynics hat on, I’m not all that impressed by the different opening percentages. They’re all remarkably close to each other and I’m pretty certain they’re not different enough to get anywhere near statistical significance.
I’m not knocking the idea - just doubting the stats before everyone goes crazy including the word Secrets in all their postings