Lies, **** Lies, and Statistics
#1
Lies, **** Lies, and Statistics
Hi Guys,
In all statistics, you will eventually encounter outliers. An outlier is usually defined as anything more than 2 standard deviations from the mean (average).
That being said, after looking through the stats on several vehicles there are clearly some bogus stats in the database. For example, I like the VUE GL just as much as the next guy, But there is no way you're going to pull of 60+ MPG in the thing unless you're coasting down the backside of the Rockies.
At the other extreme, the average for the prius 2 is 25 MPG because someone posted that they are only getting .5 MPG. Clearly this must be a mistake in entry, as the only way I could see someone getting that is if they drive it without wheels.
I think an automatic method of data verification/moderation should be put in place. If a user enters some data that makes it an outlier, they should be warned that their data may not be correct. If they still enter it, it should be flagged to at least be looked at by a moderator.
This way, when people come to the site they at least get a realistic range. As it is right now, the Prius 2 looks to be one of the worst performing hybrids on the market.
~X~
EDIT
As an example of the 2 std deviation range, for the honda insight, the average MPG is 56.15 with a standard deviation of 9.12. This gives a range of 37 MPG to 74 MPG, which means there is only one value right now that is outside the range (and I'm sure that person can justify their results).
In all statistics, you will eventually encounter outliers. An outlier is usually defined as anything more than 2 standard deviations from the mean (average).
That being said, after looking through the stats on several vehicles there are clearly some bogus stats in the database. For example, I like the VUE GL just as much as the next guy, But there is no way you're going to pull of 60+ MPG in the thing unless you're coasting down the backside of the Rockies.
At the other extreme, the average for the prius 2 is 25 MPG because someone posted that they are only getting .5 MPG. Clearly this must be a mistake in entry, as the only way I could see someone getting that is if they drive it without wheels.
I think an automatic method of data verification/moderation should be put in place. If a user enters some data that makes it an outlier, they should be warned that their data may not be correct. If they still enter it, it should be flagged to at least be looked at by a moderator.
This way, when people come to the site they at least get a realistic range. As it is right now, the Prius 2 looks to be one of the worst performing hybrids on the market.
~X~
EDIT
As an example of the 2 std deviation range, for the honda insight, the average MPG is 56.15 with a standard deviation of 9.12. This gives a range of 37 MPG to 74 MPG, which means there is only one value right now that is outside the range (and I'm sure that person can justify their results).
Last edited by Xyrus; 07-07-2007 at 08:02 AM.
#3
Re: Lies, **** Lies, and Statistics
I see the error in the Vue entry, but I'm not seeing the Prius 2 mistake. I see the average is still 47+. I do agree with you and wish some of these entries could be weeded out. Included in these would be the obvious mistakes as you mentioned. Also, those people that enter 1 tank and then never visit the database again. How is that any kind of accurate representation?
The HCH II database has 13 entries with less that 100 miles (including two entries of 1 mile, one of 6 miles and one of 9 miles). There are 123 entries with less than 1,000 miles. This is 22%. Of course, people have to start somewhere, but only 28 of the 123 entries below 1,000 miles are active. Why are the low miles driven and inactive entries not deleted? Do they serve any purpose? What is the point of leaving in these entries of a 33.6 mpg over 1 mile, 19.9 mpg over 6 miles, etc.?
Just as hypermilers are not recognized until 3,000 miles, there should be some kind of limits used before a vehicle is considered in the average of a given class of vehicles in the database.
The HCH II database has 13 entries with less that 100 miles (including two entries of 1 mile, one of 6 miles and one of 9 miles). There are 123 entries with less than 1,000 miles. This is 22%. Of course, people have to start somewhere, but only 28 of the 123 entries below 1,000 miles are active. Why are the low miles driven and inactive entries not deleted? Do they serve any purpose? What is the point of leaving in these entries of a 33.6 mpg over 1 mile, 19.9 mpg over 6 miles, etc.?
Just as hypermilers are not recognized until 3,000 miles, there should be some kind of limits used before a vehicle is considered in the average of a given class of vehicles in the database.
#4
Re: Lies, **** Lies, and Statistics
You are correct Mr. Kite. If you click on the Prius 2 it comes up correctly, but on the main graph it shows the Prius 2 at 25 MPG.
Seems like there may be more than 1 error.
~X~
Seems like there may be more than 1 error.
~X~
#5
Re: Lies, **** Lies, and Statistics
Sometimes people put in erroneous data to mess with the database. So, such a check wouldn't really be fullproof.
If you could please locate by providing the URL of these outliers as you find them, we can do our best to take care of them. Thanks.
If you could please locate by providing the URL of these outliers as you find them, we can do our best to take care of them. Thanks.
#6
Re: Lies, **** Lies, and Statistics
OK. I see what you are talking about now. I believe what has happened is that the text is no longer in line with the bar that it represents. So, the Prius II text is misplaced on the 25 bar (should be for the RX 400h) instead of the 48 bar above.
Last edited by Mr. Kite; 07-10-2007 at 09:10 AM.
#7
Re: Lies, **** Lies, and Statistics
https://www.greenhybrid.com/compare/.../car/4450.html
This car has a reported tank of 75.6 gallons, which might be possible in Prius II if you also fill up the entire interior volume with gasoline.
Here's the second example.
https://www.greenhybrid.com/compare/.../car/2141.html
This car reported a tank of 70,726 gallons. The Boeing 747-400ER's fuel capacity is 63,705 gallons.
#8
Re: Lies, **** Lies, and Statistics
Here's the Vue entry Xyrus was referring to.
https://www.greenhybrid.com/compare/.../car/5118.html
He should either fix his entry or be congratulated for his near 300 mpg tank.
If you go to the Civic Hybrid II database and sort by distance, you will see the inactive entries with single digit miles. Can these be deleted as well? Can we come up with some guidelines here?
https://www.greenhybrid.com/compare/.../car/5118.html
He should either fix his entry or be congratulated for his near 300 mpg tank.
If you go to the Civic Hybrid II database and sort by distance, you will see the inactive entries with single digit miles. Can these be deleted as well? Can we come up with some guidelines here?
#9
Re: Lies, **** Lies, and Statistics
EASY EDITS:
(1) IF FUEL (MPG) < 1, error. Do not save entry.
(2) IF FUEL (MPG) > 300, error. Do not save entry.
(3) IF FUEL (GAL) < 1, error. Do not save entry. An entry of this sort *could* be valid, although highly statistically inaccurate, due to the small fuel usage, and variability of fuel refills.
(4) IF FUEL (GAL) > 50, error. Do not save entry. It's possible someone wants to input an entry covering many, many miles (and thus many GAL). Make them separate this large and unusual entry into chunks of 50 GAL or less.
(5) IF DISTANCE < 10, error. Do not save entry.
(6) IF DISTANCE / GAL less than 1, or greater than 300, error. Do not save entry.
These are simple edits.
You could change any of the numbers to make the valid ranges tighter, although as time goes by, we hope to be in vehicles that CAN hit 301 MPG