# Dropping Users from Mileage Database

The current method is perfectly fine with no flaws. It sounds like you guys are trying to find ways to artifically skew the results higher.

*not*have 10x more relevant data than someone with 10 tanks. Both cars are in fact equally valid, as both are broken in.The current method is perfectly fine with no flaws. It sounds like you guys are trying to find ways to artifically skew the results higher.

I frequent a beer rating website. They don't include your ratings until a certain number of ratings are given as an easy way to weed out games. Though I can't imagine too many games played on this database. Ford employees stacking the Escape's average with fake entries?

A less bad idea is adding weight to people with more current data. If someone hasn't updated their results in 1 year, they can be made to count less than someone that updated within the last month, but it's debatable whether or not that's a good idea... and in fact I'd argue it is not.

No.

If Jason is performing a simple arithmetic mean of each driver's average that is true. But "averaging averages" is the #1 mistake in elementary statistics, and Jason is not making it (I hope. He is at an engineering school, after all).

The number of miles driven in total by all drivers, divided by the number of gallons purchased in total by all drivers equals the average mpg of all drivers. Period.

Any time one has a set of averages, they must be broken back to their individual components (miles driven and gallons purchased) or kept as an average and weighted. So someone who drove 50Kmiles at 50mpg will overwhelmingly dominate another who drove 5k miles at 40mpg.

The proper mean of these two drivers is 48.9mpg: 50K at 50mpg = 1000gal, 5k at 40mpg = 125 gal. So 55k/1125gal = 48.9mpg. When those miles were added has no bearing.

Any 'time' influence is purely psychological, with the viewer internally weighting or discounting certain drivers' inputs accordingly.

You are wrong, gonavy, because this is not an issue of taking an averages of averages. See my signature? That is NOT my average mileage; it's my TOTAL mileage. My average mileage would be adding up the mileage for all my tanks and dividing by the number of tanks. Total mileage is the total number of miles divided by total number of gallons, and that's what's in my signature.

Currently Jason takes an average of totals, which is perfectly fine.

To put it another way, if one person drives 500,000 miles and gets 30 mpg (using 16,666.67 gallons of fuel), and 9 other people each drive 10,000 miles and each average 50 mpg (200 gallons each), is the average mileage 31.95 or 48? It's 48 obviously.

590,000 / 18,466.67 = 31.95

(50*9 + 30) / 10 = 48

Someone who buys that car can expect to get roughly 48 mpg. It is unlikely they would get 32 mpg, regardless of how many miles the 30-mpg guy puts on this car, since it has no effect on anyone else. He is only one person that gets 30 mpg so he shouldn't count any more than anyone else.

To put it another way, if one person drives 500,000 miles and gets 30 mpg (using 16,666.67 gallons of fuel), and 9 other people each drive 10,000 miles and each average 50 mpg (200 gallons each), is the average mileage 31.95 or 48? It's 48 obviously.

590,000 / 18,466.67 = 31.95

(50*9 + 30) / 10 = 48

Someone who buys that car can expect to get roughly 48 mpg. It is unlikely they would get 32 mpg, regardless of how many miles the 30-mpg guy puts on this car, since it has no effect on anyone else. He is only one person that gets 30 mpg so he shouldn't count any more than anyone else.

Yes- if 10 people drove reasonably far and got 50mpg, I would believe their results, not the true arithmetic mean of 32. But that is NOT taking the average- its doing some mental processing, basically looking for the mode of the distribution (50), and weighting its dominance relative to the sole outlyer at 30, despite his massive mileage total. But its only valid because the 10 drivers all drove at least reasonably far- not a single tank, but more than a few.

What I think you are getting at, is that after a point (several Kmiles, when a driver has steadied out), all drivers should be weighted equally. I'll buy that.

That's also why real estate ranges give the mean and median- the median tends to un-weight the far outlyers and give more attention to the pricing that is selling best.

Yes- if 10 people drove reasonably far and got 50mpg, I would believe their results, not the true arithmetic mean of 32.

30

50

50

50

50

50

50

50

50

50

The mean is the sum of those divided by 10, which gives you 48. Miles divided by gallons is not an average, it's a direct measurement -- one data sample. Finding the average of multiple data samples is an average.

Total mileage = total miles / total gallons.

Average mileage = mean of all individual mileage samples

In order to calculate the arithmetic mean of a set of data samples, all you need is the set of data samples. How those samples were calculated is completely irrelevant. To calculate the mean mileage, you need the set of lifetime mileage; you have no use for the miles driven or gallons of gas consumed. All you need is the ratio.

