Ninety gigs, down the toilet

Clearly, I know what I'm talking about here.

Yes, I am pleased to learn of Hitachi's plan to release probably-functional "one terabyte" hard drives Real Soon Now. They'll probably work fine, the price is good, it's like Bigfoot or Jesus. Huzzah.

This is, however, a good time to mention that now that consumer hard drives are nudging the "1Tb barrier", the capacity rip-off factor is about to become worse by a factor of 1.024. Again.

As I and others have written many times before, storage manufacturers are, almost without exception, in love with specifying their devices as if a kilobyte is 1000 bytes, a megabyte 1000 kilobytes, a gigabyte 1000 megabytes, and (now) a terabyte 1000 gigabytes.

According to the standard SI prefixes, this is exactly true. There are one thousand grams in a kilogram, after all.

In computer usage, though, those SI prefixes are perverted to refer to powers of two, not ten, despite the so-far-unsuccessful effort of the standards organisations to get everybody to call the computer capacities "kibibyte", "mebibyte" and so on.

So a real kilobyte, as used by every desktop computer operating system, contains two to the power of ten, 2^10, 1024, bytes. A real megabyte contains 2^20, 1,048,576, bytes. A real gigabyte contains 2^30, 1,073,741,824, bytes. A real terabyte contains 2^40, 1,099,511,627,776, bytes.

As you can see, the difference between the powers of ten and the powers of two - the rip-off factor, in other words - gets worse and worse as capacities rise. Once you get to the terabyte level, the factor is very nearly 1.1.

There can be a further loss of capacity from the space taken up by formatting data - the metaphorical painting of the lines on the parking lot. But that varies with the filesystem you use, and the actual raw capacity you get from a drive with sticker capacity X varies, too.

That capacity is never high enough to cancel out the 1000/1024 rip-off factor, but it often is enough to account for the space taken up by formatting. The "320Gb" Western Digital drives in my current computer do indeed format to 298Gb, exactly what you get if you divide 320 by 1.024 three times. That's thanks to an extra 67-odd megabytes of space, which cancels out the formatting losses. They're still nowhere near 320 real formatted gigabytes, though.

So even if the new "one terabyte" drives are similarly generous, you can only expect them to format to 909 - maybe 910 - gigabytes 0.91 real terabytes, which is 931 real gigabytes.

So, OK, maybe not technically ninety gigs down the toilet. Maybe only 69, depending on which way you look at it.

Either way, that's a lot of $US5000 18 megabyte Winchesters. And there are still plenty of hard drives on the retail shelves that don't hold as much as this new one will rip you off for.

So, until someone starts selling a "1.1Tb" or larger drive, the true 1Tb barrier for single drives will not be broken.

The mismatch, of course, may be getting worse, but it arguably matters less and less, as the price per megabyte of hard drives continues to fall.

But that doesn't mean that people in the year 2020, or whenever, won't feel fleeced when their new "1Pb" drive only formats to a lousy 888 909 terabytes.

20 Responses to “Ninety gigs, down the toilet”

  1. matt Says:

    The last time you wrote about this, I didn't understand the issue, and I guess I still don't. As you say yourself, computer jargon has perverted the SI units. Adding binary SI units is the most sensible way of sorting the problem out.

    Binary units are typically only used to measure objects that increase in a strict power-of-2 series, like RAM (ever seen a 3 MiB DIMM?). Things which don't, like hard-disks and network speed, tend to be measured in decimal units.

    Also, as the "rip-off" factor becomes bigger, more people will become aware of it. The usage of binary SI units becomes more sensible and, hopefully, more common. After all, memory manufacturers don't lose out in any way by labeling their products with binary units.

    But there is no "rip-off" - you've only lost 90 gigs down the toilet if you were expecting to be able to fit 1 Tebibyte onto your disk. People who don't understand what a Tebibyte is, probably weren't expecting to get one of those onto their new disk anyway! A Terabyte of data will still fit, though.

    What units does Windows use for disk-space accounting? Decimal or binary? If it's decimal, then Joe Consumer is never going to notice any discrepancy in his disk usage/space.

    (The line about $5000 18 MB Winchesters is something of a red herring, but incidentally, the first hard-disk we bought for the home PC was a 10 MB full-height monster. When we ran out of space a few years later we decided that doubling the space would be sufficient, and that further doubling it would give us more space than we could ever need. So it got upgraded to a massive 40 MB monster. Mind you, porn was a lot smaller back then too. ;)

  2. peridot Says:

    I agree. If you care about the difference, use tebibytes or real SI; if you don't care, use perverted SI. What's the problem?

  3. EEK Says:

    Moral: You'll only be getting screwed for as much as you let yourself be.

  4. Alereon Says:

    I did some digging in Hitachi's spec sheets and the Storagereview Drive Performance Database to figure out how fast this new drive is. It turns out that Hitachi's claimed "Media Transfer Rate" improvements correspond exactly to measured real world performance improvements, e.g. Hitachi claims the T7K500 has a 22.5% faster media transfer rate than the 7K500, and benchmarks show that the maximum transfer rate is 22.5% higher.

    Based on these numbers, I anticipate that the Hitachi 7K1000 will show a benchmarked Maximum Transfer Rate of 82.7MB/sec and a Minimum Transfer Rate of 51.4MB/sec. Compare that to the WD Raptor 150GB at 88.3/60.2 and the newest Seagate 750GB HDDs at 78.5/44.3. The Hitachi 7K1000 is shaping up to be the fastest 7200rpm HDD ever produced, and is edging in on the WD Raptor 150GB.

    I hope WD hurries up and pushes out some 300GB Raptor drives.

  5. Alereon Says:

    Addendum: I used the wrong factor for the Minimum Transfer Rate. The Hitachi 7K1000 should show a Minimum Transfer Rate of 44.6MB/sec.

  6. topdeck Says:

    What units does Windows use for disk-space accounting? Decimal or binary? If it’s decimal, then Joe Consumer is never going to notice any discrepancy in his disk usage/space.

    It uses binary. Everything in the IT world uses binary, with the noteable exception of HDD manufacturers. I agree with Dan, it's misleading and a ripoff.

    When Joe Schmoe goes to buy a Terabyte and gets home and finds out he's only got 910 Gb, he's been ripped off. It's also important to note that when you download a file that's 100 Mb, it also uses binary.

    Even CD- & DVD-ROM manufacturers use the correct units.

  7. Daniel Rutter Says:

    Duuuh, me not so good at maths. It's 0.91Tb, which is 931Gb. Post fixed now. Rip-off still big. Not quite as big, though, from a certain point of view.

  8. Lanthanide Says:

    I really wonder why some hard drive manufacturer just doesn't come out with *correctly* sized hard drives. Are you going to buy a hard drive from manufacturer X that only has 298 gigs on it, or a hard drive from manufacturer Y that actually has the advertised 320 gigs on it?

    It looks like an easy way to gain marketshare and I think that the hard drive platters probably have that much space on them right now anyway, just it isn't all being used.

    Also, a while back there was a class action suit filed in the US for buyers of WD hard drives claiming that the size advertised wasn't correct. Surprisingly, the court actually upheld the complaint and awarded damages; people affected got to download a data backup program that usually cost something like $39 for free from WD. In other words, the compensation was pretty much worthless and hardly anyone would have actually taken the offer up.

  9. qupada Says:

    topdeck: actually only cd manufacturers have it right. A "700MB" CD has 360,000 frames (80 min, 75 frames/sec), with 2352 (44100*16*2/75/8, for what it's worth) Bytes per frame (audio), or 2048 data Bytes/frame once error correction is factored in, for a total of 703.125 real megabytes.

    However, DVDs do, in fact, cheat us. A single layer dvd holds 4,700,000,000 Bytes and a dual layer 8,500,000,000 Bytes. 4.38GB and 7.91GB respectively.

  10. abb3w Says:

    OK, while we're being anal about units, "1 Tb" is one terrabit; "1 TB" is one terrabyte. Regardless of your feelings about the T-vs-Ti prefix destinctions, this distinction is clearly recognized industry wide. This is why every ISP gets away with advertising speeds in Mbps, instead of MBps, and why I've had to explain to half of my collection of clueless end-lusers why their DSL seems to be a factor of eight slower than what they thought they were buying.

    That said... you can blame this on the same short-sighted techies from days of yore who never thought code would still be in use by Y2K, and certainly not by orders of magnitude more machines than existed when they wrote it. The 1024 = 1000 approximation is the same kind of sloppiness. Fortunately, the decimal/binary SI distinction is mostly just cosmetically irritating. It would be nice if Google would add the binary SI prefixes to Google Calculator, though.

  11. UnderLord Says:

    I whinged about the same thing last Friday
    At least you got some response!
    I was the engineer at one of the first real PC companies in the U.K, back in the day, and the dates in those hard-drive capacity lists look off to me.
    I left Keen Computers in 1982 after years of fixing Corvus Drives.
    I do recall though that we sold the Corvus drives as 5, 9 and 18MB capacities, maybe honesty was the rule back then in the U.K.
    True enough, at the price per gigabyte of storage, why should we care, but I don't like to see dodgy numbers mandated by competitive marketing go unchallenged.

  12. phrantic Says:

    But they're marketing people! It's what they do! Just look at the now-confusing numbers used in CPU and GPU naming. Even Toyota rounded off my 1968cc 18R engine to two litres.

    And don't get me started on PSU ratings.

    Their job is to be as kind to their product's capabilities as possible, and our job call bullshit on it.

  13. Jimmy Says:

    This is hearsay, so I may be wrong on this.

    I remember reading a post by someone about this and they stated that prior to the 100Mb Drive, disk sizes were accurate. It was the maker of the first "100Mb" Disk that broke the rule, it was probably because they were short by a little, and someone suggesting using 1000 bytes to the kilobyte system as a way around it.


  14. becakman Says:

    I z a we the new sherriff?
    Iz a da drooler also the ruler, to measure size, standards are nice. Back in the day we would partition different portions of the biggest drives according to the type of data files that would best written into that partition. with the "{[(modern)]}" paging sizes of virtual memory (sp?) these days, I feel that nothing matters anymore, unless you are trying to lay off HD broadcast quality video full size 1920x1080px (hundreds of MB per second versus DV which is a mere 3-4 MB disk used per second of footage!) to a Sony HD deck that is more than 12.9' away via 4-way fiber channel network, oh and the 4 terabyte RAID must be hard wired directly into your G5, otherwise none of this is even glimmerrable.

    Have long wind edly said that, my point was that back in the day we actually cared about the block size of the partition, so that the partition that I stored all my BS 2k docs on would be a 600 mb partition with a .5k block size, and the partions that I set up for photoshop skratch disk and Video Capture would have the max block size, (32k or 64k) depending on what hard drive formatting utility was in use.
    We were using FWB Hard Disk Toolkit and RAID toolkit back in 1990-92. so if I saved a 2k file on the partition with 64k block size, it would take up 64k. Again that was back when we cared and actually controlled all processes and applications on our EXPENSIVE little macs. QUADRA blog a blam!

    PEACE no warts on rolling logs peas in yurtz

    ERic PIt Crew out :-)

  15. Daniel Rutter Says:

    Uh, abb3w - I think you'll find there's no such agreement about the upper and lower case status of bits and bytes. The old fashioned way of doing it is to give B to bits (because they came first, and have a definite value) and b to bytes (which came second, and can vary in size; you whippersnappers and your assumptions about eight-bit bytes).

    The upshot is that anybody hoping to make themselves clear must define their terms every single time, not just dive in with a blithe assumption that other people will understand the way they happen to do it.

    I wrote about this in the very first Dan's Data letters column.

  16. Coding Horror Says:

    Gigabyte: Decimal vs. Binary

    Everyone who has ever purchased a hard drive finds out the hard way that there are two ways to define a gigabyte.   When you buy a "500 Gigabyte" hard drive, the vendor defines it using the decimal powers...

  17. kibibyte Says:

    "In computer usage, though, those SI prefixes are perverted to refer to powers of two, not ten"

    ... except for hard drives, and DVDs, and tape drives, and older floppy drives, and networking speeds, and processor speeds, ...

    "the rip-off factor, in other words"

    What a bunch of baloney.

    Hard drives have always been measured in correct power-of-10 units. The first hard drive ever sold was the IBM 350 RAMAC in the 1950s. It featured 50,000 sectors, each of which held 100 alphanumeric 7-bit characters. I'm not seeing a connection to powers of two, are you? Through the 60s, 70s, 80s, and 90s, hard drives continued to be measured in powers of 10, as they are today. Operating systems that used "binary K" like CP/M and DOS didn't even come out until the 70s.

    Early floppy drives (8-inchers) were also measured in decimal kbytes and megabits. Meanwhile, your 56K modem was 56,000 bits per second, not 57,344. Today, your Ethernet connection is 100,000,000 bits per second, not 104,857,600. Your MP3 files are 128,000 bps, not 131,072. Your 50 GB Blu-Ray disc holds about 50,000,000,000 bytes, not 53,687,091,200. The ONLY thing that's inherently powers of two is memory. If you have evidence to the contrary, please present it, because all I've seen is complaints of fraud against drive manufacturers with no evidence to back it up.

    The problem isn't deceitful marketing; it's Microsoft. What conceivable benefit is there to reporting a 100,000,000,000 byte drive as "93 GB" in one place and "95,367 MB" in another place? None. Microsoft's notation is stupid and useless. Western Digital was absolutely correct in their response to getting sued:

    'Surely Western Digital cannot be blamed for how software companies use the term "gigabyte"—a binary usage which, according to Plaintiff's complaint, ignores both the historical meaning of the term and the teachings of the industry standards bodies. In describing its HDD's, Western Digital uses the term properly. Western Digital cannot be expected to reform the software industry. ... Apparently, Plaintiff believes that he could sue an egg company for fraud for labeling a carton of 12 eggs a "dozen," because some bakers would view a "dozen" as including 13 items.'

    Using "G-" to mean "1,073,741,824" is just wrong, plain and simple.

  18. JC Says:

    I'd just like to remind everyone that the real reason for this discrepancy is that the SI, standards organization, are headstrong and foolish.

    Remember that the computing world used the binary system and had STANDARD definitions for bit, byte, kilobyte, gigabyte, etc, for decades before SI decided they were going to *correct* us by attempting to redefine a standardized term already in use.

    Bit and byte don't exist in the decimal system. It is an invalid expression to define prefixes like kilo =1000, giga=1,000,000, etc, in the decimal system because a prefix is inherently a descriptor of the base unit (that unit being binary).

    In other words, there's no such thing as gibibyte, and gigabyte is never 1 million bytes. IF the standards organization wants to define a term then logically and scientifically they can use the prefixes to mean decimal values ONLY if they use a different base unit that is defined in the decimal system. They tried to change the prefix when they needed to change the base unit!

    If they like throwing "ibi" onto words to fix what they mistakenly thought was a problem, then 1 million bytes would be termed gigabibi. Seems silly, but technically correct for making up new words instead of the mess they made instead.

  19. Pedant Says:

    18: JC - I suspect this is a troll, but it may be that you are simply an idiot.

    The computer industry did not exist when kilo, mega, giga, and so forth were defined. They are part of the metric system (you may have heard of it).

    The computer industry bastardised the existing definitions (which were all powers of ten), not the other way around.

    I think it's foolish, even annoying, that Microsoft chooses to use the wrong (binary) prefixes for file sizes. It loses one of the major advantages of the metric system - the idea that one can simply shift the decimal point and adjust the prefix on the unit of measure: 0.31G = 31M = 31000k for everything EXCEPT things expressed using the wrong (binary) prefixes.

    Dan - please stop pushing this foolishness. Binary prefixes are ONLY appropriate for RAM, and you know that.

  20. Daniel Rutter Says:

    Sure, that'd be great - except operating systems use powers-of-two for file and drive sizes.

    Insisting that the "27.9Mb" file someone just clicked on is actually 29.26 powers-of-ten megabytes does not strike me as a point of view that's certain to sweep the world. I sincerely hope that this nonsense fades away in the near future, but I don't expect that to actually be the case, because redefining the gigabyte as 0.931 previous gigabytes and the terabyte as 0.9095 previous terabytes is (a) confusing for normal users and (b) in deadly opposition to the desires of the hard-drive companies.

    (I think all OSes use powers-of-two for file sizes, but I'm not sure. Is there some oddball OS that uses powers-of-ten?)

Leave a Reply