Choosing Optimal MySQL Data Types
Hey there! Welcome to Making the Web - my personal blog about website development. Feel free to subscribe to my RSS feed to keep up with the latest. Alternatively, subscribe by email. Hope you enjoy this article!
In MySQL, the most simple optimisation is to select the correct, most appropriate, and therefore most efficient data types. But many MySQL developers, and other database developers, sometimes fail to do this; non-MySQL developers - such as PHP programmers who use MySQL as just a necessity for their database-driven website - often fail miserably.
To help you understand the data types available, and hopefully use them correctly, I have written about some of the most common - the ones which you probably use every day:
SIGNED or UNSIGNED?
For people who don't know what signed or unsigned is, I'll give a brief explanation:
With signed integers, one bit is reserved so the numbers can be made negative. With unsigned, a bit is not reserved and numbers can only be positive. If you had a SIGNED INT, the value can be between -2,147,483,648 and 2,147,483,647. But, with an UNSIGNED INT, the value can range from 0 to 4,294,967,295.
If you will never use a negative number in a field, then you should use UNSIGNED. If you have an auto-incrementing primary key, then you'd want to use UNSIGNED because it allows for more rows.
Too big?
You should think carefully before creating fields. If you have a 'users' table on a small website, with UNSIGNED INT as the primary key, then ask yourself, "Will I ever get 4,294,967,295 users?" The most likely answer to this question is no. You've got to think about what the highest value is ever likely to be. Then, select the right numeric data type which can cover that amount. I've made a little numeric data type table with the minimum and maximum values:
Type Minimum Maximum SIGNED TINYINT -128 127 UNSIGNED TINYINT 0 255 SIGNED SMALLINT -32,768 32,767 UNSIGNED SMALLINT 0 65,535 SIGNED MEDIUMINT -8,388,608 8,388,607 UNSIGNED MEDIUMINT 0 16,777,215 SIGNED INT -2,147,483,648 2,147,483,647 UNSIGNED INT 0 4,294,967,295 SIGNED BIGINT -9,223,372,036,854,775,808 9,223,372,036,854,775,807 UNSIGNED BIGINT 0 18,446,744,073,709,551,615
What's the difference between CHAR and VARCHAR?
The difference between CHAR and VARCHAR is that CHAR is fixed length, while VARCHAR is variable length. This means that CHAR is always the same size and takes up the same amount of bytes, while VARCHAR varies. VARCHAR also adds an extra byte. When should you use CHAR?
To store MD5 passwords, which are always 32 characters long (when stored in hexadecimal form; it may be better to store them as 16 bytes in a BINARY field?).
To store same-length barcodes.
When should you use VARCHAR?
To store email addresses, which vary in size
To store peoples names, which vary in size
You'd usually want to use VARCHAR when storing most user input, because their input can vary in size.
Store as numbers
When possible, you should store data in numbers, because it can be searched and ordered more quickly. For example, IP address can be stored like this:
123.123.123.123
using VARCHAR(15). If you first translated it to an IPv4 Internet network address (in PHP, you can use the ip2long() function), you could store it as:
2071690107
using a SIGNED INT.
Choosing the right date/time type
In MySQL, there are 5 different date and time types:
DATE
TIME
DATETIME
TIMESTAMP
YEAR
DATE gives the date to the nearest day, TIME gives the time (but not the date) to the nearest second, DATETIME gives the date and time to the nearest second, TIMESTAMP gives the number of seconds since epoch, and YEAR gives the date to the nearest year.
YEAR only requires 1 byte of storage, so is the best option if you only need to know the year.
DATE and TIME are the next smallest type of date/time storage, requiring just 3 bytes.
TIMESTAMP requires 4 bytes.
DATETIME requires 8 bytes.
In almost any case, it is probably best to use TIMESTAMP, which gives the date and time to the nearest second.
I have also written a more in-depth series of articles about how efficient data storage can be achived with some background knowledge: the first part focuses on the storage of strings.
Tags: data, data types, MySQL, optimisation, space, storage












HoboBen Said,
September 3, 2007 @ 2:58 am
Another great article!
I must admit, I hate working with databases, but this has helped a lot.
TomJ Said,
September 5, 2007 @ 12:59 am
Great article. Clear, concise and informative.
Stephen G Said,
September 5, 2007 @ 5:01 am
> In almost any case, it is propably best to use TIMESTAMP,
> which gives the the date and time to the nearest second.
No recommendatoin for TIMESTAMP should be shown without an asterisk saying to be mindful that using TIMESTAMP type with the default (DEFAULT CURRENT_TIMESTAMP) can have unintended consequences. (i.e., will automatically update the first TIMESTAMP column on each insert / update.)
http://dev.mysql.com/doc/refman/5.0/en/timestamp.html
KiLVaiDeN Said,
September 5, 2007 @ 10:42 am
Cool list of tips, I didn't agree though with the conversion between ip char to integer, because it's quite useful to be able to see which ip is stored with a single look at the table and because you don't really need an integer for this information ( no ordering will likely ever occur ).
Besides, I think you didn't give enough informations about CHAR and VARCHAR. There is a performance impact on using VARCHAR against CHAR, because having variable space to store a string requires some more dynamic than having a fixed length field. Therefore, it looks like CHAR would be faster to process, even though it takes more space on the disk.
Then lastly, I'll also say that having an unsigned int instead of a medium int does impact the storage size, but doesn't necessarily impact the application performance, because the CPU doesn't need more cycles to compute larger integers, depending on the CPU registers size of course. So I'd recommend using unsigned int, because it's always a pain to alter tables the day you find out that you were greedy on the first CREATE scripts
Cheers
KiLVaiDeN
Dario Said,
September 5, 2007 @ 12:44 pm
1. Sorry for my english (I'm from Argentina).
2. I used for save date, INT UNSIGNED, and save from php as time(), then the date are saved before/after epoch time as positive or negative int.
I hope make sense
Lachlan Said,
September 5, 2007 @ 1:16 pm
Despite its 32-bit limitations, wouldn't a Unix timestamp be friendlier than the TIMESTAMP type, at least when ordering records? Or is there something 'Im missing.
Punch Said,
September 5, 2007 @ 1:55 pm
Use UNIX_TIMSTAMP() and FROM_UNIXTIME() to convert between the Unix timestamp and the MySQL one.
http://dev.mysql.com/doc/refman/5.0/en/date-and-time-functions.html
Pablo Said,
September 5, 2007 @ 2:24 pm
Frequently it's necessary to store hashes (MD5, SHA1); I've found two ways:
a) store the base64 representation (20% space saved)
b) use BINARY fields and store the binary representation, using mysql HEX/UNHEX and/or PHP pack('H*', $hash) / bin2hex (50% space saved)
D Said,
September 5, 2007 @ 5:20 pm
Instead of using a client language specific function to convert an IP address to a numeric, you could use the MySQL built int functions:
INET_ATON() Return the numeric value of an IP address
INET_NTOA() Return the IP address from a numeric value
Craig Francis Said,
September 5, 2007 @ 5:55 pm
Good list of tips.
For those commenter's mentioning using the timestamp… I would suggest using the DATETIME, mostly because its easier to read at a quick glance at the table, and its easy to pipe into PHP's strtotime(), and use that in the date() function to create any output format.
Also, if you are going to store dates after 2038, then its likely to have problems on 32bit systems when stored as an INT.
Barce Said,
September 5, 2007 @ 7:02 pm
This is definitely stuff to keep in mind, especially the part about unsigned ints.
But IMHO, I think that storing an IP as an int can be more of a readability hassle than it's worth.
The real killer problem isn't having optimal MySQL data types, it's having to run alter table commands on columns that are too small. I know it's really bad when bitwise operations are being done on an int column that should really be two or more columns. *shudder*
Thanks for the info!
Denis au fil du web » links for 2007-09-06 Said,
September 6, 2007 @ 1:49 am
[…] Making the web » Choosing Optimal MySQL Data Types Optimisation des types de données dans MySQL (tags: mysql datatype optimization howto) […]
links for 2007-09-06 « Donghai Ma Said,
September 6, 2007 @ 5:18 am
[…] Making the web » Choosing Optimal MySQL Data Types (tags: database design) […]
the contented Said,
September 9, 2007 @ 12:24 am
Hey, thank you very much for this great article! This is some information that could be useful for many, many people working with MySQL. Good work!
I want to add two MySQL datatypes: BLOB and TEXT. They can hold much more textual or binary data, but are therefore stored differently. It may cause performance issues when you query BLOB or TEXT fields very often, so many people recommend sourcing those columns out to extra tables and only query them when really needed.
Again, thanks for your post and have a lot of fun!
Greetings,
Felix Neumann
Metaholic » Blog Archive » Making the web » Choosing Optimal MySQL Data Types Said,
September 10, 2007 @ 11:30 pm
[…] Permalink | Source […]
PHP Sidor » Blog Archive » Att välja datatyp i MySQL Said,
September 13, 2007 @ 9:59 am
[…] Jag snubblade över lite tips när det gäller att välja datatyp i MySQL […]
Marc-André Said,
September 21, 2007 @ 1:35 pm
Thanks, your post became for me a fast reference.
Anonymous Said,
September 24, 2007 @ 2:59 am
I store all my strings as MD5 hashes because then they will be 32 characters in length and I can use the char type which is much more efficient.
Anonymous Said,
November 6, 2007 @ 3:19 pm
I'm kind of late to the game here, but if you're going to store an IP address as an int, wouldn't you want to use an UNSIGNED INT, rather than a SIGNed one, as suggested above? The upper value limit of a signed int 2,147,483,647, which only cover half of all possible IP addresses (you can't store anything >= 128.0.0.0), whereas of course, the unsigned int and ip address line up perfectly (as they're both 32-bit values).
Brendon Said,
November 6, 2007 @ 3:25 pm
I believe that the PHP function ip2long returns an integer which can be negative and positive - for this reason, you would need to use a SIGNED INT.
If you find evidence to suggest otherwise, please tell me.
Magnus Holmgren Said,
November 12, 2007 @ 12:54 pm
I'd like to point out that the next generation Internet Procotol, IPv6, uses 128 bit addresses. No newly written, modern applications should limit themselves to IPv4!
BCSEEATI Said,
November 27, 2007 @ 3:42 am
I found some typo.
According to the manual, SIGNED BIGINT range is -9223372036854775808 to 922337203685477580[7]. And UNSIGNED INT range is 0 to 4294967[2]95.
Brendon Said,
November 27, 2007 @ 8:04 am
Thanks, I've updated the article
Johny Said,
December 17, 2007 @ 5:21 pm
You can also find the mysql information
http://www.phpshore.com/mysql.html
Martijn Said,
February 17, 2008 @ 11:35 am
Made my day when setting up my first MySQL table ever
Big thanks.
Ido Perelmutter Said,
February 29, 2008 @ 6:51 pm
Great article. I work a lot with mySQL and I've never known almost everything in here.
Thanks!
sanjuro Said,
March 10, 2008 @ 11:08 am
Beside the obvious, is there a performance difference (speed, storage…) between using a signed and an unsigned integer ? This is what I would like to know. For instance if your values never exceed 99.
Vaibogam S Said,
March 31, 2008 @ 3:54 pm
What would be the best datatype to store 6digit unsigned number? Is it UNSIGNED SMALLINT?
Thanks!
Brendon Said,
March 31, 2008 @ 7:32 pm
@Vaibogam S, the maximum value for a 6-digit number would be 999,999, therefore making UNSIGNED MEDIUMINT the most suitable choice.
Ryan Said,
April 20, 2008 @ 3:22 pm
This is a fantastic article on MySQL data types. I will be referring to this for a long, long time. Great job!
Srinivasan Ramachandran Said,
April 22, 2008 @ 6:55 am
Neverthless to say, this is a very essential and also an eye opener to Database designing. Very helpful post. There are lot of them out there writing all the advanced stuff but this post just clears the fundamentals which is very much essential.
Regards,
Ashok Srinivasan.