This article describes how to use regexes and some bash tricks and bad voodoo to split a SQL file into pieces in order to get it into MySQL in a reasonable amount of time. Parsing SQL is notoriously painful and error prone, so if you see mistakes here (I'm sure there are), please comment!
I recently got called in to work on a project for a customer I had never worked with before. It is a very well known media / retail company and the job was to build a new interface for a media player / video library they required. Pretty standard stuff, some views a couple quicktabs, a bit of ajax coding and a lot of pixel-f**king. Anyway, near the end of the week long project when it came time to stage it, I realized I had a pretty big problem. This site had 2 million users and several tables of profile data and other information. My SQL file was over 3GB and took 6-8 hours to import. Eek!
So knowing that I wasn't going to write my update functions 100% perfect the first time, and I would likely have to import several times until I was confident it was working okay, I needed to find a way to get the DB to import in a more reasonable amount of time. All I had at this point was a sql file of the production DB and 6hrs to get the code ready.
Here's what I did:
(keep in mind this is on OSX, GNU tools maybe slightly differnet)
split -a5 '^DROP TABLE IF EXISTS [^;]+;$' db_backup.sql _db_split
What this does is look for the string DROP TABLE IF EXISTS, splits based on that and creates files called _db_splitXXXXX where XXXXX is a unique string.
Okay, now I have a bunch of files that look like this:
ls -l _db_split* -rw-r--r-- 1 jacob staff 837 Jul 16 23:24 _db_splitaaaaa -rw-r--r-- 1 jacob staff 679 Jul 16 23:24 _db_splitaaaab -rw-r--r-- 1 jacob staff 937 Jul 16 23:24 _db_splitaaaac -rw-r--r-- 1 jacob staff 1666 Jul 16 23:24 _db_splitaaaad -rw-r--r-- 1 jacob staff 601 Jul 16 23:24 _db_splitaaaae -rw-r--r-- 1 jacob staff 99276 Jul 16 23:24 _db_splitaaaaf
for f in _db_split*; do new_f=$(egrep 'CREATE TABLE `([a-z0-9_]+)' -o $f | cut -d' ' -f 3 | cut -c2-100); mv $f dump_$new_f.sql; done;
-rw-r--r-- 1 jacob staff 679 Jul 16 23:24 dump_access.sql -rw-r--r-- 1 jacob staff 937 Jul 16 23:24 dump_accesslog.sql -rw-r--r-- 1 jacob staff 1666 Jul 16 23:24 dump_actions.sql -rw-r--r-- 1 jacob staff 601 Jul 16 23:24 dump_actions_aid.sql -rw-r--r-- 1 jacob staff 99276 Jul 16 23:24 dump_apachesolr_search_node.sql -rw-r--r-- 1 jacob staff 728 Jul 16 23:24 dump_authmap.sql
function sql_remove_insert() { out=$(cat $1 | ruby -e 'puts STDIN.read.split("Dumping data for table")[0]'); echo "$out" > $1; }
for t in `ls dump_cache* dump_watchdog* dump_settings_audit_log_*`; do sql_remove_insert $t; done;
for t in {dump_users,dump_users_roles,dump_users_uuid,dump_password_reset_users}; do perl -pi -e 's/,\([0-9]{5,10},(?!\),).*\),/,/g' $t.sql; done;
Comments
mysqldump arguments that can help
I use the following arguments with mysqldump:
alias dump_mysql='mysqldump -Q --add-drop-table -c --order-by-primary --extended-insert=FALSE --no-create-db=TRUE'
There are two key parts - disabling extended inserts (--extended-insert=FALSE) and using the longer "complete" insert statements (-c); these have the result of creating a complete INSERT statement for every individual record, which makes GREP calls *much* easier to do.
Yes, that's true. In fact,
Yes, that's true. In fact, if I had access to the DB to make myself another dump (itself a fairly long process), I would probably use --tables and --no-data and a few other tricks. Sadly, not the case here :)
Add new comment | Jacob Singh
Thanks for another excellent article. Where else may anybody get that kind of information in such a perfect approach of writing?
I've a presentation subsequent week, and I'm on the search for such information.
Outstanding piece of work you
Outstanding piece of work you have done. This type of post is rarely found. This site has proved its metals in the way of giving extra ordinary information. colostomy
6-8hrs for 3GB??? That's
6-8hrs for 3GB??? That's unbelievably slow. It sounds like you are running the stock my.cnf. Here's what I use for my dev machine (2GB RAM):
[mysqld]
key_buffer = 64M
join_buffer_size = 512K
max_allowed_packet = 16M
query_cache_size = 64M
table_cache = 512
sort_buffer_size = 64K
read_buffer_size = 256K
read_rnd_buffer_size = 256K
net_buffer_length = 2K
thread_stack = 128K
tmp_table_size = 96M
max_heap_table_size = 96M
innodb_buffer_pool_size = 128M
innodb_additional_mem_pool_size = 2M
innodb_flush_log_at_trx_commit = 0
innodb_flush_method = O_DIRECT
I'd recommend spending an hour finding out what these settings do. It will save you hours of waiting around for MySQL to do its thing (or writing parsing scripts). If you've got more RAM jack up innodb_buffer_pool_size till your active data set fits in RAM.
All the contents you
All the contents you mentioned in post is too good and can be very useful. I will keep it in mind, thanks for sharing the information keep updating, looking forward for more posts. Thanks
T8 LED Tubes Lights
Good
Good Work ! thanks for sharing the information keep updating
You extravasation
You extravasation unornamented us with untold a whopping grouping of grouping. Your signal is restive and you personalised to lot it with all. I assets your ply.
Acheter monster beats by dre
Acheter monster beats by dre headphones.Our par casque dr dre à la vente sont chauds
beats by dre
I really like the work that
I really like the work that has gone into making the post. I will be sure to tell my blog buddies about your content keep up the good work.Tecfrigo - Mercatus
Drupal can automatically
Drupal can automatically notify the administrator about new versions of modules, themes, or the Drupal core. Such a feature can be useful for security fixes. Thanks a lot.
Regards,
télécharger microsoft office
télécharger microsoft office 2007
Good
Why does every one do hacking? is there nothing better then hacking.Essay Writer|Essay expert|
Thanks for the tips, maybe I
Thanks for the tips, maybe I can use this ended my tufted marketing and I've been use untold anulus media in run a interaction and they someone existing a big amend on me.
Web Directory
So, first, I would like to
So, first, I would like to say thanks for your post. It is always necessary new orlando homes for us to have a copy of the text file to computer and keep it safe.
Well this type of details is
Well this type of details is really worthy of in search of, reliable details for viewers and a value for you as will definitely display the quality of you. It’s excellent to have these types of content. Water Damage Experts in Philadelphia
Just stumbled across your
Just stumbled across your blog and was instantly amazed with all the useful information that is on it. Great post, just what i was looking for and i am looking forward to reading your other posts soon! coder salary
I wanted to say that it's
I wanted to say that it's nice to know that someone else also mentioned this as I had trouble finding the same info elsewhere. This was the first place that told me the answer. jaket distro
Great post
I got very good and relevent information from this post keep sharing such posts, thanks Mobile prices
Really your post is very good
Nice post man. Thanks a lot
Nice Post :
Well, I don't know if that's going to work for me, but definitely worked for you! Excellent post!
Coursework Help | GCSE Coursework Help | GCSE Science Coursework Help | Statistics Coursework Help | Sociology Coursework Help
Hello
This post gives the light in which we can observe the reality. this is very nice one and gives indepth information. thanks for this post.
research-paper.co