myisam加载数据的性能问题。

现在遇到一个关于myisam的性能问题,想请教一下。
以下是问题的详细描述,希望得到你的指点。谢谢。
问题:

如何优化查询数据到临时表的速度?客户要求是最好能达到5秒以内。

只要速度能快,不一定非要采用myisam,还请指点。

数据规模:

单表,表字段有70几个,记录数数为6000W条。用了近60G的存储空间。单条记录为1k左右大小。

基于mysql 5.5的myisam存储,数据为自增ID为主键,同时按每三百W条记录进行分区。

遇到的问题

根据lucene中检索出来的1w条记录id,id是相当分散的,然后通过

create table temptable select * from datatable where id in(1,2,3,4,5.....)

方式,将数据存入临时表。

基本上每次操作的时间都会需要用到近30秒

下面是profile的日志情况。

+----------------------+-----------+

| Status | Duration |

+----------------------+-----------+

| starting | 0.002119 |

| checking permissions | 0.000013 |

| Opening tables | 0.000512 |

| System lock | 0.000030 |

| Table lock | 0.000020 |

| init | 0.002095 |

| creating table | 0.023468 |

| After create | 0.000792 |

| System lock | 0.000006 |

| Table lock | 0.000025 |

| optimizing | 0.000371 |

| statistics | 5.897464 |

| preparing | 0.000147 |

| executing | 0.000005 |

| Sending data | 24.350588 |

| end | 0.000009 |

| query end | 0.000006 |

| freeing items | 0.000463 |

| logging slow query | 0.000008 |

| logging slow query | 0.000005 |

| cleaning up | 0.000548 |

+----------------------+-----------+

21 rows in set (0.00 sec)

运行机器的io及内存情况。

硬盘的IO

Timing cached reads: 1930 MB in 2.00 seconds = 965.19 MB/sec

Timing buffered disk reads: 712 MB in 3.01 seconds = 236.93 MB/sec

内存:12G

myisam的配置情况。

# Example MySQL config file for medium systems.
#
# This is for a system with little memory (32M - 64M) where MySQL plays
# an important part, or systems up to 128M where MySQL is used together with
# other programs (such as a web server)
#
# You can copy this file to
# /etc/my.cnf to set global options,
# mysql-data-dir/my.cnf to set server-specific options (in this
# installation this directory is /data/site/database/mysql/var) or
# ~/.my.cnf to set user-specific options.
#
# In this file, you can use all long options that a program supports.
# If you want to know which options a program supports, run the program
# with the "--help" option.

# The following options will be passed to all MySQL clients
[client]
#password = your_password
port = 3306
socket = /tmp/mysql.sock

# Here follows entries for some specific programs

# The MySQL server
[mysqld]
port = 3306
socket = /tmp/mysql.sock
skip-external-locking
key_buffer_size = 512M
max_allowed_packet = 2M
table_open_cache = 128
sort_buffer_size = 1M
net_buffer_length = 8K
read_buffer_size = 512K
read_rnd_buffer_size = 512K
myisam_sort_buffer_size = 8M
#skip-innodb
# Don't listen on a TCP/IP port at all. This can be a security enhancement,
# if all processes that need to connect to mysqld run on the same host.
# All interaction with mysqld must be made via Unix sockets or named pipes.
# Note that using this option without enabling named pipes on Windows
# (via the "enable-named-pipe" option) will render mysqld useless!
#
#skip-networking

# Replication Master Server (default)
# binary logging is required for replication
log-bin=mysql-bin

# binary logging format - mixed recommended
binlog_format=mixed

# required unique id between 1 and 2^32 - 1
# defaults to 1 if master-host is not set
# but will not function as a master if omitted
server-id = 1

# Replication Slave (comment out master section to use this)
#
# To configure this host as a replication slave, you can choose between
# two methods :
#
# 1) Use the CHANGE MASTER TO command (fully described in our manual) -
# the syntax is:
#
# CHANGE MASTER TO MASTER_HOST=, MASTER_PORT=,
# MASTER_USER=, MASTER_PASSWORD= ;
#
# where you replace , , by quoted strings and
# by the master's port number (3306 by default).
#
# Example:
#
# CHANGE MASTER TO MASTER_HOST='125.564.12.1', MASTER_PORT=3306,
# MASTER_USER='joe', MASTER_PASSWORD='secret';
#
# OR
#
# 2) Set the variables below. However, in case you choose this method, then
# start replication for the first time (even unsuccessfully, for example
# if you mistyped the password in master-password and the slave fails to
# connect), the slave will create a master.info file, and any later
# change in this file to the variables' values below will be ignored and
# overridden by the content of the master.info file, unless you shutdown
# the slave server, delete master.info and restart the slaver server.
# For that reason, you may want to leave the lines below untouched
# (commented) and instead use CHANGE MASTER TO (see above)
#
# required unique id between 2 and 2^32 - 1
# (and different from the master)
# defaults to 2 if master-host is set
# but will not function as a slave if omitted
#server-id = 2
#
# The replication master for this slave - required
#master-host =
#
# The username the slave will use for authentication when connecting
# to the master - required
#master-user =
#
# The password the slave will authenticate with when connecting to
# the master - required
#master-password =
#
# The port the master is listening on.
# optional - defaults to 3306
#master-port =
#
# binary logging - not required for slaves, but recommended
#log-bin=mysql-bin

# Point the following paths to different dedicated disks
#tmpdir = /tmp/
#log-bin = /path-to-dedicated-directory/hostname

# Uncomment the following if you are using InnoDB tables
#innodb_data_home_dir = /data/site/database/mysql/var/
#innodb_data_file_path = ibdata1:10M:autoextend
#innodb_log_group_home_dir = /data/site/database/mysql/var/
# You can set .._buffer_pool_size up to 50 - 80 %
# of RAM but beware of setting memory usage too high
#innodb_buffer_pool_size = 16M
#innodb_additional_mem_pool_size = 2M
# Set .._log_file_size to 25 % of buffer pool size
#innodb_log_file_size = 5M
#innodb_log_buffer_size = 8M
#innodb_flush_log_at_trx_commit = 1
#innodb_lock_wait_timeout = 50

[mysqldump]
quick
max_allowed_packet = 16M

[mysql]
no-auto-rehash
# Remove the next comment character if you are not familiar with SQL
#safe-updates

#*** MyISAM Specific options

# Size of the Key Buffer, used to cache index blocks for MyISAM tables.
# Do not set it larger than 30% of your available memory, as some memory
# is also required by the OS to cache rows. Even if you're not using
# MyISAM tables, you should still set it to 8-64M as it will also be
# used for internal temporary disk tables.
key_buffer_size = 64M

# Size of the buffer used for doing full table scans of MyISAM tables.
# Allocated per thread, if a full scan is needed.
read_buffer_size = 4M

# When reading rows in sorted order after a sort, the rows are read
# through this buffer to avoid disk seeks. You can improve ORDER BY
# performance a lot, if set this to a high value.
# Allocated per thread, when needed.
read_rnd_buffer_size = 32M

# MyISAM uses special tree-like cache to make bulk inserts (that is,
# INSERT ... SELECT, INSERT ... VALUES (...), (...), ..., and LOAD DATA
# INFILE) faster. This variable limits the size of the cache tree in
# bytes per thread. Setting it to 0 will disable this optimisation. Do
# not set it larger than "key_buffer_size" for optimal performance.
# This buffer is allocated when a bulk insert is detected.
bulk_insert_buffer_size = 128M

# This buffer is allocated when MySQL needs to rebuild the index in
# REPAIR, OPTIMIZE, ALTER table statements as well as in LOAD DATA INFILE
# into an empty table. It is allocated per thread so be careful with
# large settings.
myisam_sort_buffer_size = 256M

# The maximum size of the temporary file MySQL is allowed to use while
# recreating the index (during REPAIR, ALTER TABLE or LOAD DATA INFILE.
# If the file-size would be bigger than this, the index will be created
# through the key cache (which is slower).
myisam_max_sort_file_size = 10G

# If a table has more than one index, MyISAM can use more than one
# thread to repair them by sorting in parallel. This makes sense if you
# have multiple CPUs and plenty of memory.
myisam_repair_threads = 1

# Automatically check and repair not properly closed MyISAM tables.
myisam_recover

[myisamchk]
key_buffer_size = 512M
sort_buffer_size = 512M
read_buffer = 8M
write_buffer = 8M

[mysqlhotcopy]
interactive-timeout

Taxonomy upgrade extras:

1w条记录 =1w次 磁盘寻道 =0.02s * 1w = 20s

而60G数据在普通单台pc server上基本不可能完全载入内存

也许用多台server搞定,用gridsql之类的东西分布式查询
也许上SSD可以救你,
也许上小型机也能解决。(也许在外加个oracle,再加磁盘阵列)
但我没用过,只是一堆假设,我是来误人子弟的,哈哈

ssd应该可以,我们用了ssd之后速度快了很多。而且现在80G的ssd也很便宜。

可以试试分布式的 Key-Value 数据库,比如 Cassandra, 如果只是按主键查找速度是相当快的.