sogo/Misc/db/tests/helge/NOTES
dev-unix.inverse.qc.ca f18c764ffa see ChangeLog
Monotone-Revision: 9054022ef1ca8aeba6e34842d27d9b94ce002b89

Monotone-Author: dev-unix.inverse.qc.ca
Monotone-Date: 2006-06-15T19:34:10
Monotone-Branch: ca.inverse.sogo
2006-06-15 19:34:10 +00:00

151 lines
4.8 KiB
Plaintext

# $Id$
Issues
- folder generation?
- addition der item-generations geht nicht, weil auch records geloescht
werden koennen
- separate tabelle fuer "proposed" appointments?
- damit wir die Apts nicht schicken muessen und die erst im scheduler
angezeigt werden, wenn sie aktuell sind
Move Databases
==============
HOST="localhost"
USER="agenor"
DB="blah2"
DB="agenor%i" % ( i, )
DB="agenortabledb"
NEWTABLE="agenor_tab_%i" % ( i, )
DB="agenor_fldinfodb"
DB="agenor_testhugeperf"
Schemas
=======
CREATE TABLE SOGo_folder_info (
c_foldername VARCHAR(255) NOT NULL,
c_tablename VARCHAR(255) NOT NULL,
c_dbname VARCHAR(255) NOT NULL
);
INSERT INTO SOGo_folder_info (c_foldername,c_tablename,c_dbname)
VALUES ('hh calendar', 'agenor_tab_105', 'agenortabledb');
CREATE UNIQUE INDEX SOGo_folder_info_idx
ON SOGo_folder_info
USING BTREE(c_foldername);
(ca 5s)
---
CREATE TABLE SOGo_huge_ical (
c_pkey INT PRIMARY KEY,
c_data VARCHAR(32000) NOT NULL
);
CREATE TABLE SOGo_huge_quick (
c_pkey INT PRIMARY KEY,
c_sourceid VARCHAR(255) NOT NULL,
c_startdate INT NOT NULL,
c_enddate INT NOT NULL,
c_title VARCHAR(255) NOT NULL,
c_attendees VARCHAR(4096) NOT NULL,
c_isallday INT NOT NULL,
c_sequenceid INT NOT NULL,
c_generation INT NOT NULL
);
CREATE INDEX SOGo_huge_quick_idx
ON SOGo_huge_quick
USING BTREE(c_startdate);
(ca 15s on 1.000.000)
---
Performance
===========
agenor_fldinfodb:
10000 Folder Info Entries, kein Index: 71s
100000 Folder Info Entries: 12:09m, 729s, 137 inserts/s
5992424=>6001088, diff 8664KB data size (von ca 16000 auf 110000)
~94 byte per row (raw: ~12+14+13=39 byte)
110001-250000 Folder Info Entries: 15:59m, 959s, 145 inserts/s
6001088-6014316, diff 13228KB data size (~96 byte per row)
ohne Index, via Python:
COUNT(*) => 360ms
c_tablename, c_dbname | * => 1128ms
c_tablename, c_dbname | c_foldername='privcal_99827' => 345ms
mit Index, via Python
COUNT(*) => 350ms
c_tablename, c_dbname | * => 1124ms
c_tablename, c_dbname | c_foldername='privcal_99827' => 18,5,5ms
agenor_testhugeperf:
1.000.000 entries, kein Index:
10000=79.37s, 20000=162s, 30000=245s,340000=2831s,790000=6670s
(~120 rows per sec) = > ca 2h fuer 1.000.000, 20h fuer 10.000.000
30000=65MB => ~2KB per Record => ~2GB fuer 1.000.000
220000=440MB,810000=1.55GB,1.000.000=
~1.92GB x 20 = 40GB
duration:~2:50h
ohne Index, via Python:
COUNT(*) => 20.8s
pkey,sourceid,start,end | c_start>x&c_end<x+10s => 13.4s
c_sourceid | * => 10.5s
mit Index, via Python
COUNT(*) => 9.7s,3s,2.5s
pkey,sourceid,start,end | c_start>x&c_end<x+10s => 86ms,59ms,17ms,17ms
c_sourceid | * => 9.3s,4.8s,4.8s
vacuum analyze: 30s?
ohne fsync, 2 gleichzeitig
- 1108 rows pro sec in einer connection! (1.000.000=ca 15min)
53:59m
53:52m
~8 gleichzeitig
~20:00:-
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html
http://www.lyris.com/lm_help/7.8/tuning_postgresql.html
http://www.linuxjournal.com/article.php?sid=4791
http://www.argudo.org/postgresql/soft-tuning.php#s2
PostgreSQL Index
================
http://www.postgresql.org/docs/current/static/sql-createindex.html
http://postgis.refractions.net/docs/x511.html
"After building an index, it is important to force PostgreSQL to collect table statistics, which are used to optimize query plans: VACUUM ANALIZE;"
PostgreSQL provides the index methods B-tree, R-tree, hash, and GiST
- only the B-tree and GiST index methods support multicolumn indexes
---snip---
An index field can be an expression computed from the values of one or more
columns of the table row. This feature can be used to obtain fast access to
data based on some transformation of the basic data. For example, an index
computed on upper(col) would allow the clause WHERE upper(col) = 'JIM' to use
an index.
---snap---
---snip---
Note: Because of the limited utility of hash indexes, a B-tree index should
generally be preferred over a hash index. We do not have sufficient evidence
that hash indexes are actually faster than B-trees even for = comparisons.
Moreover, hash indexes require coarser locks; see Section 9.7.
---snap--
PostgreSQL Arrays
=================
---snip---
It all depends, I have found array operations to be slow. So if you have just
a few elements, like less than 10 then arrays are ok, but with more e.g. 30+
elements, as in my case, imho the whole exercise is not really feasable. I am
going to re-design the schema to get rid of the arrays as soon as I have a
moment. Also the code around that part of PostgreSQL has not been visited for
a fair while and needs some polishing up. I'd avoid them.
---snap---