sogo/Misc/db/tests/helge/NOTES

# $Id$

Issues
- folder generation?
  - addition der item-generations geht nicht, weil auch records geloescht
    werden koennen
- separate tabelle fuer "proposed" appointments?
  - damit wir die Apts nicht schicken muessen und die erst im scheduler
    angezeigt werden, wenn sie aktuell sind

Move Databases
==============

HOST="localhost"
USER="agenor"

DB="blah2"
DB="agenor%i" % ( i, )

DB="agenortabledb"
NEWTABLE="agenor_tab_%i" % ( i, )

DB="agenor_fldinfodb"

DB="agenor_testhugeperf"


Schemas
=======

CREATE TABLE SOGo_folder_info (
  c_foldername VARCHAR(255) NOT NULL,
  c_tablename  VARCHAR(255) NOT NULL,
  c_dbname     VARCHAR(255) NOT NULL
);
INSERT INTO SOGo_folder_info (c_foldername,c_tablename,c_dbname)
  VALUES ('hh calendar', 'agenor_tab_105', 'agenortabledb');

CREATE UNIQUE INDEX SOGo_folder_info_idx
  ON SOGo_folder_info
  USING BTREE(c_foldername);
(ca 5s)

---
  CREATE TABLE SOGo_huge_ical (
     c_pkey INT PRIMARY KEY,
     c_data VARCHAR(32000) NOT NULL
  );
  CREATE TABLE SOGo_huge_quick (
    c_pkey       INT PRIMARY KEY,
    c_sourceid   VARCHAR(255) NOT NULL,
    c_startdate  INT NOT NULL,
    c_enddate    INT NOT NULL,
    c_title      VARCHAR(255) NOT NULL,
    c_attendees  VARCHAR(4096) NOT NULL,
    c_isallday   INT NOT NULL,
    c_sequenceid INT NOT NULL,
    c_generation INT NOT NULL
  );
CREATE INDEX SOGo_huge_quick_idx
  ON SOGo_huge_quick
  USING BTREE(c_startdate);
(ca 15s on 1.000.000)
---


Performance
===========
agenor_fldinfodb:
  10000   Folder Info Entries, kein Index: 71s
  100000  Folder Info Entries: 12:09m, 729s, 137 inserts/s
    5992424=>6001088, diff 8664KB data size (von ca 16000 auf 110000)
    ~94 byte per row (raw: ~12+14+13=39 byte)
  110001-250000 Folder Info Entries: 15:59m, 959s, 145 inserts/s
    6001088-6014316, diff 13228KB data size (~96 byte per row)

  ohne Index, via Python:
    COUNT(*) 			=> 360ms
    c_tablename, c_dbname | *	=> 1128ms
    c_tablename, c_dbname | c_foldername='privcal_99827' => 345ms
  mit Index, via Python
    COUNT(*) 			=> 350ms
    c_tablename, c_dbname | *	=> 1124ms
    c_tablename, c_dbname | c_foldername='privcal_99827' => 18,5,5ms

agenor_testhugeperf:
  1.000.000 entries, kein Index:
    10000=79.37s, 20000=162s, 30000=245s,340000=2831s,790000=6670s
      (~120 rows per sec) = > ca 2h fuer 1.000.000, 20h fuer 10.000.000
    30000=65MB => ~2KB per Record => ~2GB fuer 1.000.000
    220000=440MB,810000=1.55GB,1.000.000=
    ~1.92GB x 20 = 40GB
    duration:~2:50h
  ohne Index, via Python:
    COUNT(*)                     => 20.8s
    pkey,sourceid,start,end | c_start>x&c_end<x+10s => 13.4s
    c_sourceid              | *  => 10.5s
  mit Index, via Python
    COUNT(*)                     => 9.7s,3s,2.5s
    pkey,sourceid,start,end | c_start>x&c_end<x+10s => 86ms,59ms,17ms,17ms
    c_sourceid              | *  => 9.3s,4.8s,4.8s
  vacuum analyze: 30s?
  ohne fsync, 2 gleichzeitig
    - 1108 rows pro sec in einer connection! (1.000.000=ca 15min)
    53:59m
    53:52m
  ~8 gleichzeitig
    ~20:00:-

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html
http://www.lyris.com/lm_help/7.8/tuning_postgresql.html
http://www.linuxjournal.com/article.php?sid=4791
http://www.argudo.org/postgresql/soft-tuning.php#s2

PostgreSQL Index
================
http://www.postgresql.org/docs/current/static/sql-createindex.html
http://postgis.refractions.net/docs/x511.html

"After building an index, it is  important to force PostgreSQL to collect table statistics, which are  used to optimize query plans: VACUUM ANALIZE;"

PostgreSQL provides the index methods B-tree, R-tree, hash, and GiST
- only the B-tree and GiST index methods support multicolumn indexes

---snip---
An index field can be an expression computed from the values of one or more
columns of the table row. This feature can be used to obtain fast access to
data based on some transformation of the basic data. For example, an index
computed on upper(col) would allow the clause WHERE upper(col) = 'JIM' to use
an index.
---snap---

---snip---
Note: Because of the limited utility of hash indexes, a B-tree index  should
generally be preferred over a hash index. We do not have  sufficient evidence
that hash indexes are actually faster than  B-trees even for = comparisons.
Moreover,  hash indexes require coarser locks; see Section 9.7.
---snap--

PostgreSQL Arrays
=================
---snip---
It all depends, I have found array operations to be slow. So if you have just
a few elements, like less than 10 then arrays are ok, but with more e.g. 30+
elements, as in my case, imho the whole exercise is not really feasable. I am
going to re-design the schema to get rid of the arrays as soon as I have a
moment. Also the code around that part of PostgreSQL has not been visited for
a fair while and needs some polishing up. I'd avoid them.
---snap---