this morning it was a sad moment. Normally people ask me all the time about SQL optimization or SQL tuning or when to create an index and so on.
This morning i had another DOOOH moment.
Basically while trying to optimize BinBase even further I noticed that some queries take an absurd amount of time to finish. Like 500 seconds to query the bin table.
So I started to investigate and discovered that some of my queries are using seq scans over huge tables. For no apparent reason.
It turned out that the sort column was not indexed…
CREATE INDEX bin_retention_index ON bin (retention_index ASC NULLS LAST);
which was greatly improved after executing this statement.
Now there was still a huge wait time for a certain query
explain select * from BIN where bin_id not in ( SELECT bin_id from SPECTRA where sample_id = 1630733 and bin_id is not null )ORDER BY retention_index ASC
and it turned out that indexes were never used for the ‘is not null’ part. After some research and a lot of head scratching it turned out that postgres supports partial indexes for exactly this case.
create index spectra_bin_id_partial_is_not_null on spectra (bin_id) where bin_id is not null
Now afterward we some improvement but the actually slow data access is caused by something else. We always need the complete table – a few compounds and this operation takes some time.
Time to optimize the internal database lookup…