postgresql - One large table or many small ones in database? -

say want create typical todo-webapp using db postgresql. user should able create todo-lists. on lists should able make actual todo-entries.

i regard todo-list object has different properties owner, name, etc, , of course actual todo-entries have own properties content, priority, date ... .

my idea create table todo-lists of users. in table store attributes of each list. questions arises how store todo-entries themselves? of course in additional table, should rather:

1. create 1 big table entries , have field storing id of todo-list belong to, so:

todo-list: id, owner, ... todo-entries: list.id, content, ...

which give 2 tables in total. todo-entries table large. although know entries expire, hence table grows more usage not on time. write select * todo-entries todo-list-id=id id of list trying retrieve.

2. create todo-entries table on per user basis.

todo-list: id, owner, ... todo-entries-owner: list.id, content,. ..

number of entries table depends on number of users in system. select * todo-entries-owner. mid-sized tables depending on number of entries users in total.

3. create 1 todo-entries-table for each todo-list , store generated table name in field table. instance use todos-list unique id in table name like:

todo-list: id, owner, entries-list-name, ...     todo-entries-id: content, ... //the id part id todo-list id field.

in third case potentially have quite large number of tables. user might create many 'short' todo-lists. retrieve list go along lines select * todo-entries-id todo-entries-id should either field in todo-list or done implicitly concatenating 'todo-entries' todos-list unique id. btw.: how do that, should done in js or can done in postgresql directly? , related this: in select * <tablename> statement, possible have value of field of other table <tablename>? select * todo-list(id).entries-list-name or so.

the 3 possibilities go few large many small tables. personal feeling second or third solutions better. think might scale better. i'm not sure quite sure of , know 'typical' approach is.

i go more in depth of think of each of approaches, point of question:

which of 3 possibilities should go for? (or else, has normalization?)

follow up:

what (postgresql) statements like?

the viable option first. far easier manage , faster other options.

image have 1 million users, average of 3 to-do lists each, average of 5 entries per list.

scenario 1

in first scenario have 3 tables:

todo_users: 1 million records
todo_lists: 3 million records
todo_entries: 15 million records

such table sizes no problem postgresql , right indexes able retrieve data in less second (meaning simple queries; if queries become more complex (like: me todo_entries longest todo_list of top 15% of todo_users have made less 3 todo_lists in 3-month period highest todo_entries entered) slower (as in other scenarios). queries straightforward:

-- find user data based on username entered in web site -- index on 'username' essential here select * todo_users username = ?;  -- find to-do lists user userid has been retrieved previous query select * todo_lists userid = ?;  -- find entries to-do list based on todoid select * todo_entries listid = ?;

you can combine 3 queries one:

select u.*, l.*, e.* -- or select appropriate columns 3 tables todo_users u left join todo_lists l on l.userid = u.id left join todo_entries e on e.listid = l.id u.username = ?;

use of left joins means data users without lists or lists without entries (but column values null).

inserting, updating , deleting records can done similar statements , fast.

postgresql stores data on "pages" (typically 4kb in size) , pages filled, thing because reading writing page very slow compared other operations.

scenario 2

in scenario need 2 tables per user (todo_lists , todo_entries) need mechanism identify tables query.

1 million todo_lists tables few records each
1 million todo_entries tables few dozen records each

the practical solution construct full table names "basename" related username or other persistent authentication data web site. this:

username = 'jerry'; todo_list = username + '_lists'; todo_entries = username + '_entries';

and query table names. more need todo_users table anyway store personal data, usernames , passwords of 1 million users.

in cases tables small , postgresql not use indexes (nor have to). have more trouble finding appropriate tables, though, , build queries in code , feed them postgresql, meaning cannot optimize query plan. bigger problem creating tables new users (todo_list , todo_entries) or deleting obsolete lists or users. typically requires behind-the scenes housekeeping avoid previous scenario. , biggest performance penalty pages have little content waste disk space , lots of time reading , writing partially filled pages.

scenario 3

this scenario worse scenario 2. don't it, it's madness.

3 million tables todo_entries few records each

so...

stick option 1. real option.

Search This Blog

Ruby Code