postgresql - One large table or many small ones in database? -
say want create typical todo-webapp using db postgresql. user should able create todo-lists. on lists should able make actual todo-entries.
i regard todo-list object has different properties owner, name, etc, , of course actual todo-entries have own properties content, priority, date ... .
my idea create table todo-lists of users. in table store attributes of each list. questions arises how store todo-entries themselves? of course in additional table, should rather:
1. create 1 big table entries , have field storing id of todo-list belong to, so:
todo-list: id, owner, ... todo-entries: list.id, content, ... which give 2 tables in total. todo-entries table large. although know entries expire, hence table grows more usage not on time. write select * todo-entries todo-list-id=id id of list trying retrieve.
or
2. create todo-entries table on per user basis.
todo-list: id, owner, ... todo-entries-owner: list.id, content,. .. number of entries table depends on number of users in system. select * todo-entries-owner. mid-sized tables depending on number of entries users in total.
or
3. create 1 todo-entries-table for each todo-list , store generated table name in field table. instance use todos-list unique id in table name like:
todo-list: id, owner, entries-list-name, ... todo-entries-id: content, ... //the id part id todo-list id field. in third case potentially have quite large number of tables. user might create many 'short' todo-lists. retrieve list go along lines select * todo-entries-id todo-entries-id should either field in todo-list or done implicitly concatenating 'todo-entries' todos-list unique id. btw.: how do that, should done in js or can done in postgresql directly? , related this: in select * <tablename> statement, possible have value of field of other table <tablename>? select * todo-list(id).entries-list-name or so.
the 3 possibilities go few large many small tables. personal feeling second or third solutions better. think might scale better. i'm not sure quite sure of , know 'typical' approach is.
i go more in depth of think of each of approaches, point of question:
- which of 3 possibilities should go for? (or else, has normalization?)
follow up:
- what (postgresql) statements like?
the viable option first. far easier manage , faster other options.
image have 1 million users, average of 3 to-do lists each, average of 5 entries per list.
scenario 1
in first scenario have 3 tables:
todo_users: 1 million recordstodo_lists: 3 million recordstodo_entries: 15 million records
such table sizes no problem postgresql , right indexes able retrieve data in less second (meaning simple queries; if queries become more complex (like: me todo_entries longest todo_list of top 15% of todo_users have made less 3 todo_lists in 3-month period highest todo_entries entered) slower (as in other scenarios). queries straightforward:
-- find user data based on username entered in web site -- index on 'username' essential here select * todo_users username = ?; -- find to-do lists user userid has been retrieved previous query select * todo_lists userid = ?; -- find entries to-do list based on todoid select * todo_entries listid = ?; you can combine 3 queries one:
select u.*, l.*, e.* -- or select appropriate columns 3 tables todo_users u left join todo_lists l on l.userid = u.id left join todo_entries e on e.listid = l.id u.username = ?; use of left joins means data users without lists or lists without entries (but column values null).
inserting, updating , deleting records can done similar statements , fast.
postgresql stores data on "pages" (typically 4kb in size) , pages filled, thing because reading writing page very slow compared other operations.
scenario 2
in scenario need 2 tables per user (todo_lists , todo_entries) need mechanism identify tables query.
- 1 million
todo_liststables few records each - 1 million
todo_entriestables few dozen records each
the practical solution construct full table names "basename" related username or other persistent authentication data web site. this:
username = 'jerry'; todo_list = username + '_lists'; todo_entries = username + '_entries'; and query table names. more need todo_users table anyway store personal data, usernames , passwords of 1 million users.
in cases tables small , postgresql not use indexes (nor have to). have more trouble finding appropriate tables, though, , build queries in code , feed them postgresql, meaning cannot optimize query plan. bigger problem creating tables new users (todo_list , todo_entries) or deleting obsolete lists or users. typically requires behind-the scenes housekeeping avoid previous scenario. , biggest performance penalty pages have little content waste disk space , lots of time reading , writing partially filled pages.
scenario 3
this scenario worse scenario 2. don't it, it's madness.
- 3 million tables
todo_entriesfew records each
so...
stick option 1. real option.
Comments
Post a Comment