bash - SQLite database design for music chart tracker -


i've been putting little sqlite database track top 100 songs itunes rss feed. i've built script in bash hard work , it's working finally, i'm not sure if database structure correct, i'm looking feedback on best way go learning sql go @ moment don't want dig myself hole when comes building queries retrieve data in time!

i have 3 tables so;

artists_table

artist_id - pk artist_name 

songs_table

song_id - pk artist_id - fk (from artists table) 

charts_table

chart_id - pk song_id - fk (from songs table) position - (chart position 1-100) date - (date of chart position xxxx-xx-xx) 

the artists , songs table seem me, got foreign key constraint working...etc i'm not sure charts table, wrong structure?

i want track songs/artists/positions on time can generate stats...etc

thanks,

initial response

i ask data, in order answer question, keep telling me process. no doubt, important you. , wish ensure record filing system correct.

personally, never write line of code until have database designed. partly because hate rewrite code (and love code). have sequence reversed, unfortunate trend these days. means, whatever give you, have rewrite large chunks of code.

(b.1) how check if artist[song] exists ?

(b.2) how know there not more occ of specific artist/song on file ?

right now, given details in question, let's have incoming, pussycat dolls place 66 on mtv chart today:

    insert artist values ( "pussycat dolls" )    -- succeeds, intended     insert artist values ( "pussycat dolls" )    -- succeeds, unintended     insert artist values ( "pussycat dolls" )    -- succeeds, unintended 
  1. exactly pussycat dolls record placed 66th today ? when rfs grows, , have more fields in artist, eg. birth_date, of 3 records update ?

  2. ditto song.

  3. how chart identified, us top 40 ?

(b.1) how check if artist[song] exists ?

when execute code, runs inside sqlite program. exact sql string pass ? let's this:

    select $artist_id = artist_id             artist             artist_name = $artist_name     if $artist_id = null         insert artist values ( $artist_name ) 

then have going have few surprises when system goes "live". interaction eliminate them. right have few hundred artists.

  • when have few thousand artists, system slow down snails pace.

  • when things go awry, have duplicate artists, songs, charts.

record filing system

right now, have pre-1970's isam record filing system, no relational integrity, power, or speed.

if understand more dangers of rfs, in todays relational context, please read this answer.

relational database

as understand it, want integrity, power, , speed of relational database. here heading towards. obviously, incomplete, unconfirmed, there may details missing, many questions remain open. have model data, data (as opposed going it, process), , nothing data.

this approach ensure many things:

  • as data grows , added (in terms of structure, not population), existing data , code not change

  • you have data , referential integrity

  • you can obtain each of stats via single select command.

  • you can execute select against data, selects not capable of dreaming about, meaning unlimited stats. long data stored in relational form.

a database collection of facts real world, limited subject area of concern. far don't have facts, have recording of incoming rss stream. , recording has no integrity, there nothing code can rely on. heading in direction of facts:

first draft music chart trd (obsolete due progression, see below.)

response comments 1

currently, tracking 1 chart, see in model has ability track several, nice!

not really. side-effect of doing things properly. issue here 1 of identification. chart position not identified rss feed id, or chart_table.id, plus positionno plus datetime. no. chart position is identified us top 100/27 apr 15/1… side effect chartname part of identifier, , allows multiple charts, no additional coding.

in these dark days of it, people write systems 1 country, , implement statecode on place. , experience massive problems when open international customer base. point is, there no such thing state not have country, state exists in context of country. identifier state must include country identifier, (countrycode, statecode). both australia , canada have nt statecode.

if can explain how store data rss feed, might clear things somewhat.

no, please. data, , data. please review previous comments on issue, , benefits.

i away main computer @ moment, respond within next couple of hours if thats ok.

no worries. tomorrow.

your model make sense me though,

that because know data values intimately, not understand data, , when lays out you, correctly, experience pleasurable little twitches of recognition.

i don't mind having recode everything, learning curve!

that's because put cart before horse, , coded against data laid out in spreadsheet, instead of designing database first , coding against second.

if not used notation, please advised every little tick, notch, , mark, solid vs dashed lines, square vs round corners, means specific. refer idef1x notation.

response comments 2

just 1 more quick question.

fire away, until satisfied.

in diagram, there disadvantage putting artist table above song table , making song table child of parent artist instead? artists can have many songs, each song can have 1 artist. there need additional table contain artistpk , songpk. not store artistpk songs table fk, song can exist if there associated artist?

  1. notice attachment way had organised. repeat:

a database collection of facts real world, limited subject area of concern.

facts logical, not physical. when facts organised correctly (normalised, designed):

you can execute select against data, selects not capable of dreaming about, meaning unlimited stats. long data stored in relational form.

when aren't, cant. sql (not reports envisioned) against data limited limitations in model, boils down 1 thing: discrete facts being recorded in logical form, or not.

with trd have progressed recording facts real world, limited scope of app, , not non-discretion of facts.

could not store artistpk songs table fk, song can exist if there associated artist?

in working context, @ moment, true. not true in real world recording. if app or scope changes, have change great slabs of db , app. if record facts correctly, exist, not limited current app scope, no such change necessary when the app or scope changes (sure, have add objects , code, not modify existing objects , code).

in real world, song , artist discrete facts, each can exist independent of other. proposition false.

  • ave maria existed 16 centuries before karen carpenter recorded it.

  • and understand , accept artist exists without `song.

is there need additional table contain artistpk , songpk.

it isn't "additional table contain artistpk , songpk", recording discrete fact (separate independent existence of artist , song), specific artist recorded specific song. fact count on in thechartdateposition`

your proposition places song dependent on, subordinate to, artist, , not true. , stats (dreamed of or not) based on song have navigate artist::artistsong, sort or order by, etc.

artists can have many songs, each song can have 1 artist.

that half-true (true in current working context, not true in real world). truth is:

  • each artist independent
    each song independent
    each artist recorded 1-to-n songs (via artistsong)
    each song recorded 1-to-n artists (via artistsong)

for understanding, changing words above form correct propositions (as opposed stating technically correct predicates):

  • artists can have many recordedsongs
    each recordedsong can have 1 artist each recordedsong can have 1 song

so yes, there disadvantages, significant ones.

which why state, you must divorce app, usage, , model data, data, , nothing data.

solution 2

i have updated trd.

second draft music chart trd

  • courier means example data; blue indicates key (primary first); pipe indicates column separation; slash indicates alternate key (only columns not in pk shown); green indicates non-key.

  • i giving predicates. these important, many reasons. main reason here, disambiguate issues discussing.

    • if more information on predicates, visit this answer, scroll down (way down!) predicate, , read section. evaluate that trd , those predicates against it.
  • the index on chartdatesong needs explanation. @ first assumed:

       pk ( chart, date, rank ) 

    but integrity purposes, search, need:

       ak ( chart, date, artistid, songid ) 

    which better pk. switched them. need both. (i don't know nonsqlite, if has clustered indices, ak, not pk should clustered.)

       pk ( chart, date, artistid, songid )     ak ( chart, date, rank ) 

response comments 3

what scenario when song enters charts same song_name record in song_table unrelated (not cover, original, happens share same name)

in civilised countries called fraud, obtaining benefit deception, try think in devilish terms moment , answer question.

well, if happens, have cater it. how feed inform of such event ? trust doesn't. song identifier still name.

and instead of unique song record being created, existing song_id added artistssongs_table artist id, wouldn't problem?

we don't know better, not problem. no 1 watching feed knows better either. if , when receive data informing of issue, through whatever channel, , can specify it, can change it.

normally have app allows navigate hierarchies, , change them, eg. referencemaintenance app, exporer-type window on left, , combo dialogues (list of occs on top, plus detail of 1 occ on bottom) on right .

until then, not form of corruption, because constraint prevents such corruption undefined. can't held guilty of breaking law hasn't been written yet. except in rogue states.

although song can have same name, doesn't mean it's same record.

yes.

wouldn't better differentiate song artist?

they are differentiated artist.

you appreciate fact of song, , fact of artist playing song, 2 discrete facts, yes ? please question predicates not mean perfect sense, propositions database supports.

  • ave maria exists independent fact, in song

  • karen carpenter, celine dion, , yours truly exist 3 independent facts, in artist

  • karen carpenter-ave maria, celine dion-ave maria, , yours truly-ave maria exist 3 discrete facts in artistsong.

  • that 7 separate facts, 1 song, 3 artists.

response comments 4

i understand now. artistsong_table 2 items "meet" , relationship exists , unique.

yes. wouldn't state in way. term fact has technically precise meaning, on , above english meaning.

a database collection of facts real world, limited subject area of concern.

perhaps read response 3 again, understanding of fact in mind.

  • each artistsong row fact. depends on fact of artist, , fact of song. establishes fact that artist recorded song. , artistsong fact 1 other facts, lower in hierarchy, depend upon.

  • "relationship ... actually". think mean "instance". relationship exists between tables, because drew line, , implement foreign key constraint. perhaps think of fact "instance".

just make sure understand idea correctly, if add "genre" mix, correct in thinking new 'independent' table genre_table created , artistsong_table inherit pk fk?

yes. classic reference or lookup table, relationship non-identifying. don't know enough music brothelry make declarations, understand it, genre applies song; artist; , artistsong (they can play song in genre different song.genre). have given me one, model that.

the consequence of is, when inserting rows in artistsong, have have genre. if in feed, , good, if not, have processing issue deal with. simple method overcome is, implement genre "", indicates you need determine other channels.

it easy enough add classifier (eg. genre) later, because non-identifying relationship. identifying items difficult add later, because force keys change. refer para 3 under response 1.

you ready data model:

third draft music chart data model


Comments

Popular posts from this blog

java - Spring Data JPA: Why findOne(id) executing delete query internally? -

python - Mongodb How to add addtional information when aggregating? -

java - Incorrect order of records in M-M relationship in hibernate -