Age | Commit message (Collapse) | Author | Files | Lines | |
---|---|---|---|---|---|
2014-05-11 | Catch IOException and print faulty tweet number | Peter Wu | 1 | -1/+2 | |
On IOException (via GZipInputStream), the tweet would be missed. Catch that too and print the tweet number in other exceptional cases. | |||||
2014-05-11 | run.sh: use build/classes instead of jar | Peter Wu | 1 | -0/+6 | |
2014-05-11 | Support compressed tweets | Peter Wu | 2 | -0/+21 | |
2014-05-11 | Wrap queries in a transaction with rollback | Peter Wu | 1 | -0/+14 | |
If a tweet cannot be processed, then it must be faulty. Do not fill the database with useless information. | |||||
2014-05-11 | Are you stupid? Converting SQLException to RuntimeException?! | Peter Wu | 1 | -15/+10 | |
2014-05-11 | More data sanization (decode HTML entities, location fix) | Peter Wu | 3 | -4/+37 | |
2014-05-11 | Report status while importing (add --status option) | Peter Wu | 3 | -13/+112 | |
* Add status reporting to know how many tweets are already imported. * Remove spam from DataFiller when no brand is detected | |||||
2014-05-11 | Add --dbport option, override default auth parameters | Peter Wu | 1 | -2/+5 | |
Login as postgres (superuser!) is not really secure... | |||||
2014-05-10 | More nullable annotations, print tweets for parse errors too | Peter Wu | 3 | -1/+11 | |
Nullable is based on Twitter platform (users, tweets) documentation. Now the line number is also printed. | |||||
2014-05-10 | Fix tweetUrl, userUrl table names, fix retweets | Peter Wu | 2 | -2/+7 | |
2014-05-10 | Move SQLException processing in processTweet to Main | Peter Wu | 2 | -37/+38 | |
Huge diff comes from whitespace diff. Now the tweet is printed with its line number in error cases, and all further insertions are aborted. | |||||
2014-05-10 | Convert long to int where possible, fix retweetid | Peter Wu | 4 | -15/+24 | |
If there is no retweet, the retweetid must really be NULL, not 0. In order to better match the database, convert some types to integers too. | |||||
2014-05-10 | user: Add verified, description, fix coordinates type | Peter Wu | 4 | -2/+17 | |
2014-05-10 | Fix timestamp type | Peter Wu | 2 | -7/+10 | |
Without this cast, setTimestamp would complain that a timestamp is expected, but a text type is given. | |||||
2014-05-10 | Extend tests with date tests, fix created_at values | Peter Wu | 1 | -4/+37 | |
2014-05-10 | Convert String to DateTime (for created_at) | Peter Wu | 8 | -6/+77 | |
Extra efforts are done to keep the timezone information. | |||||
2014-05-10 | Centralize Gson creation and registration | Peter Wu | 5 | -20/+49 | |
2014-05-10 | More query fixes (incorrect field names, table alias) | Peter Wu | 2 | -4/+4 | |
2014-05-10 | Split url to tweetUrl and userUrl, misc fixes | Peter Wu | 3 | -56/+45 | |
* Insert User URLs. * Fix hash insertion query (copy/paste error...). * Split url to tweetUrl and userUrl as there is no "url" table anymore. * Do not execute queries directly via getStmt(), but execute them via NamedPreparedStatement such that faulty queries can be printed. * Fix buildQuery for more than two primary keys. * Drop comments from prepared statements members in DataFiller, there is nothing that you cannot learn from the variable name. Besides there was a copy/paste error for mentions. * Change order of insertion to ensure consistency. * Inline setting parameters for queries, it is now more transparant. | |||||
2014-05-10 | NamedPreparedStatement: fix query parameters | Peter Wu | 1 | -3/+3 | |
2014-05-10 | Merge remote-tracking branch 'origin/master' | Peter Wu | 0 | -0/+0 | |
Conflicts: src/main/Main.java [pretty much ignored the help text change] | |||||
2014-05-10 | Main: print tweet number (not id) | Peter Wu | 1 | -1/+2 | |
2014-05-10 | Check for errors if a validation error occurred | Peter Wu | 2 | -12/+9 | |
This gives more helpful messages when the primitive does not match (object or array instead of string). | |||||
2014-05-10 | Coordinates is an object with an array | Peter Wu | 4 | -8/+86 | |
Ensure that the array is of a fixed length, add tests to check for that. | |||||
2014-05-10 | Verify that an array and object are really array (and objects) | Peter Wu | 2 | -5/+21 | |
2014-05-10 | Detect wrong type for string | Peter Wu | 2 | -0/+14 | |
2014-05-10 | Add validator debugger where the wrong type is returned | Peter Wu | 4 | -2/+65 | |
2014-05-10 | user.time_zone can be null too | Peter Wu | 2 | -1/+5 | |
2014-05-09 | Entities can be missing, user.place is not a string | Peter Wu | 5 | -9/+71 | |
* User: place is not a string but a Place object. * User: entities is nullable. * Tweet: in_reply_to_user_id, coordinates is nullable. * ValidatingJsonDeserializer: Treat null values as missing fields. * ValidatingJsonDeserializerTest: Test for null values. | |||||
2014-05-09 | Trace object path for JSON validation errors. | Peter Wu | 4 | -16/+18 | |
2014-05-09 | Add "--skipdb" option to allow printing results | Peter Wu | 1 | -1/+20 | |
For testing whether the data is correct or not. | |||||
2014-05-09 | First try to get a reader, then try to open database | Peter Wu | 1 | -23/+24 | |
Connecting to the database is probably more expensive, so try to read data first. | |||||
2014-05-09 | Use upsert queries, convert to named parameter statements | Peter Wu | 3 | -108/+190 | |
Get rid of ispostedby, move it to tweet table ("userid") | |||||
2014-05-09 | Get rid of useless method | Peter Wu | 2 | -57/+44 | |
2014-05-09 | Don't reject missing arguments | Peter Wu | 1 | -4/+0 | |
Now that --dbhost is optional, the argument list might be empty. | |||||
2014-05-09 | Only catch IllegalArgumentException for argument parsing | Peter Wu | 1 | -3/+6 | |
2014-05-09 | Fixup help text, allow to override DB settings | Peter Wu | 1 | -25/+34 | |
2014-05-09 | Get rid of DBConnection | Peter Wu | 4 | -105/+121 | |
That construct was hiding the Connection instance. Very bad abstraction as I really need it to support transactions. While at it, make the prepared statement objects final such that it is detected when those are not properly initialized. In Main, use try-with-resources, remove a noisy "exit succesfull" [sic] message. Due to the extra try/catch for the db connection (RuntimeException is not OK my friend!), the indentation had to be changed. | |||||
2014-05-09 | Remove debug prints, add comments | Peter Wu | 1 | -1/+3 | |
2014-05-09 | Add missing annotation for retweeted_status, more tests | Peter Wu | 2 | -0/+54 | |
2014-05-09 | Fixed a typo in the help message. | Maurice Laveaux | 1 | -3/+3 | |
2014-05-09 | Add ValidatingJsonDeserializerTest (JUnit) | Peter Wu | 2 | -1/+440 | |
2014-05-09 | Add missing validation annotations, ... | Peter Wu | 4 | -19/+62 | |
* Add missing Validator annotations for User. * Add entities and url properties for user. * Properly do a recursive check if an Validator annotation is present. | |||||
2014-05-08 | Get rid of unused crap | Peter Wu | 4 | -302/+0 | |
2014-05-08 | Replace JSON by GSON, adding extra validations | Peter Wu | 8 | -149/+248 | |
Also change reader method, tweets are not received via an observed but by submitting from the caller. Added TODO WTF here and there, formatted with Alt + Shift + F. | |||||
2014-05-08 | Add json containers for Tweet and User objects | Peter Wu | 5 | -1/+150 | |
2014-05-07 | Merge origin/master | S129778 | 1 | -1/+1 | |
2014-05-07 | data querries into database | S129778 | 5 | -14/+8391 | |
2014-05-07 | Fixed a small typo, tweets contain "user". | Maurice Laveaux | 1 | -1/+1 | |
2014-05-07 | Removed the extranous DBQuery and simpel property | Maurice Laveaux | 7 | -62/+84 | |
* InputReader reads from Scanner. |