AgeCommit message (Collapse)AuthorFilesLines
2014-06-02Get rid of tweets.txtHEADmasterMaurice Laveaux1-8170/+0
2014-06-02Added --cat command, fixed null bytes in usernamesMaurice Laveaux3-4/+24
2014-06-02Replace reply(user)id by replytweetidPeter Wu3-7/+7
2014-06-02Reverted the hacky try-catch fixMaurice Laveaux1-18/+14
2014-06-02Removes null bytes from tweet text and unused brandcheckerMaurice Laveaux2-46/+32
2014-05-28added nullable fixdaanpeters2-0/+4
2014-05-20Reply user ID can be NULLPeter Wu2-2/+6
2014-05-12Fix status reportingPeter Wu2-1/+8
2014-05-12Merge branch 'master' of Wu2-5/+8
2014-05-12Print helpful messagePeter Wu1-1/+4
2014-05-12Locale adjustmentS1297782-5/+8
2014-05-12Fix tests for nullable propertiesPeter Wu1-5/+18
2014-05-11Catch IOException and print faulty tweet numberPeter Wu1-1/+2
On IOException (via GZipInputStream), the tweet would be missed. Catch that too and print the tweet number in other exceptional cases. use build/classes instead of jarPeter Wu1-0/+6
2014-05-11Support compressed tweetsPeter Wu2-0/+21
2014-05-11Wrap queries in a transaction with rollbackPeter Wu1-0/+14
If a tweet cannot be processed, then it must be faulty. Do not fill the database with useless information.
2014-05-11Are you stupid? Converting SQLException to RuntimeException?!Peter Wu1-15/+10
2014-05-11More data sanization (decode HTML entities, location fix)Peter Wu3-4/+37
2014-05-11Report status while importing (add --status option)Peter Wu3-13/+112
* Add status reporting to know how many tweets are already imported. * Remove spam from DataFiller when no brand is detected
2014-05-11Add --dbport option, override default auth parametersPeter Wu1-2/+5
Login as postgres (superuser!) is not really secure...
2014-05-10More nullable annotations, print tweets for parse errors tooPeter Wu3-1/+11
Nullable is based on Twitter platform (users, tweets) documentation. Now the line number is also printed.
2014-05-10Fix tweetUrl, userUrl table names, fix retweetsPeter Wu2-2/+7
2014-05-10Move SQLException processing in processTweet to MainPeter Wu2-37/+38
Huge diff comes from whitespace diff. Now the tweet is printed with its line number in error cases, and all further insertions are aborted.
2014-05-10Convert long to int where possible, fix retweetidPeter Wu4-15/+24
If there is no retweet, the retweetid must really be NULL, not 0. In order to better match the database, convert some types to integers too.
2014-05-10user: Add verified, description, fix coordinates typePeter Wu4-2/+17
2014-05-10Fix timestamp typePeter Wu2-7/+10
Without this cast, setTimestamp would complain that a timestamp is expected, but a text type is given.
2014-05-10Extend tests with date tests, fix created_at valuesPeter Wu1-4/+37
2014-05-10Convert String to DateTime (for created_at)Peter Wu8-6/+77
Extra efforts are done to keep the timezone information.
2014-05-10Centralize Gson creation and registrationPeter Wu5-20/+49
2014-05-10More query fixes (incorrect field names, table alias)Peter Wu2-4/+4
2014-05-10Split url to tweetUrl and userUrl, misc fixesPeter Wu3-56/+45
* Insert User URLs. * Fix hash insertion query (copy/paste error...). * Split url to tweetUrl and userUrl as there is no "url" table anymore. * Do not execute queries directly via getStmt(), but execute them via NamedPreparedStatement such that faulty queries can be printed. * Fix buildQuery for more than two primary keys. * Drop comments from prepared statements members in DataFiller, there is nothing that you cannot learn from the variable name. Besides there was a copy/paste error for mentions. * Change order of insertion to ensure consistency. * Inline setting parameters for queries, it is now more transparant.
2014-05-10NamedPreparedStatement: fix query parametersPeter Wu1-3/+3
2014-05-10Merge remote-tracking branch 'origin/master'Peter Wu0-0/+0
Conflicts: src/main/ [pretty much ignored the help text change]
2014-05-10Main: print tweet number (not id)Peter Wu1-1/+2
2014-05-10Check for errors if a validation error occurredPeter Wu2-12/+9
This gives more helpful messages when the primitive does not match (object or array instead of string).
2014-05-10Coordinates is an object with an arrayPeter Wu4-8/+86
Ensure that the array is of a fixed length, add tests to check for that.
2014-05-10Verify that an array and object are really array (and objects)Peter Wu2-5/+21
2014-05-10Detect wrong type for stringPeter Wu2-0/+14
2014-05-10Add validator debugger where the wrong type is returnedPeter Wu4-2/+65
2014-05-10user.time_zone can be null tooPeter Wu2-1/+5
2014-05-09Entities can be missing, is not a stringPeter Wu5-9/+71
* User: place is not a string but a Place object. * User: entities is nullable. * Tweet: in_reply_to_user_id, coordinates is nullable. * ValidatingJsonDeserializer: Treat null values as missing fields. * ValidatingJsonDeserializerTest: Test for null values.
2014-05-09Trace object path for JSON validation errors.Peter Wu4-16/+18
2014-05-09Add "--skipdb" option to allow printing resultsPeter Wu1-1/+20
For testing whether the data is correct or not.
2014-05-09First try to get a reader, then try to open databasePeter Wu1-23/+24
Connecting to the database is probably more expensive, so try to read data first.
2014-05-09Use upsert queries, convert to named parameter statementsPeter Wu3-108/+190
Get rid of ispostedby, move it to tweet table ("userid")
2014-05-09Get rid of useless methodPeter Wu2-57/+44
2014-05-09Don't reject missing argumentsPeter Wu1-4/+0
Now that --dbhost is optional, the argument list might be empty.
2014-05-09Only catch IllegalArgumentException for argument parsingPeter Wu1-3/+6
2014-05-09Fixup help text, allow to override DB settingsPeter Wu1-25/+34
2014-05-09Get rid of DBConnectionPeter Wu4-105/+121
That construct was hiding the Connection instance. Very bad abstraction as I really need it to support transactions. While at it, make the prepared statement objects final such that it is detected when those are not properly initialized. In Main, use try-with-resources, remove a noisy "exit succesfull" [sic] message. Due to the extra try/catch for the db connection (RuntimeException is not OK my friend!), the indentation had to be changed.