mercredi 22 avril 2015

unsupported type error during pandas DataFrame.to_sql

I am trying using pandas to analyze some json data and later write to a sqlite3 database.

The input data is loaded from here and I only care about the 'bugs' information. Since there are list(for blocks, depends_on, flags, keywords) and null (for dupe_of), I use json_normalize to generate the DataFrame df. But when I use to_sql:

engine = create_engine('sqlite:///mydb.db')
df.to_sql("mydb", engine, index=False, if_exists='append')

It reports error:

  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 966, in to_sql
    dtype=dtype)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 538, in to_sql
    chunksize=chunksize, dtype=dtype)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 1172, in to_sql
    table.insert(chunksize)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 717, in insert
    self._execute_insert(conn, keys, chunk_iter)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 692, in _execute_insert
    conn.execute(self.insert_statement(), data)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 841, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/elements.py", line 322, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 938, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1070, in _execute_context
    context)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1271, in _handle_dbapi_exception
    exc_info
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1063, in _execute_context
    context)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 442, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.InterfaceError: (InterfaceError) Error binding parameter 0 - probably unsupported type. u'INSERT INTO "MozillaRepo" (blocks, classification, component, creatio
n_time, depends_on, dupe_of, flags, id, is_confirmed, is_open, keywords, last_change_time, op_sys, platform, priority, product, resolution, severity, status, summary, version) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)' ([], u'Client Software', u'Preferences', u'1999-07-30T22:55:51Z', [], None, []  ... displaying 10 of 22 total bound parameter sets ...  u'Dialup properties needs to be exposed in prefs', u'Trunk')

From the last line of the message, it should be the data type issue. But I don't know how to correct it.

The result of df.dtypes is:

blocks              object
classification      object
component           object
creation_time       object
depends_on          object
dupe_of             object
flags               object
id                   int64
is_confirmed          bool
is_open               bool
keywords            object
last_change_time    object
op_sys              object
platform            object
priority            object
product             object
resolution          object
severity            object
status              object
summary             object
version             object
dtype: object

And I can see that the database table is created as:

CREATE TABLE "mydb" (
        blocks TEXT, 
        classification TEXT, 
        component TEXT, 
        creation_time TEXT, 
        depends_on TEXT, 
        dupe_of TEXT, 
        flags TEXT, 
        id BIGINT, 
        is_confirmed BOOLEAN, 
        is_open BOOLEAN, 
        keywords TEXT, 
        last_change_time TEXT, 
        op_sys TEXT, 
        platform TEXT, 
        priority TEXT, 
        product TEXT, 
        resolution TEXT, 
        severity TEXT, 
        status TEXT, 
        summary TEXT, 
        version TEXT, 
        CHECK (is_confirmed IN (0, 1)), 
        CHECK (is_open IN (0, 1)), 
        CHECK (is_open IN (0, 1)), 
        CHECK (is_confirmed IN (0, 1))
)

Aucun commentaire:

Enregistrer un commentaire