CodeSOD: Swap the Workaround


Blane D is responsible for loading data into a Vertica 8.1 database for analysis. Vertica is a distributed, column-oriented store, for data-warehousing applications, and its driver has certain quirks.

For example, a common task that you might need to perform is swapping storage partitions around between tables to facilitate bulk data-loading. Thus, there is a SWAP_PARTITIONS_BETWEEN_TABLES() stored procedure. Unfortunately, if you call this function from within a prepared statement, one of two things will happen: the individual node handling the request will crash, or the entire cluster will crash.

No problem, right? Just don’t use a prepared statement. Unfortunately, if you use the ODBC driver for Python, every statement is converted to a prepared statement. There’s a JDBC driver, and a bridge to enable it from within Python, but it also has that problem, and it has the added cost of requiring a a JVM running.

So Blane did what any of us would do in this situation: he created a hacky-workaround that does the job, but requires thorough apologies.

def run_horrible_partition_swap_hack(self, horrible_src_table, horrible_src_partition,
                                     terrible_dest_table, terrible_dest_partition):
    """
    First things first - I'm sorry, I am a terrible person.

    This is a horrible horrible hack to avoid Vertica's partition swap bug and should be removed once patched!

    What does this atrocity do?
    It passes our partition swap into the ODBC connection string so that it gets executed outside of a prepared
    statement... that's it... I'm sorry.
    """
    conn = self.get_connection(getattr(self, self.conn_name_attr))

    hacky_sql = "SELECT SWAP_PARTITIONS_BETWEEN_TABLES('{src_table}',{src_part},{dest_part},'{dest_table}')"\
        .format(src_table=horrible_src_table,
                src_part=horrible_src_partition,
                dest_table=terrible_dest_table,
                dest_part=terrible_dest_partition)

    even_hackier_sql = hacky_sql.replace(' ', '+')

    conn_string = ';'.join(["DSN={}".format(conn.host),
                            "DATABASE={}".format(conn.schema) if conn.schema else '',
                            "UID={}".format(conn.uid) if conn.uid else '',
                            "PWD={}".format(conn.password) if conn.password else '',
                            "ConnSettings={}".format(even_hackier_sql)])  # :puke:

    odbc_conn = pyodbc.connect(conn_string)

    odbc_conn.close()

Specifically, Blane leverages the driver’s ConnSettings option in the connection string, which allows you to execute a command when a client connects to the database. This is specifically meant for setting up session parameters, but it also has the added “benefit” of not performing that action using a prepared statement.

If it’s stupid but it works, it’s probably the vendor’s fault, I suppose.

[Advertisement] Onsite, remote, bare-metal or cloud – create, configure and orchestrate 1,000s of servers, all from the same dashboard while continually monitoring for drift and allowing for instantaneous remediation. Download Otter today!

http://ift.tt/2tgstaN

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s