pgdumplib

Python3 library for reading and writing pg_dump files using the custom format.

Package Version BSD

Installation

pip install pgdumplib

Documentation

API

pgdumplib exposes a load method to create a Dump instance from a pg_dump file created in the custom format.

See the Examples page to see how to read a dump or create one.

pgdumplib.load(filepath, converter=None)[source]

Load a pg_dump file created with -Fd from disk

Parameters
Raises

ValueError

Return type

pgdumplib.dump.Dump

pgdumplib.new(dbname='pgdumplib', encoding='UTF8', converter=None, appear_as='12.0')[source]

Create a new pgdumplib.dump.Dump instance

Parameters
Return type

pgdumplib.dump.Dump

The Dump class exposes methods to load an existing dump, to add an entry to a dump, to add table data to a dump, to add blob data to a dump, and to save a new dump.

There are Converters that are available to format the data that is returned by read_data(). The converter is passed in during construction of a new Dump, and is also available as an argument to pgdumplib.load().

The default converter, DataConverter will return all fields as strings, only replacing NULL with None. The SmartDataConverter will attempt to convert all columns to native Python data types.

When loading or creating a dump, the table and blob data are stored in gzip compressed data files in a temporary directory that is automatically cleaned up when the Dump instance is released.

class pgdumplib.dump.Dump(dbname='pgdumplib', encoding='UTF8', converter=None, appear_as='12.0')[source]

Create a new instance of the Dump class

Once created, the instance of Dump can be used to read existing dumps or to create new ones.

Parameters
  • dbname (str) – The database name for the dump (Default: pgdumplib)

  • encoding (str) – The data encoding (Default: UTF8)

  • converter – The data converter class to use (Default: pgdumplib.converters.DataConverter)

add_entry(desc, namespace=None, tag=None, owner=None, defn=None, drop_stmt=None, copy_stmt=None, dependencies=None, tablespace=None, tableam=None, dump_id=None)[source]

Add an entry to the dump

The namespace and tag are required.

A ValueError will be raised if desc is not value that is known in :py:module:`pgdumplib.constants`.

The section is

When adding data, use table_data_writer() instead of invoking add_entry() directly.

If dependencies are specified, they will be validated and if a dump_id is specified and no entry is found with that dump_id, a ValueError will be raised.

Other omitted values will be set to the default values will be set to the defaults specified in the pgdumplib.dump.Entry class.

The dump_id will be auto-calculated based upon the existing entries if it is not specified.

Note

The creation of ad-hoc blobs is not supported.

Parameters
  • desc (str) – The entry description

  • namespace (str) – The namespace of the entry

  • tag (str) – The name/table/relation/etc of the entry

  • owner (str) – The owner of the object in Postgres

  • defn (str) – The DDL definition for the entry

  • drop_stmt (Optional[str]) – A drop statement used to drop the entry before

  • copy_stmt (Optional[str]) – A copy statement used when there is a corresponding data section.

  • dependencies (list) – A list of dump_ids of objects that the entry is dependent upon.

  • tablespace (str) – The tablespace to use

  • tableam (str) – The table access method

  • dump_id (int) – The dump id, will be auto-calculated if left empty

Raises

ValueError

Return type

pgdumplib.dump.Entry

blobs()[source]

Iterator that returns each blob in the dump

Return type

tuple(int, bytes)

get_entry(dump_id)[source]

Return the entry for the given dump_id

Parameters

dump_id (int) – The dump ID of the entry to return.

Return type

pgdumplib.dump.Entry or None

load(path)[source]

Load the Dumpfile, including extracting all data into a temporary directory

Parameters

path (os.PathLike) – The path of the dump to load

Raises

RuntimeError

Raises

ValueError

Return type

Dump

lookup_entry(desc, namespace, tag)[source]

Return the entry for the given namespace and tag

Parameters
  • desc (str) – The desc / object type of the entry

  • namespace (str) – The namespace of the entry

  • tag (str) – The tag/relation/table name

  • section (str) – The dump section the entry is for

Raises

ValueError

Return type

pgdumplib.dump.Entry or None

save(path)[source]

Save the Dump file to the specified path

Parameters

path (os.PathLike) – The path to save the dump to

Return type

NoReturn

table_data(namespace, table)[source]

Iterator that returns data for the given namespace and table

Parameters
  • namespace (str) – The namespace/schema for the table

  • table (str) – The table name

Raises

pgdumplib.exceptions.EntityNotFoundError

Return type

Generator[Union[str, Tuple[Any, …]], None, None]

table_data_writer(entry, columns)[source]

A context manager that is used to return a TableData instance, which can be used to add table data to the dump.

When invoked for a given entry containing the table definition,

Parameters
  • entry (Entry) – The entry for the table to add data for

  • columns (list or tuple) – The ordered list of table columns

Return type

TableData

property version

Return the version as a tuple to make version comparisons easier.

Return type

tuple

class pgdumplib.dump.Entry(dump_id, had_dumper=False, table_oid='0', oid='0', tag=None, desc=None, defn=None, drop_stmt=None, copy_stmt=None, namespace=None, tablespace=None, tableam=None, owner=None, with_oids=False, dependencies=<factory>, data_state=3, offset=0)[source]

The entry model represents a single entry in the dataclass

Custom formatted dump files are primarily comprised of entries, which contain all of the metadata and DDL required to construct the database.

For table data and blobs, there are entries that contain offset locations in the dump file that instruct the reader as to where the data lives in the file.

Variables
  • dump_id (int) – The dump id, will be auto-calculated if left empty

  • had_dumper (bool) – Indicates

  • oid (str) – The OID of the object the entry represents

  • tag (str) – The name/table/relation/etc of the entry

  • desc (str) – The entry description

  • defn (str) – The DDL definition for the entry

  • drop_stmt (str) – A drop statement used to drop the entry before

  • copy_stmt (str) – A copy statement used when there is a corresponding data section.

  • namespace (str) – The namespace of the entry

  • tablespace (str) – The tablespace to use

  • tableam (str) – The table access method

  • owner (str) – The owner of the object in Postgres

  • with_oids (bool) – Indicates …

  • dependencies (list) – A list of dump_ids of objects that the entry is dependent upon.

  • data_state (int) – Indicates if the entry has data and how it is stored

  • offset (int) – If the entry has data, the offset to the data in the file

  • section (str) – The section of the dump file the entry belongs to

property section

Return the section the entry belongs to

Return type

str

class pgdumplib.dump.TableData(dump_id, tempdir, encoding)[source]

Used to encapsulate table data using temporary file and allowing for an API that allows for the appending of data one row at a time.

Do not create this class directly, instead invoke table_data_writer().

append(*args)[source]

Append a row to the table data, passing columns in as args

Column order must match the order specified when table_data_writer() was invoked.

All columns will be coerced to a string with special attention paid to None, converting it to the null marker (\N) and datetime.datetime objects, which will have the proper pg_dump timestamp format applied to them.

Return type

NoReturn

finish()[source]

Invoked prior to saving a dump to close the temporary data handle and switch the class into read-only mode.

For use by pgdumplib.dump.Dump only.

Return type

NoReturn

read()[source]

Read the data from disk for writing to the dump

For use by pgdumplib.dump.Dump only.

Return type

bytes

property size

Return the current size of the data on disk

Return type

int

Constants

Constants used in the reading and writing of a pg_dump file. There are additional undocumented constants, but they should not be of concern unless you are hacking on the library itself.

pgdumplib.constants.K_OFFSET_NO_DATA = 3

Specifies the entry has no data

pgdumplib.constants.K_OFFSET_POS_NOT_SET = 1

Specifies the entry has data but no offset

pgdumplib.constants.K_OFFSET_POS_SET = 2

Specifies the entry has data and an offset

pgdumplib.constants.MAX_VER = (1, 14, 0)

The maximum supported version of pg_dump files ot support

pgdumplib.constants.MIN_VER = (1, 12, 0)

The minumum supported version of pg_dump files ot support

pgdumplib.constants.SECTION_DATA = 'DATA'

Data section for an entry in a dump’s table of contents

pgdumplib.constants.SECTION_NONE = 'None'

Non-specific section for an entry in a dump’s table of contents

pgdumplib.constants.SECTION_POST_DATA = 'Post-Data'

Post-data section for an entry in a dump’s table of contents

pgdumplib.constants.SECTION_PRE_DATA = 'Pre-Data'

Pre-data section for an entry in a dump’s table of contents

pgdumplib.constants.VERSION = (1, 12, 0)

pg_dump file format version to create by default

Converters

When creating a new pgdumplib.dump.Dump instance, either directly or by using pgdumplib.load(), you can specify a converter class to use when reading data using the pgdumplib.dump.Dump.read_data() iterator.

The default converter (DataConverter) will only replace columns that have a NULL indicator (\N) with None.

The SmartDataConverter will attempt to convert individual columns to native Python data types.

Creating your own data converter is easy and should simply extend the DataConverter class.

class pgdumplib.converters.DataConverter[source]

Base Row/Column Converter

Base class used for converting row/column data when using the pgdumplib.dump.Dump.read_data() iterator.

This class just splits the row into individual columns and returns the row as tuple of strings, only converting \N to None.

static convert(row)[source]

Convert the string based row into a tuple of columns.

Parameters

row (str) – The row to convert

Return type

tuple

class pgdumplib.converters.NoOpConverter[source]

Performs no conversion on the row passed in

static convert(row)[source]

Returns the row passed in

Parameters

row (str) – The row to convert

Return type

str

class pgdumplib.converters.SmartDataConverter[source]

Attempts to convert columns to native Python data types

Used for converting row/column data with the pgdumplib.dump.Dump.read_data() iterator.

Possible conversion types:

convert(row)[source]

Convert the string based row into a tuple of columns

Return type

Tuple[Union[None, str, int, datetime, Decimal, IPv4Address, IPv4Network, IPv6Address, IPv6Network, UUID], …]

Exceptions

pgdumplib specific exceptions

exception pgdumplib.exceptions.EntityNotFoundError(namespace, table)[source]

Raised when an attempt is made to read data from a relation in a dump file but it is not found in the table of contents.

This can happen if a schema-only dump was created OR if the namespace and table specified were not found.

exception pgdumplib.exceptions.NoDataError[source]

Raised when attempting to work with data when do data entries exist

exception pgdumplib.exceptions.PgDumpLibException[source]

Common Base Exception

Examples

Reading

First, create a dump of your database using pg_dump using the custom format:

pg_dump -Fc -d [YOUR] dump.out

The following example shows how to get the data and the table definition from a dump:

import pgdumplib
from pgdumplib import constants

dump = pgdumplib.load('dump.out')
for row in dump.table_data('public', 'table-name'):
    print(row)
print(dump.lookup_entry(constants.TABLE, 'public', 'table-name').defn)

Writing

To create a dump, you need to add sections. The following example shows how to create a dump with a schema, extension, comment, type, tables, and table data:

import datetime
import uuid

import pgdumplib
from pgdumplib import constants

dump = pgdumpib.new('example')

schema = dump.add_entry(
    desc=constants.SCHEMA,
    tag='test',
    defn='CREATE SCHEMA test;',
    drop_stmt='DROP SCHEMA test;')

dump.add_entry(
    desc=constants.ACL,
    tag='SCHEMA test',
    defn='GRANT USAGE ON SCHEMA test TO PUBLIC;',
    dependencies=[schema.dump_id])

uuid_ossp = dump.add_entry(
    desc=constants.EXTENSION,
    tag='uuid-ossp',
    defn='CREATE EXTENSION IF NOT EXISTS "uuid-ossp" WITH SCHEMA public;',
    drop_stmt='DROP EXTENSION "uuid-ossp";')

dump.add_entry(
    desc=constants.COMMENT,
    tag='EXTENSION "uuid-ossp"',
    defn="""COMMENT ON EXTENSION "uuid-ossp" IS generate universally unique identifiers (UUIDs)'""",
    dependencies=[uuid_ossp.dump_id])

addr_type = dump.add_entry(
    desc=constants.TYPE,
    namespace='test',
    tag='address_type',
    owner='postgres',
    defn="""\
    CREATE TYPE test.address_type AS ENUM ('billing', delivery');""",
    drop_stmt='DROP TYPE test.address_type;',
    dependencies=[schema.dump_id])

test_addresses = dump.add_entry(
    desc=constants.TABLE,
    namespace='test',
    tag='addresses',
    owner='postgres',
    defn="""\
    CREATE TABLE addresses (
        id               UUID                     NOT NULL DEFAULT uuid_generate_v4() PRIMARY KEY,
        created_at       TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
        last_modified_at TIMESTAMP WITH TIME ZONE,
        user_id          UUID                     NOT NULL REFERENCES users (id) ON DELETE CASCADE ON UPDATE CASCADE,
        type             address_type             NOT NULL,
        address1         TEXT                     NOT NULL,
        address2         TEXT,
        address3         TEXT,
        locality         TEXT                     NOT NULL,
        region           TEXT,
        postal_code      TEXT                     NOT NULL,
        country          TEXT                     NOT NULL
    );""",
    drop_stmt='DROP TABLE test.addresses;',
    dependencies=[schema.dump_id, addr_type.dump_id, uuid_ossp.dump_id])

example = dump.add_entry(
    constants.TABLE,
    'public', 'example', 'postgres',
    'CREATE TABLE public.example (\
        id UUID NOT NULL PRIMARY KEY,\
        created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,\
        value TEXT NOT NULL);',
    'DROP TABLE public.example')

with dump.table_data_writer(example, ['id', 'created_at', 'value']) as writer:
    writer.append(uuid.uuid4(), datetime.datetime.utcnow(), 'row1');
    writer.append(uuid.uuid4(), datetime.datetime.utcnow(), 'row2');
    writer.append(uuid.uuid4(), datetime.datetime.utcnow(), 'row3');
    writer.append(uuid.uuid4(), datetime.datetime.utcnow(), 'row4');
    writer.append(uuid.uuid4(), datetime.datetime.utcnow(), 'row5');

dump.save('custom.dump')

Indices and tables