pgdumplib¶
Python3 library for reading and writing pg_dump files using the custom format.
Installation¶
pip install pgdumplib
Documentation¶
API¶
pgdumplib exposes a load method to create a Dump
instance from a pg_dump file created in the custom format.
See the Examples page to see how to read a dump or create one.
-
pgdumplib.
load
(filepath, converter=None)[source]¶ Load a pg_dump file created with -Fd from disk
- Parameters
filepath (os.PathLike) – The path to the dump to load
converter (pgdumplib.converters.DataConverter or None) – The data converter class to use (Default:
pgdumplib.converters.DataConverter
)
- Raises
- Return type
-
pgdumplib.
new
(dbname='pgdumplib', encoding='UTF8', converter=None, appear_as='12.0')[source]¶ Create a new
pgdumplib.dump.Dump
instance- Parameters
dbname (
str
) – The database name for the dump (Default:pgdumplib
)encoding (
str
) – The data encoding (Default:UTF8
)converter (pgdumplib.converters.DataConverter or None) – The data converter class to use (Default:
pgdumplib.converters.DataConverter
)appear_as (
str
) – The version of Postgres to emulate (Default:12.0
)
- Return type
The Dump
class exposes methods to
load
an existing dump,
to add an entry
to a dump,
to add table data
to a dump,
to add blob data
to a dump,
and to save
a new dump.
There are Converters that are available to format the data that is
returned by read_data()
. The converter
is passed in during construction of a new Dump
,
and is also available as an argument to pgdumplib.load()
.
The default converter, DataConverter
will
return all fields as strings, only replacing NULL
with
None
. The SmartDataConverter
will attempt to convert all columns to native Python data types.
When loading or creating a dump, the table and blob data are stored in
gzip compressed data files in a temporary directory that is automatically
cleaned up when the Dump
instance is released.
-
class
pgdumplib.dump.
Dump
(dbname='pgdumplib', encoding='UTF8', converter=None, appear_as='12.0')[source]¶ Create a new instance of the
Dump
classOnce created, the instance of
Dump
can be used to read existing dumps or to create new ones.- Parameters
dbname (str) – The database name for the dump (Default:
pgdumplib
)encoding (str) – The data encoding (Default:
UTF8
)converter – The data converter class to use (Default:
pgdumplib.converters.DataConverter
)
-
add_entry
(desc, namespace=None, tag=None, owner=None, defn=None, drop_stmt=None, copy_stmt=None, dependencies=None, tablespace=None, tableam=None, dump_id=None)[source]¶ Add an entry to the dump
The
namespace
andtag
are required.A
ValueError
will be raised if desc is not value that is known in :py:module:`pgdumplib.constants`.The section is
When adding data, use
table_data_writer()
instead of invokingadd_entry()
directly.If
dependencies
are specified, they will be validated and if adump_id
is specified and no entry is found with thatdump_id
, aValueError
will be raised.Other omitted values will be set to the default values will be set to the defaults specified in the
pgdumplib.dump.Entry
class.The
dump_id
will be auto-calculated based upon the existing entries if it is not specified.Note
The creation of ad-hoc blobs is not supported.
- Parameters
desc (str) – The entry description
namespace (str) – The namespace of the entry
tag (str) – The name/table/relation/etc of the entry
owner (str) – The owner of the object in Postgres
defn (str) – The DDL definition for the entry
drop_stmt (
Optional
[str
]) – A drop statement used to drop the entry beforecopy_stmt (
Optional
[str
]) – A copy statement used when there is a corresponding data section.dependencies (list) – A list of dump_ids of objects that the entry is dependent upon.
tablespace (str) – The tablespace to use
tableam (str) – The table access method
dump_id (int) – The dump id, will be auto-calculated if left empty
- Raises
- Return type
-
get_entry
(dump_id)[source]¶ Return the entry for the given dump_id
- Parameters
dump_id (int) – The dump ID of the entry to return.
- Return type
-
load
(path)[source]¶ Load the Dumpfile, including extracting all data into a temporary directory
- Parameters
path (os.PathLike) – The path of the dump to load
- Raises
- Raises
- Return type
-
lookup_entry
(desc, namespace, tag)[source]¶ Return the entry for the given namespace and tag
- Parameters
- Raises
- Return type
-
save
(path)[source]¶ Save the Dump file to the specified path
- Parameters
path (os.PathLike) – The path to save the dump to
- Return type
-
class
pgdumplib.dump.
Entry
(dump_id, had_dumper=False, table_oid='0', oid='0', tag=None, desc=None, defn=None, drop_stmt=None, copy_stmt=None, namespace=None, tablespace=None, tableam=None, owner=None, with_oids=False, dependencies=<factory>, data_state=3, offset=0)[source]¶ The entry model represents a single entry in the dataclass
Custom formatted dump files are primarily comprised of entries, which contain all of the metadata and DDL required to construct the database.
For table data and blobs, there are entries that contain offset locations in the dump file that instruct the reader as to where the data lives in the file.
- Variables
dump_id (int) – The dump id, will be auto-calculated if left empty
had_dumper (bool) – Indicates
oid (str) – The OID of the object the entry represents
tag (str) – The name/table/relation/etc of the entry
desc (str) – The entry description
defn (str) – The DDL definition for the entry
drop_stmt (str) – A drop statement used to drop the entry before
copy_stmt (str) – A copy statement used when there is a corresponding data section.
namespace (str) – The namespace of the entry
tablespace (str) – The tablespace to use
tableam (str) – The table access method
owner (str) – The owner of the object in Postgres
with_oids (bool) – Indicates …
dependencies (list) – A list of dump_ids of objects that the entry is dependent upon.
data_state (int) – Indicates if the entry has data and how it is stored
offset (int) – If the entry has data, the offset to the data in the file
section (str) – The section of the dump file the entry belongs to
-
class
pgdumplib.dump.
TableData
(dump_id, tempdir, encoding)[source]¶ Used to encapsulate table data using temporary file and allowing for an API that allows for the appending of data one row at a time.
Do not create this class directly, instead invoke
table_data_writer()
.-
append
(*args)[source]¶ Append a row to the table data, passing columns in as args
Column order must match the order specified when
table_data_writer()
was invoked.All columns will be coerced to a string with special attention paid to
None
, converting it to the null marker (\N
) anddatetime.datetime
objects, which will have the proper pg_dump timestamp format applied to them.- Return type
-
finish
()[source]¶ Invoked prior to saving a dump to close the temporary data handle and switch the class into read-only mode.
For use by
pgdumplib.dump.Dump
only.- Return type
-
read
()[source]¶ Read the data from disk for writing to the dump
For use by
pgdumplib.dump.Dump
only.- Return type
-
Constants¶
Constants used in the reading and writing of a pg_dump file. There are additional undocumented constants, but they should not be of concern unless you are hacking on the library itself.
-
pgdumplib.constants.
K_OFFSET_NO_DATA
= 3¶ Specifies the entry has no data
-
pgdumplib.constants.
K_OFFSET_POS_NOT_SET
= 1¶ Specifies the entry has data but no offset
-
pgdumplib.constants.
K_OFFSET_POS_SET
= 2¶ Specifies the entry has data and an offset
-
pgdumplib.constants.
MAX_VER
= (1, 14, 0)¶ The maximum supported version of pg_dump files ot support
-
pgdumplib.constants.
MIN_VER
= (1, 12, 0)¶ The minumum supported version of pg_dump files ot support
-
pgdumplib.constants.
SECTION_DATA
= 'DATA'¶ Data section for an entry in a dump’s table of contents
-
pgdumplib.constants.
SECTION_NONE
= 'None'¶ Non-specific section for an entry in a dump’s table of contents
-
pgdumplib.constants.
SECTION_POST_DATA
= 'Post-Data'¶ Post-data section for an entry in a dump’s table of contents
-
pgdumplib.constants.
SECTION_PRE_DATA
= 'Pre-Data'¶ Pre-data section for an entry in a dump’s table of contents
-
pgdumplib.constants.
VERSION
= (1, 12, 0)¶ pg_dump file format version to create by default
Converters¶
When creating a new pgdumplib.dump.Dump
instance, either directly
or by using pgdumplib.load()
, you can specify a converter class to
use when reading data using the pgdumplib.dump.Dump.read_data()
iterator.
The default converter (DataConverter
) will only replace columns
that have a NULL
indicator (\N
) with None
.
The SmartDataConverter
will attempt to convert individual
columns to native Python data types.
Creating your own data converter is easy and should simply extend the
DataConverter
class.
-
class
pgdumplib.converters.
DataConverter
[source]¶ Base Row/Column Converter
Base class used for converting row/column data when using the
pgdumplib.dump.Dump.read_data()
iterator.This class just splits the row into individual columns and returns the row as tuple of strings, only converting
\N
toNone
.
-
class
pgdumplib.converters.
SmartDataConverter
[source]¶ Attempts to convert columns to native Python data types
Used for converting row/column data with the
pgdumplib.dump.Dump.read_data()
iterator.Possible conversion types:
-
convert
(row)[source]¶ Convert the string based row into a tuple of columns
- Return type
Tuple
[Union
[None
,str
,int
,datetime
,Decimal
,IPv4Address
,IPv4Network
,IPv6Address
,IPv6Network
,UUID
], …]
-
Exceptions¶
pgdumplib specific exceptions
-
exception
pgdumplib.exceptions.
EntityNotFoundError
(namespace, table)[source]¶ Raised when an attempt is made to read data from a relation in a dump file but it is not found in the table of contents.
This can happen if a schema-only dump was created OR if the
namespace
andtable
specified were not found.
Examples¶
Reading¶
First, create a dump of your database using pg_dump using the custom
format:
pg_dump -Fc -d [YOUR] dump.out
The following example shows how to get the data and the table definition from a dump:
import pgdumplib
from pgdumplib import constants
dump = pgdumplib.load('dump.out')
for row in dump.table_data('public', 'table-name'):
print(row)
print(dump.lookup_entry(constants.TABLE, 'public', 'table-name').defn)
Writing¶
To create a dump, you need to add sections. The following example shows how to create a dump with a schema, extension, comment, type, tables, and table data:
import datetime
import uuid
import pgdumplib
from pgdumplib import constants
dump = pgdumpib.new('example')
schema = dump.add_entry(
desc=constants.SCHEMA,
tag='test',
defn='CREATE SCHEMA test;',
drop_stmt='DROP SCHEMA test;')
dump.add_entry(
desc=constants.ACL,
tag='SCHEMA test',
defn='GRANT USAGE ON SCHEMA test TO PUBLIC;',
dependencies=[schema.dump_id])
uuid_ossp = dump.add_entry(
desc=constants.EXTENSION,
tag='uuid-ossp',
defn='CREATE EXTENSION IF NOT EXISTS "uuid-ossp" WITH SCHEMA public;',
drop_stmt='DROP EXTENSION "uuid-ossp";')
dump.add_entry(
desc=constants.COMMENT,
tag='EXTENSION "uuid-ossp"',
defn="""COMMENT ON EXTENSION "uuid-ossp" IS generate universally unique identifiers (UUIDs)'""",
dependencies=[uuid_ossp.dump_id])
addr_type = dump.add_entry(
desc=constants.TYPE,
namespace='test',
tag='address_type',
owner='postgres',
defn="""\
CREATE TYPE test.address_type AS ENUM ('billing', delivery');""",
drop_stmt='DROP TYPE test.address_type;',
dependencies=[schema.dump_id])
test_addresses = dump.add_entry(
desc=constants.TABLE,
namespace='test',
tag='addresses',
owner='postgres',
defn="""\
CREATE TABLE addresses (
id UUID NOT NULL DEFAULT uuid_generate_v4() PRIMARY KEY,
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
last_modified_at TIMESTAMP WITH TIME ZONE,
user_id UUID NOT NULL REFERENCES users (id) ON DELETE CASCADE ON UPDATE CASCADE,
type address_type NOT NULL,
address1 TEXT NOT NULL,
address2 TEXT,
address3 TEXT,
locality TEXT NOT NULL,
region TEXT,
postal_code TEXT NOT NULL,
country TEXT NOT NULL
);""",
drop_stmt='DROP TABLE test.addresses;',
dependencies=[schema.dump_id, addr_type.dump_id, uuid_ossp.dump_id])
example = dump.add_entry(
constants.TABLE,
'public', 'example', 'postgres',
'CREATE TABLE public.example (\
id UUID NOT NULL PRIMARY KEY,\
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,\
value TEXT NOT NULL);',
'DROP TABLE public.example')
with dump.table_data_writer(example, ['id', 'created_at', 'value']) as writer:
writer.append(uuid.uuid4(), datetime.datetime.utcnow(), 'row1');
writer.append(uuid.uuid4(), datetime.datetime.utcnow(), 'row2');
writer.append(uuid.uuid4(), datetime.datetime.utcnow(), 'row3');
writer.append(uuid.uuid4(), datetime.datetime.utcnow(), 'row4');
writer.append(uuid.uuid4(), datetime.datetime.utcnow(), 'row5');
dump.save('custom.dump')