using-redisql-with-python
Using RediSQL with Python
This tutorial will help you to get start to use RediSQL with Python3.
In this tutorial we will scrape the content of Hacker News using their API documented here.
We will use async python with asyncio
to manage the event loop, aiohttp
to retrieve data from a public API and aioredis
to communicate with Redis.
To follow this tutorial you will need a modern (> v4.0) instance of Redis running RediSQL. You can obtain RediSQL from our shop or from the github releases.
To load RediSQL is sufficient to pass it as argument to the redis-server: ./redis-server --loadmodule /path/to/redisql.so
The whole code show in this example is reachable here while we also created a more sophisticate example that stress much more the infrastructure to show that the bottle neck is not RediSQL but python and the network.
RediSQL and aioredis
Most Redis library implements methods to call the standard Redis command like SET
or GET
or RPOP
and aioredis is not an exception. This is generally a problem for Redis modules like RediSQL that instead defined their own commands. Fortunately most libraries usually expose also a lower level method that is used to implement most of the other Redis command. For what concern aioredis
the lower level method that we can use is .execute
that is implemented for both single connection and for a pool of connections.
Indeed is possible to implement all the other high level command using the low level .execute
method.
RediSQL and redis
While in this article we will talk about aioredis
, another, not asynchronous library for using Redis with python is redis
library.
In the redis
library, the low level method is .execute_command
and not .execute
as for aioredis
, other than this difference everything will apply just the same.
Creating a Redis connection
The very first thing to do is to connect to Redis, in our case we use a connection pool that has the same interface of a simple connection but is backed by a pool of different connections.
Creating the pool can be done like so:
and now the variable redis
refer to a aioredis
pool.
When we will need a new connection, the pool will either give us an idle connection or open a new connection to Redis and give us the new one.
Setting up RediSQL
Now that we have a pool of connections before to get the data into RediSQL we need to set up RediSQL. The first step is to create a database in RediSQL, this can be done easily with a call like
this call will create a new RediSQL database and it will call it HN
. If the key HN
already exists the call will return an error.
The next step is to create the structure to hold our data. In our case we will stick to something simple, a single table where we store the identifier of each item (comment or story) from HN, the author of such item, when the item was created and finally we will store the whole item as json structure in a text field.
Finally, since we storing data from the open internet inside our database, is wise to create an SQL statement to execute when doing an insert. The advantage of the statement is that is safe from SQL injections and is usually faster than re-compile the same query each time.
To create a statement we can proceed as following:
The last command create a new statement in the HN
database and associate it with the string insert_item
so that we can refer to it later.
Also note the use of the json(?4)
function, this is a function provided by the JSON1 module of SQLite and exposed by RediSQL that allow fast and efficient manipulation of json object. Using the JSON1 module is possible to have a lot of flexibility even inside a rigid SQL schema.
Is usually wise to wrap those command into a try: except:
block. Hence the final function will look like this:
Running the loop
Now that we have set up our environment we can go on and start to listen for new items posted on HN.
The API provides a simple endpoint maxitem.json
that returns the id of the latest item posted on HN. When the loop start we get maxitem and we store it into Redis. Then, when the maxitem get updated we download each of the items between the old maxitem
and the new maxitem
.
We repeat the loop forever with a sleep to avoid hammering the API endpoint.
Storing the data into RediSQL
The last interesting bit is about the store_item
function that is the one that download the item from the API and store it into RediSQL.
Downloading the item from the API is a simple HTTP GET request, then we simply check if it returns a successful status code and that it actually returns valid json.
Finally to store the element into RediSQL we execute the statement that we have create before during the set up phase. Indeed we are executing the command REDISQL.EXEC_STATEMENT HN insert_item $item_id $item_author $item_time $item
. This command will find inside the database HN
the statement insert_item
that we have previously defined as INSERT INTO hn VALUES(?1, ?2, ?3, json(?4));
. Now the item id will be substituted to ?1
, the item author will substitute ?2
, the creation time of the item will take the place of ?3
and the whole json string of the item will substitute ?4
, finally the statement is executed agains RediSQL and its result returned.
If everything went right, we have just added our first row to the database using async python.
Concluding
In this tutorial we took a rather simple problem and we use it to show how to use RediSQL with async python.
We started by setting up the database, we show how to create a database and tables inside it. Then we also show how to create statements to avoid SQL injections attack and improve the efficiency of repeated queries.
Then we obtain the data from the Hacker News API and we show how to insert the data into RediSQL using the statement that we have just created.
Hopefully this tutorial will be helpful and sufficient to get started, but if you have any question feel free to get in touch or to open an issue.
If you wish to see a similar tutorial for a different language, open an issue on github.
Last updated