I am thinking about creating an open source data management web application for various types of data.
A privileged user must be able to
- add new entity types (for example a 'user' or a 'family')
- add new properties to entity types (for example 'gender' to 'user')
- remove/modify entities and properties
These will be common tasks for the privileged user. He will do this through the web interface of the application. In the end, all data must be searchable and sortable by all types of users of the application. Two questions trouble me:
a) How should the data be stored in the database? Should I dynamically add/remove database tables and/or columns during runtime?
I am no database expert. I am stuck with the imagination that in terms of relational databases, the application has to be able to dynamically add/remove tables (entities) and/or columns (properties) at runtime. And I don't like this idea. Likewise, I am thinking if such dynamic data should be handled in a NoSQL database.
Anyway, I believe that this kind of problem has an intelligent canonical solution, which I just did not find and think of so far. What is the best approach for this kind of dynamic data management?
b) How to implement this in Python using an ORM or NoSQL?
If you recommend using a relational database model, then I would like to use SQLAlchemy. However, I don't see how to dynamically create tables/columns with an ORM at runtime. This is one of the reasons why I hope that there is a much better approach than creating tables and columns during runtime. Is the recommended database model efficiently implementable with SQLAlchemy?
If you recommend using a NoSQL database, which one? I like using Redis -- can you imagine an efficient implementation based on Redis?
Thanks for your suggestions!
Edit in response to some comments:
The idea is that all instances ("rows") of a certain entity ("table") share the same set of properties/attributes ("columns"). However, it will be perfectly valid if certain instances have an empty value for certain properties/attributes.
Basically, users will search the data through a simple form on a website. They query for e.g. all instances of an entity E with property P having a value V higher than T. The result can be sorted by the value of any property.
The datasets won't become too large. Hence, I think even the stupidest approach would still lead to a working system. However, I am an enthusiast and I'd like to apply modern and appropriate technology as well as I'd like to be aware of theoretical bottlenecks. I want to use this project in order to gather experience in designing a "Pythonic", state-of-the-art, scalable, and reliable web application.
I see that the first comments tend to recommending a NoSQL approach. Although I really like Redis, it looks like it would be stupid not to take advantage of the Document/Collection model of Mongo/Couch. I've been looking into mongodb and mongoengine for Python. By doing so, do I take steps into the right direction?
Edit 2 in response to some answers/comments:
From most of your answers, I conclude that the dynamic creation/deletion of tables and columns in the relational picture is not the way to go. This already is valuable information. Also, one opinion is that the whole idea of the dynamic modification of entities and properties could be bad design.
As exactly this dynamic nature should be the main purpose/feature of the application, I don't give up on this. From the theoretical point of view, I accept that performing operations on a dynamic data model must necessarily be slower than performing operations on a static data model. This is totally fine.
Expressed in an abstract way, the application needs to manage
- the data layout, i.e. a "dynamic list" of valid entity types and a "dynamic list" of properties for each valid entity type
- the data itself
I am looking for an intelligent and efficient way to implement this. From your answers, it looks like NoSQL is the way to go here, which is another important conclusion.
So, if you conceptualize your entities as "documents," then this whole problem maps onto a no-sql solution pretty well. As commented, you'll need to have some kind of model layer that sits on top of your document store and performs tasks like validation, and perhaps enforces (or encourages) some kind of schema, because there's no implicit backend requirement that entities in the same collection (parallel to table) share schema.
Allowing privileged users to change your schema concept (as opposed to just adding fields to individual documents - that's easy to support) will pose a little bit of a challenge - you'll have to handle migrating the existing data to match the new schema automatically.
Reading your edits, Mongo supports the kind of searching/ordering you're looking for, and will give you the support for "empty cells" (documents lacking a particular key) that you need.
If I were you (and I happen to be working on a similar, but simpler, product at the moment), I'd stick with Mongo and look into a lightweight web framework like Flask to provide the front-end. You'll be on your own to provide the model, but you won't be fighting against a framework's implicit modeling choices.