Best database questions in May 2012

Which database model should I use for dynamic modification of entities/properties during runtime?

18 votes

I am thinking about creating an open source data management web application for various types of data.

A privileged user must be able to

  • add new entity types (for example a 'user' or a 'family')
  • add new properties to entity types (for example 'gender' to 'user')
  • remove/modify entities and properties

These will be common tasks for the privileged user. He will do this through the web interface of the application. In the end, all data must be searchable and sortable by all types of users of the application. Two questions trouble me:

a) How should the data be stored in the database? Should I dynamically add/remove database tables and/or columns during runtime?

I am no database expert. I am stuck with the imagination that in terms of relational databases, the application has to be able to dynamically add/remove tables (entities) and/or columns (properties) at runtime. And I don't like this idea. Likewise, I am thinking if such dynamic data should be handled in a NoSQL database.

Anyway, I believe that this kind of problem has an intelligent canonical solution, which I just did not find and think of so far. What is the best approach for this kind of dynamic data management?

b) How to implement this in Python using an ORM or NoSQL?

If you recommend using a relational database model, then I would like to use SQLAlchemy. However, I don't see how to dynamically create tables/columns with an ORM at runtime. This is one of the reasons why I hope that there is a much better approach than creating tables and columns during runtime. Is the recommended database model efficiently implementable with SQLAlchemy?

If you recommend using a NoSQL database, which one? I like using Redis -- can you imagine an efficient implementation based on Redis?

Thanks for your suggestions!

Edit in response to some comments:

The idea is that all instances ("rows") of a certain entity ("table") share the same set of properties/attributes ("columns"). However, it will be perfectly valid if certain instances have an empty value for certain properties/attributes.

Basically, users will search the data through a simple form on a website. They query for e.g. all instances of an entity E with property P having a value V higher than T. The result can be sorted by the value of any property.

The datasets won't become too large. Hence, I think even the stupidest approach would still lead to a working system. However, I am an enthusiast and I'd like to apply modern and appropriate technology as well as I'd like to be aware of theoretical bottlenecks. I want to use this project in order to gather experience in designing a "Pythonic", state-of-the-art, scalable, and reliable web application.

I see that the first comments tend to recommending a NoSQL approach. Although I really like Redis, it looks like it would be stupid not to take advantage of the Document/Collection model of Mongo/Couch. I've been looking into mongodb and mongoengine for Python. By doing so, do I take steps into the right direction?

Edit 2 in response to some answers/comments:

From most of your answers, I conclude that the dynamic creation/deletion of tables and columns in the relational picture is not the way to go. This already is valuable information. Also, one opinion is that the whole idea of the dynamic modification of entities and properties could be bad design.

As exactly this dynamic nature should be the main purpose/feature of the application, I don't give up on this. From the theoretical point of view, I accept that performing operations on a dynamic data model must necessarily be slower than performing operations on a static data model. This is totally fine.

Expressed in an abstract way, the application needs to manage

  1. the data layout, i.e. a "dynamic list" of valid entity types and a "dynamic list" of properties for each valid entity type
  2. the data itself

I am looking for an intelligent and efficient way to implement this. From your answers, it looks like NoSQL is the way to go here, which is another important conclusion.

So, if you conceptualize your entities as "documents," then this whole problem maps onto a no-sql solution pretty well. As commented, you'll need to have some kind of model layer that sits on top of your document store and performs tasks like validation, and perhaps enforces (or encourages) some kind of schema, because there's no implicit backend requirement that entities in the same collection (parallel to table) share schema.

Allowing privileged users to change your schema concept (as opposed to just adding fields to individual documents - that's easy to support) will pose a little bit of a challenge - you'll have to handle migrating the existing data to match the new schema automatically.

Reading your edits, Mongo supports the kind of searching/ordering you're looking for, and will give you the support for "empty cells" (documents lacking a particular key) that you need.

If I were you (and I happen to be working on a similar, but simpler, product at the moment), I'd stick with Mongo and look into a lightweight web framework like Flask to provide the front-end. You'll be on your own to provide the model, but you won't be fighting against a framework's implicit modeling choices.

How to set a MySQL row to READ-ONLY?

10 votes

I have a row in a table that I do not want to be changed (ever).

Is it possible to set a MySQL row to READ-ONLY so that it cannot be updated in any way? If so, how?

If not, is it possible to set a permanent value in one of the columns of that row so that it cannot be changed? If so, how?

Thanks.

You can create a BEFORE UPDATE trigger that raises an error if a "locked" record is about to be updated (e.g. by calling a non-existent procedure); since an error occurs before the operation is undertaken, MySQL ceases to proceed with it. If you also want to prevent the record from being deleted, you'd need to create a similar trigger BEFORE DELETE.

To determine whether a record is "locked", you could create a boolean locked column:

ALTER TABLE my_table ADD COLUMN locked BOOLEAN NOT NULL DEFAULT FALSE;

UPDATE my_table SET locked = TRUE WHERE ...;

DELIMITER ;;

CREATE TRIGGER foo_upd BEFORE UPDATE ON my_table FOR EACH ROW
IF OLD.locked THEN CALL raise_error;
END IF;;

CREATE TRIGGER foo_del BEFORE DELETE ON my_table FOR EACH ROW
IF OLD.locked THEN CALL raise_error;
END IF;;

DELIMITER ;

In your case, as you want to permanently lock a very specific record, you can do without the locked column and instead hard-code the test into your trigger; for example, to "lock" the record with id_column = 1234:

DELIMITER ;;

CREATE TRIGGER foo_upd BEFORE UPDATE ON my_table FOR EACH ROW
IF OLD.id_column <=> 1234 THEN CALL raise_error;
END IF ;;

CREATE TRIGGER foo_del BEFORE DELETE ON my_table FOR EACH ROW
IF OLD.id_column <=> 1234 THEN CALL raise_error;
END IF ;;

DELIMITER ;

Is there a GUID alternative for distributed key generation?

10 votes

My situation is :

  1. I have a number of client applications, which is using local DB (MS SQL, MS Access - sorry, this is Enterprise system, I have to support legacy...)
  2. I don't know anything of trend among clients - now it's ~10 but it may be ~100 in a year.
  3. Data from those tables comes to my central server and is put into one common table
  4. Sometimes existing (client) data is changed - I have to perform update/delete operations
  5. I don't want use GUID's (.NET type System.Guid) - It's hard to simply implement and support on MS Access. Besides, it's not good for performance
  6. I need a fast search on that common table, so it would be nice to use int or long int as a PK

So, I want:

  1. Something unique to avoid collisions (it will be used as a PK)
  2. It should hopefully be int or long int
  3. Must be assignable client-side before being inserted

My current solution is to take the CRC from a concatenation of:

  • ProcessodID
  • Bios date
  • User name (strings, hardware\user related data)
  • DateTime.Now (UNC)

Currently it works for me, but maybe there is a better approach to achieve my goals? Any comments, suggestions, examples, or experience of your own?

UPDATE : synchronization between client and server is periodic action, so it can occurs 2-3 times per day (it's config variable)

If data from multiple tables comes to one central table and you need to address changes to these records then my suggestion is to use two columns as PK of you central table. One column could be the Identity field from clients (not unique) and one column could be a client code (not unique) assigned by you to your client apps. The aggregate from ID and client code will be your PK

This solution has the advantage to not require any changes on the client side apps (perhaps some identity code to send to your central server where you could use for some security measure) Of course, if the customer base grows (hopefully) you need to keep a centralized table of code assigned to each client. The search on the central table should not be a problem because you are using two numbers (or short string for the identity code).

Hibernate UnUniqueify a column in table

7 votes

Hibernate UnUniqueify a column in table(Solved)


I want a field set to be non-unique on itself but to be unique in combination with the other field, I got this table with two columns(composite primary keys); id (primary key) and object_proxy_id (primary key), this is exactly what I need but hibernate sets the object_proxy_id to be unique on itself so that value cant be duplicate in the table, and I need this column to accept duplicate values. Because every user has its own object proxy and these proxy's don't have to be necessarily unique.

This is what I want to achieve:

|-------------------------------|
| tbl_object_proxy              |
| ------------------------------|
| Id (pk)| object_proxy_id (pk) |
|-------------------------------|
| 1      | 150 --               |
| 1      | 149  |= must be able to be DUPLICATE which is not the case right now.
| 2      | 150 --               |
| 2      | 151                  |
|-------------------------------|

Current code:

@Entity
@Table(name = "tbl_user_settings", uniqueConstraints = {@UniqueConstraint(columnNames={"user_id"})})
@Inheritance(strategy = InheritanceType.TABLE_PER_CLASS)

public class Settings implements Serializable
{
@Id
@SequenceGenerator(name="someSequence", sequenceName="SEQ_SOMENAME", allocationSize =1)
@GeneratedValue(strategy=GenerationType.SEQUENCE, generator="someSequence")
@Column(name="id")
private int setting_id;

@OneToOne
private User user;

@ManyToOne
private SomeObject someobject;

@ElementCollection
@CollectionTable(name="tbl_collection_name", joinColumns=
@JoinColumn(name="id"), uniqueConstraints = {@UniqueConstraint(columnNames={"id", "object_proxy_id"})})
@Column(name="SomeObject")
private Set<SomeObject> objectProxy;

/*...constructors and methods...*/
}

Results in:

-- Table schema
|-------------------|                    
| tbl_user_settings |                        
|-------------------|                        
| id                |PK <<Unique>>                      
| user_id           |FK reference tbl_user <<Unique>>                        
| object_id         |FK reference tbl_object  
|-------------------|

|------------------|
| tbl_object_proxy |
|------------------|
| id               |PK reference tbl_user_settings 
| object_proxy_id  |PK reference tbl_object <<Unique>> BUT I DON'T WANT THIS TO BE UNIQUE ON ITSELF !!!!
|------------------|

EDIT: The two primary key's in tbl_object_proxy are composite primary key's
I have tried Xeon's solution but it didn't work.

Short answer: replace the @ElementCollection by a @ManyToMany relation with a @JoinTable like this:

@ManyToMany
@JoinTable(
name="tbl_settings_objecteproxy_v2",
joinColumns = @JoinColumn(name = "id"),
inverseJoinColumns = @JoinColumn( name = "objectproxy_id"))
private Set<SomeObject> objectproxy;

See "2.2.5.3.2.1. Definition" in Hibernate Annotation Documentation

This results in a same side table but then without the unique constraint. So now this is possible:

|-------------------------------|
| tbl_object_proxy              |
| ------------------------------|
| Id (pk)| object_proxy_id (pk) |
|-------------------------------|
| 1      | 150 --               |
| 1      | 149  |= It works! The unique constraint is gone! 
| 2      | 150 --               |
| 2      | 151                  |
|-------------------------------|


Detailed answer and cause description: Somehow the @ElementCollection created a collectiontable with a one to many relation of the referenced key (collection | inverse join) which adds a unique constraint to the key referencing the other side table to reflect the one to many relationship which I didn't want. So I dropped the @ElementCollection and replaced it by a @ManyToMany relation with a @JoinTable annotation. I have also tried to declare the @ManyToMany relation in the @ElementCollection but it kept adding the Unique constraint to the referenced key.

My Settings class does now look like this:

@Entity
@Table(name = "tbl_user_settings", uniqueConstraints = {@UniqueConstraint(columnNames={"user_id"})})
@Inheritance(strategy = InheritanceType.TABLE_PER_CLASS)

public class Settings
{
@Id
@SequenceGenerator(name="someSequence", sequenceName="SEQ_SOMENAME", allocationSize =1)
@GeneratedValue(strategy=GenerationType.SEQUENCE, generator="someSequence")
@Column(name="id")
private int setting_id;

@OneToOne
private User user;

@ManyToOne
private SomeObject someobject;

@ManyToMany
@JoinTable(
name="tbl_settings_objecteproxy_v2",
joinColumns = @JoinColumn(name = "id"),
inverseJoinColumns = @JoinColumn( name = "objectproxy_id"))
private Set<SomeObject> objectProxy;

/*...constructors and methods...*/
}

Is this a correct database design?

6 votes

I'm working with the new version of an application. In the current version the database structure is changed, they say "to improve performance".

The old version of the DB had a general structure like this:

TABLE ENTITY
(
    ENTITY_ID,
    STANDARD_PROPERTY_1,
    STANDARD_PROPERTY_2,
    STANDARD_PROPERTY_3,
    ...
)

TABLE ENTITY_PROPERTIES
(
    ENTITY_ID,
    PROPERTY_KEY,
    PROPERTY_VALUE
)

so we had a main table with fields for the basic properties and a table to manage custom properties added by user.

The new version of the DB insted has a structure like this:

TABLE ENTITY
(
    ENTITY_ID,
    STANDARD_PROPERTY_1,
    STANDARD_PROPERTY_2,
    STANDARD_PROPERTY_3,
    ...
)

TABLE ENTITY_PROPERTIES_n
(
    ENTITY_ID_n,
    CUSTOM_PROPERTY_1,
    CUSTOM_PROPERTY_2,
    CUSTOM_PROPERTY_3,
    ...
)

So, now when the user add a custom property, a new column is added to the current ENTITY_PROPERTY table until the max number of columns (managed by application) is reached, then a new table is created.

So, my question is: Is this a correct way to design a DB structure? Is this the only way to "increase performances"? The old structure required a number of join or sub-select, but this structute don't seems to me very smart (or even correct)...

I have seen this done before on the assumed (often unproven) "expense" of joining - it is basically turning a row-heavy data table into a column-heavy table. They ran into their own limitation, as you imply, by creating new tables when they run out of columns.

I completely disagree with it.

Personally, I would stick with the old structure and re-evaluate the performance issues. That isn't to say the old way is the correct way, it is just marginally better than the "improvement" in my opinion, and removes the need to do large scale re-engineering of database tables and DAL code.

These tables strike me as largely static... caching would be an even better performance improvement without mutilating the database and one I would look at doing first. Do the "expensive" fetch once and stick it in memory somewhere, then forget about your troubles (note, I am making light of the need to manage the Cache, but static data is one of the easiest to manage).

Or, wait for the day you run into the maximum number of tables per database :-)

Others have suggested completely different stores. This is a perfectly viable possibility and if I didn't have an existing database structure I would be considering it too. That said, I see no reason why this structure can't fit into an RDBMS. I have seen it done on almost all large scale apps I have worked on. Interestingly enough, they all went down a similar route and all were mostly "successful" implementations.

SQL to Determine Tee Order in Golf Application

6 votes

I am working on a golf application that includes a scorecard system. I am storing each score for each player in the database and I need to come up with a query to determine tee order. So for example if the players have played 3 holes and the scores look like this...

Player    1  2  3
--------- -  -  -
Player 1: 3, 4, 3
Player 2: 2, 3, 3
Player 3: 2, 4, 3

... Then the order needs to look like this...

1.) Player 2
2.) Player 3
3.) Player 1

... So the players will be ordered by their scores compared to their opponents scores. Does that make sense? Is this even possible with a query, or should I write a function to parse a 2d array in code? I am using Java in that case.

My table structure looks like this:

  • Players (player id, and player name)
  • Rounds (round id, course id)
  • Scores (round id, player id, hole number, and score)

I can see a solution that uses windows functions row_number() and an additional column in the database for the ordering at each level (or a recursive CTE in SQL Server). However, SQLite does not support this.

Here is my recommendation on implementing the solution without doing a lot of querying backwards:

(1) Assign the tee order for the first tee.

(2) For each next tee, look at the previous score and the previous tee order:

(3) Assign the new tee order by looping through the previous scores by ordering by highest score DESC and previous tee order ASC.

Because you only have a few players per round, it is reasonable to do this in the app layer. However, if you had a database that supported window function, then you could more easily do a database only solution.

I can't resist. Here some code that will do this with a table to store the orders. You need to loop through, once per hole:

create table ThisOrder (
    ThisOrderId int primary key autoincrement,
    RoundId int,
    Hole int,
    PlayerId int
)

Initialize it with each player in some order.

Then, insert new rows into the table for each hole:

insert into ThisOrder(RoundId, HoleId, PlayerId)
    select s.RoundId, s.Hole+1, s.PlayerId
    from Scores s join
         ThisOrder to
         on s.PlayerId = to.PlayerId and
            s.RoundId = to.RoundId and
            s.Hole = to.Hole
    order by s.Score DESC, to.Order ASC

You'll need to call this once for each hole, minus one.

Then get your ordering as:

 select *
 from ThisOrder
 where roundid = <roundid> and hole = <thehole>
 order by ThisOrderId 

Best practice for storing usernames & password in MySQL Databases

5 votes

Possible Duplicate:
Secure hash and salt for PHP passwords

I am making a system that has stores user credentials (email, username and password) in a MySQL database and have seen conflicting views on using encryption, salting and encryption types.

What are the best methods you would recommend? Encoding in MD5 or SHA1? Salting or not salting? Encrypting just the password or all 3 elements?

For the password hash use PBKDF2 it's NIST approved. You should use a random non-secret salt for each password and nontrivial (over 1000) iteration count.

For the username and email, probably not worth encrypting.

5 votes

I'm develloping a web application over Play Framework 2.0.

Since I need to access a DB2 database, I added the following lines to my application's application.conf file:

db.mydb.driver=com.ibm.db2.jcc.DB2Driver
db.mydb.url="jdbc:db2://host:port/databaseName"
db.mydb.user=user
db.mydb.password=pass
db.mydb.jndiName=databaseName

I connected sucessfully to this DB but got the following exceptions:

[info] play - datasource [jdbc:db2://host:port/databaseName] bound to JNDI as databaseName
[info] play - database [databaseName] connected at jdbc:db2://host:port/databaseName
[warn] application - play_evolutions table already existed
[error] application -

! @6a8ib4hd7 - Internal server error, for request [GET /] ->

play.api.UnexpectedException: Unexpected exception [SqlSyntaxErrorException: DB2
 SQL Error: SQLCODE=-204, SQLSTATE=42704, SQLERRMC=DB2ADMIN.PLAY_EVOLUTIONS, DRI
VER=4.12.55]
        at play.core.ReloadableApplication$$anonfun$get$1$$anonfun$apply$3$$anon
fun$1.apply(ApplicationProvider.scala:134) ~[play_2.9.1.jar:2.0]
        at play.core.ReloadableApplication$$anonfun$get$1$$anonfun$apply$3$$anon
fun$1.apply(ApplicationProvider.scala:112) ~[play_2.9.1.jar:2.0]
        at scala.Option.map(Option.scala:133) ~[scala-library.jar:0.11.2]
        at play.core.ReloadableApplication$$anonfun$get$1$$anonfun$apply$3.apply
(ApplicationProvider.scala:112) ~[play_2.9.1.jar:2.0]
        at play.core.ReloadableApplication$$anonfun$get$1$$anonfun$apply$3.apply
(ApplicationProvider.scala:110) ~[play_2.9.1.jar:2.0]
        at scala.Either$RightProjection.flatMap(Either.scala:277) ~[scala-librar
y.jar:0.11.2]
Caused by: com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-2
04, SQLSTATE=42704, SQLERRMC=DB2ADMIN.PLAY_EVOLUTIONS, DRIVER=4.12.55
        at com.ibm.db2.jcc.am.hd.a(hd.java:676) ~[db2jcc4.jar:na]
        at com.ibm.db2.jcc.am.hd.a(hd.java:60) ~[db2jcc4.jar:na]
        at com.ibm.db2.jcc.am.hd.a(hd.java:127) ~[db2jcc4.jar:na]
        at com.ibm.db2.jcc.am.mn.c(mn.java:2621) ~[db2jcc4.jar:na]
        at com.ibm.db2.jcc.am.mn.d(mn.java:2609) ~[db2jcc4.jar:na]
        at com.ibm.db2.jcc.am.mn.a(mn.java:2085) ~[db2jcc4.jar:na]

In order to solve this I tried to disable Evolutions setting the following line in application.conf file:

evolutions=disabled
evolutions.enable=false

But I am allways getting the above exceptions.

Can anyone help me on this?

Thanks

A DB2 -204 SQLCODE means:

The object identified by name is not defined in the DB2® subsystem. This SQLCODE can be generated for any type of DB2 object.

...

Verify that the object name was correctly specified in the SQL statement, including any required qualifiers. If it is correct, ensure that the object exists in the system before resubmitting the statement.

It would seem the table DB2ADMIN.PLAY_EVOLUTIONS does not exist. If it should exist, try connecting to the database using a SQL client and execute select * from DB2ADMIN.PLAY_EVOLUTIONS with ur for read only; to verify the table does exist.

Also, you probably want to use evolution=disabled instead of evolutions=disabled (see the Play Git commit providing a way to disable EvolutionPlugin through configuratio…) in your application.conf file.

Why is my mongodb call so slow?

5 votes

Alright, so I'm building an application based in Node.js and I am using mongoose to handle my connection to mongodb. I have an endpoint that is such:

getTestStream : function(req, res, conditions, callback) {   
  Activity.find()
    .limit(1000)
    .run(function(err, activities) {
      if (err){
        util.sendError(req, res, "Query Error", err);
      } else if (activities) {     
        res.send(activities);
      } else {
        util.send('nope');
      }
  });
}

For some reason this call takes 700ms+ to complete. The same call without even applying a limit made from mongodb shell returns in about 4ms. It seems like such a simple query, so what's slowing it down so much? I'm guessing I've missed something obvious in configuration somewhere, but I have no idea.

Thanks to anyone who can help on this.

Other info:

mongoose@2.6.0
mongodb@2.0.4
node@0.6.9

After experimenting for a while, I've found several contributions to slowness, hopefully this helps anyone with a similar issue:

  • The objects I'm requesting are large, so processing them takes some time. For large objects modify the query to only return the fields you need right now.
  • Mongoose is useful, but it can really slow down when you request a lot of items, its better to just directly interface with node-mongodb-native if you want speed for a call. (This was about a 50%+ speed increase for my scenario)

Using these techniques I can now process 4000 records in less time than I was processing 1000 before. Thanks for anyone who commented, and special thanks to Gates VP for pointing out that mongoose wasn't really a good fit for this kind of call.

Is it faster to query a List<T> or database?

5 votes

I have recently had several situations where I need different data from the same table. One example is where I would loop through each "delivery driver" and generate a printable PDF file for each customer they are to deliver to.

In this situation, I pulled all customers and stored them into

List<Customer> AllCustomersList = customers.GetAllCustomers();

As I looped through the delivery drivers, I'd do something like this:

List<Customer> DeliveryCustomers = AllCustomersList.Where(a => a.DeliveryDriverID == DriverID);

My question: Is the way I'm doing it by querying the List object faster than querying the database each time for customer records associated with the delivery driver?

There isn't an accurate number for amount of rows that if you pass it you should query the DB instead in in-memory List<T>

But the rule of thumb is, DB are designed to work with large amount of data and they have optimization "mechanisms" while in in-memory there aren't such things.

So you will need to benchmark it to see if the round-trip to DB is worth it for that amount of rows for each time it's important to you

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil"

Limiting integer data type field lengths

5 votes

I am trying to limit the number of numbers that an integer field can contain. For example, I want the field to contain a number no more than 5 long, so 99999 would be the highest valid entry.

Is this possible to do in MySQL? I have looked at the documentation but haven't found my answer.

Unfortunately neither the CHECKconstraint nor user defined types are implemented in MySQL. Maybe this will change in future versions.

Until then you can use a trigger to correct the input if that is a way to go for you:

delimiter //
CREATE TRIGGER trigger_check BEFORE INSERT ON your_table
FOR EACH ROW 
BEGIN 
    IF NEW.NUM > 99999 THEN 
        SET NEW.NUM = 0; 
    END IF; 
END
//

Database normalization - who's right?

5 votes

My professor(who claimed to have a firm understanding about systems development for many years) and I are arguing about the design of our database.

As an example: My professor insists this design is right: (list of columns)

Subject_ID Description Units_Lec Units_Lab Total_Units etc...

Notice the total units column. He said that this column must be included. I tried to explain that it is unnecessary, because if you want it, then just make a query by simply adding the two.

I showed him an example I found in a book, but he insists that I dont have to rely on books too much in making our system. The same thing applies to similar cases as in this one: student_ID prelim_grade midterm_grade prefinal_grade average ect...

He wanted me to include the average! Anywhere I go, I can find myself reading articles that convince me that this is a violation of normalization. If I needed the average, I can easily compute the three grades. He enumerated some scenarios including ('Hey! What if the query has been accidentally deleted? What will you do? That is why you need to include it in your table!')

Do I need to reconstruct my database(which consists of about more than 40 tables) to comply with what he want? Am I wrong and just have overlooked these things?

EDIT:

Another thing is that he wanted to include the total amount in the payments table, which I believe is unnecessary(Just compute the unit price of the product and the quantity.). He pointed out that we need that column for computing debits and/or credits that are critical for the overall system management, that it is needed for balancing transaction. Please tell me what you think.

You are absolutely correct! One of the rules of normalization is to reduce those attributes which can be easily deduced by using other attributes' values. ie, by performing some mathematical calculation. In your case, the total units column can be obtained by simply adding.

Tell your professor that having that particular column will show clear signs of transitive dependency and according to the 3rd normalization rule, its recommended to reduce those.

Python database WITHOUT using Django (for Heroku)

4 votes

To my surprise, I haven't found this question asked elsewhere. Short version, I'm writing an app that I plan to deploy to the cloud (probably using Heroku), which will do various web scraping and data collection. The reason it'll be in the cloud is so that I can have it be set to run on its own every day and pull the data to its database without my computer being on, as well as so the rest of the team can access the data.

I used to use AWS's SimpleDB and DynamoDB, but I found SDB's storage limitations to be to small and DDB's poor querying ability to be a problem, so I'm looking for a database system (SQL or NoSQL) that can store arbitrary-length values (and ideally arbitrary data structures) and that can be queried on any field.

I've found many database solutions for Heroku, such as ClearDB, but all of the information I've seen has shown how to set up Django to access the database. Since this is intended to be script and not a site, I'd really prefer not to dive into Django if I don't have to.

Is there any kind of database that I can hook up to in Heroku with Python without using Django?

I'd use MongoDB. Heroku has support for it, so I think it will be really easy to start and scale out: https://addons.heroku.com/mongohq

About Python: MongoDB is a really easy database. The schema is flexible and fits really well with Python dictionaries. That's something really good.

You can use PyMongo

from pymongo import Connection
connection = Connection()

# Get your DB
db = connection.my_database

# Get your collection
cars = db.cars

# Create some objects
import datetime
car = {"brand": "Ford",
       "model": "Mustang",
       "date": datetime.datetime.utcnow()}

# Insert it
cars.insert(car)

Pretty simple, uh?

Hope it helps.

EDIT:

As Endophage mentioned, another good option for interfacing with Mongo is mongoengine. If you have lots of data to store, you should take a look at that.

Secure storage of database credentials

4 votes

Had a major problem recently where my web hosting company messed up and all my php files were displayed in plain text. This was a major issue for me for obvious reasons. Mainly because mysql database details were exposed.

I am now trying to change the way in which my php files get the login information for the database so that this will never happen again even if the hosting company fail me.

my current set up looks like this :

     include 'info.php';

    class Login {

var $host;
var $username;
var $password;
var $db;
var $url;

Inside the info.php is the username, password and so on for the database. I want to make it so that the info.php file can never be viewed and only my .php files are able to access info.php in order to get the login infomation.

How can i set this up? This is a bit of a tricky one for me to explain so please dont be harsh and -1 me for a bad description.. just ask and i will clear up any gaps in my description.

Simply place info.php outside your webroot. This way, you can include it, but should your web hosting f*#$ up, no one else can view that file, even as plain text.

You would then include it like this:

include('../info.php');

This way, even if someone finds out that you have a file called info.php that stores all your passwords, they cannot point their browser to that file.

The above would be the ideal and most watertight solution. However, if that is not possible due to permissions, the other option would be to place all sensitive files in a directory and block direct access to that directory using a .htaccess file.

In the directory you want to block off access to, place an .htaccess file with the following contents:

deny from all