Best database questions in January 2012

cannot update record, get stuck

12 votes

i have an issue with my current database with mysql.

i have over 100 connection waiting on a select record. when i execute:

show processlist;

the select query is a big query and the others are smaler queries and inserts, updates.

i have one database with 100 tables and the select is using 5 joins.

is there a way to temporary stop the process and let the other processes run, once all the processes are completed, then the select can continue running?

i will recommend to let the query do what it needs to do, if you stop any or kill processses or queries you might have data integrity errors which can lead to major errors.

BookOfZeus and tfb785 are right, first of all you probably have indexes errors. the explain will tell you what exactly is the problem and what to look for. for example if you have 5 joins and you get row counts like, 100,000 and 100 and 1 and 1 and 1 you will multiple 100,000 * 100 which can be super slow.

read carefully what the explain tells you and optimize your query based on it.

innodb can be a good option if you if the tables are accessed very often because its row locking insted of table locking for myisam.

I would say first try to optimize your query, maybe you wont need to alter your table engine to fix the issue. if you still have issues then you might consider moving to innodb.

Would relational databases scale as well (or better) than their NoSQL counterparts if we drop the relationships?

8 votes

Disclaimer: This is a broad question, so it could be moved to a different source (if the admins find it appropriate).

All the cool kids seem to be dropping relational databases in favor of their NoSQL counterparts. Everyone will have their reasons, from scaling issues to simply being on the bleeding edge of tech. And, I am not here to question their motives.

However, what I am interested in is whether any NoSQL transitions ever validated the performance (maintenance) gains over a traditional RDBMS when relationships were dropped. Why would we want to use a RDBMS when the core reason it exists is dropped? A few reasons come to mind

  1. 30+ years of academic and work research in developing these systems
  2. A well-known language in Structured Query Language (SQL).
  3. Stable and mature ORM support across technologies (Hibernate, ActiveRecord)

Clearly, in the modern world where horizontal scaling is important, there is a need to make sure that shards are fault tolerant, updated within the time intervals required by the app, etc. However, those needs shouldn't necessarily be the responsibility of a system that stores data (case in point: ZooKeeper).

Also, I acknowledge that research should be dedicated to NoSQL and that time spent in this arena will clearly lead to better more internet worthy technologies. However, a comparison of sorts between NoSQL and traditional RDBMS offerings (minus relationships) would be useful in making business decisions.

UPDATE 1: When I refer to NoSQL databases, I am talking about data stores that may not require fixed table schemas and usually avoid join operations. Hence, the emphasis in the question on dropping the relationships in a traditional SQL RDBMS

I don't find that inter-table relationships are the main limiter for scalability. I use queries with joins regularly and get good scalability if indexes are defined well.

The greater limiter for scalability is the cost of synchronous I/O. The requirements of consistency and durability -- that the DBMS actually and reliably saves data when it tells you it saved data -- is expensive.

Several NoSQL products that are currently in vogue achieve great performance by weakening their consistency and durability guarantees in their default configuration. There are many reports of CouchDB or MongoDB losing data.

There are ways you can configure those NoSQL products to be more strict about durability, but then you sacrifice their impressive performance numbers.

Likewise, you can make an SQL database achieve high performance like the NoSQL products, by disabling the default features that ensure data safety. See RunningWithScissorsDB.

PS: If you think document-oriented databases are "cutting edge", I invite you to read about MUMPS. Everything old is new again. :-)

What databases for Mac are directly supported by the FireMonkey framework in Delphi XE2?

8 votes

Confronted with the task to develop a database application for a company working in a pure Mac environment, what are my options for a central database server to be used by a Delphi FMX application (except the obvious setting up a Windows or Linux box)?

From http://www.embarcadero.com/products/firemonkey/more-connected:

FireMonkey includes built-in component based access to MS SQL Server, Oracle, IBM DB2, MySQL, Sybase, Informix, InterBase and Firebird databases.

Oracle, MySQL and Firebird are available for Mac OS X - I did not check the others

Should I use a text file or Database?

6 votes

So I'm putting together an RSS parser which will process an RSS feed, filter it, and then download the matched items. Assume that the files being downloaded are legal torrent files.

Now I need to keep a record of the files that I have already downloaded, so they aren't done again.

I've already got it working with SQLite (create database if not exists, insert row if a select statement returns nothing), but the resulting jar file is 2.5MB+ (due to the sqlite libs).

I'm thinking that if I use a text file, I could cut down the jar file to a few hundred kilobytes.

I could keep a list of the names of files downloaded - one per line - and reading the whole file into memory, search if a file exists, etc.

The few questions that occur to me know:

  • Say if 10 files are downloaded a day, would the text file method end up taking too much resources?
  • Overall which one is faster

Anyway, what do you guys think? I could use some advice here, as I'm still new to programming and doing this as a hobby thing :)

If you need to keep track only of few informations (like name of the file), you can for sure use a simple text file.

Using a BufferedReader to read you should achieve good performance.

Oracle 11g DB returning Streams instead of Strings

6 votes

I've got a new database here and it's an upgraded version from Oracle 10g to Oracle 11g - the main problem is with LOB columns and everytime any function returns a LOB as result, the new database won't return strings like the old one did:

Old DB:

["C"]=>
string(23) "3874163,3874197,3874201"

New DB:

["C"]=>
resource(182) of type (stream)

Now when reading the streams sometimes there is an error of a non-existing stream resource beeing referenced and everything fails. I'm guessing the connection closed in the meantime without the stream beeing read and therefore the access is lost.

When changing the statements to include a casting against varchar for example:

CONVERT(VARCHAR, C, 120)

Or like this:

SELECT TO_CHAR(FUNC())

The value is returned as a string again but this is not really an optimal solution as every statement would need to be changed.

Is there any way/option to prevent LOBs from beeing delivered as streams so they are instead delivered as strings like in Oracle 10g?

Edit:
We are using the oci function-set for db access.

Not really an answer as such but a few items that I hope helps.

It looks like that there is a small difference in the way that LOBs are returned between 10g and 11g, under 11g there is some notes about the conversion from btyes to byteStreams when LOBs are over a certain value, in the JDBC reference manual (I understand that this doesnt effect OCI calls as they use a different driver set).

From what I can see in terms of the OCI8 functions within php the default operation of the fetch functions is that the LOBs are returns as a reference and need to be accessed using the ->read() ->load() etc functions (see http://au.php.net/oci_fetch_array - regarding the mode and the default).

Now I dont know if you are using the OCI functions to access your oracle system as it's not specified in your question.

Couple of other items that would help in figuring this out would be if you could let us know if you recompiled php or updated the oracle drivers with the newer client version at all.

I know its not a full solution but if you are using oci_fetch_* to return the row, add a second argument to the call of OCI_RETURN_LOBS, this will cause the fetch to return a string of the LOB field instead of a reference to a stream, or use the $variable["C"]->load() to access this LOB this will cause it to load the full stream and act like a normal string.

Hope this helps.

Is there any 'web-based' sql test environment?

6 votes

In HTML+CSS+JS world, http://jsfiddle.net/ is very helpful tool for asking / making example about web development. And I also saw several browser(javascript)-based programming language compilers and REPLs. But I can't find online / web-based test environment for database operations( especially for RDBMS ).

Is there any open/free database service with web-based interfaces for testing queries?


Added: This tool will be good for this situation; If I'm troubling with complex queries, then create a sample table via web interface and ask it on stackoverflow with the 'sample table URL'. Anyone can access to the URL and test their queries on web site. (Yes, queries are running on 'real' database system) And also the query results can be tracked, then we can even make 'ranking' for it :)

There are free "disposable" database servers like db4free and FreeSQL (offline?) and even MonoQL.

As far as the web-based interfaces and short URLs go, I don't think you'll have much luck.

To manage your data you have to stick to what is provided (usually phpMyAdmin or similar) and there is no short-URL to query mapping. One other caveat of such system is that (without the appropriate user permissions) one user could easily destroy all your test data -- and remember that (relational) database versioning is much more expensive than plain text versioning, so that's pretty much out of the question.

For non-RDBMS, I can think of try.mongodb.org -- but it suffers from the same problems.

Almost forgot, the Stack Exchange Data Explorer, lets you practice T-SQL queries (with permalinks).


PS: As a personal side-note, I think it's a cool idea and I would love to see something like that implemented, perhaps even mashed-up with SchemaBank or similar - that would be just awesome.

MS Access: WHERE-EXISTS-clause not working on views?

6 votes

Prerequisites: In MS Access 2010 create the following tables:

CREATE TABLE ATBL(ID INT);
INSERT INTO ATBL(ID) VALUES (1);
INSERT INTO ATBL(ID) VALUES (2);
INSERT INTO ATBL(ID) VALUES (3);

CREATE TABLE BTBL(ID INT);
INSERT INTO BTBL(ID) VALUES (1);
INSERT INTO BTBL(ID) VALUES (2);

Also create a view called BVIEW which uses the following SELECT statement:

SELECT A.ID FROM ATBL AS A WHERE A.ID = 1 OR A.ID = 2

Now BVIEW should have the same contents as BTBL. Nevertheless the following two queries will return different results:

SELECT A.ID FROM ATBL AS A WHERE EXISTS (SELECT 1 FROM  BTBL AS B WHERE B.ID=A.ID)
SELECT A.ID FROM ATBL AS A WHERE EXISTS (SELECT 1 FROM BVIEW AS B WHERE B.ID=A.ID)

The first query returns two records (1 and 2), but the second query returns all records from ATBL. What's wrong here? Am I missing something?

A view is really a saved SQL SELECT statement. At least, this is what a saved view in MS Access is. And you use the same inner variables A and B. IMHO, they are getting mixed. The last line really looks as

SELECT A.ID FROM ATBL AS A WHERE EXISTS (SELECT 1 FROM (SELECT A.ID FROM ATBL AS A WHERE A.ID = 1 OR A.ID = 2) AS B WHERE B.ID=A.ID)

Try to change some inner names, for example:

SELECT AA.ID FROM ATBL AS AA WHERE AA.ID = 1 OR AA.ID = 2

So, the last line will look as

SELECT A.ID FROM ATBL AS A WHERE EXISTS (SELECT 1 FROM (SELECT AA.ID FROM ATBL AS AA WHERE AA.ID = 1 OR AA.ID = 2) AS B WHERE B.ID=A.ID)

So, as we see here, MS Access even doesn't know how to isolate aliases!

Fastest and most efficient way to pre-populate database in Android

5 votes

If you want to pre-populate a database (SQLite) in Android, this is not that easy as one might think.

So I found this tutorial which is often referenced here on Stack Overflow as well.

But I don't really like that way of pre-populating the database since you take the control from the database handler and create the files yourself. I would prefer to not touch the file system and let the database handler do everything on its own.

So what I thought one could do is create the database in the database handler's onCreate() as usual but then load a file (.sql) from /assets which contains the statements to fill in the values:

INSERT INTO testTable (name, pet) VALUES ('Mike', 'Tiger');
INSERT INTO testTable (name, pet) VALUES ('Tom', 'Cat');
...

But calling execSQL() in the handler's onCreate() doesn't really work. It seems that the /assets file must not have more than 1MB and the execSQL() only executes the first statement (Mike - Tiger).

What would you do do pre-populate the database?

I suggest the following:

  1. Wrap all of your INSERT logic into a transaction (BEGIN... COMMIT, or via the beginTransaction()... endTransaction() APIs)
  2. As already suggested, utilize the bind APIs and recycle objects.
  3. Don't create any indexes until after this bulk insert is complete.

Additionally take a look at Faster bulk inserts in sqlite3?

In redis, how do i remove keys?

5 votes

I want to remove keys that match "user*".

how do I do that in redis command line?

This is not a feature right now to be able to do in one shot (see the comments in the DEL documentation). Unfortunately, you are only left with using KEYS, looping through the results, and then using DEL to remove each one.

How about using bash a bit to help?

for key in `echo 'KEYS user*' | redis-cli | awk '{print $1}'`
 do echo DEL $key
done | redis-cli

To step through it:

  1. echo 'KEYS user*' | redis-cli | awk '{print $1}' -- get all the keys and strip out the extra text you don't want with awk.
  2. echo DEL $key -- for each one, create an echo statement to remove it.
  3. | redis-cli -- take the DEL statements and pass them back into the cli.

Not suggesting this is the best approach (you might have some issues if some of your usernames have spaces in them, but hopefully you get the point).

MySQL vs Firebird vs SQLite

5 votes

We will implement a company based ( not open source ) software and I really cannot choose which one I am going to work with. I can see the differences between all of them here but I need more specific information, like is really MySql not free for not open source projects (based on this blog) or how about their comparison in deployment?

EDIT : I need to expand my question. We are just implementing an API to our existing project. This will be used as a standalone program but it is an embedded system project so the deployement is really really important. I know that SQLite is really easy to deploy but I can see there are a few disadvantages. Can you please tell me which one can be better used according to this situation ?

Thank you all

I can't answer for all of them, but SQLite is entirely free to use in any application at all. It's explicitly been put in the Public Domain; you can get a license if you insist, but you're just getting what you have total right to anyway. For deployment, SQLite is certainly deployed in many places (inside many browsers, many smartphones, etc.) so it's definitely production-quality, but you do need to be in a position to be able to work with a native code library (SQLite's written in C, and compiles to a library); as it happens, that rules it out of my current projects…

Database design - system default items and custom user items

5 votes

This question applies to any database table design, where you would have system default items and custom user defaults of the same type (ie user can add his own custom items/settings).

Here is an example of invoicing and paymenttypes, By default an invoice can have payment terms of DueOnReceipt, NET10, NET15, NET30 (this is the default for all users!) therefore you would have two tables "INVOICE" and "PAYMENT_TERM"

INVOICE
Id
...
PaymentTermId



PAYMENT_TERM (System default)
Id
Name

Now what is the best way to allow a user to store their own custom "PaymentTerms" and why? (ie user can use system default payment terms OR user's own custom payment terms that he created/added)

Option 1) Add UserId to PaymentTerm, set userid for the user that has added the custom item and system default userid set to null.

INVOICE
Id
...
PaymentTermId



PaymentTerm
Id
Name
UserId (System Default, UserId=null)

Option 2) Add a flag to Invoice "IsPaymentTermCustom" and Create a custom table "PAYMENT_TERM_CUSTOM"

INVOICE
Id
...
PaymentTermId
PaymentTermCustomId
IsPaymentTermCustom (True for custom, otherwise false for system default)

PaymentTerm
Id
Name

PAYMENT_TERM_CUSTOM
Id
Name
UserId

Now check via SQL query if the user is using a custom payment term or not, if IsPaymentTermCustom=True, it means the user is using custom payment term otherwise its false.

Option 3) ????

...

As a general rule:

  • Prefer adding columns to adding tables
  • Prefer adding rows to adding columns

Generally speaking, the considerations are:

Effects of adding a table

  • Requires the most changes to the app: You're supporting a new kind of "thing"
  • Requires more complicated SQL: You'll have to join to it somehow
  • May require changes to other tables to add a foreign key column referencing the new table
  • Impacts performance because more I/O is needed to join to and read from the new table

Note that I am not saying "never add tables". Just know the costs.

Effects of adding a column

  • Can be expensive to add a column if the table is large (can take hours for the ALTER TABLE ADD COLUMN to complete and during this time the table wil be locked, effectively bringing your site "down"), but this is a one-time thing
  • The cost to the project is low: Easy to code/maintain
  • Usually requires minimal changes to the app - it's a new aspect of a thing, rather than a new thing
  • Will perform with negligible performance difference. Will not be measurably worse, but may be a lot faster depending on the situation (if having the new column avoids joining or expensive calculations).

Effects of adding rows

  • Zero: If your data model can handle your new business idea by just adding more rows, that's the best option

(Pedants kindly refrain from making comments such as "there is no such thing as 'zero' impact", or "but there will still be more disk used for more rows" etc - I'm talking about material impact to the DB/project/code)


To answer the question: Option 1 is best (i.e. add a column to the payment option table).
The reasoning is based on the guidelines above and this situation is a good fit for those guidelines.

Further,
I would also store "standard" payment options in the same table, but with a NULL userid; that way you only have to add new payment options when you really have one, rather than for every customer even if they use a standard one.

It also means your invoice table does not need changing, which is a good thing - it means minimal impact to that part of your app.