Best sql questions in February 2011

How to design a database for User Defined Fields?

15 votes

My requirements are:

  • Need to be able to dynamically add User-Defined fields of any data type
  • Need to be able to query UDF fields quickly
  • Need to be able to do calculations on UDF fields based on datatype
  • Need to be able to sort UDF fields based on datatype

Other Information:

  • I'm looking for performance primarily
  • There are a few million Master records which can have UDF data attached
  • When I last checked, there were over 50mil UDF records in our current database
  • Most of the time, a UDF field is only attached to a few thousand of the Master records, not all of them
  • UDF fields are not joined or used as keys. They're just data used for queries or reports

Options:

  1. Create a big table with StringValue1, StringValue2... IntValue1, IntValue2,... etc. I hate this idea, but will consider it if someone can tell me it is better than other ideas and why.

  2. Create a dynamic table which adds a new column on demand as needed. I also don't like this idea since I feel performance would be slow unless you indexed every column.

  3. Create a single table containing UDFFieldName, UDFDataType, and Value. When a new UDFField gets added, generate a View which pulls just that data and parses it into whatever type is specified. Items which don't meet the parsing criteria return NULL.

  4. Create multiple UDF tables, one per data type. So we'd have tables for UDFStrings, UDFDates, etc. Probably would do the same as #2 and auto-generate a View anytime a new field gets added

  5. XML DataTypes? I haven't worked with these before but have seen them mentioned. Not sure if they'd give me the results I want, especially with performance.

  6. Something else?

If performance is the primary concern, I would go with #6... a table per UDF (really, this is a variant of #2). This answer is specifically tailored to this situation and the description of the data distribution and access patterns described.

Pros:

  1. Because you indicate that some UDFs have values for a small portion of the overall data set, a separate table would give you the best performance because that table will be only as large as it needs to be to support the UDF. The same holds true for the related indices.

  2. You also get a speed boost by limiting the amount of data that has to be processed for aggregations or other transformations. Splitting the data out into multiple tables lets you perform some of the aggregating and other statistical analysis on the UDF data, then join that result to the master table via foreign key to get the non-aggregated attributes.

  3. You can use table/column names that reflect what the data actually is.

  4. You have complete control to use data types, check constraints, default values, etc. to define the data domains. Don't underestimate the performance hit resulting from on-the-fly data type conversion. Such constraints also help RDBMS query optimizers develop more effective plans.

  5. Should you ever need to use foreign keys, built-in declarative referential integrity is rarely out-performed by trigger-based or application level constraint enforcement.

Cons:

  1. This could create a lot of tables. Enforcing schema separation and/or a naming convention would alleviate this.

  2. There is more application code needed to operate the UDF definition and management. I expect this is still less code needed than for the original options 1, 3, & 4.

Other Considerations:

  1. If there is anything about the nature of the data that would make sense for the UDFs to be grouped, that should be encouraged. That way, those data elements can be combined into a single table. For example, let's say you have UDFs for color, size, and cost. The tendency in the data is that most instances of this data looks like

     'red', 'large', 45.03 
    

    rather than

     NULL, 'medium', NULL
    

    In such a case, you won't incur a noticeable speed penalty by combining the 3 columns in 1 table because few values would be NULL and you avoid making 2 more tables, which is 2 fewer joins needed when you need to access all 3 columns.

  2. If you hit a performance wall from a UDF that is heavily populated and frequently used, then that should be considered for inclusion in the master table.

  3. Logical table design can take you to a certain point, but when the record counts get truly massive, you also should start looking at what table partitioning options are provided by your RDBMS of choice.

How to rotate a table 45 degrees and save the result into another table?

14 votes

Hi,

I have a table.

---------
| a | b |
---------
| a | b |
---------

I want to rotate it 45 degrees(clockwise or anti-clockwise) and save it into another table. For example, if I rotate it 45 degrees anti-clockwise, it will be:

-------------
| b |   |   |
-------------
| a | b |   |
-------------
| a |   |   |
-------------

Another example, when I rotate

-------------
| a | b | c |
-------------
| d | e | f |
-------------
| g | h | i |
-------------

It will change to

---------------------
| c |   |   |   |   |
---------------------
| b | f |   |   |   |
---------------------
| a | e | i |   |   |
---------------------
| d | h |   |   |   |
---------------------
| g |   |   |   |   |
---------------------

How to do this in SQL?

A fully working example (for SQL Server 2005+)
If you need it for another system, there are equivalents for the pieces of the puzzle below

  • row_number()
  • dense_rank()
  • un/pivot

You can find the equivalents from other Stackoverflow questions. For example, the first two are well supported by Oracle and DB2.

create table t45 (id int identity, colA char(1), colX char(1), colZ char(1))
insert t45 select 'a','b','c'
insert t45 select 'd','e','f'
insert t45 select 'g','h','i'
GO

select [1],[2],[3],[4],[5] -- for N columns, this goes to N*2-1
from
(
    select value,
        targetRow = row+col-1,
        targetCol = ROW_NUMBER() over (partition by row+col-1 order by row)
    from
    (
        select *,
            row = DENSE_RANK() over (order by id),
            col = ROW_NUMBER() over (partition by id order by
                CASE source when 'colA' then 3 -- number in reverse
                            when 'colX' then 2
                            when 'colZ' then 1 end)
        from t45
        unpivot (value for source in (colA,colX,colZ)) upv
    ) x
) p                                -- for N columns, this goes to N*2-1
pivot (max(value) for targetCol in ([1],[2],[3],[4],[5])) pv
order by targetRow

If you need to arbitrarily apply it to any table - use dynamic SQL to generate the pattern shown above.

How does ClojureQL compare to clojure.contrib.sql?

11 votes

It looks like each one covers the basic cases like selecting certain columns and filtering by predicate pretty well, but I'm wondering how each compares for more advanced cases. Is it easier to express complex queries in one vis-à-vis the other? Is one library missing any functionality that the other covers?

ClojureQL and clojure.contrib.sql are two quite different libraries. The first aims to implement the primitives from relational algebra and compile those to SQL92. It also offer an extensible compiler that can be adapted to database specific SQL dialect. The second is a lightweight set of helpers for using JDBC from Clojure code.

Querying

clojure.contib.sql

With clojure.contib.sql, you'll have to use SQL to write your queries. Here's an example:

(sql/with-connection db
  (sql/with-query-results rs ["select * from customer"]
    (doseq [r rs] (println (:lastname r))))

ClojureQL

As ClojureQL is mostly a query language, it provides a rich Clojure-based DSL to create SQL queries. I'll skip advanced examples and only show you the ClojureQL equivalent to the above query:

(sql/with-connection db
  (cql/with-results [rs (cql/table :customer)]
    (doseq [r rs] (println (:lastname r))))

You can express queries of arbitrary complexity with both, but contrib.sql require you to write SQL code. Take note that ClojureQL DSL main advantage over standard SQL is composability. Its table function returns a RTable object representing a query on the specified table, you can chain other ClojureQL function over that object to create the query that you need, then dereference it to execute it. Refer to ClojureQL examples page and documentation for more information on how to create more complex queries.

Inserting, Updating and Deleting

clojure.contib.sql

clojure.contrib.sql provides a comprehensive set of functions to insert, update and delete rows.

  • Inserting:
    • (insert-records table & records), where records are maps
    • (insert-rows table & rows), where rows are vectors
    • (insert-values table column-names & value-groups)
  • Updating: (update-values table where-params record)
  • Inserting or Updating: (update-or-insert-values table where-params record)
  • Deleting: (delete-rows table where-params)

ClojureQL

ClojureQL provides three RTable methods to manipulate the specified table data:

  • conj! which is a shorcut to contrib.sql's insert-records
  • disj! which is a shorcut to contrib.sql's delete-rows
  • update-in! which is similar to contrib.sql's update-or-insert-values

These have the advantage of using ClojureQL predicates syntax, but for now this part of ClojureQL is not generating database agnostic SQL as it's separated from the compiler. I intend to fix that by merging code from another library I've written in the more-or-less near future.

Schema Manipulation

clojure.contib.sql

clojure.contrib.sql only provides create-table and drop-table for creating and removing tables. Note that these are very simple functions that won't make your code portable. To alter a table you'll need to send SQL ALTER statements using the do-commands function.

ClojureQL

No schema manipulation helpers provided.

Lobos (shameless plug ;-)

This is a library I wrote to plug the hole left by these two libraries. It's a work in progress, but you already get a Clojure DSL to send any DDL statements in a database agnostic way.

Here's a basic example for creating a table:

(create (table :users (integer :id :unique)))

And altering it:

(alter :add (table :users (text :name)))

You can get more information on this library by visiting the website or the github page. It aims to provides higher-level functionality like migrations and declarative schema manipulation.

Others

clojure.contrib.sql has a couple extra lower-level helpers, see the complete documentation

There's more to say about how these libraries handle database connections but I'll leave that for another day!

P.S.: Note that both ClojureQL and Lobos are relatively young libraries that still need some work. Both descent from the original ClojureQL project which was a DSL covering the whole SQL language. ClojureQL already have a stable API, but only provide a SQL92 compatible compiler. Lobos has compiler support for multiple databases. but is still in active development and its API can still change.

Update: I've made some changes after a suggestion from Lau. ClojureQL itself doesn't aim to be database-agnostic, but provide the means for users to replace the compiler by a database-specific one. Note that the DML part of SQL is much more standardize than the DDL part.

When should i use Sql Azure and when should I use table Storage?

8 votes

When should i use Sql Azure and when should I use table Storage? I was thinking , use table storage for transaction processing scenarios e.g. debit credit accounts kind of scenario and use Sql Azure when data will not be used for transactional purposes e.g reporting. What do you think?

this is an excellent question and one of the tougher and harder to reverse decisions that solution architects have to make when designing for Azure.

There are mutliple dimensions to consider: On the negative side, SQL Azure is relatively expensive for gigabyte of storage, does not scale super well and is limited to 50gigs/database. (Although that limit will increase) however, and this is very important, there are no transaction fees against SQL azure and your developers already know how to code against it.

ATS is a different animal all together. Capeable of megascalability, it is dirt cheap to store, but gets expensive to frequently access. It also requires significant amount of CPU power from your nodes to manipulate. It baiscally forces your compute nodes to become mini-db servers as the delegation of all relational activity is turned over to them.

So, in my opinion, frequently accessed data that does not need huge scalability and is not super large in size should be destined for SQL Azure, otherwise Azure Table Services.

Your specific example, transactional data from financial transactions is a perfect place for ATS, while meta information (account profiles, names, addresses, etc.) Is perfect for SQL azure.

Hth

Why is SQL Server '=' comparator case insensitive?

8 votes

I just realized that SQL server '=' comparator when used for text comparison is case insensitive. I have a few questions regarding this functionality:

  1. Is this the same for all databases or specific to SQL server?
  2. I have been using the lower function to ensure the text comparison is insensitive till now. Is it still a good idea to follow the same?
  3. How can we do case sensitive comparisons in SQL server?
  4. Why is '=' operator defaulting to case insensitive comparison?

No, case sensitivity has nothing to do with the equals sign.

Case sensitivity is determined by the collation for the database -- see the documentation for details.

SQL Server: How can I select everything from a table with a prefix?

8 votes

I have the following code in a very long stored procedure, where P equals the Products table:

SELECT
P.*,
etc1,
etc2

Which would give me "ProductID" and so on.

I would like to select it with a prefix such as:

SELECT
P.* AS P_*,
etc1,
etc2

Which would give me "P_ProductID" and so on.

Is this possible to do?

Not unless you use dynamic SQL. It is very uncommon to require such a thing though, are you sure you need it?

Working example

create table Products (ProductID int, Price money, Description varchar(10));
insert Products select 1, 12.3, 'apples'
insert Products select 2, 2.4, 'bananas'
create table OrderDetails (OrderID int, ProductID int, Qty int)
insert into OrderDetails select 11,1, 2
insert into OrderDetails select 11,2, 4

declare @sql nvarchar(max)
select @sql = coalesce(@sql+',','') +
    'P.' + QuoteName(Column_name) + ' as ' + QuoteName('P_' + Column_name)
from INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = 'Products'
order by ORDINAL_POSITION
set @sql = '
select ' + @sql + ', O.OrderID, O.Qty
from Products P
inner join OrderDetails O on P.ProductID = O.ProductID
'
--print @sql   :: uncomment if you need to see it
exec (@sql)

Output:

P_ProductID P_Price               P_Description OrderID     Qty
----------- --------------------- ------------- ----------- -----------
1           12.30                 apples        11          2
2           2.40                  bananas       11          4

SQL Query for Grouping the results based on sequence

8 votes

I have a table like this:

ID  Seq  Amt
1   1    500
1   2    500
1   3    500
1   5    500
2   10   600
2   11   600
3   1    700
3   3    700

I want to group the continuous sequence numbers into a single row like this:

ID  Start  End  TotalAmt
1   1      3    1500
1   5      5    500
2   10     11   1200
3   1      1    700
3   3      3    700

Please help to achieve this result.

;WITH numbered AS (
  SELECT
    ID, Seq, Amt,
    SeqGroup = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Seq) - Seq
  FROM atable
)
SELECT
  ID,
  Start = MIN(Seq),
  [End] = MAX(Seq),
  TotalAmt = SUM(Amt)
FROM numbered
GROUP BY ID, SeqGroup
ORDER BY 1, 2

Is There ANY Sense in SQL Data Type VARCHAR(1)?

7 votes

I've bumped into a lot of VARCHAR(1) fields in a database I've recently had to work with. I rolled my eyes: obviously the designer didn't have a clue. But maybe I'm the one who needs to learn something. Is there any conceivable reason to use a VARCHAR(1) data type rather than CHAR(1)? I would think that the RDMS would convert the one to the other automatically.

The database is MS SQL 2K5, but evolved from Access back in the day.

Yes there is sense to it.

  • Easier for it to be definable in the language. It is consistent and easier to define varchar to allow 1-8000 than to say it needs to be 2+ or 3+ to 8000.

  • The VARying CHARacter aspect of VARCHAR(1) is exactly that. It may not be optimal for storage but conveys a specific meaning, that the data is either 1 char (classroom code) or blank (outside activity) instead of NULL (unknown/not-yet-classified).

Storage plays very little part in this - looking at a database schema for CHAR(1), you would almost expect that it must always have a 1 char value, such as credit cards must have 16 digits. That is simply not the case with some data where it can be one or optionally none.

There are also differences to using VARCHAR(1) vs CHAR(1)+NULL combination for those who say tri-state [ 1-char | 0-char | NULL ] is completely useless. It allows for SQL statements like:

select activity + '-' + classroom
from  ...

which would otherwise be more difficult if you use char(1)+NULL, which can convey the same information but has subtle differences.

Statistical query in SQL - is this possible with NHibernate LINQ?

7 votes

Hi,

I have an application that uses a few data warehousing principles such as dimensional modeling to do reporting on a fairly simple database.

An example (simplified) entity named Call looks like this:

    public virtual long Id { get; set; }
    public virtual string OriginatorNumber { get; set; }
    public virtual string DestinationNumber { get; set; }
    public virtual DateDimension DateDimension { get; set; }

A few of the properties of the real model have been removed as they are irrelevant. The simplified DateDimension looks like this:

    public virtual long Id { get; set; }
    public virtual DateTime Date { get; set; }
    public virtual int DayOfMonth { get; set; }
    public virtual int Weekday { get; set; }

There are a LOT more columns like this - they are prepopulated for the current decade by application setup. So each date in the entire decade has a row in this table, and each Call has a link to the date that it occured. This is all mapped in Fluent NHibernate and working fine.

If I want to do some reporting, I can do this easily with the improved NHibernate LINQ provider in 3.0. We would like to use LINQ for the improved maintainability it gives us, but if we really MUST, we'll consider HQL, ICriteria or even plain SQL.

So say I want to build a report that shows the number of calls from a certain number, divided by the day of the week they occur. I can do that easily this way:

        var query = Calls
            .Where(c => c.OriginatorNumber == "402")
            .GroupBy(c => c.DateDimension.Weekday)
            .Select(g => new { Day = g.Key, Calls = g.Count() } );

In this example, "Calls" is basically an IQueryable returned from NHibernates LINQ provider (Query) through a repository interface. The query above gives me the correct results, NHibernate Profiler shows me that the SQL is pretty optimal, all is well.

However, if I want to do something slightly more advanced, I get stuck. Say I want the average number of calls per weekday. Not too far from the above, right? I just need to figure out the number of unique dates each weekday has in the result set, divide the total number of calls by it, and we're all set - right? Well, no, this is where I start to hit the limitations of the NHibernate LINQ provider. With LINQ to objects I could construct a query to do it - something along the lines of

.Select(g => g.Count() / g.GroupBy(c => c.DateDimension.Date).Count());

However, this does not convert into the correct query when using it in NHibernate. Rather, it turns both .Count() calls in the above to the same count(*) of call records, so the result is always 1.

I COULD of course just query for each call, weekday and date as a new anonymous object, then do the math on the application side, but according to conventional wisdom, That's Just Wrong (tm). I could end up doing it in desperation, tho, even tho it means pain when the table grows to a million++ calls.

The below is an SQL query that gives me the result I am looking for.

select ss.Weekday, AVG(cast(ss.Count as decimal))
from
(
select dd.Weekday, dd.Date, COUNT(*) as Count
from Call c
left outer join DateDimension dd
    on c.DateDimension_id = dd.Id
where c.OriginatorNumber = '402'
group by dd.Weekday, dd.Date
) ss
group by ss.Weekday
order by ss.Weekday

Is it possible to do this with the NHibernate LINQ provider? Or, if that is not possible, how close can I get before I have to let the application fetch the intermediary result and do the rest?

There are a lot of things you can't do with the LINQ provider. Using HQL or CreateCriteria is just something you'll have to accept with NHibernate.

I haven't tried it, but it looks like you should be able to do what you want to do using HQL or CreateCriteria (with DetatchedCriteria).

If you are desperate you can also fall back to plain SQL using CreateSqlQuery.

Query for width and height, a record with each greater than the other in the same query?

6 votes

I am trying to create a single query that will pull results from a table with the following requirements:

1) Query database table for 5 records.
2) At least one record with image height greater than image width.
3) At least one record with image width greater than image height.
4) Results must be ordered by newest records first (time)

How would this be done? Thanks!

CREATE TABLE `images` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `image` char(100) NOT NULL DEFAULT '',
  `image_width` smallint(4) unsigned NOT NULL DEFAULT '0',
  `image_height` smallint(4) unsigned NOT NULL DEFAULT '0',
  `time` int(11) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`)
)

EDIT: Added time field and time order requirement. Records must be ordered by the time field (used an arbitrary int field for the example).

SELECT * FROM images WHERE image_width > image_height LIMIT 1
UNION
SELECT * FROM images WHERE image_height > image_width LIMIT 1
UNION
(SELECT * FROM images LIMIT 3);

To avoid row duplication you can use the following:

SELECT DISTINCT * FROM (
    SELECT * FROM images WHERE image_width > image_height LIMIT 1
    UNION
    SELECT * FROM images WHERE image_height > image_width LIMIT 1
    UNION
    (SELECT * FROM images LIMIT 3)
) AS i;

Optimizing SELECT COUNT to EXISTS

6 votes

I have a query to find certain customers from a table.

SELECT COUNT(*)
  FROM CUSTOMER
 WHERE amount <> 0
   AND customerid = 22

There is an index on customerid, so the DB scans all rows with customerid = 22.

Since the result is processed by checking whether the count returns zero or more than zero, how can I optimize the query? I.e. such that at the first customer row with amount <> 0 the query returns 0 else if all rows are = 0, then return 1.

select case
         when exists (select *
                      from   customer
                      where  amount <> 0
                             and customerid = 22) then 1
         else 0
       end  as non_zero_exists

Would MongoDB be a good idea for a social network site (developed in Ruby on Rails)?

6 votes

Hello,

my project (in Ruby on Rails 3) is to develop a "social network" site with the following features:

  • Users can be friends. It's mutual friendships; not asymetric like Twitter.
  • Users can publish links, to share them. Friends of a user can see what this user has shared.
  • Friends can comment on those shared links.

So basically we have Users, Links, and Comments, and all that is connected. An interesting thing in social networks is that the User table has kind of a many-to-many relation with itself.

I think I can handle that level of complexity with SQL and RoR.

My question is: would it be a good idea to use MongoDB (or CouchDB) for such a site?

To be honest, I think the answer is no. MongoDB doesn't seem to fit really well with many-to-many relationships. I can't think of a good MongoDB way to implement the friendship relationships. And I've read that Diaspora started with MongoDB but then switched back to classic SQL.

But some articles on the web defend MongoDB for social networks, and above all I want to make a well-informed decision, and not miss a really cool aspect of MongoDB that would change my life.

Also, I've heard about graph DB, which are probably great, but they really seem too young to me, and I don't know how they'd fit with RoR (and not mentioning heroku).

So, am I missing something?

Thanks,

arthur

I like MongoDB and use it a lot, but I am of the opinion that if you are dealing with relational data, you should use the right tool for it. We have relational databases for that. Mongo and Couch are document stores.

Mongo has a serious disadvantage if you are going to be maintaining a lot of inter-document links. Writes are only guaranteed to be atomic for one document. So you could have inconsistent updates for relations if you are not careful with your schema.

The good thing about MongoDB is that it is very good at scaling. You can shard and create replica sets. Foursquare currently uses MongoDB and it has been working pretty well for them. MongoDB also does map-reduce and has decent geospatial integration. The team that develops MongoDB is excellent, and I live in NY where they are based and have met them. You probably are not going to have scaling issues though I would think starting out.

As far as Diaspora switching... I would not want to follow anything they are doing :)

Your comment about graph dbs is interesting though. I would probably not use a graph DB as my primary DB either, but when dealing with relationships, you can do amazing things with them. In fact usually the demo the guys from graph DB companies will give you is extracting relationship knowledge from a social network. However, there is nothing preventing you from playing with these in the future for network analysis.

In conclusion, when you are starting out here, you are not running into the problems of massive scale yet, and are probably limited on time and money. Keep in mind that even Facebook does not use just one technology, they have basically expanded to NoSQL for certain functionality (like Facebook messaging). There is nothing stopping you in the future from using say Mongo and gridFS for handling image uploads or geo-location etc. It is good to grow as your needs change. I think your gut feeling that you have an SQL app here is right, and the benefits gained with MongoDB would not be realized for a while.

Help me understand this SQL Query

6 votes

I'm new to oracle database, can some help me understand this query. This query eliminates duplicates from table.

DELETE FROM table_name A 
 WHERE ROWID > (SELECT min(rowid) 
                  FROM table_name B 
                 WHERE A.key_values = B.key_values);

Any suggestions for improving the query are welcome.

Edit: No this is not home work , what i didnt understand is , what is being done by subquery and what does ROWID > On subquery do ?

This is the Source of the query

Dissecting the actual mechanics:

DELETE FROM table_name A 

This is a standard query to delete records from the table named "table_name". Here, it has been aliased as "A" to be referred to in the subquery.

WHERE ROWID > 

This places a condition on the deletion, such that for each row encountered, the ROWID must meed a condition of being greater than..

            (SELECT min(rowid) 
              FROM table_name B 
             WHERE A.key_values = B.key_values)

This is a subquery that is correlated to the main DELETE statement. It uses the value A.key_values from the outside query. So given a record from the DELETE statement, it will run this subquery to find the minimum rowid (internal record id) for all records in the same table (aliased as B now) that bear the same key_values value.

So, to put it together, say you had these rows

rowid   |  key_values
=======    ============
1          A
2          B
3          B
4          C
5          A
6          B

The subquery works out that the min(rowid) for each record based on ALL records with the same key_values is:

rowid   |  key_values    | min(rowid)
=======    ============    ===========
1          A               1
2          B               2
3          B               2  **
4          C               4
5          A               1  **
6          B               2  **

For the records marked with **, the condition

WHERE ROWID > { subquery }

becomes true, and they are deleted.

EDIT - additional info

This answer previously stated that ROWID increased by insertion order. That is very untrue. The truth is that rowid is just a file.block.slot-on-block - a physical address.

http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:53140678334596

Tom's Followup December 1, 2008 - 6am Central time zone:

it is quite possible that D will be "first" in the table - as it took over A's place.

If rowids always "grew", than space would never be reused (that would be an implication of rowids growing always - we would never be able to reuse old space as the rowid is just a file.block.slot-on-block - a physical address)

Get followers and following in one query using MySQL

6 votes

Ok lets say I'm making a social networks like twitter. I have a table called social where it gets or where we put our social thing.

example me ( uid = 1 )
friend1 ( uid = 2 )
friend2 ( uid = 3 )

The table

SID AUID BUID
1   1    2
2   1    3
3   2    1

The information that we get here is

user id 1(me) is following 2
user id 1(me) is following 3
user id 2 is following 1(me)

And the question is: Can we do something like the two queries below in one single query ?

function CountFollowers($uid){
    $count = $this->fetch("SELECT COUNT(BUID) as followers
                                   FROM social WHERE BUID = :uid", 
    array( 'uid' => $uid));
    return $count;
}

and

function CountFollowing($uid){
    $count = $this->fetch("SELECT COUNT(AUID) as following
                                   FROM social WHERE AUID = :uid", 
    array( 'uid' => $uid));
    return $arrayofdata;
}

Thanks for looking in

This gets you two columns with the count of following and followers

SELECT (SELECT COUNT(BUID) as count FROM social WHERE BUID = :uid) as Followers
  , (SELECT COUNT(AUID) as count FROM social WHERE AUID = :uid) as Following

increasing performance on a SELECT query with large 3D point data set

5 votes

I have a large dataset (around 1.9 million rows) of 3D points that I'm selecting from. The statement I use most often is similar to:

SELECT * FROM points 
WHERE x > 100 AND x < 200 
AND   y > 100 AND y < 200 
AND   z > 100 AND z < 200 
AND otherParameter > 10

I have indicies on x, y, and z as well as the otherParameter. I've also tried adding a multi-part index to x,y,z but that hasn't helped.

Any advice on how to make this SELECT query quicker?

B-Tree indexes won't help much for such a query.

What you need as an R-Tree index and the minimal bounding parallelepiped query over it.

Unfortunately, MySQL does not support R-Tree indexes over 3d points, only 2d. However, you may create an index over, say, X and Y together which will be more selective that any of the B-Tree indexes on X and Y alone:

ALTER TABLE points ADD xy POINT;

UPDATE  points
SET     xy = Point(x, y);

ALTER TABLE points MODIFY xy POINT NOT NULL;


CREATE SPATIAL INDEX sx_points_xy ON points (xy);

SELECT  *
FROM    points
WHERE   MBRContains(LineString(Point(100, 100), Point(200, 200), xy)
        AND z BETWEEN 100 and 200
        AND otherParameter > 10;

This is only possible if your table is MyISAM.

SQL Server - Counting number of times an attribute in a dataset changes (non-concurrently)

4 votes

Hi,

I have a query that returns either a 1 or 0 based on whether or not an event occurred on a given date. This is ordered by date. Basically, a simple result set is:

Date         |  Type
---------------------
2010-09-27      1
2010-10-11      1
2010-11-29      0
2010-12-06      0
2010-12-13      1
2010-12-15      0
2010-12-17      0
2011-01-03      1
2011-01-04      0

What I would now like to be able to do is to count the number of separate, non-concurrent instances of '0's there are - i.e. count how many different groups of 0s appear.

In the above instance, the answer should be 3 (1 group of 2, then another group of 2, then finally 1 to end with).

Hopefully, the above example illustrates what I am trying to get at. I have been searching for a while, but am finding it difficult to succinctly describe what I am looking for, and hence haven't found anything of relevance.

Thanks in advance,

Josh

This is a variant of the "islands" problem. My first answer uses Itzik Ben Gan's double row_number trick to identify contiguous groups of data efficiently. The combination of Type,Grp identifies each individual island in the data.

You can read more about the different approaches to tackling this problem here.

;WITH T AS (
    SELECT  *,
            ROW_NUMBER() OVER(ORDER BY Date) -
            ROW_NUMBER() OVER(PARTITION BY Type ORDER BY Date)  AS Grp
    FROM    YourTable
)
SELECT  COUNT(DISTINCT Grp)
FROM    T
WHERE Type=0

My second answer requires a single pass through the data. It is not guaranteed to work but is on the same principle as a technique that many people successfully use to concatenate strings without problems.

DECLARE @Count int = 0

SELECT @Count = CASE WHEN Type = 0 AND @Count <=0 THEN -@Count+1 
                     WHEN Type = 1 AND @Count > 0 THEN - @Count
                     ELSE @Count END

FROM YourTable
ORDER BY Date

SELECT ABS(@Count)