Best database questions in October 2011

Is a 'blackhole' table evil?

14 votes

Reading to this question i've just learned the existence of the blackhole table trick: basically consist in using a single table to insert data, and then a trigger that split the data in many other tables.

Im wondering if this could cause problems, once the developers whos working on the project are aware of that.

What are the pro and cons of this tecnique?

Edit: The blink I got in mind when I saw the example, is about transactions: if for some reason the transaction fail, you'll find the blackhole row with the original data, for historical purpose and maybe a help with debug - but this seems to be the only +1 i can see with blackholes. Ideas?

I don't think blackhole has any real pros.

Writing the trigger code to move data around is probably not noticably less work than writing the code to insert the data in the right place in the first place.

As Christian Oudard writes, it doesn't reduce complexity - just moves it to a place where it's really hard to debug.

On the downside:

"Side effects" are usually a bad idea in software development. Triggers are side effects - I intend to do one thing (insert data in a table), and it actually does lots of other things. Now, when I'm debugging my code, I have to keep all the side effects in my head too - and the side effects could themselves have side effects.

most software spends far more time in maintenance than it does in development. Bringing new developers into the team and explaining the black hole trick is likely to increase the learning curve - for negligible benefit (in my view).

Because triggers are side effects, and it's relatively easy to set off a huge cascade of triggers if you're not careful, I've always tried to design my databases without a reliance on triggers; where triggers are clearly the right way to go, I've only let my most experienced developers create them. The black hole trick makes triggers into a normal, regular way of working. This is a personal point of view, of course.

Show table name where a value is present

7 votes

Is it possible to show the name of a table in a db where a specific value is present. I have different tables and i want to show only the table names that contains a specific value in any of the fields.

This will return lots of empty result sets, but the non-empty ones correspond to table/column combinations that fit your search. It only works for text, and detects columns that contain the value (as opposed to a full column match.)

DELIMITER |

DROP PROCEDURE IF EXISTS `SearchAllTables`|
CREATE PROCEDURE `SearchAllTables` (
    IN _search varchar(256)
    )
LANGUAGE SQL
DETERMINISTIC
SQL SECURITY DEFINER
BEGIN
    -- declare stuff
    declare _tableName varchar(64);
    declare _columnName varchar(64);
    declare _done tinyint(1) default 0;

    -- we will examine every string column in the database
    declare _columnCursor cursor for
        select TABLE_NAME, COLUMN_NAME
            from INFORMATION_SCHEMA.COLUMNS
        where TABLE_SCHEMA = database()
        and (DATA_TYPE like '%char%' 
            or DATA_TYPE like 'text');
    declare CONTINUE handler for NOT FOUND
        SET _done = 1;

    OPEN _columnCursor;
    LOOP1: LOOP
            -- get the next table/column combination
        FETCH _columnCursor INTO _tableName,_columnName;
        IF _done = 1 THEN
            CLOSE _columnCursor;
            LEAVE LOOP1;
        END IF;

            -- query the current column to see if it holds the value
        SET @query = concat(
            "select '",_tableName,"' as TableName, '",
                      _columnName,"' as ColumnName 
            from ",_tableName," 
            where ",_columnName," like concat('%',?,'%') 
            group by 1;"
        );
        SET @search = _search;
        PREPARE _stmt FROM @query;
        EXECUTE _stmt USING @search;
        DEALLOCATE PREPARE _stmt;
    END LOOP LOOP1;
END|

DELIMITER ;

Oh, yeah, and it's ugly... Maybe it'll help you, though!

Oracle: Indexing a subset of rows of a table

5 votes

I have a table which has active an inactive entries, active = 1 for active and active = 0 for inactive.

I have a variety of indexes on this table, but I only need the indexes maintained for active entries, as the application only queries against active data. Inactive data needs to be kept because it can become active again, but this is generally only done with bulk updates, which wouldn't use an index anyway.

I'm noticing indexing the inactive entries (of there are increasingly more than active entries) takes quite a bit of space.

Is there a way in Oracle (10g) to do something like this:

create index an_idx on tab (active, col1, col2, ... , coln) where active = 1?

Previous attempt:

I tried using a function based index to set the first column to null when active = 0 like so:

create index an_idx on tab (decode(active, 1, 1, null), col1, col2, ... , coln)

But Oracle still seems to index the inactive columns in this case.

Partition the table by ACTIVE, create local indexes, and make the indexes for the inactive partitions UNUSABLE. This will eliminate the time spent on indexing inactive data.

create table tab(active number, col1 number, col2 number, col3 number)
    partition by list(active) 
    (partition tab_active values(1), partition tab_inactive values(0));

create index tab_index1 on tab(col1) local;

alter index tab_index1 modify partition tab_inactive unusable;

But there are some potential downsides to this approach:

  • Not all types of indexes can be unusable.
  • It's not normal to have unusable objects in the database. People will likely complain about it or assume it's a bug and rebuild it.
  • Some operations, such as truncate, will automatically make the indexes usable again.

Heroku shared db vs Amazon RDS Performance

5 votes

I'm in the process of moving all my data from Heroku's shared db to Amazon RDS. Before switching everything over to RDS, I ran some tests locally to make sure my app works fine with it. These tests clearly slow that query time is slower on RDS. For the exact same request, I get:

On Heroku, with heroku shared db:

Completed 200 OK in 98ms (Views: 0.7ms | ActiveRecord: 56.0ms)

Locally, with RDS db instance

Completed 200 OK in 253ms (Views: 0.7ms | ActiveRecord: 127.9ms)

The ActiveRecord times are what I'm worried about here. Am I missing something? Heroku clearly states this about their shared db:

Shared databases are suitable for staging, testing, and low-scale production applications.

And yet it seems to be faster than this RDS instance I'm paying 80$/month for. Is heroku's shared db running locally? Because it's pretty obvious to me that about any database running locally inside my heroku app is going to be faster than any db that lives outside of it. Amazon says that any query taking more than 10ms is considered as a "slow query". But right now it seems that every query is gonna take at least 25ms for the roundtrip alone from the app to amazon's server + the actual query time. Or am I missing something?

From what I understand, Heroku EC2 instances run in the East availability zone, so creating an RDS instance in that same zone is pretty much like giving it a local database (I believe that's how heroku's shared databases work as well).

After setting up a staging environment for my app directly on Heroku and connecting it to my RDS instance, query times were much faster than when I tested it locally (where each SQL query had to make a roundtrip from my local machine to the RDS servers).

The only minor thing that's left unanswered is how to determine in which particular availability sub-zone my heroku app is running, so I can match it with my RDS instance (although it probably doesn't matter as much as the global availability zone).

enter image description here

Are mysql statements atomic?

5 votes

For example, I have a row with a column C1 value = 'clean', and two different computers run this query:

UPDATE T1
  SET C1 = 'dirty'
WHERE id = 1

at the same time.

Without using transactions, is it guaranteed that the value of mysql_affected_rows() would be 1 for one client and 0 for the other?

Yes and No :-)

In both cases, the access is serialised (assuming you're using a transactional engine like InnoDB) since they hit the same row, so they won't interfere with each other. In other words, the statements are atomic.

However, the affected row count actually depends on your configuration set when you open the connection. The page for mysql_affected_rows() has this to say (my bold):

For UPDATE statements, the affected-rows value by default is the number of rows actually changed. If you specify the CLIENT_FOUND_ROWS flag to mysql_real_connect() when connecting to mysqld, the affected-rows value is the number of rows "found"; that is, matched by the WHERE clause.

And from the mysql_real_connect page:

CLIENT_FOUND_ROWS: Return the number of found (matched) rows, not the number of changed rows.

So, in terms of what happens with CLIENT_FOUND_ROWS being configured, the affected rows for:

UPDATE T1 SET C1 = 'dirty' WHERE id = 1

have nothing to do with whether the data is changed, only what rows matched. This would be 1 for both queries.

On the other hand, if CLIENT_FOUND_ROWS was not set, the second query would not actually be changing the row (since it's already populated with 'dirty') and would have a row count of zero.

If you wanted the same behaviour regardless of that setting (only showing changes), you could use something like:

UPDATE T1 SET C1 = 'dirty' WHERE id = 1 AND C1 <> 'dirty'

How to store a database inside an Arduino?

5 votes

I'm working on project where I use an Arduino with a Bluetooth module and my cellphone Samsung Galaxy S II with Android OS. The idea of the project is to send some commands from my cellphone to the Arduino via Bluetooth. I want to include a database into the Arduino so that when I send login information from my cellphone, the Arduino will check the database and if the login information matches, it retrieves some data from the database and sends it to my cellphone.

How can I store a database inside the Arduino? Should I purchase an external EEPROM or RAM? And how can I deal with that database (adding, deleting and manipulating data)?

My Ardunio is of type UNO, BTW.

Just for simple login you don't need a database, you probably need just a simple table.

Consider first of all that usually EEPROMs allow from 1000 to 100000 write cycles. It means, if you write a single cell more than 100000 you have an high probability that your cell die, you cannot write it anymore.

The question is, how many logins are allowed? It is a all matter of choosing the right data structure and understand what is the amount of required memory.

Knowing the computational power of Arduino: If logins are just 2 .. 50, a simple list would be sufficient. Insertion at the end is O(1), deletion is O(n), lookup is O(n). A linked list however will allow you to reduce the number of writes for deletion to a constant small value.

If logins are more, 50 .. 1000, a sorted array with binary search is enough. Insertion is O(n), deletion is O(n), lookup is O(n log n). However the number of writes is O(n) both for deletion and insertion, and since writing is slow and can burn cells, it depends on the number of updates you want to do.

If logins are 1000 or more a binary tree is good. Insertion is O(n log n), deletion is O(n log n), lookup is O(n log n). The good thing is that for insertion and deletion you just need a small, constant number of writes.

Also an hashtable is good, but they usually use more memory. Insertion is averaged O(1), deletion is averaged O(1), lookup is averaged O(1). Insertion and deletion requires only a small constant number of write operations, less than a binary tree. As I said, this data structure uses more memory, speed comes at a cost.

You don't need a real relational database, but probably if you need too much users, you need an external EEPROM.

Of course, you have to save this data in a flash memory, internal or external, or you'll lose the database when you reset or power down the machine.

We can also say that you don't need to store the username and password, you can just store an hash of the password and username. If the hashed username and password exists, then you can allow login. In this way you can use fixed size memory and less memory. You can use MD5, it is the Android phone that have to send the MD5 hash, that is, 16 bytes, so the Arduino must only check if that MD5 hash exists in the list of users there, for example. And this is easy and fast.

SQL: eliminate the selection of redundant records that occur within X minutes

4 votes

DB in use is Firebird 2.1,in case not familiar with here is select statement sql ref:
http://ibexpert.net/ibe/index.php?n=Doc.DataRetrieval
functions ref: http://www.firebirdsql.org/file/documentation/reference_manuals/reference_material/html/langrefupd21.html

i will be happy with any sql slang [i'll convert it].

table schema:

CREATE TABLE EVENT_MASTER (
EVENT_ID                BIGINT NOT NULL,
EVENT_TIME              BIGINT NOT NULL,
DATA_F1                 VARCHAR(40),
DATA_F2                 VARCHAR(40),
PRIMARY KEY (EVENT_ID)
);

the bad news is EVENT_TIME stored as seconds elapsed since the Epoch.

data samples:

"EVENT_ID","EVENT_TIME","DATA_F1","DATA_F2"
25327,1297824698,"8604","A"
25328,1297824770,"8604","I"
25329,1297824773,"8604","A"
25330,1297824793,"8604","A"
25331,1297824809,"8604","1"
25332,1297824811,"8604","GREY"
25333,1297824812,"8604","A"
25334,1297824825,"8604","GREY"
25335,1297824831,"8604","A"
25336,1297824833,"8604","GREY"
25337,1297824838,"8604","A"
25338,1297824840,"8604","1"
25339,1297824850,"8604","A"
25340,1297824864,"8604","A"
25341,1297824875,"8804","GREY" //notice DATA_F1 is different
25342,1297824876,"8604","G"
25343,1297824877,"8604","A"
25344,1297824880,"8604","GREY"
25345,1297824895,"8604","1"
25346,1297824899,"8604","A"
25347,1297824918,"8604","GREY"
25348,1297824930,"8604","YELLOW"
25349,1297824939,"8604","GREY"
25350,1297824940,"8604",""
25351,1297824944,"8604","A"
25352,1297824945,"8604","1"
25353,1297824954,"8604","B"
25354,1297824958,"8604",""
25355,1297824964,"8604","1"
25356,1297824966,"8604","GREY"
25357,1297824974,"8604","1"
25358,1297824981,"8604","GREY"
25359,1297824983,"8604",""
25360,1297824998,"8604","GREY"
25361,1297825003,"8604","2"
25362,1297825009,"8604","G"
25363,1297825018,"8604","GREY"
25364,1297825026,"8604","F"
25365,1297825045,"8604","GREY"
25366,1297825046,"8604","1"

expected output:
distinct "DATA_F1","DATA_F2" rows within X minutes according to EVENT_TIME: like:

25341,1297824875,"8804","GREY"
25327,1297824698,"8604","A"
25328,1297824770,"8604","I"
25332,1297824811,"8604","GREY"
25348,1297824930,"8604","YELLOW"
..etc

Requirements: eliminate the selection of redundant records that occur within 5 minutes [range calcs according to EVENT_TIME column].

last i'm trying follow this pattern:

SELECT * FROM EVENT_MASTER inner join (
SELECT distinct  DATA_F1, DATA_F2 FROM EVENT_MASTER where /*the hard stuff that i need help with: (EVENT_TIME difference within X minutes)*/
) as RemovedDup ON /*EVENT_MASTER.EVENT_ID = problem is i cant select RemovedDup ID otherwise distinct becomes useless!!*/

please help ASAP.

thanks,

EDIT

adding output based on Andrei K. answer:

25331,1297824809,"8604","1"
25327,1297824698,"8604","A"
25342,1297824876,"8604","G"
25332,1297824811,"8604","GREY"
25328,1297824770,"8604","I"
25341,1297824875,"8804","GREY"
25350,1297824940,"8604",""
25352,1297824945,"8604","1" /*bug: time still within 300 seconds, this same as first record*/
25361,1297825003,"8604","2"
25351,1297824944,"8604","A"
25353,1297824954,"8604","B"
25364,1297825026,"8604","F"
25362,1297825009,"8604","G"
25347,1297824918,"8604","GREY"
25372,1297825087,"8604","ORANGE"
25348,1297824930,"8604","YELLOW"
25382,1297825216,"8604","1"
25387,1297825270,"8604","B"
25394,1297825355,"8604","BLUE"
25381,1297825211,"8604","GREY"

EDIT 2: Russell query output: not good output and its very very slow.

1297824698,"8604","A"
1297824770,"8604","I"
1297824809,"8604","1"
1297824811,"8604","GREY"
1297824825,"8604","GREY"
1297824840,"8604","1"
1297824875,"8804","GREY"
1297824876,"8604","G"
1297824880,"8604","GREY"
1297824918,"8604","GREY"
1297824930,"8604","YELLOW"
1297824939,"8604","GREY"
1297824940,"8604",""
1297824945,"8604","1"
1297824954,"8604","B"
1297824964,"8604","1"
1297824998,"8604","GREY"
1297825003,"8604","2"
1297825018,"8604","GREY"
1297825026,"8604","F"
1297825045,"8604","GREY"
1297825046,"8604","1"
1297825063,"8604","1"
1297825079,"8604","GREY"
1297825087,"8604","ORANGE"
1297825094,"8604","GREY"
1297825100,"8604","1"
1297825133,"8604","GREY"
1297825176,"8604","GREY"
1297825216,"8604","1"

EDIT 3:

based on Russell request here is: all rows WHERE DATA_F1 = '8604' AND DATA_F2 = 'GREY'

25332,1297824811,"8604","GREY"
25334,1297824825,"8604","GREY"
25336,1297824833,"8604","GREY"
25344,1297824880,"8604","GREY"
25347,1297824918,"8604","GREY"
25349,1297824939,"8604","GREY"
25356,1297824966,"8604","GREY"
25358,1297824981,"8604","GREY"
25360,1297824998,"8604","GREY"
25363,1297825018,"8604","GREY"
25365,1297825045,"8604","GREY"
25367,1297825059,"8604","GREY"
25371,1297825079,"8604","GREY"
25373,1297825094,"8604","GREY"
25376,1297825116,"8604","GREY"
25378,1297825133,"8604","GREY"
25380,1297825176,"8604","GREY"
25381,1297825211,"8604","GREY"
25384,1297825234,"8604","GREY"
25389,1297825286,"8604","GREY"
25390,1297825314,"8604","GREY"
25391,1297825323,"8604","GREY"
25393,1297825343,"8604","GREY"
25396,1297825370,"8604","GREY"
25397,1297825387,"8604","GREY"
25399,1297825416,"8604","GREY"
25401,1297825436,"8604","GREY"
25402,1297825445,"8604","GREY"
25404,1297825454,"8604","GREY"
50282,1299137344,"8604","GREY"
380151,1309849420,"8604","GREY"

As of this moment [Oct 11,2011 5:00 AM GMT] no absolute correct answer posted, and Andrei K. answer still the best try among others. so sql experts please help me find the solution otherwise i'll start to think that sql is not capable of handling the question requirements! Is it??

remark: event_time is not unique so multiple events can occur at the same second.

You'd need a very nasty recursive query to accomplish this in purely "functional" way. I don't profess to be clever enough to construct such a query, let alone make it performant.

On the other hand, allowing side-effects (i.e. temporary table) significantly simplifies things. You should even be able to make it fairly quick by adding appropriate indexes on the temporary table (not shown here). Here is the actual SQL:

CREATE GLOBAL TEMPORARY TABLE EVENT_MASTER_TMP (
    EVENT_ID                BIGINT NOT NULL,
    EVENT_TIME              BIGINT NOT NULL,
    DATA_F1                 VARCHAR(40),
    DATA_F2                 VARCHAR(40),
    PRIMARY KEY (EVENT_ID)
);

INSERT INTO EVENT_MASTER_TMP
SELECT * FROM
    (SELECT * FROM EVENT_MASTER ORDER BY EVENT_TIME) E
WHERE
    NOT EXISTS (
        SELECT *
        FROM EVENT_MASTER_TMP T
        WHERE
            E.DATA_F1 = T.DATA_F1
            AND E.DATA_F2 = T.DATA_F2
            AND E.EVENT_TIME - T.EVENT_TIME <= 5*60
    );

SELECT * FROM EVENT_MASTER_TMP;

In plain English:

  • Go through events from older to newer,
  • for each event, check if it is redundant relative to some row that is already in the temporary table
  • and if not, insert it in the temporary table, so it can be used as a criteria for the remaining events.

Executing this on your test data yields:

25327   1297824698  8604    A
25328   1297824770  8604    I
25331   1297824809  8604    1
25332   1297824811  8604    GREY
25341   1297824875  8804    GREY
25342   1297824876  8604    G
25348   1297824930  8604    YELLOW
25350   1297824940  8604    
25353   1297824954  8604    B
25361   1297825003  8604    2
25364   1297825026  8604    F

Lowering the time threshold from 5*60 to, say, 233, yield this:

25327   1297824698  8604    A
25328   1297824770  8604    I
25331   1297824809  8604    1
25332   1297824811  8604    GREY
25341   1297824875  8804    GREY
25342   1297824876  8604    G
25348   1297824930  8604    YELLOW
25350   1297824940  8604    
25351   1297824944  8604    A       <-- 246s difference
25353   1297824954  8604    B
25361   1297825003  8604    2
25364   1297825026  8604    F
25365   1297825045  8604    GREY    <-- 234s difference
25366   1297825046  8604    1       <-- 237s difference

approximate search in a database

4 votes

I have a large database with a list of institutions (universities, hospitals, etc). The names of institutions come from different sources and can be spelled differently for the same institution. They can be misspelled, for example, or words can be shortened ("uni", or "univ", or "university")

Given a name that I need to insert in to the database, is there a practical way to find if this institution is already in the database? This is not a research project, so I am looking for a solution that is reasonably fast.

I am using django and postgresql, but it does not matter I suppose.

This is the problem of record linkage. Many databases provide basic methods for this such as character-level n-gram matching, where a term like "university" is expanded into

["uni", "niv", "ive", "ver", "ers", ...]

for n = 3. The database would index all such n-grams and allow a search with some kind of weighted matching. pg_trgm seems to do exactly this, try it out.

Back up AppEngine database (Google cloud storage?)

4 votes

I have an AppEngine application that currently has about 15GB of data, and it seems to me that it is impractical to use the current AppEngine bulk loader tools to back up datasets of this size. Therefore, I am starting to investigate other ways of backing up, and would be interested in hearing about practical solutions that people may have used for backing up their AppEngine Data.

As an aside, I am starting to think that the Google Cloud Storage might be a good choice. I am curious to know if anyone has experience using the Google Cloud Storage as a backup for their AppEngine data, and what their experience has been, and if there are any pointers or things that I should be aware of before going down this path.

No matter which solution I end up with, I would like a backup solution to meet the following requirements:

1) Reasonably fast to backup, and reasonably fast to restore (ie. if a serious error/data deletion/malicious attack hits my website, I don't want to have to bring it down for multiple days while restoring the database - by fast I mean hours, as opposed to days).

2) A separate location and account from my AppEngine data - ie. I don't want someone with admin access to my AppEngine data to necessarily have write/delete access to the backup data location - for example if my AppEngine account is compromised by a hacker, or if a disgruntled employee were to decide to delete all my data, I would like to have backups that are separate from the AppEngine administrator accounts.

To summarize, given that getting the data out of the cloud seems slow/painful, what I would like is a cloud-based backup solution that emulates the role that tape backups would have served in the past - if I were to have a backup tape, nobody else could modify the contents of that tape - but since I can't get a tape, can I store a secure copy of my data somewhere, that only I have access to?

Kind Regards Alexander

There are a few options here, though none are (currently) quite what you're looking for.

With the latest release of version 1.5.5 of the SDK, we now support interfacing with Google Storage directly - you can see how, here. With this you can write data to Google Storage, but to the best of my knowledge there's no way to write a file that the app will then be unable to delete.

To actually gather the data, you could use the App Engine mapreduce API. It has built in support for writing to the App Engine blobstore; writing to Google Storage would require you to implement your own output writer, currently.

Another option, as WoLpH suggests, is to use the Datastore Admin tool to back up data to another app. With a little extra effort you could modify the remote_api stub to prohibit deletes to the target (backup) app.

One thing you should definitely do regardless is to enable two-factor authentication for your Google account; this makes it a lot harder for anyone to get control of your account, even if they discover your password.

Is it a good idea to create schema types to organize the relationships of tables

4 votes

Is it a good idea to create a schema type to separate the table relationships. I guess when you are browsing the tables in SSMS you will see them group together by schema type. But is it worth the trouble? Anyone with experience with this in real world scenarios?

enter image description here

I've generally found that to be more of a hassle than any help it's provided. What do you do with tables that are relevant to multiple areas? What happens when a table seems to belong to one area but later migrates to another area of the application? Do you change its schema and refactor all of your code?

I have used multiple schemata to make delineations when there is a VERY clear boundary between objects, but usually not something like what you have in your diagram. One example is objects which are used just for DBA support. I might put those into their own schema if they aren't used by the actual application itself.

Remove values in comma separated list from database

4 votes

I have a table in my MySQL database called 'children'. In that table is a row called 'wishes' (a comma separated list of the child's wishlist items). I need to be able to update that list so that it only removes one value. i.e. the list = Size 12 regular jeans, Surfboard, Red Sox Baseball Cap; I want to remove Surfboard.

My query right now looks like this

$select = mysql_query('SELECT * FROM children WHERE caseNumber="'.$caseNum.'" LIMIT 1 ');
    $row = mysql_fetch_array($select);

    foreach ($wish as $w) {
        $allWishes = $row['wishes'];
        $newWishes = str_replace($w, '', $allWishes);
        $update = mysql_query("UPDATE children SET wishes='$newWishes' WHERE caseNum='".$caseNum."'");
}

But the UPDATE query isn't removing anything. How can I do what I need?

Using these user-defined REGEXP_REPLACE() functions, you may be able to replace it with an empty string:

UPDATE children SET wishes = REGEXP_REPLACE(wishes, '(,(\s)?)?Surfboard', '') WHERE caseNum='whatever';

Unfortunately, you cannot just use plain old REPLACE() because you don't know where in the string 'Surfboard' appears. In fact, the regex above would probably need additional tweaking if 'Surfboard' occurs at the beginning or end.

Perhaps you could trim off leading and trailing commas left over like this:

UPDATE children SET wishes = TRIM(BOTH ',' FROM REGEXP_REPLACE(wishes, '(,(\s)?)?Surfboard', '')) WHERE caseNum='whatever';

So what's going on here? The regex removes 'Surfboard' plus an optional comma & space before it. Then the surrounding TRIM() function eliminates a possible leading comma in case 'Surfboard' occurred at the beginning of the string. That could probably be handled by the regex as well, but frankly, I'm too tired to puzzle it out.

Note, I've never used these myself and cannot vouch for their effectiveness or robustness, but it is a place to start. And, as others are mentioning in the comments, you really should have these in a normalized wishlist table, rather than as a comma-separated string.

Update

Thinking about this more, I'm more partial to just forcing the use of built-in REPLACE() and then cleaning out the extra comma where you may get two commas in a row. This is looking for two commas side by side, as though there had been no spaces separating your original list items. If the items had been separated by commas and spaces, change ',,' to ', ,' in the outer REPLACE() call.

UPDATE children SET wishes = TRIM(BOTH ',' FROM REPLACE(REPLACE(wishes, 'Surfboard', ''), ',,', ',')) WHERE caseNum='whatever';