原文来自http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html
翻译:EasyChen 转载请保留署名和出处,保持一致。
没有太多时间进行全文翻译,就把重点挑出来,用自己的话串起来,名曰 选择性翻译。 以后可能会比较多的采用这种方式。
社会书签的tag存储一直是一个比较麻烦的问题。
一个好的数据表设计,不但要能准确查出tag,还应该支持tag的AND/OR/NOT查询。我们来看看解决方案。
“MySQLicious” solution
表结构

存储实例

Intersection (AND)
“search+webservice+semweb”类的查询:
SELECT *
FROM `delicious`
WHERE tags LIKE "%search%"
AND tags LIKE "%webservice%"
AND tags LIKE "%semweb%"
Union (OR)
“search|webservice|semweb”类的查询:
SELECT *
FROM `delicious`
WHERE tags LIKE "%search%"
OR tags LIKE "%webservice%"
OR tags LIKE "%semweb%"
Minus
“search+webservice-semweb”类的查询
SELECT *
FROM `delicious`
WHERE tags LIKE "%search%"
AND tags LIKE "%webservice%"
AND tags NOT LIKE "%semweb%"
优点:
只有一个表- SQL比较直接
- 可以用mysql的全文检索来做,效率更高
缺点:
- tag的数量受到限制,通常我们都用varchar,这种字段只256个字节长。否则,你需要用text类型,速度会变慢。(Easy注,phpmore的tag用的就是TinyText)
- Like ‘%things%’不精确,当然某些应用中,这反而是需要的
“Scuttle” solution
数据表

Intersection (AND)
Query for “bookmark+webservice+semweb”:
SELECT b.*
FROM scBookmarks b, scCategories c
WHERE c.bId = b.bId
AND (c.category IN (’bookmark’, ‘webservice’, ’semweb’))
GROUP BY b.bId
HAVING COUNT( b.bId )=3
首先,所有书签-tag组合被搜出来 (c.category IN ('bookmark', 'webservice', 'semweb')), ,然后选择其中包含三个的(HAVING COUNT(b.bId)=3)
Union (OR)
Query for “bookmark|webservice|semweb”:
只需要去掉 AND查询中的HAVING子句:
SELECT b.*
FROM scBookmarks b, scCategories c
WHERE c.bId = b.bId
AND (c.category IN (’bookmark’, ‘webservice’, ’semweb’))
GROUP BY b.bId
Minus (Exclusion)
Query for “bookmark+webservice-semweb”, that is: bookmark AND webservice AND NOT semweb.
SELECT b. *
FROM scBookmarks b, scCategories c
WHERE b.bId = c.bId
AND (c.category IN (’bookmark’, ‘webservice’))
AND b.bId NOT
IN (SELECT b.bId FROM scBookmarks b, scCategories c WHERE b.bId = c.bId AND c.category = ’semweb’)
GROUP BY b.bId
HAVING COUNT( b.bId ) =2
好处: 我觉得这个方案比前一个方案好的最大理由是,可以有无限个tag。
“Toxi” solution
数据表

Intersection (AND)
Query for “bookmark+webservice+semweb”
SELECT b.*
FROM tagmap bt, bookmark b, tag t
WHERE bt.tag_id = t.tag_id
AND (t.name IN (’bookmark’, ‘webservice’, ’semweb’))
AND b.id = bt.bookmark_id
GROUP BY b.id
HAVING COUNT( b.id )=3
Union (OR)
Query for “bookmark|webservice|semweb”
SELECT b.*
FROM tagmap bt, bookmark b, tag t
WHERE bt.tag_id = t.tag_id
AND (t.name IN (’bookmark’, ‘webservice’, ’semweb’))
AND b.id = bt.bookmark_id
GROUP BY b.id
Minus (Exclusion)
Query for “bookmark+webservice-semweb”, that is: bookmark AND webservice AND NOT semweb.
SELECT b. *
FROM bookmark b, tagmap bt, tag t
WHERE b.id = bt.bookmark_id
AND bt.tag_id = t.tag_id
AND (t.name IN (’Programming’, ‘Algorithms’))
AND b.id NOT IN (SELECT b.id FROM bookmark b, tagmap bt, tag t WHERE b.id = bt.bookmark_id AND bt.tag_id = t.tag_id AND t.name = ‘Python’)
GROUP BY b.id
HAVING COUNT( b.id ) =2
Leaving out theHAVING COUNTleads to the Query for “bookmark|webservice-semweb”.
好处:
- 你可以给每个tag添加额外的信息
- 这是最规范的方案,第三范式。
坏处:
- 删除tag时,你要从多个表中删除(Easy注,Mysql5的话,可以用trigger来做)
然后我们把视线从功能转移到性能上。
A+B
250个tag

999个tag

A OR B
250个tag

添加速度比较

测试代码下载 Download the source code (PHP) LGPL协议。