- Managing tagging rule sets
- Specifying which tagging rule sets are applied at which times
- General actions for manually tagging content.
- Treatment of tags existing in external content.
- Within specific applications, common queries involving tags
For the site administrator
- Managing which public tagging rule/phrase sets are offered to users.
- Managing which outside services are offered when a user follows a phrase in viewed content.
Tagging is a highly dynamic activity. Certain tags are long lived and designate topics of permanent interest, e.g. science, politics. Many more tags are short lived and refer to topics of current discourse, e.g. Michael Jackson Trial, Apple Tiger. Use of tagging propagates by example and serves for being listed in the right category. Tagging is thus an advertisement of the content from the content author. A secondary use of tagging may be for own convenience in classifying archival material. In the latter case, the tagging may come from the user itself.
Anyway, we must take into account the fact that the tagging vocabulary undergoes constant change and that no fixed set of tagging rules will remain permanently valid.
We note that a rule set will both imply tags based on content words as well as designate phrases of interest that may link to partner services. For example, there can be a wikipedia rule set which will only link to wikipedia whenever article headings from wikipedia appear in the text. This will not imply tags. On the other hand, there may be rule sets that do imply tags based on various text criteria.
The ODS DataSpace Settings page contains a link to "Content Tagging Settings."
This page is the main page governing the account's cross-application tagging settings.
- Rule sets enabled for content tagging - This is a list of tagging rule sets that are selected for the account. Each rule set has a system wide unique name, which appears as a link going to Rule Set Edit for the rule set in question. Each entry has an up and down arrow changing the position of the entry in the list and a remove link that will remove the rule set from the list. Removing the rule set does not affect the rule set itself.
- Rule sets list - This is a list of tagging rule sets owned by the account. Each entry shows the rule set name as a link to Rule Set Edit. Additionally there is a link "enable" which will add the rule set to the head of active rule sets. This link appears as the text "enabled" if the rule set is already in the list.
- Rule sets offered for use by other users - This is a scrollable list of shared rule sets not owned by the account. Any user may decide to make any rule set they own shared, so that other users may use it in classifying their material. Additionally, the owner account of the rule set is shown. Typically system provided rule sets like the list of wikipedia titles will appear here.
Above the list appears a search box. This is a text search on titles and terms included in rule sets. Looking for "semantic web" will find rule sets that have semantic and web as matched words or where semantic and web occur in the title or description.
This page allows editing the rules which compose a rule set.
- Ruleset Name - system wide unique name
- Public - checkbox. If checked, other users may use this rule set for classifying content but may not edit it.
- Tagging Rules - this section consists of:
- Query field
- Tag field
- Is phrase check-box
- Action: for existing rule: Delete, Update, Moat; for new rule: Add
The rule name link goes to a page that shows the above as editable fields. If the public check-box is not checked, all uses of the rule set from all non-owner users will be revoked, so that the rule set disappears from their list of selected rule sets if it occurred there.
- Add New Rule - This section consists of:
- Ruleset Name field
- Is Public check-box
- Query - text pattern text field.
- The syntax is either a free text expression with boolean connectives or a sequence of words. The entry 'Jennifer Lopez is interpreted as "Jennifer Lopezz" meaning the phrase with Jennifer followed by Lopez. 'Jennifer and Lopez' will match any document with Jennifer and Lopez anywhere in the text. Note that "" means phrase. Since we primarily match phrases, if no quotation or reserved words occur in the text, we assume double quotes around it, meaning phrase match.
- Tag - this is the tag implied by a match of the text pattern.
- If multiple tags are intended, these can be separated by commas. so 'pop music' is a tag. 'pop, music' is two tags, 'pop' and 'music'. Spaces are trimmed off the start and end of tags. Single spaces and punctuation other than commas may occur inside a tag. If the tags is empty, 'Jennifer Lopez' will simply be recognized as a phrase of interest so that links to partner services may be associated with the occurrence of the name. If the pattern is not a phrase pattern, the tags field may not be empty.
- Is phrase check-box
- Add - This button adds the rule described above. A syntax error may be reported.
- Rules sets - This is a scrollable list of tagging rules.
Each is shown with its rule name.
Each entry has a check box before it for selection for Remove, Up and Down links.
Hitting the rule name link will show the text of the rule in the Add rule form above.
Clicking Add will replace the original rule.
- We note that the rules list may be extremely long, thus only the first 40 are shown inside a div with a vertical scroll bar. After this there are previous and next buttons.
- Delete - This deletes the rule.
- Export - This will allow downloading the complete text of the rules set as an xml file (tagging_rules.xml) to the user agent.
- Import - This allows importing the complete text of the rules sets to the server from the user agent. If the XML file is syntactically correct, the imported file replaces the previous content.
This page may or may not allow editing the rule set, depending on whether the user is the owner of the rule set. If editing is not allowed, the name is static text and the add rule form is not shown. The import rules option will not be enabled but the export option will be offered.
The rule set XML format is:
<?xml version="1.0" encoding="ISO-8859-1" ?> <tagging-rules xmlns="http://www.openlinksw.com/tags/rule-set"> <rule-set name="--name of the rule" shared="--0 or 1 depending of Public is checked or not" xmlns="http://www.openlinksw.com/tags/rule-set"> <rule> <pattern>text of search pattern.</pattern> <is-phrase>0 or 1</is-phrase> <tags>tags implied by pattern, separated with commas, the tags element may be absent if no tags are implied and the pattern is a phrase pattern.</tags> </rule> </rule-set> ... </tagging-rules>
Different applications will reference tagging capabilities in different contexts but will typically not have entire pages devoted to tag functions.
A blog post will have tags associated to it. This can be because of:
- xxx in the text of the post
- The user associating tags through the posting form.
- Automatic tagging by matching the content against tagging rules.
To retag an existing post, one simply opens it for editing and enters tags in the Tags text-area. The tags in the text itself will not be removed but other tags will be removed and reinserted by the same logic as at post submit time.
The post tagging controls form a group with:
- Tags in the text: The literal tags in the body listed, separated by commas.
- User Tags: Editable text field with tags either explicitly given or inferred by rules, excluding any tags in the text.
- Suggested tags - If the post is a new submit, here will be shown tags suggested by the system base on the user's active tagging rules. The list is a html table with a checked check-box in front of each tag. Only the 5 most plausible tags are shown. If we are editing a post, the suggested tags is not shown by default but instead a button "Suggest Tags" is shown. Pressing this will show the list of suggestions. If the suggestions list is shown, there is additionally a button "Uncheck All".
- Set Tags button.
Pressing this will
- delete all tags of the post,
- Add all tags from the text
- Add all tags from the user tags text field,
- add all checked tags from the suggested tags, if suggested tags were shown.
Additionally, as an editing accelerator, a list of tags used in posts by the author can be shown. Clicking on an item will add the item into the user tags list if it is not already there. The basic tagging form should allow typing tag names without passing through any define new tag operation. The tag vocabulary is after all dynamic and open ended.
- blog-tags widget
<vm:blog-tags n="top-n-shown" />
This widget is a possible part of a blog template. It renders as a list of the top n tags of the blog, with font attributes reflecting relative importance. See the Technorati list of popular tags for font effects.
All occurrences of tags are counted in the blog as by:
select top n tag, count (*) from tags where blog = this_blog group by tag order by count (*) desc;
The tags is here assumed to be a table encoding the blog-post-tag relation.
If the list was truncated, ... is shown as the last entry. Each tag will be a link to a page showing posts having the tag in question on the regular blog view template page.
When reading a news article, the tags of the article can be shown as links underneath the text. The tags are again a user account, news feed, article, tag relation, with the pk consisting the of the above 4 parts.
Some tags of the article are in the body of the article and are provided by the source and are immutable. These are in the <a href"...technorati.com/tags...> format. Other tags are added by the user. All tags are in the above mentioned table but only tags set by the user can be removed. Every transaction that sets tags rereads the content to see what tags come from the content.
The post tagging controls group can be identical with the blog one.
When a feed is received and new articles are detected therein, then the data will be tagged according to the tagging rules chosen by the owner user of each FeedManager instance which subscribes to said feed. The tagging coming from the feed itself is stored with the user 'nobody', the tagging implied by the tagging rules of the news reader owners is stored with the user equal to the news reader owner.
When a non-owner user of a news reader searches for tags, this user will find only tags the user itself has defined, plus tags that come from the feed itself.
As an option, we may consider that the owner may choose to make its own tags searchable by non-owner users. We may also consider the possibility of by default having the owner user's tags visible to the users with access to this reader instance.
For now, we make it so that each subscribing instance's owners rules are applied to the incoming and are then recorded as that user's tags. The rest of automatic tagging will be specified later.
Tagging allows new types of hype metrics. These may be relevant at the scale of the news aggregation site as well as at the scale of a single set of subscriptions, i.e. a single FeedManager instance. The tags considered for such metrics should primarily be tags occurring in the feeds themselves. In the absence of this, these may be tags inferred by the owner user's tagging rule sets. Showing personalized statistics according to multiple tagging rule sets is prohibitive.
Generally, arbitrary data mining queries on news cannot be supported in a hosted context. This is due to the fact that 1. On one hand, their time and space requirements are unpredictable and secondly users do not know how to write efficient queries. Export of data for off-line analysis by the user's own software may be supported.
The news hosting site may offer various canned reports. Certain assumptions have to be made on the user's information needs and preferences. These center on following trends and identifying opportunities for visibility.
The first report is tags in the feed of the last day or last week. The count of tags is shown together with the count of posts with the tag. A third column shows the change in ranking compared with the situation a day / week ago. We may borrow graphics from top 40 hits or such. New entries should be specially marked, as well as items that have gained significantly.
The report may have a cutoff at 40 tags or so, with a next page button for the next 40 tags.
Tags may be shown as links, going to a view of posts with the tag in question, newest first.
Reports across the whole hosting site will have to be prepared on a batch schedule. Reports on individual instances' content may be made on demand.
The page tagging controls from blog are directly applicable to wiki. The relation for tags is there - wiki cluster, wiki word, tag. A direct Technorati tag in the body counts as an immutable tag.
The basic viewing page can just show the tags as text links with an edit tags link at the end. The edit tags control group can be identical with the blog one.
All applications produce RSS. The Dublin Core subject field contains the names pf tags associated with the item. If explicit Technorati tag anchors are in the body these are also in the RSS feed. Other tags are not in the body but only in the DC subject element,