temboz - Ticket #63
Not logged in
[Honeypot]  [Browse]  [Help]  [Home]  [Login]  [Reports
[Search]  [Ticket]  [Timeline]  [Wiki
  [History

Ticket 63: ability to automatically remove boilerplate

Many feeds include some form of boilerplate, typically "blegs" (see [371]) or various other forms of shilling (e.g. A List Apart's "Hide your shame" T-Shirt ads).

We could use something like ¤Webstemmer to get rid of them automatically.

[Append remarks]

Remarks:

2008-Sep-23 08:05:12 by majid:
Splunk uses an interesting technique based on punctuation fingerprints to aid with parsing log messages, that may be relevant here.
[Append remarks]

Properties:

Type: new           Version:  
Status: new          Created: 2008-Mar-28 08:59
Severity:          Last Change: 2008-Sep-23 08:05
Priority:          Subsystem:  
Assigned To: majid           Derived From:  
Creator: majid