Ticket 63: ability to automatically remove boilerplate
Many feeds include some form of boilerplate, typically "blegs" (see [371]) or various other forms of shilling (e.g. A List Apart's "Hide your shame" T-Shirt ads).
We could use something like ¤Webstemmer to get rid of them automatically.
Remarks:
2008-Sep-23 08:05:12 by majid:
Splunk uses an interesting technique based on punctuation fingerprints to aid with parsing log messages, that may be relevant here.
Properties:
| Type: |
new |
|
Version: |
|
| Status: |
new |
|
Created: |
2008-Mar-28 08:59 |
| Severity: |
3 |
|
Last Change: |
2008-Sep-23 08:05 |
| Priority: |
3 |
|
Subsystem: |
|
| Assigned To: |
majid |
|
Derived From: |
|
| Creator: |
majid |