Latest 0.0.3
Homepage https://github.com/wcgray/Untagger
License MIT
Platforms ios 8.0
Frameworks Foundation
Authors
Untagger

Untagger

Untagger is a removal and full text extraction of HTML written in Swift heavily inspired by Boilerpipe. Like Boilerpipe, Untagger provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.

The algorithms used by the library are based on concepts of the paper "Boilerplate Detection using Shallow Text Features" by Christian Kohlschütter et al., presented at WSDM 2010 — The Third ACM International Conference on Web Search and Data Mining New York City, NY USA.

Installation

Use CocoaPods:

platform :ios, '8.0'
use_frameworks!
pod 'Untagger'

Or drag the Untagger project into your xcodeproj and make Untagger a target dependency.

Usage

Import Untagger:

import Untagger

Then use it:

UntaggerManager.sharedInstance.getText(url: url) { (title, body, source, error) in
            if error == nil {
                print("Article title: (title!)")
                print("Article body: (body!)")
            }

            if let error = error {
                print("Error: (error.message)")
            }
        }

Author

wcgray, [email protected]

License

MIT

Latest podspec

{
    "name": "Untagger",
    "version": "0.0.3",
    "summary": "Removal and full text extraction of HTML in Swift inspired by Boilerpipe",
    "description": "Untagger is a removal and full text extraction of HTML written in Swift heavily inspired by Boilerpipe. It allows you to get the title and the text body of an HTML page.",
    "homepage": "https://github.com/wcgray/Untagger",
    "license": {
        "type": "MIT",
        "file": "LICENSE"
    },
    "authors": {
        "wcgray": "[email protected]"
    },
    "source": {
        "git": "https://github.com/wcgray/Untagger.git",
        "tag": "0.0.3"
    },
    "platforms": {
        "ios": "8.0"
    },
    "swift_version": "4.0",
    "source_files": "Untagger/**/*.{swift,m,h}",
    "public_header_files": "Untagger/**/*.h",
    "module_name": "Untagger",
    "xcconfig": {
        "HEADER_SEARCH_PATHS": "$(SDKROOT)/usr/include/libxml2"
    },
    "libraries": "xml2",
    "frameworks": "Foundation"
}

Pin It on Pinterest

Share This