Context Navigation

Changes between Version 7 and Version 8 of BluePrint/Importer

-              v7
+              v8
   * http://wiki.github.com/fizx/parsley/
   * http://developer.yahoo.com/yql/guide/
+  * PDFminer is a tool to convert pdf docs into text, it is open source [http://www.unixuser.org/~euske/python/pdfminer/index.html#license (Licence)]. Some hacking in the souce code will is a good option for coding IMPORTING TOOL [http://trac.sahanapy.org/wiki/SpreadsheetImporter Spreadsheet Importer]
+  * PDFminer is a tool to convert pdf docs into text, it is open source [http://www.unixuser.org/~euske/python/pdfminer/index.html#license (Licence)]. Some hacking in the souce code will is a good option for coding IMPORTING TOOL [http://trac.sahanapy.org/wiki/SpreadsheetImporter Spreadsheet Importer] by codestasher
+  * Code snippet to extract hyperlinks from HTML docs.
+{{{
+import sgmllib
+class MyParser(sgmllib.SGMLParser):
+    def parse(self, s):
+        self.feed(s)
+        self.close()
+    def __init__(self, verbose=0):
+        sgmllib.SGMLParser.__init__(self, verbose)
+        self.hyperlinks = []
+    def start_a(self, attributes):
+        for name, value in attributes:
+            if name == "href":
+                self.hyperlinks.append(value)
+    def get_hyperlinks(self):
+        return self.hyperlinks
+import urllib, sgmllib
+f = urllib.urlopen("http://www.python.org")
+s = f.read()
+myparser = MyParser()
+myparser.parse(s)
+print myparser.get_hyperlinks()
+}}}
+by codestasher