HTTP L1/L2/L3 Proxy Checker/Leecher [Python]

Wednesday, August 1, 2012

HTTP L1/L2/L3 Proxy Checker


Proxy Checker Download Link

 #!/usr/bin/env python  
 #www.linux-ninjas.com  
 import Queue  
 import threading  
 import urllib2  
 import time  
 input_file = 'proxylist.txt'  
 threads = 10  
 queue = Queue.Queue()  
 output = []  
 class ThreadUrl(threading.Thread):  
   """Threaded Url Grab"""  
   def __init__(self, queue):  
     threading.Thread.__init__(self)  
     self.queue = queue  
   def run(self):  
     while True:  
       #grabs host from queue  
       proxy_info = self.queue.get()  
       try:  
         proxy_handler = urllib2.ProxyHandler({'http':proxy_info})  
         opener = urllib2.build_opener(proxy_handler)  
         opener.addheaders = [('User-agent','Mozilla/5.0')]  
         urllib2.install_opener(opener)  
         req = urllib2.Request("http://www.proxylists.net/proxyjudge.php")  
         sock=urllib2.urlopen(req, timeout= 7)  
         rs = sock.read(5000)  
         if '<TITLE>ProxyLists.Net - Proxy judge</TITLE>' in rs:  
             if 'Proxy is high anonymous (or no proxy)' in rs:  
                 output.append(('HighAnon',proxy_info))  
             elif 'Proxy is anonymous' in rs:  
                 output.append(('Anon',proxy_info))  
             elif 'Transparent' in rs:  
                 output.append(('Trans',proxy_info))  
         else:  
             raise "Not Judging"  
       except:  
         output.append(('x',proxy_info))  
       #signals to queue job is done  
       self.queue.task_done()  
 start = time.time()  
 def main():  
   #spawn a pool of threads, and pass them queue instance   
   for i in range(5):  
     t = ThreadUrl(queue)  
     t.setDaemon(True)  
     t.start()  
   hosts = [host.strip() for host in open(input_file).readlines()]  
   #populate queue with data    
   for host in hosts:  
     queue.put(host)  
   #wait on the queue until everything has been processed     
   queue.join()  
 main()  
 for proxy,host in output:  
   if (proxy == 'HighAnon'):  
     print proxy,host  
   if (proxy == 'Anon'):  
     print proxy,host  
   if (proxy == 'Trans'):  
     print proxy,host  
 print "Elapsed Time: %s" % (time.time() - start)  

This doesnt output one at a time, I personally don't mind waiting as I'm never really in a hurry to find a working proxy. Also you could change the main function (and the header) to allow writing to a file per L1/L2/L3:

 #-- This section goes at the top --
 high = open('L1HighAnonList.txt', 'w') 
 anon = open('L2AnonList.txt', 'w')
 tran = open('L3TransList.txt', 'w') 
 #-- This section replaces the main() --
 main()   
  for proxy,host in output:   
   if (proxy == 'HighAnon'):   
    high.writelines(host + '\n')   
   if (proxy == 'Anon'):   
    anon.writelines(host + '\n')   
   if (proxy == 'Trans'):   
    tran.writelines(host + '\n')   
 print "Elapsed Time: %s" % (time.time() - start)

Proxy Leecher


Proxy Leecher Download Link

 #!/usr/bin/env python  
 #www.linux-ninjas.com  
 import re, urllib, collections  
 proxies = open('leechlist.txt', 'w')  
 prox = []  
 urls = ['http://proxy-hunter.blogspot.com.au/2012/07/26-07-12-l1l2l3-http-proxies-1467.html','http://proxy-hunter.blogspot.com.au/2012/07/28-07-12-l1l2l3-http-proxies-676.html','http://proxy-hunter.blogspot.com.au/2012/07/30-07-12-l1l2l3-http-proxies-716.html','http://proxy-hunter.blogspot.com.au/2012/07/31-07-12-l1l2l3-http-proxies-722.html']  
 for url in urls:  
     document = urllib.urlopen(url).read()  
     proxylist = re.findall("\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d+", document)  
     for y in proxylist:  
         prox.append(y)  
 newprox=collections.Counter(prox)  
 for item in list(newprox):  
     proxies.writelines(item + '\n')  

The leecher just grabs the urls and looks for 111.222.333.444:[0-9].
I would suggest NOT going to each page and copying them one by one, but look for an RSS Feed, because usually most proxy sites will have a RSS Feed which shows the WHOLE post, this will enable you to get possibly get the WHOLE DB of proxies new and old, but I'd only do this once.

Enjoy your proxies :) (Also the leecher will grab socks proxies, it wont differentiate, which may be why some of the proxies will fail in the Checker as it only checks for HTTP. Not Socks and no SSL.

Software vs Hardware Raid On Linux

Friday, July 6, 2012

The "software or hardware RAID?" question comes up often. I think I answer the question at least once every six months or so. And the responses to such a question are often filled with, in my opinion, inordinate praise of expensive hardware RAID solutions, and inordinate scorn for software RAID. This page is my attempt to explain a Linux kernel engineer's point of view on the matter.
NOTE: This comparison excludes SAN and other external RAID solutions. Externally attached storage is outside the scope of this discussion. Externally connected solutions can obviously be SAN, software RAID, hardware RAID, or a combination thereof.

Why prefer Linux software RAID?


  • Potential for increased hardware and software biodiversity
  • Kernel engineers have much greater ability to diagnose and fix problems, as opposed to a closed source firmware. This has often been a problem in the past, with hardware RAID.
  • Disk format is public
  • ...thus, no vendor lock-in: Your data is not stored in a vendor-proprietary format.
  • A controller-independent, vendor-neutral layout means disks can be easily moved between controllers. Sometimes a complete backup+restore is required even when moving between hardware RAID models from the same vendor.
  • Eliminates single-points-of-failure (SPOF) compared to similar configurations of hardware RAID.
  • RAID5 XOR runs on host CPU, which practically guarantees that it is far faster than most hardware RAID microcontrollers.
  • RAID5 XOR speed increases as host CPU speeds increase.
  • RAID speed increases as host CPU count (multi-thread, multi-core) increases, following current market trends.
  • Cost. A CPU and memory upgrade is often cheaper and more effective than buying an expensive RAID card.
  • Level of abstraction. Linux software RAID can distribute data across ATA, SCSI, iSCSI, SAN, network or any other block device. It is block device agnostic. Hardware RAID most likely cannot even span a single card.
  • Hardware RAID has a field history of bad firmwares corrupting data, locking up, and otherwise behaving poorly under load. (certainly this is highly dependent on card model and firmware version)
  • Hardware RAID firmwares have a very limited support lifetime. You cannot get firmware updates for older hardware. Sometimes the vendor even ceases to exist.
  • Each hardware RAID has a different management interface, and level of feature support.
  • Your hardware RAID featureset is largely locked in stone, at purchase time. With software RAID, the featureset grows with time, as new features are added to Linux... no hardware upgrade required.
  • Additional RAID mode support. Most hardware controllers don't support RAID-6 as Linux software RAID does, and Linux will soon be adding RAID-5E and RAID-6E support.
  • Error handling and logging varies from vendor to vendor (and card to card), with hardware RAID.
  • Many ATA-based hardware RAID solutions either (a) fail to manage disk lifetimes via SMART, or (b) manage SMART diagnostics in a non-standard way.

Why prefer Linux hardware RAID?

  • All the efficiencies that may be derived from reducing the number of copies of each WRITE (system -> controller, when compared to the software RAID case).
  • Software RAID may saturate PCI bus bandwidth long before a hardware RAID card does (this presumes multiple devices on a single PCI bus).
  • RAID5 XOR calculations are not performed on the host CPU, freeing the host CPU from other tasks and preventing host CPU saturation under load or DoS.
  • It is easy to parallelize RAID operations, simply by purchasing more HW RAID cards, without affecting host CPU usage.
  • Battery backup on high end cards allows faster journalled rebuilds.
  • Battery-backed write-back cache may improve write throughput.

As an aside...

  • Some cards export hardware RAID capabilities, such as XOR or RAID1 offload, but allow full OS control of the operations in lieu of a firmware. Linux does not support this programming model well.

FAQ

Responses to specific points often made in soft-vs-hardware RAID discussions.
  • "Hardware RAID is always better."
    No solution is always better than another solution.
  • "Software RAID is always better."
    Ditto.

Sprunge.us CLI alternative to Pastebin

Sunday, June 10, 2012

Everyone has those days where you just need to easily upload a text document but you stuck with internet and a terminal. And sometimes its feels like it would be easier just to load up a GUI web interface to just copy a text document to send to friends, or between computers.

After attempting to write a paste-bin CLI program where I could just simply send it and it gets uploaded. I got fed up, because of their filter on long pastes they don't want their site 'abused'.... I stumbled upon my new favourite site.


Sprunge.US is a Free website, dedicated to the Sprunger seen in Futurama, which 'Sprunges information' simpleness that should be, text uploading.

 sprunge(1)             SPRUNGE             sprunge(1)  
 NAME  
   sprunge: command line pastebin:  
 SYNOPSIS  
   <command> | curl -F 'sprunge=<-' http://sprunge.us  
 DESCRIPTION  
   add ?<lang> to resulting url for line numbers and syntax highlighting  
 EXAMPLES  
   ~$ cat bin/ching | curl -F 'sprunge=<-' http://sprunge.us  
     http://sprunge.us/VZiY  
   ~$ firefox http://sprunge.us/VZiY?py#n-7  
 SEE ALSO  
   http://github.com/rupa/sprunge  

To Install the Sprunger, simply type this command in a terminal:
 sprunge="curl -F 'sprunge=<-' http://sprunge.us" 

Usage:
 cat myfile | sprunge  

Then you'll get your http link back eg: http://sprunge.us/KPIJ
Try it out yourself today and <3 the Sprunger!

Imgur Album Downloader

Friday, June 1, 2012

So instead of going out to buy new computer parts, I decided I'd spend 5 minutes on coding something that i needed. Especially as I personally don't run FireFox on my linux computer, and don't have access to DownThemAll, whenever I'd view an album of beautiful landscapes, I'd have to bookmark it and remember to come back later and download them. Not anymore! My first imgur Album Downloader!

Code:
 #!/bin/bash  
 #----------  
 #Imgur Album Downloader v0.1 \ Linux-Ninjas.com
 #----------  
 function albumtemp()  
 {  
 if [ -e .albumtemp ]; then  
     rm .albumtemp  
 fi  
 }  
 #Check for url + albumtemp  
 if [ $# == "" ]; then  
     echo -n "Url:"  
     read url  
     albumtemp  
 else  
     url="$1"  
     albumtemp  
 fi  
 #echo "url: $url"  
 if [ "$url" == "" ]; then  
     echo "You need to enter a url!"  
 else  
     curl $url -o .albumtemp  
     cat .albumtemp | grep _blank | awk '{print $2}' | cut -d \" -f 2 | xargs wget  
     albumtemp  
 fi  
(Pastebin raw link)

Usage:
 imguralbum url

It will then download all the pictures in the album to the current directory.
Install the script into ~/bin and then add ~/bin to your $PATH variable and you are good to download all the images! :)

Rsync Crash Course

Sunday, April 29, 2012

Today a friend asked me to give him a crash course in rsync, a great program that I use a lot to copy files from my servers to home, and visa-versa. Have you ever tried copying a file over a network only to have your terminal hang for a while until you realise that your wireless has disconnected, your cat has pulled out the phone line, or the little man inside your computer decided to have a field day and took a toilet break? Let rsync save you from that knowing that pain again.

Background:
rsync is another alternative to cp/scp. Except it has a lot more features. Such as the ability to resume a transfer, copy computer to computer, and has the ability to transfer over ssh with a beautiful progressbar.



General Usage:
 rsync <source> <destination>