Monday, December 31, 2007

Top 10 Most Popular Torrent Sites of 2007 + Top 5 Newcomers

Top 10 Most Popular Torrent Sites:


  1. Mininova
  2. Without a doubt the most visited BitTorrent site. In November, Mininova reached a milestone by entering the list of the 50 most visited websites on the Internet.

  3. IsoHunt
  4. IsoHunt continued to grow this year. In September they were forced to close their trackers to US traffic because of the issues they have with the MPAA, but this had no effect on the visitor count.

  5. The Pirate Bay
  6. The Pirate Bay has been in the news quite a bit this year and remains not only the most used BitTorrent tracker, but also one of the most visited BitTorrent sites. At the moment they are fighting with IsoHunt for the second place in this list.

  7. Torrentz
  8. Torrentz is the only “torrent site” in the top 10 that doesn’t host .torrent files. Several improvements and new features have been introduced over the past year such as a comment system, private bookmarks and a cleaner layout.

  9. BTjunkie
  10. BTjunkie was one of the fastest risers last year and continued to grow throughout 2007. Last month they were, like many others, forced to leave their ISP (LeaseWeb), but the transition to a new host went smoothly and didn’t result in any downtime.

  11. TorrentSpy
  12. TorrentSpy was the most popular BitTorrent site of 2006, but dropped to sixth place due to legal issues with the MPAA. To ensure the privacy of their users, TorrentSpy decided that it was best to block access to all users from the US, causing their traffic to plunge.

  13. TorrentPortal
  14. Not much news about TorrentPortal this year, but that probably is a good thing. Like most other sites they have grown quite a bit in 2007.

  15. GamesTorrents
  16. It’s quite a surprise to see GamesTorrents in the list of 10 most popular BitTorrent sites of 2007. This Spanish BitTorrent site had a huge dip in traffic earlier this year but managed to secure 8th place.

  17. TorrentReactor
  18. TorrentReactor.net has been around for quite a while, four years to be exact, and is still growing.

  19. BTmon
  20. BTmon was one of the newcomers in 2006, and it is the youngest BitTorrent site in the top 10 this year.


Top 5 Newcomers:

  1. SumoTorrent
  2. SumoTorrent launched this April and quickly became one of the more popular BitTorrent sites.

  3. SeedPeer
  4. SeedPeer launched in September and is formerly known as Meganova.

  5. Zoozle
  6. A BitTorrent meta-search engine, launched in January.

  7. Extratorrent
  8. Launched a year ago, it got a serious traffic boost earlier this year.

  9. BitTorrent.am
  10. BitTorrent.am is also indexed by Torrentz.com, and was launched early 2007.

Friday, December 07, 2007

Google's storage strategy

Not a SAN in sight

With 6 billion web pages to index and millions of Google searches run daily you would think, wouldn't you, that Google has an almighty impressive storage setup. It does, but not the way you think. The world's largest search company does use networked storage but in the form of networked clusters of Linux servers, cheap rack'em high, buy'em cheap x86 servers with one or two internal drives.

A cluster will consist of several hundred, even thousands of machines, each with their internal disk. At the last public count, in April 2003, there were 15,000 plus such machines with 80GB drives. As an exercise let's assume 16,000 machines with 1.5 disk drives, 120MB, per machine. That totals up to 1.84TB. In fact Google probably has between two and five petabytes altogether, if we add in duplicated systems, test systems and news systems and Froogle systems and so forth. Why does Google use such a massively distributed system?

It's the application

Crudely speaking, Google's storage has to do two production jobs. First it has to assimilate the results of the web crawlers which discover and index new pages. In file system terms the bulk of this activity is appending data to existing files rather than overwriting them.

The second task is to respond to the millions of online search requests, query the stored data, and come up with results. These searches can be extensively parallelised.

Google has its own GFS - Google File System - and it is described here. It has implemented this on several very large clusters of Linux machines spread across the globe in data centres.

Google's application is unique and not comparable to a general enterprise application which typically involves file data being overwritten and a much lower degree of parallelism. Google also requires that its services be up and running 7 x 24, every day of the year, no matter what. Single or even double points of failure, or network bottlenecks are simply not acceptable - ever.

Overall system configuration

Google has devised its own cluster architecture, which has evolved from the first Google system set up at Stanford by the founders in 1998 (so recent!) Sergey Brin and Larry Page.

The nature of a Google query, such as search for 'EMC', requires the scanning of hundreds of magabytes of data and billions of cpu cycles. But each web page that might contain the term 'EMC' can be read independently of the others. Thus it is inherently parallel. Brin and Page reasoned it was better to have many cheap Linux machines do the search in parallel rather than running an SMP Unix server. The Unix server would cost 5-10 times as much and represent a point of failure.

Run the search in clustered Linux PC servers (cheap, very cheap), each with their own internal disk rather than a networked storage device (expensive; network link is a bottleneck) and combine the results. Even better, store the index data for the web pages separately from the web pages themselves. Run the search across the web page index, then aggregate the positive hits and search the web pages to extract the little snippets of text surrounding the search term. Aggregate these and serve them to the user.

Linux was chosen because it was inexpensive and more reliable than either Windows NT or any proprietary Unix version.

There is no concept of state as there would be with a commercial web transaction. Each search request is atomic, can be dealt with and forgotten.

In scaling terms this is a classic scale out or horizontal scaling scenario and not a scale up, as in adding CPUs to a server, requirement.

The index is separated into what Google calls shards and these are stored on separate index servers.

The hard drives

Given this why not have a large disk server used by the clustered Linux machines? It's cost and reliability that drives this. A disk server is expensive and, as a single box, is vulnerable. Getting the hard drives with the PC servers means that the data is stored across hundreds if not thousands of drives. Google replicates data three times for redundancy. It can afford to be cavalier about hardware failures. So a drive fails. Log it, switch queries on that data to a replica and move on. It's all pretty instant.

There isn't even RAID protection. In a way the Google cluster architecture is similar to the RAIN storage idea, a redundant array of inexpensive nodes. (Techworld mentioned RAIN here. Exagrid is a supplier with RAIN storage product ideas which Techworld discussed recently here.)

The drives are IDE drives and not SCSI, which would be more expensive. Google spends more time reading files than waiting for them to be read. Latency is not that great an issue so having lightning fast 15,000pm SCSI drives is not a requirement. In 2001, 5400rpm 80GB maxtor IDE drives were mentioned as being used by Google.

Google's architecture is home-grown. Its PC servers are supplied by two specialist server builders. There is no great case study material here for Sun or IBM or HP, none whatsoever. The only well-known supplier is Red Hat for Linux, and much of its distribution is discarded as not needed.

Google gets its system reliability from software and hardware duplication. It uses commodity PCs to build a high-end computing cluster.

File System

The Google file system basics are that each GFS cluster has a single GFS master node and many chunk servers. These are accessed by many, many clients. Files are divided into fixed-size chunks of 64MB. The master maintains all file system metadata. The chunk servers store chunks on their local disks as Linux files. They need not cache file data because the local systems' Linux buffer cache keeps frequently accessed data in RAM.

To understand more about this read the GFS paper referenced above. The assumptions behind the file system includes one that component failures are normal. So system component health is watched rigorously and constantly and automatic recovery is integral to Google's operations.

Growth

Google has been growing at a phenomenal rate. In June 2000 it had three data centres and 4,000 Linux servers. Six months earlier it had 2,000. By April 2001 it had 8,000 servers and was moving to four datacentres from its then total of five. At that point it had 1 petabyte of storage. The number of servers had passed 15,000 in April,2003, probably well past.

By the end of this year Google could have around 18,000 servers and more than 5PB of storage. It is a fascinating exercise in commodity computing economics, performance and reliability but, unless your applications are inherently parallel, not a general role model, alas.

Source: http://www.techworld.com/features/index.cfm?featureID=467&printerfriendly=1

Saturday, December 01, 2007

The Fifth Continent

Sydney, Nov 24-30, 2007

Camera: Olympus E-500 | Lens: 40-150mm F3.5-4.5 | Shutter speed: 1/640 sec | Aperture value: F4.5 | Focal length: 150mm | ISO sensitivity: 100

Camera: Olympus E-500 | Lens: 40-150mm F3.5-4.5 | Shutter speed: 1/1250 sec | Aperture value: F4.1 | Focal length: 100mm | ISO sensitivity: 100

Camera: Olympus E-500 | Lens: 14-45mm F3.5-5.6 | Shutter speed: 1/160 sec | Aperture value: F5.6 | Focal length: 45mm | ISO sensitivity: 100

The Three Sisters are a famous rock formation in the Blue Mountains of New South Wales, Australia. They are close to the town of Katoomba and are one of the Blue Mountains' most famous sights, towering above the Jamison Valley. Their names are Meehni (922 m), Wimlah (918 m), and Gunnedoo (906 m). Camera: Olympus E-500 | Lens: 14-45mm F3.5-5.6 | Shutter speed: 1/160 sec | Aperture value: F7.1 | Focal length: 45mm | ISO sensitivity: 100

The first sister - Meehni. Camera: Olympus E-500 | Lens: 14-45mm F3.5-5.6 | Shutter speed: 1/100 sec | Aperture value: F5.6 | Focal length: 14mm | ISO sensitivity: 100

Camera: Olympus E-500 | Lens: 14-45mm F3.5-5.6 | Shutter speed: 1/80 sec | Aperture value: F5.6 | Focal length: 45mm | ISO sensitivity: 400

A peep at Sydney nightline from Hilton Sydney. Camera: Olympus E-500 | Lens: 14-45mm F3.5-5.6 | Shutter speed: 60 sec | Aperture value: F16 | Focal length: 14mm | ISO sensitivity: 100

The kangaroo is an Australian icon. This kangaroo looked at me with lovely eyes. Unbelievably, the world between human and animal is that close, but the distance between human hearts are far within reach. Camera: Olympus E-500 | Lens: 40-150mm F3.5-4.5 | Shutter speed: 1/80 sec | Aperture value: F4.5 | Focal length: 150mm | ISO sensitivity: 400

Koalas have a slow metabolic and rests motionless for about 16 to 18 hours a day, sleeping most of that time, but spend about three of their five active hours eating. I have the luck to see this Koala eating eucalyptus leaves. Camera: Olympus E-500 | Lens: 40-150mm F3.5-4.5 | Shutter speed: 1/160 sec | Aperture value: F6.3 | Focal length: 150mm | ISO sensitivity: 100

The Sydney Opera House is located in Sydney, New South Wales, Australia. Camera: Olympus E-500 | Lens: 40-150mm F3.5-4.5 | Shutter speed: 1/200 sec | Aperture value: F5.6 | Focal length: 40mm | ISO sensitivity: 100

The Sydney Opera House was made a UNESCO World Heritage Site on June 28, 2007. Camera: Olympus E-500 | Lens: 40-150mm F3.5-4.5 | Shutter speed: 1/320 sec | Aperture value: F5.6 | Focal length: 118mm | ISO sensitivity: 100

The Sydney Opera House is one of the world's most distinctive 20th century buildings, and one of the most famous performing arts venues in the world. Camera: Olympus E-500 | Lens: 40-150mm F3.5-4.5 | Shutter speed: 1/320 sec | Aperture value: F5.6 | Focal length: 113mm | ISO sensitivity: 100

The Sydney Opera House was among the 20 selected finalists in the 2007 New Seven Wonders of the World project. Camera: Olympus E-500 | Lens: 40-150mm F3.5-4.5 | Shutter speed: 1/200 sec | Aperture value: F6.3 | Focal length: 40mm | ISO sensitivity: 100

The Sydney Harbour Bridge is a steel arch bridge across Sydney Harbour. Camera: Olympus E-500 | Lens: 40-150mm F3.5-4.5 | Shutter speed: 1/160 sec | Aperture value: F5.6 | Focal length: 40mm | ISO sensitivity: 100