Jul 11, 2011

my package is published at phpclasses.org

Here is the link:

This is a bunch of classes in a namespace that allow one using filter_input() function conveniently, without remembering all those FILTER_ constants.
It provides API for several frequently used filters with convenient autocompletion in IDEs.

May 30, 2011

security issue with regular expressions

Ok, yii fixed it's security issue with regular expressions in validators I was worried about.

It comes out that serious php applications use regular expressions as a tool for checking user input, not paying attention to the documented limitations.

Everyone around talks about sql injections, but when it comes to regular expressions, you need to explain this even to the authors of the framework.

In fact, I like how Qiang closes bugs in yii in minutes, most important is make him notice them :)

May 10, 2011

XenForo integration

Hi, here I will share some ideas on how to integrate XenForo with a custom PHP web site.

A few words about Xenforo internal structure.
It has its own MVC framework, using selected modules from Zend Framework.
It does not follow any coding standards. You can find SQL in controllers and lots of undocumentes functions. It is a big heap of garbage with lots of hidden rocks.

It uses the VBulletin approach for templates, and CSS. The templates use an XML-based markup and reside in the database. When XenForo shows a page, it fetches the templates from the database, compiles them int the native php/xml code and executes with eval().
This makes lots of problems to debug and find which template contains the required part of the page, but it allows editing the templates from the forum admin are.
Same with CSS. When the page loads, it fetches all styles for a page from a database, merges them and sends to a browser in a single request.

Additionally to the templates XenForo got helpers - the idea is borrowed from ZF. Helpers are PHP scripts with HTML that render some parts of the pages, but they reside in files, not in DB.

The first thing that may help know more about database calls and files included is to add the line "$config['debug'] = 1;" in library/config.php and call the page of your interest with GET-parameter ?_debug=1

Lets assume the forum is installed in the folder "forum" of our application.
XenForo framework is initialized by 3 lines:

XenForo_Application::initialize('forum/library', __DIR__.'/forum');

You can see this lines in the entry script of the forum (forum/index.php)

Our first task is to integrate sessions - to make the forum recognize the visitor.
In XenForo the sessions are stored in the database, in table xf_session. It does not use internal PHP sessions at all.
The forum operates with sessions through the XenForo_Session class (library/XenForo/Session.php)
The easiest way to activate and update session is to call

$session = new XenForo_Session();
The call "save()" is required to update the timestamp mark of the last user activity, because the session is checked for the delay before being opened (in the "where" clause of the query).

To set the user ID of the session you can call $session->changeUserId(1);

(2 be continued)

Sep 28, 2010

mass-mailing system development

From November 2009 till August 2010 I was developing and launching the emailing system.
The goal of the project is to create a system sending lots of email letters from the large number of server simultaneously. The system is designed to be used by multiple users through a web interface, achieving high delivery rate.
For example, it can be used by the web site owners who want to send newsletters to a large list of users and not willing to invest in their own infrastructure.

For the user the process of sending emails starts consists of registering servers in a system, adding domains, creating campaigns and monitoring the progress.

Main features:

  • Distributing the campaign among multiple working servers with multiple Ips.
  • Automatic servers setup and deployment.
  • Independence of servers location, minimal requirements for the hardware and connection.
  • Almost linear scalability of the system.
  • Precise tuning of delivery process, like limiting delivery rate per recipient domain – helps avoiding blocks by yahoo/gmail/etc for flood.
  • Working with large lists of recipients - millions of records. Multiple white/black lists in a campaign can be used.
  • HTTP API for recepients addition to the lists for integration with other applications.
  • Rar and zip compressed lists are accepted for convenience.
  • Multiple domains used in a campaign.
  • Link masking with multiple user domains.
  • Automatic domains management, sub-domains and mx records creation per IP.
  • Clicks accounting.
  • Bounce and unsubscribe tracking - creating the special black lists taken into account in the future campaigns.
  • Web interface for users, aggregate stats and delivery speed graph for campaigns, admin area for user and system management.
  • Using tags in templates, like “Hello [UserName]“
  • Plain text, HTML or multipart (html+plain text) letters.
  • encoding selection - 7bit, quoted-printable or base64.
  • Selecting ips and domains for a campaign.
  • Detecting offline servers.

There are several interesting aspects in the project implementation.
Internally the system is organized as a set of web services. Each working server runs an MTA (mail transport agent) to send the letters, and a lightweight web server for a web service. The central management server communicates with the workers by HTTP to send letters (instead of SMTP), gather status of delivery, subscription letters, and for the remote configuration management.
When pushing the letters to the working servers to be sent, the template and parameters are sent (addresses, names, links, etc). The individual letters are generates on the worker servers. The data sent to the working servers is encrypted.

This approach has many advantages:

  • cuts the traffic
  • provides high level of security
  • makes it possible to use remote machines and VPSes as remote working servers
  • unloads the central server
  • makes use of standard widely used trusted technologies (XML, HTTP, OpenSSL, cURL)
  • possible traffic compression
The working servers need some custom software to run a web service. There are scripts automatically installing the software from the repository, and generating config files.

The software used is Nginx (web server) and PHP. No database server - just bundled SQLite used to store the queue of the letters to send and logs of the results. It works well – there is no concurrent requests, and

Best of all, the database or a web server requires no human attention.
Server requirements:

  • a clean CentOS 5 installation
  • correctly configured networking
  • free 25 port: no Plesk/Qmail/Postfix running
  • 256 MB RAM
  • date/time/zone and ntp synchronization recommended

The process of automating server setup and remote management came to be much harder then I considered initially. There is several dozens commands one needs to execute and files to upload to prepare a system. Finding out an exact list, exact order and exact file permissions to set wasn't a trivial task.

Moreover, in real world, there are many unexpected problems appearing, like /etc/resolv.conf is not configured, incorrect clock or time zone, port 25 is occupied by some software, autoconf is not installed, and so on.

All you can do is know where the server came from, and check it if you are not sure about it's configuration. After all, there is a detailed log written during a setup process.

When adding a server, user submits the root password which it is not saved anywhere, instead the public key is uploaded, and a key authorization is used to access worker servers further. The server setup routine is implemented as a daemon running on a management server and called by a web script. Most of configuration updates, like rate limit changes, domains management or clearing queues for a campaign canceled by the user, is done trough a web service with encrypted messages.

The system is written mostly in PHP. Version 5.3 has got a good memory management, and daemons are quite stable. I got no stability complains during several months of work and tens of millions of letters sent. At some moment the DB became filled with data from old campaings and required optimization.

The complexity of the forced me to do refactoring of core modules several times, cause new requirements appeared during the work as usually. Everything is written with an OOP design and MVC. It is not a web application at all, also.

Mar 21, 2008

Mirrors project public start

Hurray! We completed the PHP scripts for web proxies (anonymizers).

The story started 3 years ago when I published the code for web proxies on the phpclasses.org. The code became popular, I received hundreds of emails.

Then I approached the idea more seriously and launched a web proxy site - browser.grik.net
It worked unexpectedly well - the popularity was going up by 10% per week without my attention until it overloaded the server (pretty weak one, need to mention).
Optimization allowed the site to process 1-2 thousands users per day with 2 GB of daily traffic. In about a year due to personal reasons I turned off the site.

Now I restore the project with much bigger expectations.
The code written will be published as open source, so everyone can download it and set up a good web proxy site.

I will launch at least 10 sites for myself which will point each other, creating a whole network of web proxies.
Later other people willing to join a network will be invited.

The current plan is to reach 100 000 users per day.

First site is launched by an old domain: http://browser.grik.net/

Tanks for attention, I will keep posting about the progress.

Jan 7, 2008

gCurl package - PHP classes to help using CURL

I prepared and published a package that helps me writing complex scripts with HTTP request/response.
You can get it from my site (gCurl.tar.bz2)
or from PHP Clases when it will get published.

I used it to handle HTTP requests/responses in a number of projects, like "PHP Server-side browser", "Craigslist submitter", in spiders and crawlers, and now it will be a part of my Mirrors project.
Now I updated it for PHP 5.2, commented generously and added a number of samples of usage.

Briefly, the package implements commonly used routines of preparing HTTP requests and parsing server response.
For request it provides means to send cookies, custom headers, GET parameters and POST data.
Response processing includes processing HTTP headers, including multi-line headers, cookies parsing ( representing them as an array convenient to work with ).
The most powerful important feature of the package is an interface that helps assigning handlers for response headers and body.
Assigning a handler for a response body allows processing the data on-the-fly, by chunks, as received from the server.
This way one can process HTTP response body without waiting for all data to be received and not consuming memory for the whole response body.

Most important, one can assign the body handler after processing response headers. This allows assigning a handler depending on Content-Type or cookie.

This allows me to write very flexible scripts dealing with HTTP.
Hope it will help you as well :)

Dec 26, 2007

What you don't know about USSR

I met an interesting article about the look of normal western people at our traditions and at the USSR, the country I was born in.

It tells some nice story about change in the culture and society happened after the fall of USSR.

How did the Richest Russians become Rich