Monday, January 30, 2012

.htaccess

This articles covers the different purposes of .htaccess files.

I didnot  .htaccess could be used to do so many things until i read the juicy matters below. I though .htccess was just for re-directions and re-write.
Lets list em out and  first and describe later. First the usual,

1. Re-directions
2. Redirections

and then comes the cool stuffs that i didnot think ever about .htaccess could do

3. Serving Custom Error Pages
4. Restricting Access to Specific Resources
5. Block Access to Certain Entities
6. Force an IE Rendering Mode
7. Implement Caching
8. Enabling Compression

________________________________________
Re-directions
Redirect 301 ^old\.html$ http://localhost/new.html

This sets the HTTP status code to 301 (moved permanently) and redirects all requests to old.html transparently to new.html. We use a regular expression to match the URL to redirect, which gives us a fine degree of control to ensure only the correct URL is matched for redirection, but adds complexity to the configuration and administration of it. The full URL of the resource being redirected to is required.
________________________________________
Rewrites

RewriteEngine on
RewriteRule ^old\.html$ new.html

In this example, we just provide a simple file redirect from one file to another, which will also be performed transparently, without changing what is displayed in the address bar. The first directive, RewriteEngine on, simply ensures that the rewrite engine is enabled.
In order to update what is displayed in the address bar of the visitor's browser, we can use the R flag at the end of the RewriteRule e.g.

1 RewriteRule ^old\.html$ http://hostname/new.html [r=301]

The r flag causes an external redirection which is why the full URL (an example URL here) to the new page is given. We can also specify the status code when using the flag. This causes the address bar to be updated in the visitor's browser.
One of the possible uses for URL rewriting I gave at the start of this section was to make unsightly URLs (containing query-string data) friendlier to visitors and search engines. Let's see this in action now:

2 RewriteRule ^products/([^/]+)/([^/]+)/([^/]+) product.php?cat=$1&brand=$2&prod=$3

This rule will allow visitors to use a URL like products/turntables/technics/sl1210, and have it transformed into product.php?cat=turntables&brand=technics&prod=sl1210. The parentheses in between the forward slashes in the above regular expression are capturing groups – we can use each of these as $1, $2 and $3 respectively. The [^/]+ character class within the parentheses means match any character except a forward-slash 1 or more times.
In practice, URL rewriting can be (and usually is) much more complex and achieve far greater things than this. URL rewriting is better explained using entire tutorials so we won't look at them in any further detail here.
________________________________________
Serving Custom Error Pages

It's just not cool to show the default 404 page anymore. Many sites take the opportunity offered by a file not found error to inject a little humour into their site, but at the very least, people expect the 404 page of a site to at least match the style and theme of any other page of the site.
Very closely related to URL rewriting, serving a custom error page instead of the standard 404 page is easy with an .htaccess file:
1 ErrorDocument 404 "/404.html"
That's all we need; whenever a 404 error occurs, the specified page is displayed. We can configure pages to be displayed for many other server errors too.
________________________________________
Restricting Access to Specific Resources
Using .htaccess files, we can enable password protection of any file or directory, to all users, or based on things like domain or IP address. This is after all one of their core uses. To prevent access to an entire directory, we would simple create a new .htaccess file, containing the following code:

AuthName "Username and password required"
AuthUserFile /path/to/.htpasswd
Require valid-user
AuthType Basic
This file should then be saved into the directory we wish to protect. The AuthName directive specifies the message to display in the username/password dialog box, the AuthUserFile should be the path to the .htpasswd file. The Require directive specifies that only authenticated users may access the protected file while the AuthType is set to Basic.
To protect a specific file, we can wrap the above code in a directive, which specifies the protected file: 


AuthName "Username and password required"
AuthUserFile /path/to/.htpasswd
Require valid-user
AuthType Basic

We also require an .htpasswd file for these types of authentication, which contains a colon-separated list of usernames and encrypted passwords required to access the protected resource(s). This file should be saved in a directory that is not accessible to the web. There are a range of services that can be used to generate these files automatically as the password should be stored in encrypted form.
________________________________________
Block Access to Certain Entities
Another use of .htaccess files is to quickly and easily block all requests from an IP address or user-agent. To block a specific IP address, simply add the following directives to your .htaccess file:

order allow,deny
deny from 192.168.0.1
allow from all

The order directive tells Apache in which order to evaluate the allow/deny directives. In this case, allow is evaluated first, then deny. The allow from all directive is evaluated first (even though it appears after the deny directive) and all IPs are allowed, then if the client's IP matches the one specified in the deny directive, access is forbidden. This lets everyone in except the specified IP. Note that we can also deny access to entire IP blocks by supplying a shorter IP, e.g. 192.168.
To deny requests based on user-agent, we could do this:

RewriteCond %{HTTP_USER_AGENT} ^OrangeSpider

RewriteRule ^(.*)$ http://%{REMOTE_ADDR}/$ [r=301,l]
In this example, any client with a HTTP_USER_AGENT string starting with OrangeSpider (a bad bot) is redirected back to the address that it originated from. The regular expression matches any single character (.) zero or more times (*) and redirects to the %{REMOTE_ADDR} environment variable. The l flag we used here instructs Apache to treat this match as the last rule so will not process any others before performing the rewrite.
________________________________________
Force an IE Rendering Mode
Alongside controlling how the server responds to certain requests, we can also do things to the visitor's browser, such as forcing IE to render pages using a specific rendering engine. For example, we can use the mod_headers module, if it is present, to set the X-UA-Compatible header:
1 Header set X-UA-Compatible "IE=Edge"
Adding this line to an .htaccess file will instruct IE to use the highest rendering mode available. As demonstrated by HTML5 Boilerplate, we can also avoid setting this header on files that don't require it by using a

      Header unset X-UA-Compatible

________________________________________
Implement Caching
Caching is easy to set up and can make your site load faster.
Caching is easy to set up and can make your site load faster. 'Nuff said! By setting a far-future expires date on elements of sites that don't change very often, we can prevent the browser from requesting unchanged resources on every request.
If you're running your site through Google PageSpeed or Yahoo's YSlow and you get the message about setting far-future expiry headers, this is how you fix it:

ExpiresActive on
ExpiresActive on
ExpiresByType image/gif                 "access plus 1 month"
ExpiresByType image/png                 "access plus 1 month"
ExpiresByType image/jpg                 "access plus 1 month"
ExpiresByType image/jpeg                "access plus 1 month"
ExpiresByType video/ogg                 "access plus 1 month"
ExpiresByType audio/ogg                 "access plus 1 month"
ExpiresByType video/mp4                 "access plus 1 month"
ExpiresByType video/webm                "access plus 1 month"

You can add different ExpiresByType directives for any content that is listed in the performance tool you're using, or anything else that you want to control caching on. The first directive, ExpiresActive on, simply ensures the generation of Expires headers is switched on. These directives depend on Apache having the mod_expires module loaded.
________________________________________
Enabling Compression
Another warning we may get in a performance checker refers to enabling compression, and this is also something we can fix simply by updating our .htaccess file:

FilterDeclare   COMPRESS
FilterProvider  COMPRESS  DEFLATE resp=Content-Type $text/html
FilterProvider  COMPRESS  DEFLATE resp=Content-Type $text/css
FilterProvider  COMPRESS  DEFLATE resp=Content-Type $text/javascript
FilterChain     COMPRESS
FilterProtocol  COMPRESS  DEFLATE change=yes;byteranges=no

This compression scheme works on newer versions of Apache (2.1+) using the mod_filter module. It uses the DEFLATE compression algorithm to compress content based on its response content-type, in this case we specify text/html, text/css and text/javascript (which will likely be the types of files flagged in PageSpeed/Yslow anyhow).
In the above example we start out by declaring the filter we wish to use, in this case COMPRESS, using the FilterDeclare directive. We then list the content types we wish to use this filter. The FilterChain directive then instructs the server to build a filter chain based on the FilterProvider directives we have listed. The FilterProtocol directive allows us to specify options that are applied to the filter chain whenever it is run, the options we need to use are change=yes (the content may be changed by the filter (in this case, compressed)) and byteranges=no (the filter must only be applied to complete files).
On older versions of Apache, the mod_deflate module is used to configure DEFLATE compression. We have less control of how the content is filtered in this case, but the directives are simpler:

 SetOutputFilter DEFLATE
 AddOutputFilterByType DEFLATE text/html text/css text/javascript

In this case we just set the compression algorithm using the SetOutputFilter directive, and then specify the content-types we'd like to compress using the AddOutputFilterByType directive.
Usually your web server will use one of these modules depending on which version of Apache is in use. Generally, you will know this beforehand, but if you are creating a generic .htaccess file that you can use on a variety of sites, or which you may share with other people and therefore you don't know which modules may be in use, you may wish to use both of the above blocks of code wrapped in
directives so that the correct module is used and the server doesn't throw a 500 error if we try to configure a module that isn't included. You should be aware that it's also relatively common for hosts that run a large number of sites from a single box to disable compression as there is a small CPU performance hit for compressing on the server.

Thursday, January 26, 2012

mod-rewrite




this article covers mod-rewrite from gekko.

Mod-rewrite provides a way to modify incoming URL requests, dynamically, based onregular expressions (alias regex) rules. These rules allows us to map arbitrary 
URLs onto our internal URL structure in any way you like, meaning rename the files name called onto URL to anything as wished. It`s written in a file .htaccess.



Here is an example:


.htaccess file content 

Options +FollowSymLinks
RewriteEngine on
RewriteRule ^page1\.html$ page2.html [R=301,L] 

1. .htacces file is where we rewrite our sites incoming URL structure. 
2. the rules are read top to bottom.
3. so more abstract rules are written first and then comes more specific, meaning a page name redirect.

First-off,
Options +FollowSymLinks
These directive instructs Apache to follow symbolic links within site.
Symbolic links are "abbreviated nicknames" for things within  site and are usually disabled by default. 
Since mod_rewrite relies on them, we must turn them on

Second,
RewriteEngine on
The "RewriteEngine on" directive does exactly what it says.
Mod_rewrite is normally disabled by default and this directive enables the processing of subsequent mod_rewrite directives. 

Third,
RewriteRule ^page1\.html$ page2.html [R=301,L] 
In this example, we have a caret at the beginning of the pattern, and a 
dollar sign at the end. These are regex special characters called 
anchors
The caret tells regex to begin looking for a match with the 
character that immediately follows it, in this case a "p".
The dollar sign anchor tells regex that this is the end of the string we want to 
match. 
In this example there is ^page1\.html followed by page2.html. 
page1\.html  and ^page1\.html$ are interchangeable expressions and match the same string.
However, page1\.html matches any string containing exactly and only page1.html" (apage1.html for example) anywhere in the URL, but ^page1\.html$ matches only a string which is exactly equal to page1.html

In a more complex redirect, anchors (and other special regex characters) are often essential. 

And Finally,

In the above example, we also have an [R=301,L]

These are called flags in mod_rewrite and they're optional parameters. 

R=301 instructs Apache to return a 301 status code with the delivered page and, when not included as in [R,L], defaults to 302. 

mod_rewrite can return any status code that you specify in the 300-400 range and it REQUIRES 
the square brackets surrounding the flag.

The  L flag tells Apache that this is the last rule that it needs to 
process. 

So this is what the mod rewrite and regx does huh???

Indeed it helps solving many redirects problems with a more cleaner and custom mechanism to access web page but the crust of mod-rewrite is security. 

In above example the ^page1\.html will only allow page1.html to access page2.html nothing else.
As page1 is masked as page2 it provides another layer of security to keep the file encapsulated.
By using mod-rewrite developers can limit the access to the incoming URL structure by defining specific data types to access the page as a parameter in the regex. 


There are whole range of other stuffs that you can do with mod-rewrite. 
More Ref`s:
http://etext.lib.virginia.edu/services/helpsheets/unix/regex.html
http://gnosis.cx/publish/programming/regular_expressions.html
http://httpd.apache.org/docs/current/rewrite/


Some Quick References:
---------------------------------------------------------
Patterns ("wildcards") are matched against a string Special characters 

    . (full stop) - match any character
    \* (asterisk) - match zero or more of the previous symbol
    \+ (plus) - match one or more of the previous symbol
    ? (question) - match zero or one of the previous symbol
    \\? (backslash-something) - match special characters
    ^ (caret) - match the start of a string
    $ (dollar) - match the end of a string
    [set] - match any one of the symbols inside the square braces.
    [^set] - match any symbol that is NOT inside the square braces.
    (pattern) - grouping, remember what the pattern matched as a special variable
    {n,m} - from n to m times matching the previous character (m could be omitted to mean >=n times)
    (?!expression) - match anything BUT expression at the current position. Example: "^(/(?!(favicon.ico$|js/|images/)).*)" => "/fgci/$1"
    
----------------------------------------------------------

[abc]     A single character: a, b or c
[^abc]     Any single character but a, b, or c
[a-z]     Any single character in the range a-z
[a-zA-Z]     Any single character in the range a-z or A-Z
^     Start of line
$     End of line
\A     Start of string
\z     End of string
.     Any single character
\s     Any whitespace character
\S     Any non-whitespace character
\d     Any digit
\D     Any non-digit
\w     Any word character (letter, number, underscore)
\W     Any non-word character
\b     Any word boundary character
(...)     Capture everything enclosed
(a|b)     a or b
a?     Zero or one of a
a*     Zero or more of a
a+     One or more of a
a{3}     Exactly 3 of a
a{3,}     3 or more of a
a{3,6}     Between 3 and 6 of a
--------------------------------------------------------------------

Regular Expression  
foo  The string "foo"
^foo  "foo" at the start of a string
foo$  "foo" at the end of a string
^foo$  "foo" when it is alone on a string
[abc]  a, b, or c
[a-z]  Any lowercase letter
[^A-Z]  Any character that is not a uppercase letter
(gif|jpg)  Matches either "gif" or "jpeg"
[a-z]+  One or more lowercase letters
[0-9\.\-]  ?ny number, dot, or minus sign
^[a-zA-Z0-9_]{1,}$  Any word of at least one letter, number or _
([wx])([yz])  wy, wz, xy, or xz
[^A-Za-z0-9]  Any symbol (not a number or a letter)
([A-Z]{3}|[0-9]{4})  Matches three letters or four numbers
--------------------------------------------------------------------------------