Regex YouTube Parser

Over the next several days I’m going to post a series of Regex strings. These Regex strings can be used to parse input for different links. I’m using PHP in my examples (you may need to tweak the Regex to work in another language).

Today’s Regex is used to parse a string for YouTube links and extract the Video_ID. Once I have the Video_ID I’m able to create a standard embed code from nearly any YouTube URL.

Supported Links:

The following Regex supports these YouTube links and embed code snippets.

Short URL : http://youtu.be/OxWMsxa5uVk
Normal URL: http://www.youtube.com/watch?v=OxWMsxa5uVk&t=28s
HTTPS URL : https://www.youtube.com/watch?v=OxWMsxa5uVk&feature=g-logo
New Embed : <iframe width="560" height="315" src="http://www.youtube.com/
            embed/OxWMsxa5uVk" frameborder="0"
            allowfullscreen></iframe>
Old Embed : <object width="1280" height="720"><param name="movie" 
            value="http://www.youtube.com/v/OxWMsxa5uVk?    
            version=3&hl=en_US&rel=0"></param>
            <param name="allowFullScreen" value="true"></param>
            <param name="allowscriptaccess" 
            value="always"></param><embed 
            src="http://www.youtube.com/v/OxWMsxa5uVk?version=3&
            hl=en_US&rel=0" type="application/x-shockwave-flash" 
            width="1280" height="720" allowscriptaccess="always" 
            allowfullscreen="true"> </embed></object>

I won’t spend a lot of time explaining the Regex because I’ve broken it up Regex and commented each line.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
$regexstr = '~
# Match Youtube link and embed code
(?:				 # Group to match embed codes
   (?:<iframe [^>]*src=")?	 # If iframe match up to first quote of src
   |(?:				 # Group to match if older embed
      (?:<object .*>)?		 # Match opening Object tag
      (?:<param .*</param>)*     # Match all param tags
      (?:<embed [^>]*src=")?     # Match embed tag to the first quote of src
   )?				 # End older embed code group
)?				 # End embed code groups
(?:				 # Group youtube url
   https?:\/\/		         # Either http or https
   (?:[\w]+\.)*		         # Optional subdomains
   (?:               	         # Group host alternatives.
       youtu\.be/      	         # Either youtu.be,
       | youtube\.com		 # or youtube.com 
       | youtube-nocookie\.com	 # or youtube-nocookie.com
   )				 # End Host Group
   (?:\S*[^\w\-\s])?       	 # Extra stuff up to VIDEO_ID
   ([\w\-]{11})		         # $1: VIDEO_ID is numeric
   [^\s]*			 # Not a space
)				 # End group
"?				 # Match end quote if part of src
(?:[^>]*>)?			 # Match any extra stuff up to close brace
(?:				 # Group to match last embed code
   </iframe>		         # Match the end of the iframe	
   |</embed></object>	         # or Match the end of the older embed
)?				 # End Group of last bit of embed code
~ix';

Usage Example:

This example function takes an input string and uses the above Regex to parse off the Video_ID ($1) and add it to $iframestr to create a standard embed code.

1
2
3
4
5
6
function ParsePostYouTube($string){
  $regexstr = <<REGEX FROM ABOVE>>;
  $iframestr = ' <p><iframe width="500" height="284" src="http://www.youtube.com/embed/$1?wmode=transparent" frameborder="0" allowfullscreen></iframe></p> ';
 
  return preg_replace($regexstr, $iframestr, $string);
}

I began programming in C++ when I was in college. Odd for a business major, but hey I am a Dork. After college I got a job as System Administrator. As a System Administrator I was in charge of web administration. My journey as a PHP web developer had begun. Since that time I have gained an in depth knowledge of CSS, Javascript, XML and MySQL. With changes and advances to technology I have also began learning AJAX. I started Blue Fire Development to do freelance work in my spare time.

Tagged with: , ,

4 Comments on “Regex YouTube Parser

1 Pings/Trackbacks for "Regex YouTube Parser"

Leave a Reply

Your email address will not be published. Required fields are marked *

*