Register | Login
Views: 19364387
Main | Memberlist | Active users | ACS | Commons | Calendar | Online users
Ranks | FAQ | Color Chart | Photo album | IRC Chat
11-02-05 12:59 PM
0 user currently in Programming. | 3 guests
Acmlm's Board - I2 Archive - Programming - Grabbing URL's with PHP?
  
User name:
Password:
Reply:
 

UserPost
kode54
Posts: 7/7
Here is information on the function handling functions. call_user_func may be what you want, but create_function may also be handy for declaring the function at once, or even custom generating function code on the fly, in the event that you find it more efficient to generate one function to be executed repeatedly.
windwaker
Posts: 1630/1797
Well, while we're on the subject of PHP...

How do you do something like...
$functionname = "mycustomfunction";
execute_function($functionname);

Where execute_function is a function that runs a function called mycustomfunction()?
kode54
Posts: 6/7
That does not treat user input as code. It merely processes a token of the data which the expression finds using its own code. I don't think it's vulnerable to double-quotes faking out the processor code either. Even if that were possible, the expression cuts off at the first single or double quote character. There may yet be a vulnerability somewhere in there, but since IPB is using almost the same code, I presume it to be safe.

Just to clarify the preg_replace "e" flag, it specifies that your replacement string, which in this case is a string constant, is a piece of PHP code to be executed once per match. I will have to look it up again, as I am not sure if it means that said code can "echo" or otherwise manipulate the standard output and in turn be piped as the replacement text, which is eventually output as the return value of preg_replace(). My example simply ignores the return and stores the data in an array created outside of the function. In fact, you could do like my code does and further foreach() parse an array of strings, or while() parse a SQL result set.

As I said, Regex isn't the only way. You may want to experiment with the XML extension, it may prove to be faster for handling just anchors and/or img tags than scanning the raw text with a regular expression.
HyperLamer
Posts: 4702/8210
Treating user input as code could lead to some nasty security flaws though. You'd need to make sure you sealed up any possible holes.
kode54
Posts: 5/7
Here's a regular expression I kind of borrowed from Invision Power Board months ago, and later modified. As you can see, it uses case insensitive (i) and also PHP's extended execute (e) attribute, which treats the replacement string as a piece of code to execute instead of merely a replacement. I think you can still catch the echoed output by assigning the return value to an array, but this works as well:

$urls = array();

preg_replace('#(^|\s|"|'."'".')((http|https|news|ftp)://\w+[^\s\(\)\[\]"'."'".']+)#ie', '\$urls[] = "\2"', $input_text);

Token 2 (the complete link) in every match will be pushed into $urls. Token 1 is only there for the original code to preserve the preceding whitespace, but I added various quotation marks since I encountered various IRC logs where clients or servers added the quotes or other characters.

It is probably not a good idea to use this string in a redundant manner, rather to process data once and record somewhere that you processed it. Well, since you're processing a web page, that should mean less complexity than what I was doing. (Thousands of lines of IRC logs, all processed from a MySQL server, every time the page is loaded... No I won't demonstrate.)

Also, if you know what you are doing, and you will always be processing properly formed X/HTML content, it may be more secure to parse the pages with the XML extension and locate all anchor tags. Then worry about the Regex if/when you need fulltext scanning.
windwaker
Posts: 1577/1797
:O

Thanks tons man! This's what I've been looking for, for quite a while.
Ramsus
Posts: 70/162
I read a man page one day (a few years ago, I think) and played around with it. Just check out the perlre manual page. It's definitely a must if you do web development, since it is the absolute easiest tool to filter and secure user input with. Perl in taint-mode even requires you use regex's with all input and external variables.
windwaker
Posts: 1575/1797
Ah, I see. I'd used regular expressions, however I'd never really written them on my own; where'd you learn to do that?
Ramsus
Posts: 64/162
Couldn't you just use a regex like /<a href="(http:\/\/.*?)">/ ?


EDIT: In case you're not familiar with using regular expressions in PHP, you'd use the following code with a buffer full of HTML (in this case, $buffer) to get an array of URLs:

<?php

preg_match_all("/<a href=\"(http:\/\/.*?)\">(.*?)<\/a>/", $buffer, $links);
// $links[0] is an array filled with all of the anchor tags
// $links[1] is an array filled just with the URLs from those tags
// $links[2] is an array filled with the names of the links
foreach ($links[1] as $link) {
echo "URL: $link \n";
}
?>
windwaker
Posts: 1562/1797
I'm trying to build something that'll go to another site of mine in PHP, however, I need to be able to grab urls from the page it's viewing (because I need it to differentiate between images and links).

Any ideas?
Acmlm's Board - I2 Archive - Programming - Grabbing URL's with PHP?


ABII


AcmlmBoard vl.ol (11-01-05)
© 2000-2005 Acmlm, Emuz, et al



Page rendered in 0.004 seconds.