Vectorization
From Ufldl
for
Vectorization
Jump to:
navigation
,
search
When working with learning algorithms, very often having a faster piece of code means that you'll make progress faster on the problem. For example, if your learning algorithm takes 20 minutes to run to completion, that means you can "try" at most one new idea every 20 minutes. If your code takes 20 hours to run, that pretty much means you can "try" only one idea a day, since that's how long you have to wait before your code finishes and you get feedback on how well it did. If you can speed up your code so that it takes on 10 hours to run, that can literally double your productivity as a researcher! '''Vectorization''' refers to a powerful way to speed up your algorithm, by taking advantage of highly-optimized linear algebra packages. Specifically, numerical computing and parallel computing researchers have put decades of work into making certain numerical operations (such as matrix-matrix multiplication, matrix-matrix addition, matrix-vector multiplication) fast. Thus, if you can express your learning algorithm in terms of these highly optimized operations, you can make your code run much faster than if you were end up implicitly implementing some of these operations yourself. Concretely, if <math>\textstyle x \in \Re^{n+1}</math> and <math>\textstyle \theta \in \Re^{n+1}</math> are vectors and you need to compute <math>\textstyle z = \theta^Tx</math>, you can implement <syntaxhighlight lang="matlab"> z = 0; for i=1:(n+1), z = z + theta(i) * x(i); end; </syntaxhighlight> or you can simply implement <syntaxhighlight lang="matlab"> z = theta' * x; </syntaxhighlight> The second piece of code is not only simpler, but it will run \emph{much} faster. More generally, a good rule-of-thumb for coding Matlab and Octave is: ::'''Whenever possible, avoid using an explicit for-loop in your code.''' [Is multi-threading enabled by default in Matlab?] Sometimes, using the vectorization methods describe here can make your code significantly harder to program, read, and/or debug (though as you gain familiarity with these methods, you'll also find vectorized code easier to read). Thus, there's a tradeoff in ease of programming/debugging and running time. In particular, if the first time you write your program you use all the vectorization tricks, your code may be much harder to read and thus you may miss bugs or have a harder time finding bugs. Thus, sometimes you may first choose to implement your algorithm without too many vectorization tricks first, and verify that it is working correctly (perhaps by running on a small problem). Then only after it is working, you can vectorize your code one piece at a time, pausing after each piece to verify that your code is still computing the same result as before. At the end, you'll then hopefully have a correct, debugged, and vectorized/efficient piece of code.
Template:Languages
(
view source
)
Template:Vectorized Implementation
(
view source
)
Return to
Vectorization
.
Views
Page
Discussion
View source
History
Personal tools
Log in
ufldl resources
UFLDL Tutorial
Recommended Readings
wiki
Main page
Recent changes
Random page
Help
Search
Toolbox
What links here
Related changes
Special pages