var ebf: Blog // by Eduardo Fonseca

Optimizing cv::drawContours on iOS

OpenCV requires no introduction - it is amazing1. The portability and speed makes any project that requires Computer Vision a breeze. And being open source really helps.

But, sometimes moving across platforms can be tricky. In our case, the problem was that the cv::drawContours call ran almost instantly on the iOS Simulator, but took 20-45 seconds on the device. Not good™.

Instruments gave me a clear picture of what could be happening:

Holly… lot’s of std::vector<> appends, inside cv::drawContours(). Time to understand this better. Let’s go back a little on how to detect contours with OpenCV.

Look at that silhouette!

If you follow OpenCV’s documentation, you will see that detecting contours inside an image is pretty straightforward:

Mat canny_output;
vector<vector<Point> > contours;
vector<Vec4i> hierarchy;

/// Detect edges using canny
Canny( src_gray, canny_output, thresh, thresh*2, 3 );
/// Find contours
findContours( canny_output, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE, Point(0, 0) );

/// Draw contours
Mat drawing = Mat::zeros( canny_output.size(), CV_8UC3 );
for( int i = 0; i< contours.size(); i++ )
{
	Scalar color = Scalar( rng.uniform(0, 255), rng.uniform(0,255), rng.uniform(0,255) );
	drawContours( drawing, contours, i, color, 2, 8, hierarchy, 0, Point() );
}

Not a big deal. You feed cv::findContours() with the output of cv::Canny() and then iterate over the results, drawing the contours with cv::drawContours(). Easy.

Try it on the desktop. It flies. So, what the heck is happening on iOS then?

The cost

That left me stumped. Ok, our app have 18k-20k contours to be drawn, compared to simpler examples online. But it should be crazy fast, after all, we are just drawing what we have already detected.

That’s when the beauty of Open Source comes in. I decided to check what was happening inside cv::drawContours and opened the source file for an analysis.

There I found this snippet:

std::vector<CvSeq> seq;
std::vector<CvSeqBlock> block;

...

seq.resize(last);
block.resize(last);

for( i = first; i < last; i++ )
	seq[i].first = 0;

A little down below, things get even more interesting.


    if( contourIdx >= 0 )
    {
        CV_Assert( 0 <= contourIdx && contourIdx < (int)last );
        first = contourIdx;
        last = contourIdx + 1;
    }

    for( i = first; i < last; i++ )
    {
        Mat ci = _contours.getMat((int)i);
        if( ci.empty() )
            continue;
        int npoints = ci.checkVector(2, CV_32S);
        CV_Assert( npoints > 0 );
        cvMakeSeqHeaderForArray( CV_SEQ_POLYGON, sizeof(CvSeq), sizeof(Point),
                                 ci.data, npoints, &seq[i], &block[i] );
    }

    if( hierarchy.empty() || maxLevel == 0 )
        for( i = first; i < last; i++ )
        {
            seq[i].h_next = i < last-1 ? &seq[i+1] : 0;
            seq[i].h_prev = i > first ? &seq[i-1] : 0;
        }
    else
    {
        size_t count = last - first;
        CV_Assert(hierarchy.total() == ncontours && hierarchy.type() == CV_32SC4 );
        const Vec4i* h = hierarchy.ptr<Vec4i>();

        if( count == ncontours )
        {
            for( i = first; i < last; i++ )
            {
                int h_next = h[i][0], h_prev = h[i][1],
                    v_next = h[i][2], v_prev = h[i][3];
                seq[i].h_next = (size_t)h_next < count ? &seq[h_next] : 0;
                seq[i].h_prev = (size_t)h_prev < count ? &seq[h_prev] : 0;
                seq[i].v_next = (size_t)v_next < count ? &seq[v_next] : 0;
                seq[i].v_prev = (size_t)v_prev < count ? &seq[v_prev] : 0;
            }
        }
        else
        {
            int child = h[first][2];
            if( child >= 0 )
            {
                addChildContour(_contours, ncontours, h, child, seq, block);
                seq[first].v_next = &seq[child];
            }
        }
    }

In a nutshell, OpenCV is initializing and processing the whole contours vector everytime the loop iterates. That’s not very efficient when you have 18k contours to be processed on a mobile device. So, as an experiment, I tried to optimize a bit on my side.

Hey. Look at that.

Since we are doing only a top level search on the contour hierarchy, this did the trick for me:

for (; idx >= 0; idx = hierarchy[idx][0]) {
	std::vector<std::vector<cv::Point> > contour;
	contour.push_back(contours[idx]);
	std::vector<Vec4i> plainHierarchy;
	plainHierarchy.push_back(hierarchy[idx]);

	Scalar color = Scalar( rng.uniform(0, 255), rng.uniform(0,255), rng.uniform(0,255) );

	drawContours(drawing, contour, 0, color, 2, 8, plainHierarchy, INT_MAX);
}

And the results:

Touché. From 20-45 seconds to ~200ms. Mission accomplished!

  1. Yes, I know Objective-C++ is weird. But, as a C++ geek, I find std::vectors of NSIntegers fun :)